managed string library

Robert Seacord

unread,

Jun 15, 2006, 2:33:58 AM6/15/06

to

The SEI has published CMU/SEI-2006-TR-006 "Specifications for Managed
Strings" and released a "proof-of-concept" implementation of the managed
string library.

The specification, source code for the library, and other resources
related to managed strings are available for download from the CERT web
site at:

http://www.cert.org/secure-coding/managedstring.html

The following is a brief summary of the managed string library:

The managed string library was developed in response to the need for a
string library that can improve the quality and security of newly
developed C-language programs while eliminating obstacles to widespread
adoption and possible standardization. As the name implies, the managed
string library is based on a dynamic approach; memory is allocated and
reallocated as required. This approach eliminates the possibility of
unbounded copies, null-termination errors, and truncation by ensuring
that there is always adequate space available for the resulting string
(including the terminating null character). The one exception is if
memory is exhausted; that is treated as an error condition. In this way,
the managed string library accomplishes the goal of indicating either
success or failure. The managed string library also protects against
improper data sanitization by (optionally) ensuring that all characters
in a string belong to a predefined set of "safe" characters.

rCs

--
Robert C. Seacord
Senior Vulnerability Analyst
CERT/CC

Work: 412-268-7608
FAX: 412-268-6989
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

webs...@gmail.com

unread,

Jul 10, 2006, 7:11:27 PM7/10/06

to

Robert Seacord wrote:
> The SEI has published CMU/SEI-2006-TR-006 "Specifications for Managed
> Strings" and released a "proof-of-concept" implementation of the managed
> string library.
>
> The specification, source code for the library, and other resources
> related to managed strings are available for download from the CERT web
> site at:
>
> http://www.cert.org/secure-coding/managedstring.html
>
> The following is a brief summary of the managed string library:
>
> The managed string library was developed in response to the need for a
> string library that can improve the quality and security of newly
> developed C-language programs while eliminating obstacles to widespread
> adoption and possible standardization.

I'm wondering whether or not you compared it to other available
libraries such as my own ( http://bstring.sf.net/ ) or James Anthill's
( http://www.and.org/vstr/ ) before engaging in this effort?

I understand that need for this solely from the security focus, but
your effort looks like it was born out of a very direct and narrow
approach that takes absolutely nothing else into account. Besides
being slow, the API is somewhat cumbersome, which makes inline usage
impossible. The whole charset filtering thing is not multithreading
friendly, and IMHO, a poorly focused solution to the system() problem.
Instead of hiding all the functionality in the managed string, why not
instead have a "filterstring()" function, or better yet, have a
"safesystem()" function? I have a more complete discussion in the
Bstrlib documentation (
http://bstring.cvs.sourceforge.net/*checkout*/bstring/tree/bstrlib.txt?pathrev=HEAD
, search for "Managed String Library").

Beyond just security concerns, there is 1) The "Software Crisis"
concern. This is the concern that writing software in a scalable
manner (i.e., writing millions of bug-free lines of code) is difficult
to do. 2) Performance. C's string operations often take an additional
O(n) penalty for having to implicitely call strlen(), or the
equivalent, redundantly. 3) The Clib's poor base functionality (no
insert/delete, split, replace, substring functions), additionally
hampered by its inability to deal with aliasing (so strcat(p,p) leads
to UB even though it has a compelling intuitive meaning).

Obviously, I think Bstrlib fares well by such criteria. My point,
though, is that the "Managed String Library" really does not do very
well at all. I think this is important because I think it will be hard
to compell developers to use Managed Strings (or TR 24731) if it
continues to propogate *other* weaknesses of the C std library while
only attempting to solving a very narrow problem.

I.e., I believe Bstrlib to be a better *SECURITY* solution than Managed
Strings (or TR 24731, or pretty much anything else short of a change in
language altogether) because it is more *compelling* to programmers to
use for *OTHER* reasons. I.e., it is not security gained by some
additionally necessary expended effort forcused solely on security, but
rather its a better way of dealing with strings overall which happens
to supply a good security system as a bonus. Bstrlib has been put
through its paces by other programmers who use it, regardless of their
reason (safety/security is certainly not the only reason). With
Managed Strings, it appears that you are starting from scratch.

Also unlike any other solution, the Bstrlib webpage also includes a
public statement on security, which gives a whole set of security
assertions that auditors can test the library against (
http://bstring.cvs.sourceforge.net/*checkout*/bstring/tree/security.txt?pathrev=HEAD
) How do Managed Strings compare in this regard? For example, in
terms of password input/manipulation, can your proposal make guarantees
that copies of string content be kept out of the heap (from a free or
realloc) outside of programmer control?

Since I have not submitted Bstrlib to the ANSI C committee, perhaps you
see this as a moot point. But what I would suggest to you is that you
at least *study* my library first, and see what ideas from it you
should incorporate into your proposal.

While I see the merit of your intentions, I don't see your effort as
quite there yet. Especially in light of open source alternatives such
as my library. At the very least, I think you should take libraries
such as mine as a yardstick by which to compare yours in developing it.

--
Paul Hsieh
http://www.pobox.com/~qed/
http://bstring.sf.net/

Jonathan Leffler

unread,

Jul 25, 2006, 6:53:02 AM7/25/06

to

webs...@gmail.com wrote:
> [...] (so strcat(p,p) leads

> to UB even though it has a compelling intuitive meaning).

What's the compelling intuitive meaning? To me, it means copy
characters from the start of p over the null that used to mark the end
of p and keep going until you crash.

--
Jonathan Leffler #include <disclaimer.h>
Email: jlef...@earthlink.net, jlef...@us.ibm.com
Guardian of DBD::Informix v2005.02 -- http://dbi.perl.org/

webs...@gmail.com

unread,

Jul 27, 2006, 5:00:07 PM7/27/06

to

Jonathan Leffler wrote:
> webs...@gmail.com wrote:
> > [...] (so strcat(p,p) leads
> > to UB even though it has a compelling intuitive meaning).
>
> What's the compelling intuitive meaning? To me, it means copy
> characters from the start of p over the null that used to mark the end
> of p and keep going until you crash.

That's not an intuitive meaning. Its just an understanding of an
implementation anomoly. Perhaps for you, implementation details
changes your intuition.

Most people would intuitively think of this as simply replacing the
string with a doubled version of itself -- i.e., its analogous to the
C++ expression p += p for std::string's (and to be honest, I don't know
if that's legal or not), or just p = p + p in most other programming
languages.

You only *know* that this is not the case, because you know that strcat
is implemented as some variation of { d += strlen(d); while (*d++ =
*s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
(d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
variation is going to be faster. This is not intuition -- its just a
technical calculation.

Andrew Poelstra

unread,

Jul 27, 2006, 6:31:09 PM7/27/06

to

On 2006-07-27, webs...@gmail.com <webs...@gmail.com> wrote:
> Jonathan Leffler wrote:
>> webs...@gmail.com wrote:
>> > [...] (so strcat(p,p) leads
>> > to UB even though it has a compelling intuitive meaning).
>>
>> What's the compelling intuitive meaning? To me, it means copy
>> characters from the start of p over the null that used to mark the end
>> of p and keep going until you crash.
>
> That's not an intuitive meaning. Its just an understanding of an
> implementation anomoly. Perhaps for you, implementation details
> changes your intuition.
>

No, knowledge that C strings are null-terminated (which any C programmer
needs to know) suggests that intuitively. Either you calculate strlen(),
add a counter variable, and `for' your way through the string, or you
eliminate the counter and superfluous call to strlen(), and code it
efficiently.

It's more intuitive to use the more efficient, less code-intensive, and
easier-to-read version.

> Most people would intuitively think of this as simply replacing the
> string with a doubled version of itself -- i.e., its analogous to the
> C++ expression p += p for std::string's (and to be honest, I don't know
> if that's legal or not), or just p = p + p in most other programming
> languages.
>

"Most people" are not C programmers; if you know enough to use strcat(),
you should have an understanding of how C strings work. (And indeed, I've
never seen a C textbook that introduced strcat() prior to introcuding C-
style strings.) (Although I've heard of some pretty terrible textbooks on
this group that I was fortunate enough to avoid!)

> You only *know* that this is not the case, because you know that strcat
> is implemented as some variation of { d += strlen(d); while (*d++ =
> *s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
> (d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
> variation is going to be faster. This is not intuition -- its just a
> technical calculation.
>

IMHO, _intuitively_, there is no other way to implement strcat().

--
Andrew Poelstra <website down>
To reach my email, use <email also down>
New server ETA: 1 days

Keith Thompson

unread,

Jul 27, 2006, 6:36:51 PM7/27/06

to

webs...@gmail.com writes:
> Jonathan Leffler wrote:
>> webs...@gmail.com wrote:
>> > [...] (so strcat(p,p) leads
>> > to UB even though it has a compelling intuitive meaning).
>>
>> What's the compelling intuitive meaning? To me, it means copy
>> characters from the start of p over the null that used to mark the end
>> of p and keep going until you crash.
>
> That's not an intuitive meaning. Its just an understanding of an
> implementation anomoly. Perhaps for you, implementation details
> changes your intuition.

[...]

In this case, intuition is not necessary. If you read the standard's
description of strcat(), you'll see:

... If copying takes place between objects that overlap, the
behavior is undefined.

Any decent description of strcat() (in a man page or text book, for
example) should have similar wording; if it doesn't, that's the fault
of the author of the documentation.

If you attempt to use strcat(), or any other function, without reading
a description of how it's supposed to work, you can't reasonably
expect any particular behavior.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

James Dennett

unread,

Jul 27, 2006, 10:56:00 PM7/27/06

to

Andrew Poelstra wrote:
> On 2006-07-27, webs...@gmail.com <webs...@gmail.com> wrote:
>> Jonathan Leffler wrote:
>>> webs...@gmail.com wrote:
>>>> [...] (so strcat(p,p) leads
>>>> to UB even though it has a compelling intuitive meaning).
>>> What's the compelling intuitive meaning? To me, it means copy
>>> characters from the start of p over the null that used to mark the end
>>> of p and keep going until you crash.
>> That's not an intuitive meaning. Its just an understanding of an
>> implementation anomoly. Perhaps for you, implementation details
>> changes your intuition.
>>
>
> No, knowledge that C strings are null-terminated (which any C programmer
> needs to know) suggests that intuitively. Either you calculate strlen(),
> add a counter variable, and `for' your way through the string, or you
> eliminate the counter and superfluous call to strlen(), and code it
> efficiently.
>
> It's more intuitive to use the more efficient, less code-intensive, and
> easier-to-read version.

[...]

> IMHO, _intuitively_, there is no other way to implement strcat().

The argument from an implementors perspective shows how
easily we forget what was intuitive before we came to
think in terms of how to implement string functionality
in C.

Some can't even think of what strcat intuitively means
(append one string to another) without considering how
it's most sensibly implemented in terms of pointer
operations.

-- James

webs...@gmail.com

unread,

Jul 28, 2006, 12:35:58 PM7/28/06

to

Keith Thompson wrote:
> webs...@gmail.com writes:
> > Jonathan Leffler wrote:
> >> webs...@gmail.com wrote:
> >> > [...] (so strcat(p,p) leads
> >> > to UB even though it has a compelling intuitive meaning).
> >> What's the compelling intuitive meaning? To me, it means copy
> >> characters from the start of p over the null that used to mark the end
> >> of p and keep going until you crash.
> >
> > That's not an intuitive meaning. Its just an understanding of an
> > implementation anomoly. Perhaps for you, implementation details
> > changes your intuition.
> [...]
>
> In this case, intuition is not necessary.

Just *SAYING* this is the ultimate indictment of the C language. If
the language doesn't match your intuition, then its just takes that
much more effort to program in it.

> [...] If you read the standard's

> description of strcat(), you'll see:
>
> ... If copying takes place between objects that overlap, the
> behavior is undefined.
>
> Any decent description of strcat() (in a man page

The latest cygwin man page makes no mention of this and WATCOM C/C++'s
documentation omits this.

> [...] or text book, for

> example) should have similar wording; if it doesn't, that's the fault
> of the author of the documentation.

Here's the first hit on google:

http://www.cplusplus.com/ref/cstring/strcat.html

and the second:

http://www.mkssoftware.com/docs/man3/strcat.3.asp

Here's the wikipedia entry as of 07/28/2006:

http://en.wikipedia.org/wiki/Strcat

and here's the Open BSD documentation that it links to:

http://www.openbsd.org/cgi-bin/man.cgi?query=strcat

So I guess none of that counts as "decent documentation".

> If you attempt to use strcat(), or any other function, without reading
> a description of how it's supposed to work, you can't reasonably
> expect any particular behavior.

Right. That's because C is a "throwback to a forgotten era" kind of
language. Compare this to languages like Lua, Ruby and Python, where
your first guess as to how something works after seeing one example of
it has a 99% chance of being correct, and a 99% chance that you don't
even have a candidate second guess in mind.

webs...@gmail.com

unread,

Jul 28, 2006, 1:14:56 PM7/28/06

to

Andrew Poelstra wrote:
> On 2006-07-27, webs...@gmail.com <webs...@gmail.com> wrote:
> > Jonathan Leffler wrote:
> >> webs...@gmail.com wrote:
> >> > [...] (so strcat(p,p) leads
> >> > to UB even though it has a compelling intuitive meaning).
> >>
> >> What's the compelling intuitive meaning? To me, it means copy
> >> characters from the start of p over the null that used to mark the end
> >> of p and keep going until you crash.
> >
> > That's not an intuitive meaning. Its just an understanding of an
> > implementation anomoly. Perhaps for you, implementation details
> > changes your intuition.
>
> No, knowledge that C strings are null-terminated (which any C programmer
> needs to know) suggests that intuitively. Either you calculate strlen(),
> add a counter variable, and `for' your way through the string, or you
> eliminate the counter and superfluous call to strlen(), and code it
> efficiently.

Right -- you are either with us, or you are with the terrorists.

> It's more intuitive to use the more efficient, less code-intensive, and
> easier-to-read version.

There is a *third option*. Skip the first character which overwrites
the '\0', do the append starting from src+1, then go back and do the
'\0' overwrite at the end. This extra work adds at most O(1) to the
execution time. Voila, like magic you have an aliasing safe strcat().
Ain't the "as-if" rule wonderful?

But this is just coding jujitsu, and has nothing to do with intuition.

> > Most people would intuitively think of this as simply replacing the
> > string with a doubled version of itself -- i.e., its analogous to the
> > C++ expression p += p for std::string's (and to be honest, I don't know
> > if that's legal or not), or just p = p + p in most other programming
> > languages.
>
> "Most people" are not C programmers; if you know enough to use strcat(),
> you should have an understanding of how C strings work.

Yes, but this "knowledge" is simply bent by force into shape by what
the standard tells you. I.e., its working *against* your intuition.
Which was kind of my point.

> > You only *know* that this is not the case, because you know that strcat
> > is implemented as some variation of { d += strlen(d); while (*d++ =
> > *s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
> > (d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
> > variation is going to be faster. This is not intuition -- its just a
> > technical calculation.
>
> IMHO, _intuitively_, there is no other way to implement strcat().

That is why you fail.

Tell me this. How does *your* intuition tell you how memmove() is
implemented? Keep in mind that this guy actually is completely
aliasing safe.

SuperKoko

unread,

Jul 28, 2006, 2:18:36 PM7/28/06

to

webs...@gmail.com wrote:
> Jonathan Leffler wrote:
> > webs...@gmail.com wrote:
> > > [...] (so strcat(p,p) leads
> > > to UB even though it has a compelling intuitive meaning).
> >
> > What's the compelling intuitive meaning? To me, it means copy
> > characters from the start of p over the null that used to mark the end
> > of p and keep going until you crash.
>
> That's not an intuitive meaning. Its just an understanding of an
> implementation anomoly. Perhaps for you, implementation details
> changes your intuition.

Intuitive or not, that was not obvious for a beginner in C89, but that
should be obvious in C99, even for a beginner:
char* strcat (char * restrict, const char * restrict);

Thanks to "restrict", the function has a better documentation.

Andrew Poelstra:

> IMHO, _intuitively_, there is no other way to implement strcat().
>

But there are other ways to implement it....
Borland C++ 5.0 and Digital Mars Compiler use alternative
implementations (and they behave weird too, but in another way).

Al Balmer

unread,

Jul 28, 2006, 2:29:18 PM7/28/06

to

HP-UX man page:
"Character movement is performed differently in different
implementations, so moves involving overlapping source and destination
strings may yield surprises."

--
Al Balmer
Sun City, AZ

Andrew Poelstra

unread,

Jul 28, 2006, 2:41:05 PM7/28/06

to

On 2006-07-28, webs...@gmail.com <webs...@gmail.com> wrote:

> Andrew Poelstra wrote:
>> No, knowledge that C strings are null-terminated (which any C programmer
>> needs to know) suggests that intuitively. Either you calculate strlen(),
>> add a counter variable, and `for' your way through the string, or you
>> eliminate the counter and superfluous call to strlen(), and code it
>> efficiently.
>
> Right -- you are either with us, or you are with the terrorists.
>

;-)

>> It's more intuitive to use the more efficient, less code-intensive, and
>> easier-to-read version.
>
> There is a *third option*. Skip the first character which overwrites
> the '\0', do the append starting from src+1, then go back and do the
> '\0' overwrite at the end. This extra work adds at most O(1) to the
> execution time. Voila, like magic you have an aliasing safe strcat().
> Ain't the "as-if" rule wonderful?
>

If you start from src+1, you still have to store the original value of
src, or use a counter variable. My points on code-intensivity and ease
of reading still stand.

> But this is just coding jujitsu, and has nothing to do with intuition.
>

It's clever, I admit. However, jumping through hoops to manage odd
inputs which could be fixed with an
assert (dest > src + strlen(src) + 1);
doesn't bode well to me.

>> > Most people would intuitively think of this as simply replacing the
>> > string with a doubled version of itself -- i.e., its analogous to the
>> > C++ expression p += p for std::string's (and to be honest, I don't know
>> > if that's legal or not), or just p = p + p in most other programming
>> > languages.
>>
>> "Most people" are not C programmers; if you know enough to use strcat(),
>> you should have an understanding of how C strings work.
>
> Yes, but this "knowledge" is simply bent by force into shape by what
> the standard tells you. I.e., its working *against* your intuition.
> Which was kind of my point.
>

There are many reasons why I don't qualify as a "normal person" or
"average programmer"; my intuition agrees with that of strcat().

>> > You only *know* that this is not the case, because you know that strcat
>> > is implemented as some variation of { d += strlen(d); while (*d++ =
>> > *s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
>> > (d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
>> > variation is going to be faster. This is not intuition -- its just a
>> > technical calculation.
>>
>> IMHO, _intuitively_, there is no other way to implement strcat().
>
> That is why you fail.
>

I'm trying to avoid a flamewar.

> Tell me this. How does *your* intuition tell you how memmove() is
> implemented? Keep in mind that this guy actually is completely
> aliasing safe.
>

I suspect that memmove() memcpy()'s the data to a safe place and
memcpy()'s it back to the dest. This adds an intermediate step
which prevents problems with src and dest overlapping.

That doesn't sound particularly efficient to me, though; I assume
that compiler/library writers have found much better ways to code
it.

--
Andrew Poelstra <website down>
To reach my email, use <email also down>

New server ETA: 8 hours

Andrew Poelstra

unread,

Jul 28, 2006, 2:42:18 PM7/28/06

to

On 2006-07-28, SuperKoko <tabk...@yahoo.fr> wrote:
> Andrew Poelstra:
>> IMHO, _intuitively_, there is no other way to implement strcat().
>>
> But there are other ways to implement it....
> Borland C++ 5.0 and Digital Mars Compiler use alternative
> implementations (and they behave weird too, but in another way).
>

Allow me to rephrase that:
IMHO, there is no other _intuitive_ way to implement strcat().

--
Andrew Poelstra <website down>
To reach my email, use <email also down>

New server ETA: 1 days

David R Tribble

unread,

Jul 28, 2006, 2:44:31 PM7/28/06

to

Andrew Poelstra wrote:
>> IMHO, _intuitively_, there is no other way to implement strcat().
>

websnarf wrote:
> That is why you fail.
>
> Tell me this. How does *your* intuition tell you how memmove() is
> implemented? Keep in mind that this guy actually is completely
> aliasing safe.

...because it's specified that way. And memcpy() is not specified
that way. strcpy() is not specified that way, either.

Are you arguing for ignoring the function specifications?

-drt

David R Tribble

unread,

Jul 28, 2006, 2:51:56 PM7/28/06

to

websnarf wrote:
>> Tell me this. How does *your* intuition tell you how memmove() is
>> implemented? Keep in mind that this guy actually is completely
>> aliasing safe.
>

Andrew Poelstra wrote:
> I suspect that memmove() memcpy()'s the data to a safe place and
> memcpy()'s it back to the dest. This adds an intermediate step
> which prevents problems with src and dest overlapping.
>
> That doesn't sound particularly efficient to me, though; I assume
> that compiler/library writers have found much better ways to code it.

The "typical" implementation of memmove() does a range check on
its arguments to determine if overlapping areas exists, and if so, does
a copy in reverse (high to low) direction. Such an algorithm is
efficient on CPUs with native bidirectional loop (REP) instructions.

-drt

kuy...@wizard.net

unread,

Jul 28, 2006, 4:09:21 PM7/28/06

to

webs...@gmail.com wrote:
> Keith Thompson wrote:

...

> > [...] If you read the standard's
> > description of strcat(), you'll see:
> >
> > ... If copying takes place between objects that overlap, the
> > behavior is undefined.
> >
> > Any decent description of strcat() (in a man page
>
> The latest cygwin man page makes no mention of this and WATCOM C/C++'s
> documentation omits this.
>
> > [...] or text book, for
> > example) should have similar wording; if it doesn't, that's the fault
> > of the author of the documentation.
>
> Here's the first hit on google:
>
> http://www.cplusplus.com/ref/cstring/strcat.html
>
> and the second:
>
> http://www.mkssoftware.com/docs/man3/strcat.3.asp
>
> Here's the wikipedia entry as of 07/28/2006:
>
> http://en.wikipedia.org/wiki/Strcat
>
> and here's the Open BSD documentation that it links to:
>
> http://www.openbsd.org/cgi-bin/man.cgi?query=strcat
>
> So I guess none of that counts as "decent documentation".

Correct. You disagree?

The man pages on the machines I use most often are much better:

Linux: "The strings may not overlap, and the dest string must have
enough space for the result."

Irix: "If overflow of s1 occurs, or copying takes place when s1 and s2

Richard Tobin

unread,

Jul 28, 2006, 6:18:04 PM7/28/06

to

In article <slrneckmka.6...@poelstra01.lan>,
Andrew Poelstra <apoe...@poelstra01.lan> wrote:

>Allow me to rephrase that:
> IMHO, there is no other _intuitive_ way to implement strcat().

Languages are supposed to be intuitive for users, not implementers.

-- Richard

Keith Thompson

unread,

Jul 28, 2006, 8:17:13 PM7/28/06

to

webs...@gmail.com writes:
> Keith Thompson wrote:
>> webs...@gmail.com writes:
>> > Jonathan Leffler wrote:
>> >> webs...@gmail.com wrote:
>> >> > [...] (so strcat(p,p) leads
>> >> > to UB even though it has a compelling intuitive meaning).
>> >> What's the compelling intuitive meaning? To me, it means copy
>> >> characters from the start of p over the null that used to mark the end
>> >> of p and keep going until you crash.
>> >
>> > That's not an intuitive meaning. Its just an understanding of an
>> > implementation anomoly. Perhaps for you, implementation details
>> > changes your intuition.
>> [...]
>>
>> In this case, intuition is not necessary.
>
> Just *SAYING* this is the ultimate indictment of the C language. If
> the language doesn't match your intuition, then its just takes that
> much more effort to program in it.

News flash: C is not the most intuitive and beginner-friendly language
ever invented. Does this come as a surprise to you?

In a language where strings are first-class objects, and you can pass
them around as values, use them as operands in expressions, and so
forth, I'd expect something called "strcat" to behave in some
reasonable intuitive manner. I'd still need to see the declaration to
know how to use it, but it would probably be safe to assume that
something like

s1 = strcat(s2, s3)

would do the obvious thing.

C is not like that. Strings are not a data type, they're a data
format, "a contiguous sequence of characters terminated by and
including the first null character", subject to all of C's
complications regarding arrays and pointers. If you think you can
guess, with 99% certainty, how strcat() is going to behave based on
that, you're likely to be disappointed.

>> [...] If you read the standard's
>> description of strcat(), you'll see:
>>
>> ... If copying takes place between objects that overlap, the
>> behavior is undefined.
>>
>> Any decent description of strcat() (in a man page
>
> The latest cygwin man page makes no mention of this and WATCOM C/C++'s
> documentation omits this.

The Cygwin man page doesn't mention this, but it's not intended to be
complete:

strcat is part of the libc library. The full documentation for
libc is maintained as a Texinfo manual. If info and libc are
properly installed at your site, the command

info libc

will give you access to the complete manual.

I'm not convinced that's a good idea, but it's explicitly acknowledged
with a reference to the complete documentation.

"info libc" doesn't work for me under Cygwin (I don't know why, but
the reason is clearly irrelevant), but on another system the section
on strcat clearly says:

This function has undefined results if the strings overlap.

I don't know about Watcom.

>> [...] or text book, for
>> example) should have similar wording; if it doesn't, that's the fault
>> of the author of the documentation.
>
> Here's the first hit on google:
>
> http://www.cplusplus.com/ref/cstring/strcat.html
>
> and the second:
>
> http://www.mkssoftware.com/docs/man3/strcat.3.asp
>
> Here's the wikipedia entry as of 07/28/2006:
>
> http://en.wikipedia.org/wiki/Strcat
>
> and here's the Open BSD documentation that it links to:
>
> http://www.openbsd.org/cgi-bin/man.cgi?query=strcat
>
> So I guess none of that counts as "decent documentation".

I agree. I don't know what cplusplus.com is, and I'm not too
surprised by an error like this in Wikipedia (possibly someone here
will correct it soon). I am surprised that the OpenBSD documentation
doesn't mention this. That's a problem -- but not a problem with C
itself.

>> If you attempt to use strcat(), or any other function, without reading
>> a description of how it's supposed to work, you can't reasonably
>> expect any particular behavior.
>
> Right. That's because C is a "throwback to a forgotten era" kind of
> language. Compare this to languages like Lua, Ruby and Python, where
> your first guess as to how something works after seeing one example of
> it has a 99% chance of being correct, and a 99% chance that you don't
> even have a candidate second guess in mind.

Then by all means feel free to go and use those languages. Nobody
here will stop you.

I won't comment on the "throwback to a forgotten era" remark, but
apart from that I think your statement is pretty much factually
correct. You cannot generally look at the name of a C function, or
even a one-sentence description, and infer how it's going to behave in
all circumstances. I don't recall anyone claiming that you could.

Keith Thompson

unread,

Jul 28, 2006, 8:19:55 PM7/28/06

to

webs...@gmail.com writes:
> Andrew Poelstra wrote:
[...]

>> No, knowledge that C strings are null-terminated (which any C programmer
>> needs to know) suggests that intuitively. Either you calculate strlen(),
>> add a counter variable, and `for' your way through the string, or you
>> eliminate the counter and superfluous call to strlen(), and code it
>> efficiently.
>
> Right -- you are either with us, or you are with the terrorists.

First you wrote "Yeah, sieg heil!" in a recent thread in comp.lang.c,
and now you bring terrorists into a discussion of C strings.

I'll say it again:

Shove it, Paul.

Keith Thompson

unread,

Jul 28, 2006, 8:44:00 PM7/28/06

to

Andrew Poelstra <apoe...@poelstra01.lan> writes:
[...]

> It's clever, I admit. However, jumping through hoops to manage odd
> inputs which could be fixed with an
> assert (dest > src + strlen(src) + 1);
> doesn't bode well to me.

That assert doesn't bode well. It invokes undefined behavior if dest
and src don't point into the same object (or just past the end of it).

Old Wolf

unread,

Jul 28, 2006, 9:59:55 PM7/28/06

to

webs...@gmail.com wrote:
> Just *SAYING* this is the ultimate indictment of the C language. If
> the language doesn't match your intuition, then its just takes that
> much more effort to program in it.

You're confusing 'intuition' with 'expectation'. Windows users who
try to use an X-windows desktop, find it 'unintuitive' that the active
window is the one with the mouse over it (rather than the one they
last clicked on).

But this is not a matter of intuition, it is just a matter of
them expecting what they are used to. A person who has
not used either system before, would be fine either way.

Regarding the string issue, you would only expect strcat(p, p)
to double p if you had some sort of mental image of p as a
string object. But it is no such thing. It's an array of char. If
you dont understand this then you are not going to get far
with C programming.

webs...@gmail.com

unread,

Jul 29, 2006, 12:32:05 AM7/29/06

to

Andrew Poelstra wrote:
> On 2006-07-28, webs...@gmail.com <webs...@gmail.com> wrote:
> > Andrew Poelstra wrote:
> >> It's more intuitive to use the more efficient, less code-intensive, and
> >> easier-to-read version.
> >
> > There is a *third option*. Skip the first character which overwrites
> > the '\0', do the append starting from src+1, then go back and do the
> > '\0' overwrite at the end. This extra work adds at most O(1) to the
> > execution time. Voila, like magic you have an aliasing safe strcat().
> > Ain't the "as-if" rule wonderful?
>
> If you start from src+1, you still have to store the original value of
> src, or use a counter variable. My points on code-intensivity and ease
> of reading still stand.

You typically read the source for your compiler's standard runtime
library? You seriously intuitively expect the source for your C lib
functions to be readable? They may be, but if your vendor is serious
about performance (fast strlens, strcpys, memcpys, and even strstr
implementations are quite convoluted), chances are they aren't really.

> > But this is just coding jujitsu, and has nothing to do with intuition.
>
> It's clever, I admit. However, jumping through hoops to manage odd
> inputs which could be fixed with an
> assert (dest > src + strlen(src) + 1);
> doesn't bode well to me.

How does this *fix* anything? If src happens to be on an earlier
address (and not overlap at all) then what exactly is that assert
doing?

> >> > Most people would intuitively think of this as simply replacing the
> >> > string with a doubled version of itself -- i.e., its analogous to the
> >> > C++ expression p += p for std::string's (and to be honest, I don't know
> >> > if that's legal or not), or just p = p + p in most other programming
> >> > languages.
> >>
> >> "Most people" are not C programmers; if you know enough to use strcat(),
> >> you should have an understanding of how C strings work.
> >
> > Yes, but this "knowledge" is simply bent by force into shape by what
> > the standard tells you. I.e., its working *against* your intuition.
> > Which was kind of my point.
>
> There are many reasons why I don't qualify as a "normal person" or
> "average programmer"; my intuition agrees with that of strcat().

But you don't have a good justification for this. This intuition can't
have existed before you learned the C language, as there is no other
language or string model with that kind of problem. Meaning, that is
not really intuition.

> > Tell me this. How does *your* intuition tell you how memmove() is
> > implemented? Keep in mind that this guy actually is completely
> > aliasing safe.
>
> I suspect that memmove() memcpy()'s the data to a safe place and
> memcpy()'s it back to the dest. This adds an intermediate step
> which prevents problems with src and dest overlapping.

(I would throw away any compiler that did that. I'm pretty sure there
are basically no C compilers that do anything like that. Interesting
that your "intuition" seems to have failed you here. You can always
pick between between forward and backward copy depending on the order
of the pointers and memmove will work fine -- you can even then
continue to implement block tricks as is commonly done with memcpy().)

Your *intuition* starts looking a little more like my first "alternate"
suggestion for how strcat might be implemented. So it looks like
you've rushed right over the fence to my side because ...

> That doesn't sound particularly efficient to me, though; I assume
> that compiler/library writers have found much better ways to code
> it.

... that's right! Its *NOT* efficient, because your idea was just
driven by your *intuition*. So if I simply tell your that I've
implemented strcat_allow_alias() doesn't your intuition end up matching
my first alternative strcat implementation idea?

Yet in *BOTH* cases, it turns out that serious implementations of an
aliasing safe strcat, and memmove do not cost significantly more than
the aliasing unsafe strcat or memcpy functions. So the C standard
decides to pay the penalty of mismatching against people's intuition in
order to save some trivial almost unmeasurable efficiency savings (and
TR24731 is of no help since it takes the exact same position as the
original Clib functions).

webs...@gmail.com

unread,

Jul 29, 2006, 12:34:02 AM7/29/06

to

You, of course, see the word "intuition" constantly being cited in all
the sentences written above right?

Default User

unread,

Jul 29, 2006, 1:08:48 AM7/29/06

to

Keith Thompson wrote:

> webs...@gmail.com writes:

> > Right -- you are either with us, or you are with the terrorists.
>
> First you wrote "Yeah, sieg heil!" in a recent thread in comp.lang.c,
> and now you bring terrorists into a discussion of C strings.

Let's just ignore the troll.

Brian

Andrew Poelstra

unread,

Jul 29, 2006, 1:28:42 AM7/29/06

to

On 2006-07-29, Keith Thompson <ks...@mib.org> wrote:
> Andrew Poelstra <apoe...@poelstra01.lan> writes:
> [...]
>> It's clever, I admit. However, jumping through hoops to manage odd
>> inputs which could be fixed with an
>> assert (dest > src + strlen(src) + 1);
>> doesn't bode well to me.
>
> That assert doesn't bode well. It invokes undefined behavior if dest
> and src don't point into the same object (or just past the end of it).
>

Let's see... What I meant was:
assert ((dest - src) > 0 && (dest - src) < strlen (src));

Is that right?

--
Andrew Poelstra <website down>
To reach my email, use <email also down>

New server ETA: Who knows?

Andrew Poelstra

unread,

Jul 29, 2006, 1:37:40 AM7/29/06

to

On 2006-07-29, webs...@gmail.com <webs...@gmail.com> wrote:
> Andrew Poelstra wrote:
>> assert (dest > src + strlen(src) + 1);
>

> How does this *fix* anything? If src happens to be on an earlier
> address (and not overlap at all) then what exactly is that assert
> doing?
>

There are a lot of things wrong with that code. I stand corrected.

>> There are many reasons why I don't qualify as a "normal person" or
>> "average programmer"; my intuition agrees with that of strcat().
>
> But you don't have a good justification for this. This intuition can't
> have existed before you learned the C language, as there is no other
> language or string model with that kind of problem. Meaning, that is
> not really intuition.
>

Actually, in my assembler days I used C-style strings, and I was a /lot/
more concerned with how they were implemented, given that my assembler
wasn't going to help me at all with overruns, etc.

C++ allows C-style strings. 0-termination is actually a very simple and
easy-to-understand way to represent strings. Knowing that C represents
strings that way, I can think of intuitive behavior for strcat().

Given another language with a different string representation, I'd
assume other behaviors, yes. Given another language with the /same/
string representation, my point still stands. This wasn't caused by
learning C; it was caused by learning C-style strings.

>> I suspect that memmove() memcpy()'s the data to a safe place and
>> memcpy()'s it back to the dest. This adds an intermediate step
>> which prevents problems with src and dest overlapping.
>

> Your *intuition* starts looking a little more like my first "alternate"
> suggestion for how strcat might be implemented. So it looks like
> you've rushed right over the fence to my side because ...
>

memmove() is specified to handle overlapping memory boundaries; memcpy()
and strcat() are not. That much is in the Standard, and memmove()'s
specification overrides my intuitive assumption that one shouldn't pass
overlapping memory to the function.

--
Andrew Poelstra <website down>
To reach my email, use <email also down>

New server ETA: 1 days

webs...@gmail.com

unread,

Jul 29, 2006, 2:18:54 AM7/29/06

to

Keith Thompson wrote:
> webs...@gmail.com writes:
> > Keith Thompson wrote:
> >> webs...@gmail.com writes:
> >> > Jonathan Leffler wrote:
> >> >> webs...@gmail.com wrote:
> >> >> > [...] (so strcat(p,p) leads
> >> >> > to UB even though it has a compelling intuitive meaning).
> >> >> What's the compelling intuitive meaning? To me, it means copy
> >> >> characters from the start of p over the null that used to mark the end
> >> >> of p and keep going until you crash.
> >> >
> >> > That's not an intuitive meaning. Its just an understanding of an
> >> > implementation anomoly. Perhaps for you, implementation details
> >> > changes your intuition.
> >> [...]
> >>
> >> In this case, intuition is not necessary.
> >
> > Just *SAYING* this is the ultimate indictment of the C language. If
> > the language doesn't match your intuition, then its just takes that
> > much more effort to program in it.
>
> News flash: C is not the most intuitive and beginner-friendly language
> ever invented. Does this come as a surprise to you?

Huh? No, but apparently its a surprise to Jonanthan Leffler and Andrew
Poelstra. They seem to be arguing the case that it *does* match their
intuition.

My *original* contention, is that any proposal such as Richard
Seacord's managed string library should go ahead and pay the basically
0 penalty of actually *matching* intuition (as compared to the enormous
penalty he seems willing to pay for automatic character set filtering).
Anything else so far, are just people's false projections about what I
said *beyond* this, and my responses to them.

> In a language where strings are first-class objects, and you can pass
> them around as values, use them as operands in expressions, and so
> forth, I'd expect something called "strcat" to behave in some
> reasonable intuitive manner.

Yeah, that's nice. C is basically the only example of a language of
its kind, yet you feel not the slightly problem with making sweeping
generalizations about it based on its properties. That's kind of like
saying that Joseph Lieberman can't be elected president of the US
because he's jewish. I mean, that's nonsense (he's a right wing
democrat, which means he has no serious base of support outside his
state) in same way your idea here is nonsense.

first-class or not, strcat *CAN* be implemented as aliasing safe, at
very little cost. The fact that it doesn't is a choice that was made;
its nothing more than that. Its certainly not a *property* of
low-level languages (in assembly language, for example, there is no
assertion or expection of being unable to deal with aliased "objects"),
or a property of the fact that its not a first class value (bstrlib is
the obvious counter-example of this.)

> [...] I'd still need to see the declaration to

> know how to use it, but it would probably be safe to assume that
> something like
>
> s1 = strcat(s2, s3)
>
> would do the obvious thing.

You mean if it were a first class value? But you are making a false
association here. There is no reason you cannot perform in-place
mutation of first class values. So the API could still have the same
basic functionality as the current strcat (i.e., two operands, and
modifying the destination.)

> C is not like that. Strings are not a data type, they're a data
> format, "a contiguous sequence of characters terminated by and
> including the first null character", subject to all of C's
> complications regarding arrays and pointers. If you think you can
> guess, with 99% certainty, how strcat() is going to behave based on
> that, you're likely to be disappointed.

These *complications*, as you suggest, have nothing to do with it. Its
all down to pure choice at the specification level. The guesses are
only wrong because the standard chooses that they should be wrong.

> >> [...] If you read the standard's
> >> description of strcat(), you'll see:
> >>
> >> ... If copying takes place between objects that overlap, the
> >> behavior is undefined.
> >>
> >> Any decent description of strcat() (in a man page
> >
> > The latest cygwin man page makes no mention of this and WATCOM C/C++'s
> > documentation omits this.
>
> The Cygwin man page doesn't mention this, but it's not intended to be
> complete:
>
> strcat is part of the libc library. The full documentation for
> libc is maintained as a Texinfo manual. If info and libc are
> properly installed at your site, the command
>
> info libc
>
> will give you access to the complete manual.
>
> I'm not convinced that's a good idea, but it's explicitly acknowledged
> with a reference to the complete documentation.
>
> "info libc" doesn't work for me under Cygwin (I don't know why, but
> the reason is clearly irrelevant),

It works on my system. info libc does nothing more than document the
standard include contents. info strcat just re-echos the man page.

> [...] but on another system the section

> on strcat clearly says:
>
> This function has undefined results if the strings overlap.
>
> I don't know about Watcom.

Well I just told you about Watcom, so now you do (it reads
substantially similar to the man pages). Its all downloadable from the
open watcom site if you care.

> >> [...] or text book, for
> >> example) should have similar wording; if it doesn't, that's the fault
> >> of the author of the documentation.
> >
> > Here's the first hit on google:
> >
> > http://www.cplusplus.com/ref/cstring/strcat.html
> >
> > and the second:
> >
> > http://www.mkssoftware.com/docs/man3/strcat.3.asp
> >
> > Here's the wikipedia entry as of 07/28/2006:
> >
> > http://en.wikipedia.org/wiki/Strcat
> >
> > and here's the Open BSD documentation that it links to:
> >
> > http://www.openbsd.org/cgi-bin/man.cgi?query=strcat
> >
> > So I guess none of that counts as "decent documentation".
>
> I agree. I don't know what cplusplus.com is, and I'm not too
> surprised by an error like this in Wikipedia (possibly someone here
> will correct it soon). I am surprised that the OpenBSD documentation
> doesn't mention this.

So we there have it. The standard for "decent" documentation as you
suggest appears to be quite high, and is certainly different from what
is commonly available.

> [...] That's a problem -- but not a problem with C itself.

If you could just *stop* with the false projection for one second. You
know there is a reason why I quote other text when I post responses.

> >> If you attempt to use strcat(), or any other function, without reading
> >> a description of how it's supposed to work, you can't reasonably
> >> expect any particular behavior.
> >
> > Right. That's because C is a "throwback to a forgotten era" kind of
> > language. Compare this to languages like Lua, Ruby and Python, where
> > your first guess as to how something works after seeing one example of
> > it has a 99% chance of being correct, and a 99% chance that you don't
> > even have a candidate second guess in mind.
>
> Then by all means feel free to go and use those languages. Nobody
> here will stop you.

It always this false choice with you. I have to completely throw out
my investment in learning this language because it makes a number of
idiotic decisions through nothing other than poor choices.

We're not even talking about what language I *USE* for whatever I am
doing. Remember, this thread started as a discussion about improving
to the standard, and as I understand it, has reached the level of
serious official proposal. Citing other languages ought to be a
standard part of such a discussion without you pulling out this tired
old canard all the time.

webs...@gmail.com

unread,

Jul 29, 2006, 2:34:49 AM7/29/06

to

SuperKoko wrote:
> webs...@gmail.com wrote:
> > Jonathan Leffler wrote:
> > > webs...@gmail.com wrote:
> > > > [...] (so strcat(p,p) leads
> > > > to UB even though it has a compelling intuitive meaning).
> > >
> > > What's the compelling intuitive meaning? To me, it means copy
> > > characters from the start of p over the null that used to mark the end
> > > of p and keep going until you crash.
> >
> > That's not an intuitive meaning. Its just an understanding of an
> > implementation anomoly. Perhaps for you, implementation details
> > changes your intuition.
>
> Intuitive or not, that was not obvious for a beginner in C89, but that
> should be obvious in C99, even for a beginner:
> char* strcat (char * restrict, const char * restrict);
>
> Thanks to "restrict", the function has a better documentation.

Yes, I am aware of this. So the documentation has moved into the
platform's header files. Anyways, are you aware of any university
program teaching C programming based on the C99 standard? Somehow I
doubt any significant percentage of novices are learning C via the C99
standard.

> Andrew Poelstra:
> > IMHO, _intuitively_, there is no other way to implement strcat().
>
> But there are other ways to implement it....
> Borland C++ 5.0 and Digital Mars Compiler use alternative
> implementations (and they behave weird too, but in another way).

Well same with the solaris compiler (without the weird behavior). But
who's counting actual modern stuff?

SuperKoko

unread,

Jul 29, 2006, 6:15:37 AM7/29/06

to

webs...@gmail.com wrote:
> first-class or not, strcat *CAN* be implemented as aliasing safe, at
> very little cost. The fact that it doesn't is a choice that was made;
> its nothing more than that. Its certainly not a *property* of
> low-level languages (in assembly language, for example, there is no
> assertion or expection of being unable to deal with aliased "objects"),
> or a property of the fact that its not a first class value (bstrlib is
> the obvious counter-example of this.)
>

I agree.
In fact, I deem that functions should tend to abstract details.
The implementation details of strcat should not change artificially the
interface.

Making strcat work with aliasing would not cost much, and would
increase safety of C.

IMHO, it would improve the C standard with a quasi-zero cost:
1) Breaks no code
2) That's a library issue : Easy to implemented by any actual C
implementation.

The only tradeoff would be efficiency... But I think that it can be
implemented efficiently.

Keith Thompson

unread,

Jul 29, 2006, 6:22:43 AM7/29/06

to

Andrew Poelstra <apoe...@poelstra01.lan> writes:
> On 2006-07-29, Keith Thompson <ks...@mib.org> wrote:
>> Andrew Poelstra <apoe...@poelstra01.lan> writes:
>> [...]
>>> It's clever, I admit. However, jumping through hoops to manage odd
>>> inputs which could be fixed with an
>>> assert (dest > src + strlen(src) + 1);
>>> doesn't bode well to me.
>>
>> That assert doesn't bode well. It invokes undefined behavior if dest
>> and src don't point into the same object (or just past the end of it).
>>
>
> Let's see... What I meant was:
> assert ((dest - src) > 0 && (dest - src) < strlen (src));
>
> Is that right?

No. If dest and src don't point into the same object, both "dst > src"
and "dest - src" invoke undefined behavior.

Of course an implementation of a library function is free to use these
constructs if it knows how the compiler is going to treat them.

Richard Tobin

unread,

Jul 29, 2006, 8:41:34 AM7/29/06

to

In article <slrneclsga.3...@poelstra01.lan>,
Andrew Poelstra <apoe...@poelstra01.lan> wrote:

>Let's see... What I meant was:
> assert ((dest - src) > 0 && (dest - src) < strlen (src));
>
>Is that right?

If pointers are not known to point to within the same object, you can
only compare them for equality. So the only way to achieve what you
want (within the letter of the law of the C standard) is to loop
through the addresses in the two objects, testing whether they are
equal. You could optimise that a bit and avoid testing all pairs.

-- Richard

Joe Wright

unread,

Jul 29, 2006, 9:35:10 AM7/29/06

to

Andrew Poelstra wrote:

> On 2006-07-27, webs...@gmail.com <webs...@gmail.com> wrote:
>> Jonathan Leffler wrote:
>>> webs...@gmail.com wrote:
>>>> [...] (so strcat(p,p) leads
>>>> to UB even though it has a compelling intuitive meaning).
>>> What's the compelling intuitive meaning? To me, it means copy
>>> characters from the start of p over the null that used to mark the end
>>> of p and keep going until you crash.
>> That's not an intuitive meaning. Its just an understanding of an
>> implementation anomoly. Perhaps for you, implementation details
>> changes your intuition.
>>
>

> No, knowledge that C strings are null-terminated (which any C programmer
> needs to know) suggests that intuitively. Either you calculate strlen(),
> add a counter variable, and `for' your way through the string, or you
> eliminate the counter and superfluous call to strlen(), and code it
> efficiently.
>

> It's more intuitive to use the more efficient, less code-intensive, and
> easier-to-read version.
>

>> Most people would intuitively think of this as simply replacing the
>> string with a doubled version of itself -- i.e., its analogous to the
>> C++ expression p += p for std::string's (and to be honest, I don't know
>> if that's legal or not), or just p = p + p in most other programming
>> languages.
>>
>
> "Most people" are not C programmers; if you know enough to use strcat(),

> you should have an understanding of how C strings work. (And indeed, I've
> never seen a C textbook that introduced strcat() prior to introcuding C-
> style strings.) (Although I've heard of some pretty terrible textbooks on
> this group that I was fortunate enough to avoid!)

>
>> You only *know* that this is not the case, because you know that strcat
>> is implemented as some variation of { d += strlen(d); while (*d++ =
>> *s++); } instead of { size_t ld = strlen(d), ls = strlen(s); memmove
>> (d+ld, s, ls); d[ld+ls] = '\0'; }. You know this because the first
>> variation is going to be faster. This is not intuition -- its just a
>> technical calculation.
>>
>

> IMHO, _intuitively_, there is no other way to implement strcat().
>

#include <string.h>

char *catstr(char *dst, char *src) {
if (dst == src) {
size_t s, siz = strlen(dst);
s = siz;
src += siz;
while (s <= siz) {
src[s] = dst[s];
--s;
}
} else {
strcat(dst, src);
}
return dst;
}

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

mithra

unread,

Jul 29, 2006, 4:01:22 PM7/29/06

to

webs...@gmail.com wrote:
> SuperKoko wrote:
> > webs...@gmail.com wrote:
> > > Jonathan Leffler wrote:
> > > > webs...@gmail.com wrote:
> > > > > [...] (so strcat(p,p) leads
> > > > > to UB even though it has a compelling intuitive meaning).
> > > >
> > > > What's the compelling intuitive meaning? To me, it means copy
> > > > characters from the start of p over the null that used to mark the end
> > > > of p and keep going until you crash.

>...

This is computer *science*, not "getting in Touch with Your Feelings
101". intuition only serves to guess at translating psuedo code to a
first pass at real code, looking up the function defs. in the process.

I use Python and Perl a lot, and I *always* keep a copy of their
function library definitions up when I code them. I don't rely on
intuition, and I have been programming since the 'dark ages'.
Imagination, inspiration, experience, research, experimentation, and
perhaps intuition as filler in the pseudo code until I can look it up.

Back in 1980 I wrote my own wrap-around for strcat() that checks sizes
and realloc() as needed. Obviously, not intuitively, I used it with
pointers as destinations, and for the few microseconds in computing
time that it cost me 20+ years ago I traded the security that
something untoward wouldn't have a customer calling me in the wee hours
of the morning. It has been in my personal library, along with a lot of
other string manipulation code, since then.

C acts as a portable assembly language extension. Keep that in mind and
save intuition for VB.

BTW, FreeBSD man page for strcat() offers:
...
The strcat() and strncat() functions append a copy of the
null-terminated
string append to the end of the null-terminated string s, then add
a ter-
minating `\0'. The string s must have sufficient space to hold
the
result.

The strncat() function appends not more than count characters from
append, and then adds a terminating `\0'.
...
The strcat() function is easily misused in a manner which enables
mali-
cious users to arbitrarily change a running program's
functionality
through a buffer overflow attack. (See the FSA.)

Avoid using strcat(). Instead, use strncat() or strlcat() and
ensure
that no more characters are copied to the destination buffer than
it can
hold.

Curtis W. Rendon
--
One OS to rule them all, One OS to find them,
One OS to bring them all and in the darkness bind them
In the Land of Redmond where the Shadows lie.

Chris Torek

unread,

Jul 29, 2006, 4:33:34 PM7/29/06

to

>Andrew Poelstra wrote:
>> IMHO, _intuitively_, there is no other way to implement strcat()

[than the usual version that self-destructs when you do
char buf[100] = "apple";

strcat(buf, buf); /* desired: appleapple\0 */
or
strcat(buf, buf + 2); /* desired: appleple\0 */
]

In article <Z4ydnZhpdsaX-VbZ...@comcast.com>

Joe Wright <joeww...@comcast.net> wrote:
>char *catstr(char *dst, char *src) {
> if (dst == src) {

[rest snipped]

This handles the:

catstr(buf, buf);

case, but not the:

catstr(buf, buf + 2);

case. To handle both, perhaps something like:

/* returns strlen(result) */
size_t catstr(char *dst, const char *src) {
size_t dstlen = strlen(dst), srclen = strlen(src);

memmove(dst + dstlen, src, srclen + 1);
return dstlen + srclen;
}

would be better. (Untested.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Joe Wright

unread,

Jul 29, 2006, 7:15:30 PM7/29/06

to

Chris Torek wrote:
>> Andrew Poelstra wrote:
>>> IMHO, _intuitively_, there is no other way to implement strcat()
> [than the usual version that self-destructs when you do
> char buf[100] = "apple";
>
> strcat(buf, buf); /* desired: appleapple\0 */
> or
> strcat(buf, buf + 2); /* desired: appleple\0 */
> ]
>
> In article <Z4ydnZhpdsaX-VbZ...@comcast.com>
> Joe Wright <joeww...@comcast.net> wrote:
>> char *catstr(char *dst, char *src) {
>> if (dst == src) {
> [rest snipped]
>
> This handles the:
>
> catstr(buf, buf);
>
> case, but not the:
>
> catstr(buf, buf + 2);
>
> case. To handle both, perhaps something like:
>
> /* returns strlen(result) */
> size_t catstr(char *dst, const char *src) {
> size_t dstlen = strlen(dst), srclen = strlen(src);
>
> memmove(dst + dstlen, src, srclen + 1);
> return dstlen + srclen;
> }
>
> would be better. (Untested.)

I wasn't aware that 'catstr(buf, buf + 2)' was at issue.

David R Tribble

unread,

Aug 1, 2006, 5:25:25 PM8/1/06

to

Andrew Poelstra wrote:
>> IMHO, _intuitively_, there is no other way to implement strcat().
>

Paul Hsieh wrote:
>> That is why you fail.
>>
>> Tell me this. How does *your* intuition tell you how memmove() is
>> implemented? Keep in mind that this guy actually is completely
>> aliasing safe.
>

David R Tribble wrote:
>> ...because it's specified that way. And memcpy() is not specified
>> that way. strcpy() is not specified that way, either.
>>
>> Are you arguing for ignoring the function specifications?
>

Paul Hsieh wrote:
> You, of course, see the word "intuition" constantly being cited in all
> the sentences written above right?

Yeah. Okay, my intuition tells me that memcpy() is going to be
implemented in the most efficient way to copy a block of bytes
to another block and ignore any issues of overlapping blocks,
because that's the way the function is specified.

My intuition tells me that memmove() is going to be implemented
so that overlapping blocks are copied correctly with a possible
loss of efficiency, because that's the way the function is specified.

My intuition tells me that strcat() is going to be implemented
so that one string is appended to another, using the '\0' characters
to signal the end of the strings, and ignoring the possibility of
overlapping strings (e.g., strcat(p,p)), because that's the way
the function is specified:

7.21.3.1
[...]
The strcat function appends a copy of the string pointed to
by s2 (including the terminating null character) to the end of
the string pointed to by s1. The initial character of s2 overwrites
the null character at the end of s1. If copying takes place

between objects that overlap, the behavior is undefined.

The phrases "overwriting the terminating null character of s2"
and "if the objects overlap the behavior is undefined" are a
pretty clear indication that 'strcat(p,p)' will invoke u.b. and
probably corrupt the resulting string value of p. At least that's
what my intuition tells me.

-drt

Douglas A. Gwyn

unread,

Aug 1, 2006, 6:05:29 PM8/1/06

to

David R Tribble wrote:
> Yeah. Okay, my intuition tells me that memcpy() is going to be
> implemented in the most efficient way to copy a block of bytes
> to another block and ignore any issues of overlapping blocks,
> because that's the way the function is specified.
> My intuition tells me that memmove() is going to be implemented
> so that overlapping blocks are copied correctly with a possible
> loss of efficiency, because that's the way the function is specified.
> My intuition tells me that strcat() is going to be implemented
> so that one string is appended to another, using the '\0' characters
> to signal the end of the strings, and ignoring the possibility of
> overlapping strings (e.g., strcat(p,p)), because that's the way
> the function is specified:

> ...

> The phrases "overwriting the terminating null character of s2"
> and "if the objects overlap the behavior is undefined" are a
> pretty clear indication that 'strcat(p,p)' will invoke u.b. and
> probably corrupt the resulting string value of p. At least that's
> what my intuition tells me.

It comes down to expectations. If the programmer expects "char"
to mean "character" and "strcat" to mean "concatenate character
strings", then he is simply wrong. There is a relationship
between the concepts and the corresponding implementations, but
not an exact mapping between them, and it is the differences that
can cause trouble when the programmer's mental model is wrong.

C, C++, and many other languages do provide the *means* for a
programmer to create his own object types and support functions
that more closely fit whatever model he has. If the standard
library facility (which met legacy requirements sufficiently
well) doesn't meet your current requirements, don't (mis)use it;
provide your own implementation. (Be sure to choose new names.)

James Dennett

unread,

Aug 9, 2006, 12:52:14 PM8/9/06

to

Jonathan Leffler wrote:
> webs...@gmail.com wrote:
>> [...] (so strcat(p,p) leads
>> to UB even though it has a compelling intuitive meaning).
>
> What's the compelling intuitive meaning? To me, it means copy
> characters from the start of p over the null that used to mark the end
> of p and keep going until you crash.

The simpler expectation from the interface is "append
a copy of the string *currently* pointed to by p to p",
i.e., append it to itself.

Other languages that support this via notation such
as s+=s; or s = s+s implement it this way.

If you think of strcat in terms of its implementation
then your expectation is natural.

-- James
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Jun Woong

unread,

Aug 9, 2006, 12:52:10 PM8/9/06

to

Jonathan Leffler wrote:

> webs...@gmail.com wrote:
> > [...] (so strcat(p,p) leads
> > to UB even though it has a compelling intuitive meaning).
>
> What's the compelling intuitive meaning? To me, it means copy
> characters from the start of p over the null that used to mark the end
> of p and keep going until you crash.
>

I am also not sure what the intuitive meaning was intended, but
thinking about memcpy vs. memmove, if string copying functions had
defined to copy the source string into an intermediate buffer before
putting into the destination, there would be no crash. I suspect
that's the compelling intuitive meaning he intended.

--
Jun, Woong (woong at icu.ac.kr)
Samsung Electronics Co., Ltd.

``All opinions are mine and do not represent any organization''

kuy...@wizard.net

unread,

Aug 10, 2006, 5:00:54 AM8/10/06

to

James Dennett wrote:
> Jonathan Leffler wrote:
> > webs...@gmail.com wrote:
> >> [...] (so strcat(p,p) leads
> >> to UB even though it has a compelling intuitive meaning).
> >
> > What's the compelling intuitive meaning? To me, it means copy
> > characters from the start of p over the null that used to mark the end
> > of p and keep going until you crash.
>
> The simpler expectation from the interface is "append
> a copy of the string *currently* pointed to by p to p",
> i.e., append it to itself.
>
> Other languages that support this via notation such
> as s+=s; or s = s+s implement it this way.
>
> If you think of strcat in terms of its implementation

or, in terms of it's specification by the standard,

> then your expectation is natural.
--

Robert Seacord

unread,

Aug 25, 2006, 1:34:46 AM8/25/06

to

Paul,

I did have a look at James Anthill's Vstr implementation and I discuss
it in the strings chapter of my book on "Secure Coding in C and C++"
which happens to be available online at:

http://www.informit.com/articles/article.asp?p=430402&seqNum=8&rl=1

I was not aware of your implementation, so I did not evaluate it.

Our goal in writing the managed string library was to define an API that
could be used to program more securely. To my mind, writing secure code
also encompasses eliminating defects in general so I believe this
library also addresses these concerns. You also commented that " the
API is somewhat cumbersome, which makes inline usage impossible". This
was an intentional decision on our part, as in-line usage typically
prevents/discourages a user from checking the return status of the function.

Performance was not a major objective for the reference implementation
of the API. The idea was to allow library vendors and other interested
parties such as yourself the opportunity to provide more efficient
implementations.

We should have a complete implementation of the API available shortly.
I'll post a further announcement to these news groups when it is ready.

Thanks,
rCs

webs...@gmail.com

unread,

Aug 25, 2006, 4:28:23 AM8/25/06

to

Robert Seacord wrote:
> I did have a look at James Anthill's Vstr implementation and I discuss
> it in the strings chapter of my book on "Secure Coding in C and C++"
> which happens to be available online at:
>
> http://www.informit.com/articles/article.asp?p=430402&seqNum=8&rl=1
>
> I was not aware of your implementation, so I did not evaluate it.

In James' own string library comparisons he makes mention of the Better
String Library, and its about the only one he doesn't evicerate in his
evaluation. Or you could have just searched for "string library" in
google (Bstrlib is currently hit #2).

> Our goal in writing the managed string library was to define an API that
> could be used to program more securely.

I think both James and I had this in mind when we were developing our
respective libraries as well. The main difference being that we both
also decided it was worthwhile to satisfty even more criteria. I was
more focussed on functionality and ease of use or "Software Crisis"
kind of problems, whereas James was clearly more focussed on ultimate
performance in networking or highly IO centric environments.

And I think you *missed* the crucial point about aliasing entirely.
This is a safety/security issue (at least it is as much as NULL
parameter detection is) and yet it appears to be unaddressed in your
library. (I have not evaluated Vstr deeply enough to know whether or
not he solved that issue, since it only compiles with the
gnu-toolchain).

Another security issue is the question of auditing. As you know, prior
to long term real world usage the only way you can have any sort of
assurance about the security of any system is by having security
experts audit the code. The Better String Library highly facilitates
this by its "security statement". This statement declares up front
exactly what Bstrlib's asserted functionality is with respect to
security. In this way an auditor can delineate his/her strategy by 1)
verifying that the library does what it says (by examining the source
code of Bstrlib) and 2) verifying that the facilities claimed are
sufficient to meet the security requirements of the rest of the
program. This delineation is important as it gives a bounded and well
defined way that the auditor can evaluate all this "extra code" that
Bstrlib provides that another developer has not written themselves.

With the managed string library being propoposed for the next C
standard, the auditor ends up having to "trust" the compiler if its a
closed source compiler, as well as trusting your design. But worse
yet, without any security statement, an auditor will have a harder time
knowing what exactly they should expect the managed library to deliver
from the point of view of security.

> [...] To my mind, writing secure code

> also encompasses eliminating defects in general so I believe this
> library also addresses these concerns. You also commented that " the
> API is somewhat cumbersome, which makes inline usage impossible". This
> was an intentional decision on our part, as in-line usage typically
> prevents/discourages a user from checking the return status of the function.

Some might say that it discourages users from using the library at all.
In Bstrlib I introduce the concept of "error propogation". Along with
supporting inline usage, errors are detected and passed through the
calls (error-in produces error-out.) This more closely matches the
"laziness" of users, in that they need not check each call, they only
need to check the last call in the chain of calls.

You might argue that this is just a different way to solve the same
problem, except that Bstrlib's method enjoys two huge advantages: 1) it
vastly simplifies error checking without compromising correctness,
which is important for dealing with unintentional leaks due to exiting
before freeing resources. 2) it allows one to continue to write code
very concisely which in most cases will lead to easier maintenance.

By *only* focussing on the problem from the very narrow point of view
of security, you are missing ideas like this. And you consequently
lose the potential to appeal to programmers for other reasons, which is
important if you want people to seriously adopt these new functions.
This is why there is such a thing as cherry flavored cough syrup, or
mint flavored toothpaste.

> Performance was not a major objective for the reference implementation
> of the API. The idea was to allow library vendors and other interested
> parties such as yourself the opportunity to provide more efficient
> implementations.

I have no idea how I would improve the efficiency of pervasive
character set filtering over the more obvious alternative of performing
the filtering just at the time that system() is called. The
alternative is different semantically -- but I believe that this
difference is what is called for. Charset filter is not a generally
useful feature *UNLESS* you are calling system (*and* under that
assumption that this is good enough for system call safety.) I.e., I
don't believe I *can* solve the performance problems of the managed
string design.

If you ignore perfomance to this degree, you will immediately create
resistance among developers who will shy away from using "managed
strings" because of some performance penalty they percieve. It sets up
a "false dichotomy" in the minds of developers that safety must come at
the expense of performance. A quick look at either Bstrlib or Vstr
shows that this dichotomy is not true. Both substantially out-perform
the standard C library functions (with Vstr, its kind of conditional on
the kind of code, but when it wins it usually wins big) and both also
deliver far more safety.

> We should have a complete implementation of the API available shortly.
> I'll post a further announcement to these news groups when it is ready.

Ok, but I think the major problem is with the design not any
implementation.

[comp.lang.c.moderated removed because posting there appears to delay
posts for weeks]

CBFalconer

unread,

Aug 31, 2006, 3:16:18 PM8/31/06

to

Robert Seacord wrote:
>
> I did have a look at James Anthill's Vstr implementation and I
> discuss it in the strings chapter of my book on "Secure Coding in
> C and C++" which happens to be available online at:
>
> http://www.informit.com/articles/article.asp?p=430402&seqNum=8&rl=1
>
> I was not aware of your implementation, so I did not evaluate it.

And, due to the woeful lack of quotation and attribution, nobody
else is aware of it or anything else.

--
Chuck F (cbfal...@yahoo.com) (cbfal...@maineline.net)
Available for consulting/temporary embedded and systems.
<http://cbfalconer.home.att.net> USE maineline address!

James Dennett

unread,

Aug 31, 2006, 3:18:00 PM8/31/06

to

kuy...@wizard.net wrote:
> James Dennett wrote:
>> Jonathan Leffler wrote:
>>> webs...@gmail.com wrote:
>>>> [...] (so strcat(p,p) leads
>>>> to UB even though it has a compelling intuitive meaning).
>>> What's the compelling intuitive meaning? To me, it means copy
>>> characters from the start of p over the null that used to mark the end
>>> of p and keep going until you crash.
>> The simpler expectation from the interface is "append
>> a copy of the string *currently* pointed to by p to p",
>> i.e., append it to itself.
>>
>> Other languages that support this via notation such
>> as s+=s; or s = s+s implement it this way.
>>
>> If you think of strcat in terms of its implementation
>
> or, in terms of it's specification by the standard,

But the point was to think of the *intuitive* meaning of
strcat, not its formally specified meaning. The standard
doesn't capture the intuitive (or, if you prefer, naive)
expectation. Which is fine by me, as I don't expect that
intuition will be sufficient for robust programming.

-- James

Douglas A. Gwyn

unread,

Aug 31, 2006, 3:18:06 PM8/31/06

to

"Robert Seacord" <r...@sei.cmu.edu> wrote in message
news:clcm-2006...@plethora.net...

> was an intentional decision on our part, as in-line usage typically
> prevents/discourages a user from checking the return status of the
> function.

My point of view is that requiring the programmer to explicitly test
for correctness is not appreciably better than the current situation,
and that usage errors (as opposed to expected "failures" such as
testing for the existence of a file by name) are best handled by
throwing an exception, so that *some* strategy for handling such
errors is *always* in place. With nested exception handlers, this
strategy can be established at the lowest feasible level for an
intelligent recovery procedure, or allowed to default to a higher-
level strategy that provides a "coarser grained" recovery. The
programmer still retains total control (if he wants to exert it), but
the exceptional-case handling does not clutter the main-line logic.

And yes, nested exception handling is certainly possible in
Standard C; there have been several implementations.

kuy...@wizard.net

unread,

Sep 17, 2006, 8:28:48 PM9/17/06

to

James Dennett wrote:
> kuy...@wizard.net wrote:
> > James Dennett wrote:

...

> >> If you think of strcat in terms of its implementation
> >
> > or, in terms of it's specification by the standard,
>
> But the point was to think of the *intuitive* meaning of
> strcat, not its formally specified meaning. The standard
> doesn't capture the intuitive (or, if you prefer, naive)
> expectation. Which is fine by me, as I don't expect that
> intuition will be sufficient for robust programming.

I think that was pretty much my point. I didn't have any intuition
about what strcat() would do when I first heard about it. I read its
specification, and expected it to operate as specified. For functions
with more generic names, such as "open", I do have some expectations
that should be met, though they're pretty vague expectations. But
"strcat"? I know it's short for "string catenate", but that's only
obvious after having read the specification. Anybody who would
abbreviate a name that way isn't giving me any intuitive expectations
about what they might have intended by the name. It's not quite as bad
as "grep", but it comes close.

Douglas A. Gwyn

unread,

Sep 17, 2006, 8:28:56 PM9/17/06

to

James Dennett wrote:
> But the point was to think of the *intuitive* meaning of
> strcat, not its formally specified meaning.

Since intuition is so subjective, more care is needed.
*String concatenation* is one thing, strcat is another.
The concatenation of "abc" with "efg" is "abcefg", but
when you're talking about the data accessed by strcat
there are also storage locations involved, not some
abstract value space. And it is not at all evident
that there is only one "right" way to handle that
storage in cases where there is overlap between input
and output.

From the point of view of run-time efficiency, if a
strcat-like function were required to produce well-
defined behavior of the kind that some seem to think
is desired, it would have to, for *every* invocation,
perform some additional testing to determine whether
there is overlap, or else it would have to use a
considerably less efficient method all the time. The
trade-off would not be acceptable to many users (who
currently don't have any problem using strcat properly).

We went through this with memcpy, and the result was
to provide a separate "better-defined" function memmove.
Note that the advent of memmove did not cause a mass
exodus away from using memcpy, because many programmers
are able to use both of them as appropriate, and cases
of potential overlap are relatively uncommon.

> The standard
> doesn't capture the intuitive (or, if you prefer, naive)
> expectation. Which is fine by me, as I don't expect that
> intuition will be sufficient for robust programming.

Indeed, we hear frequent demands that the C standard
ought to specify things so that programmers don't need
to know the specifications or think about what they're
doing. That approach to programming doesn't get one
very far before getting into trouble.

webs...@gmail.com

unread,

Sep 18, 2006, 4:24:56 AM9/18/06

to

Douglas A. Gwyn wrote:
> James Dennett wrote:
> > But the point was to think of the *intuitive* meaning of
> > strcat, not its formally specified meaning.
>
> Since intuition is so subjective, more care is needed.
> *String concatenation* is one thing, strcat is another.
> The concatenation of "abc" with "efg" is "abcefg", but
> when you're talking about the data accessed by strcat
> there are also storage locations involved, not some
> abstract value space. And it is not at all evident
> that there is only one "right" way to handle that
> storage in cases where there is overlap between input
> and output.

So the correct answer is to twist your intuition to match the standard?
People's intuition do not easily map to definitionism such as this.
By creating this alternate explanation, you are directly acting against
people's intuition. And in the end you are working really hard to
defend something that doesn't have a justifiably solid defence.

> From the point of view of run-time efficiency, if a
> strcat-like function were required to produce well-
> defined behavior of the kind that some seem to think
> is desired, it would have to, for *every* invocation,
> perform some additional testing to determine whether
> there is overlap, or else it would have to use a
> considerably less efficient method all the time.

This is utterly false on its face. I've given a solution *IN THIS
THREAD* that so obviously would run in equivalent time as the straight
forward typical non-safe method that is currently endorsed.

If such a thing were true, for example, Bstrlib would have a hard time
keeping up with the performance of the standard library (Bstrlib is
aliasing safe). Bstrlib annihilates the standard library on
performance across the board on many platforms and compilers that were
tested. This achievement does not come from nowhere of course -- just
doing a brief survey of the source code of most standard C compilers,
it is actually fairly straightfoward to outperform the standard library
on many functions, particularly string functions. I have on my
assembly examples web page a demonstration for hugely accelerating the
implementation of "strlen" over most x86 compilers -- its *much* faster
than either the expected implementation or nearly all implementations I
have actually encountered (Sun went ahead and implemented code similar
to mine for their Solaris compilers a few years ago). You cannot speak
of performance without speaking to implementation details.

You people who do not understand performance really should probably not
pretend to comment on it as if from a position of authority. In
particular you can't on the one hand claim that you can't characterize
performance of C in principle, then turn around and claim that some
change in specification will change performance, and of course,
ultimately just be plain wrong anyway.

> [...] The

> trade-off would not be acceptable to many users (who
> currently don't have any problem using strcat properly).

Since there is no down side (except slightly increased code footprint)
there would be nothing to object to.

> We went through this with memcpy, and the result was
> to provide a separate "better-defined" function memmove.

That is because on hardware that existed at the time there was a
measurable difference on that function. On today's hardware, there is
no difference in performance between memmove and memcpy, BTW. This is
not true of strcat, however, which would never be slower.

> Note that the advent of memmove did not cause a mass
> exodus away from using memcpy, because many programmers
> are able to use both of them as appropriate, and cases
> of potential overlap are relatively uncommon.

That's only because it was overshadowed by the larger mass exodus
towards using Perl, Python, C++, Java, etc. People who still use C,
mostly buy into C's weaknesses and just live with it for whatever
reasons.

> > The standard
> > doesn't capture the intuitive (or, if you prefer, naive)
> > expectation. Which is fine by me, as I don't expect that
> > intuition will be sufficient for robust programming.

That's a truism that exists in a narrow field of programming languages.
My point is that it exists to a large extent in C (especially its
libraries) for no good reason.

> Indeed, we hear frequent demands that the C standard
> ought to specify things so that programmers don't need
> to know the specifications or think about what they're
> doing.

Actually what we often hear is over-generalized hyperbole from the
committee and committee apologists who feel that they don't need to
address any problems with the language.

> [...] That approach to programming doesn't get one

> very far before getting into trouble.

You, of course, have never attempted to program in the language
"Python".

You have also lost sight, completely, of the whole point of this
thread. The whole idea of secure programming is concerned with dealing
with programmer's inadvertent bugs. My claim, is that if you more
closely align the programming language with people's intuition and
expectation, then the number of bugs and security flaws will naturally
decrease.

TR 24731 misses this point completely, and instead just exposes the
flaws more explicitely. But you can always stuff RSIZE_MAX into the
extra length parameter, and basically gain no more security. Automated
tools can assist you in finding buffer overflow flaws, and potential
buffer overflow flaws based on old legacy code exactly as effectively
as trying to do so while porting to TR 24731. I.e., in real effect,
this proposal will actually do basically nothing. Richard Seacord has
clearly done much better by doing buffer management for you with his
proposed "managed strings" library, however, he has ignored the usage
and intuition impact.

C99 has not been widely adopted and it never will be, and its primarily
because it just doesn't offer anything that people really care about.
People care about performance, safety, scalability, and C99 offers
precious little on any those fronts, even though in this language there
is no shortage of fertile ground for expanding in all those areas. If
you want c0x to have any impact at all, you have to deliver something
on these fronts that you utter failed to do with C99. And imho neither
TR 24731, nor managed strings rise to that level which is what I'm
trying to point out.

Richard Seacord and the Microsofties simply does rise high enough to
meet the challenge, and the ANSI C committee are too blind to allow,
encourage or seek improvements anyways. The problem with these
proposals is that they don't go far enough to address the problem --
the committee reacts by saying that they go too far to solve a problem
that they don't believe exists.

CBFalconer

unread,

Sep 18, 2006, 6:39:52 AM9/18/06

to

webs...@gmail.com wrote:
>
... snip ...

>
> You have also lost sight, completely, of the whole point of this
> thread. The whole idea of secure programming is concerned with
> dealing with programmer's inadvertent bugs. My claim, is that if
> you more closely align the programming language with people's
> intuition and expectation, then the number of bugs and security
> flaws will naturally decrease.

And you have lost track of the reasons for having different
languages in the first place. If you want secure programming,
there are quite adequate languages for the purpose, such as Ada and
Pascal. There is no need to destroy C.

--
"The most amazing achievement of the computer software industry
is its continuing cancellation of the steady and staggering
gains made by the computer hardware industry..." - Petroski

--
Posted via a free Usenet account from http://www.teranews.com

Bjorn Reese

unread,

Sep 18, 2006, 12:35:50 PM9/18/06

to

CBFalconer wrote:

> And you have lost track of the reasons for having different
> languages in the first place. If you want secure programming,
> there are quite adequate languages for the purpose, such as Ada and
> Pascal. There is no need to destroy C.

What makes you think that C will be destroyed if it included better
solutions (that is, solutions with a better cognitive fit for the
majority of users) than TR 24731 and managed strings?

--
mail1dotstofanetdotdk

Douglas A. Gwyn

unread,

Sep 18, 2006, 6:49:04 PM9/18/06

to

webs...@gmail.com wrote:
> So the correct answer is to twist your intuition to match the standard?

No, I'm saying that intuition is a function of experience,
and thus it may vary among individuals. You have heard in
this thread from people who deny that their intuition about
the expected behavior of strcat(a,a) matches yours.

> > From the point of view of run-time efficiency, if a
> > strcat-like function were required to produce well-
> > defined behavior of the kind that some seem to think
> > is desired, it would have to, for *every* invocation,
> > perform some additional testing to determine whether
> > there is overlap, or else it would have to use a
> > considerably less efficient method all the time.
> This is utterly false on its face. I've given a solution *IN THIS
> THREAD* that so obviously would run in equivalent time as the straight
> forward typical non-safe method that is currently endorsed.

On some platforms, compilers implement strcat, strcpy, and
similar functions using string-op microcoded instructions,
which can malfunction pretty badly when the source and
destination objects overlap. Thus to perform according to
your preferred specification, additional testing would be
necessary to detect that possibility and use alternate,
generally slower code when there would be a problem.

> > Note that the advent of memmove did not cause a mass
> > exodus away from using memcpy, because many programmers
> > are able to use both of them as appropriate, and cases
> > of potential overlap are relatively uncommon.
> That's only because it was overshadowed by the larger mass exodus
> towards using Perl, Python, C++, Java, etc. People who still use C,
> mostly buy into C's weaknesses and just live with it for whatever
> reasons.

No, obviously I was talking about the effect on C programming.
If memmove's "superior" semantics were so attractive, it would
have supplanted memcpy *among C programmers*, but it hasn't.

> > Indeed, we hear frequent demands that the C standard
> > ought to specify things so that programmers don't need
> > to know the specifications or think about what they're
> > doing.

> > [...] That approach to programming doesn't get one
> > very far before getting into trouble.
> You, of course, have never attempted to program in the language
> "Python".

Actually I have, but I stand by my statement that catering
to ignorance and laziness is not good for software quality.

> You have also lost sight, completely, of the whole point of this
> thread. The whole idea of secure programming is concerned with dealing
> with programmer's inadvertent bugs. My claim, is that if you more
> closely align the programming language with people's intuition and
> expectation, then the number of bugs and security flaws will naturally
> decrease.

Or, you could align the intuition and expectation with reality
and get that same effect. strcat(a,a) is not something that
a reasonable C programmer would think of doing.

> TR 24731 misses this point completely, ...

I have my own criticism of that TR, larger on the basis that
it misses the real problem, which is quality control. Any
attempt at a technological solution to thoughtless programming
practice is doomed to fail, or at best to be an incomplete
solution. To the extent that attention is diverted from the
real causes of erroneous programs, it's a bad thing.

> the committee reacts by saying that they go too far to solve a problem
> that they don't believe exists.

I haven't heard anybody saying that. I have said that these
kinds of technical solutions try to solve the wrong problem.

webs...@gmail.com

unread,

Sep 19, 2006, 3:54:58 AM9/19/06

to

Douglas A. Gwyn wrote:
> webs...@gmail.com wrote:
> > So the correct answer is to twist your intuition to match the standard?
>
> No, I'm saying that intuition is a function of experience,
> and thus it may vary among individuals. You have heard in
> this thread from people who deny that their intuition about
> the expected behavior of strcat(a,a) matches yours.

I have only heard from people disagree with this *after* they have been
indoctrinated by the dictates of the C standard. I have not heard of
people (in this thread or otherwise) *unindoctrinated* whose intuition
is different. In this thread, these are just people mischaracterizing
what is meant by the word intuition.

But of course there are no shortage of people in this thread who are
honest. Remember I never *defined* what I thought strcat(p,p) should
do -- the honest people knew what I meant without prompting.

> > > From the point of view of run-time efficiency, if a
> > > strcat-like function were required to produce well-
> > > defined behavior of the kind that some seem to think
> > > is desired, it would have to, for *every* invocation,
> > > perform some additional testing to determine whether
> > > there is overlap, or else it would have to use a
> > > considerably less efficient method all the time.
> >
> > This is utterly false on its face. I've given a solution *IN THIS
> > THREAD* that so obviously would run in equivalent time as the straight
> > forward typical non-safe method that is currently endorsed.
>
> On some platforms, compilers implement strcat, strcpy, and
> similar functions using string-op microcoded instructions,

This has nothing to do with it ...

> which can malfunction pretty badly when the source and
> destination objects overlap. Thus to perform according to
> your preferred specification, additional testing would be
> necessary to detect that possibility and use alternate,
> generally slower code when there would be a problem.

And that is incorrect. Here:

char * safestrcat (char * dst, const char * src) {
if (*src) {
char * dend = dst + strlen (dst);
strcpy (dend + 1, src + 1);
*dend = *src;
}
return dst;
}

Now tell me this cannot be translated to an optimzed solution on any
platform.

> > > Note that the advent of memmove did not cause a mass
> > > exodus away from using memcpy, because many programmers
> > > are able to use both of them as appropriate, and cases
> > > of potential overlap are relatively uncommon.
> > That's only because it was overshadowed by the larger mass exodus
> > towards using Perl, Python, C++, Java, etc. People who still use C,
> > mostly buy into C's weaknesses and just live with it for whatever
> > reasons.
>
> No, obviously I was talking about the effect on C programming.

So was I. It has *reduced* the number of C programmers. The problem
of memcpy() versus memmove() is minor by comparison, and programmers
left before they had to worry about such things. Some people in this
world carry things to their logical conclusion -- obviously memmove vs
memcpy is mere a single straw on the camel's back. And like any
classic public relations person, you argue for the straw.

> If memmove's "superior" semantics were so attractive, it would
> have supplanted memcpy *among C programmers*, but it hasn't.

First of all, its not necessarily superior if 99% of the time you know
ahead of time that the memory is not going to overlap. The C textbooks
do an incredibly shoddy job on this point, and its much like people
using p++ instead of p+=1. People just do what they are used to.

I didn't claim that memmove was necessarily superior. But
understanding the difference and using it for an analogy for people's
disingenuous claims about how they think intuition served my purposes
(its obvious from the pathetic responses in this thread that barely
anyone has a clue of how memmove is properly implemented, meaning that
implementation-based intuition is pretty ridiculous in real life.)

> > > Indeed, we hear frequent demands that the C standard
> > > ought to specify things so that programmers don't need
> > > to know the specifications or think about what they're
> > > doing.
> > > [...] That approach to programming doesn't get one
> > > very far before getting into trouble.
> > You, of course, have never attempted to program in the language
> > "Python".
>
> Actually I have, but I stand by my statement that catering
> to ignorance and laziness is not good for software quality.

So you are aware of how to program in a more serious programming
language, and yet you appreciate nothing of it. Then clearly this gulf
is ideological.

This isn't about this false dichotomy you cling to with so much
ferocious desperation. If a programmer is lazy there is nothing you
can do about it. But if a programmer has a finite amount of energy it
might be worth while to meet them in the middle -- especially when it
doesn't actually cost you anything (outside of your own delusions and
paranoia I mean).

> > You have also lost sight, completely, of the whole point of this
> > thread. The whole idea of secure programming is concerned with dealing
> > with programmer's inadvertent bugs. My claim, is that if you more
> > closely align the programming language with people's intuition and
> > expectation, then the number of bugs and security flaws will naturally
> > decrease.
>
> Or, you could align the intuition and expectation with reality
> and get that same effect.

That is why you fail.

You can't align people's intuition and expectation to anything. You
just can't do it. You can only get people to lie about it (obvious
reference to 1984); they "learn" that their intuition is wrong. You
will not get the same effect because you can't align people's
intiution. Worse yet, the attempt to do so is ridiculously expensive.

> [...] strcat(a,a) is not something that

> a reasonable C programmer would think of doing.

But it is something every reasonable programmer might think of doing.
Notice that the only real distinction is the word "C".

> > TR 24731 misses this point completely, ...
>

> I have my own criticism of that TR, largely on the basis that

> it misses the real problem, which is quality control.

Did you know that Microsoft has the largest quality control
organization of any software institution by far? They also produce the
most bugs of any software institution I have ever heard of too.

There have been real studies of this problem (I'm thinking of studies
cited by a luminary from Lucent who gave a talk on this from a few
years ago.) Post-development testing and q&a tends to capture some
percentage of bugs, but is hardly the answer; there are just too many
corner cases, and people just don't think about them from the outside.

According to these studies, the best solutions were always the ones
that lived closest to the programmer while they wre developing. The
best they found at the time was direct source code peer review (which
explains the rise of the recent "paired programming" paradigm in
"extreme programming".) But this is clearly too expensive and is going
to have a high degree of variability depending on the skill of the
reviewers.

I can attest to this, as the last serious bug I dealt with in Bstrlib
was based on a "memory overflow attack". An ordinary tester just would
not even be able to set up an appropriate test easily (I went back to
*16* bit compilers to set up my testing framework for this.) This is
one of those problems that would have lain dormant waiting to spring
its head as people started transitioning towards 64 bit systems for
standard development. The point, of course, is that nobody ever
reported this bug to me as nobody saw it fail in any test. It required
an insight by me, the developer, while reviewing the code. It was not
technically an independent review of course, except that I came back to
looking at it after a long time away from it (so it was an
approximation of "independent review".)

This leads us to obvious alternatives: 1) changing the programming
language 2) pervasive use of lint or defect detection tools 3)
modifying the language itself through libraries.

That vast majority of programmers have clearly chosen option #1.
Probably because #2 costs money and are not guarantees, and there
hasn't been much culture for #3. Changing the standard could be
address #1 and #3 simultaneously -- if only there were some motivation
to do so.

> [...] Any

> attempt at a technological solution to thoughtless programming
> practice is doomed to fail, or at best to be an incomplete
> solution.

That's why of course, they don't make automatic shifting cars. And of
course, that's why they don'tput seatbelts in cars either. Afterall
they are just doomed technologies.

> [...] To the extent that attention is diverted from the

> real causes of erroneous programs, it's a bad thing.

If you make the real issue disappear through avoiding it at the point
of design, then it isn't a diversion. That is to say, its possible to
solve some of these problems completely at the level of the programming
language itself. Once you consider this in the light of the
realization that programmers have a finite amount of energy with which
to produce programs, this should make the motivation for making the
programming language less error prone pretty compelling.

> > the committee reacts by saying that they go too far to solve a problem
> > that they don't believe exists.
>
> I haven't heard anybody saying that. I have said that these
> kinds of technical solutions try to solve the wrong problem.

Yeah, well its easy to say things. Especially when you are not being
called into account for the things you say.

Richard Heathfield

unread,

Sep 19, 2006, 7:53:30 AM9/19/06

to

webs...@gmail.com said:

> Douglas A. Gwyn wrote:
>> webs...@gmail.com wrote:
>> > So the correct answer is to twist your intuition to match the standard?
>>
>> No, I'm saying that intuition is a function of experience,
>> and thus it may vary among individuals. You have heard in
>> this thread from people who deny that their intuition about
>> the expected behavior of strcat(a,a) matches yours.
>
> I have only heard from people disagree with this *after* they have been
> indoctrinated by the dictates of the C standard. I have not heard of
> people (in this thread or otherwise) *unindoctrinated* whose intuition
> is different. In this thread, these are just people mischaracterizing
> what is meant by the word intuition.
>
> But of course there are no shortage of people in this thread who are
> honest. Remember I never *defined* what I thought strcat(p,p) should
> do -- the honest people knew what I meant without prompting.

I haven't a clue what you think strcat(p, p) will do. Do you think that
makes me dishonest? Ah, but clever people know that I'm honest, so if you
think I'm dishonest, that makes you wrong and stupid.

Anyone can flame. Try constructing an argument that isn't offensive and
insulting, and maybe it'll be worth taking the time to read it.

<snip>

> Yeah, well its easy to say things. Especially when you are not being
> called into account for the things you say.

Quite so.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Francis Glassborow

unread,

Sep 19, 2006, 8:03:30 AM9/19/06

to

In article <1158652498.0...@h48g2000cwc.googlegroups.com>,
webs...@gmail.com writes

>Yeah, well its easy to say things. Especially when you are not being
>called into account for the things you say.

There are numerous places where I would debate your statements in the
post from which this is a quote. However the degree of heat including
direct personal attacks that litter your post lead me to bin the whole
thing. I will confine myself to a single comment:

Intuition is not something that we are born with, it is a term that we
use to refer to our expectations based on prior experience. In that
sense intuition can and certainly is 'educated.' Someone who fails to
adapt their intuition in the light of experience is old beyond their
years.

--
Francis Glassborow ACCU
Author of 'You Can Do It!' and "You Can Program in C++"
see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects

kuy...@wizard.net

unread,

Sep 19, 2006, 8:46:35 AM9/19/06

to

webs...@gmail.com wrote:
> Douglas A. Gwyn wrote:
...

> > No, I'm saying that intuition is a function of experience,
> > and thus it may vary among individuals. You have heard in
> > this thread from people who deny that their intuition about
> > the expected behavior of strcat(a,a) matches yours.
>
> I have only heard from people disagree with this *after* they have been
> indoctrinated by the dictates of the C standard. I have not heard of
> people (in this thread or otherwise) *unindoctrinated* whose intuition

> is different. ...

Well, I believe that you could reasonably argue that anyone who knows
enough about C to be a regular participant in this newsgroup has been
"indoctrinated", That makes it a perfect excuse for ignoring anything
anyone says about the matter that is inconsistent with your own
intuition.

> ... In this thread, these are just people mischaracterizing

> what is meant by the word intuition.
>
> But of course there are no shortage of people in this thread who are
> honest. Remember I never *defined* what I thought strcat(p,p) should
> do -- the honest people knew what I meant without prompting.

It seems to me that you've just called me dishonst, and implicitly
called me a liar. On what evidence are you basing your accusation that
I was lying when I wrote: "I didn't have any intuition about what

strcat() would do when I first heard about it. I read its

specification, and expected it to operate as specified."?

> You can't align people's intuition and expectation to anything. You
> just can't do it. You can only get people to lie about it (obvious
> reference to 1984); they "learn" that their intuition is wrong. You
> will not get the same effect because you can't align people's
> intiution. Worse yet, the attempt to do so is ridiculously expensive.

Realigning people's intuition is a routine, everyday event. Your
intuition is not some unalterable platonic ideal that you somehow
become mystically aware of. It is merely the subconscious expectations
you have developed based upon your prior experience, which means that
it's constantly changing as you acquire new experiences. If you have
prior experience with a language where strings are first class objects,
you might reasonably develop an intuition that says that catenating two
stings creates a new string containing a copy of the second string
appended to a copy of the first string. However, without such prior
experience, I don't see any reason why you'd have any particular
expectations about it.

When I first learned C, the three languages I already knew were Fortran
I (sic), APL, and Basic, in that order, with APL as my favorite of the
three. It's been so long since I've done any Fortran I or Basic that I
don't remember how they handled the equivalent of strcat(). Fortran I
was far more primitive than even Fortran IV - I'm not sure it even had
string catenation. APL supports string catenation with precisely the
semantics described above, but it uses the catenate operator ',' to do
it, not a specially named function, so I came to C with no prior
expectations for what a function named "strcat()" would do. If I had
relied upon my APL background, I would have interpreted strcat("hello
", "world!") as a call to a unary function with a single argument
formed by the ',' operator by catenating those two strings, with
redundant parenthesis around the argument. It's a good thing I didn't
rely on my intuition for such purposes.

> > [...] strcat(a,a) is not something that
> > a reasonable C programmer would think of doing.
>
> But it is something every reasonable programmer might think of doing.
> Notice that the only real distinction is the word "C".

By using "but" you imply that you're accepting as true the statement
that you're responding to. The combination of his statement and your
statement implies that there is no overlap between the sets of
"reasonable programmers" and "reasonable C programmers". Since the
latter set is a subset of the former set, having an empty overlap
implies that the second set is empty. Are you a C programmer?

Ben Pfaff

unread,

Sep 19, 2006, 12:19:33 PM9/19/06

to

Richard Heathfield <inv...@invalid.invalid> writes:

> I haven't a clue what you think strcat(p, p) will do. Do you think that
> makes me dishonest? Ah, but clever people know that I'm honest, so if you
> think I'm dishonest, that makes you wrong and stupid.

I think the meaning of strcat(p,p) is related to the question of
how many instances of "ana" there are in "banana". Both
questions have multiple reasonable answers.
--
"The way I see it, an intelligent person who disagrees with me is
probably the most important person I'll interact with on any given
day."
--Billy Chambless

Richard Heathfield

unread,

Sep 19, 2006, 12:51:24 PM9/19/06

to

Ben Pfaff said:

> I think the meaning of strcat(p,p) is related to the question of
> how many instances of "ana" there are in "banana".

It depends on what kind of banana it is, and what language you speak.
(Curiously, according to M-W the Wolof for "banana" is "banaana".)

It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
strcat in a different language, because sure as bananas is banaanas he
isn't talking about strcat in C. Even if he thinks he is.

David Wagner

unread,

Sep 19, 2006, 3:40:19 PM9/19/06

to

Richard Heathfield wrote:
>It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
>strcat in a different language, because sure as bananas is banaanas he
>isn't talking about strcat in C. Even if he thinks he is.

You're missing his point. Once you've been steeped in C lore, of course
you know what strcat() does. Any good C programmer does, once they've
learned the language. But learning the language, in this case, consists
of unlearning your intuition about what it means to concatenate strings.
The natural intuition about what strcat() should do would be that it
concatenates strings ("str" "cat", get it?). And there is a natural
definition of what it means to concatenate strings. Unfortunately,
C's strcat() does not use that natural definition; it does not match
the natural intuition you might have before you were exposed to C.
That's a problem, because it means that learning C requires unlearning
your intuition. Unlearning an old intuition and learning a new one
is twice as hard as learning something that you never had any prior
intuition about. Strcat() is just one example of this kind of phenomenom;
it occurs in many funny places in the C language. The folks here are
so steeped in C that they've probably forgotten what it was like to
originally learn C and be surprised by some of its oddball semantics.
Those oddball semantics were justified in the days of PDP-11 where CPU
performance was more important than making the programmer's life easier.
Today, those design choices are debateable.

Douglas A. Gwyn

unread,

Sep 19, 2006, 3:29:16 PM9/19/06

to

Richard Heathfield wrote:
> It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
> strcat in a different language, because sure as bananas is banaanas he
> isn't talking about strcat in C. Even if he thinks he is.

In fairness, he's arguing what he thinks it *should* be
(according to his notion, which he seems to think is the
only sensible or "honest" one).

One wonders whether he thinks that a general matrix
multiplication function matmul(a,b,c,l,m,n) is stupid
if it doesn't work when invoked as matmul(a,a,a,n,n,n).
The conceptual issues are pretty much the same as for
string concatenation.

David Wagner

unread,

Sep 19, 2006, 3:46:23 PM9/19/06

to

>On what evidence are you basing your accusation that
>I was lying when I wrote: "I didn't have any intuition about what
>strcat() would do when I first heard about it. I read its
>specification, and expected it to operate as specified."?

Are you sure you really had zero intuitions or guesses about what
strcat() does before you read the manual? If I had told you that
strcat()'s semantics were to start up a flight simulator, play the
Yankee Doodle Dandy over the speakers, then erase every 7th file on the
filesystem, you wouldn't be surprised in the least? You wouldn't find
those semantics counterintuitive? If you say so, I'll believe you,
but if so, I doubt that your case is representative of the programmer
population at large. It's only natural to look at the name "strcat",
recognize that it is referring to concatenation of strings (anyone
who has used /bin/cat on Unix should know what "cat" is short for),
make a guess that maybe strcat() concatenates strings -- and then have
your guess be proven wrong. That's the sense in which strcat() doesn't
really match the natural intuition.

Of course, for those rare folks who approach strcat() with absolutely
zero intuitions, guesses, or preconceptions about its semantics before
reading the manual, it doesn't matter what semantics we assign to that
function, as long as we document them. But I would bet that the majority
of programmers start off with some intuitions or guesses about what a
function like strcat() does, just from its name, and if we violate those
intuitions, then that has a cost. It leads to programmer surprise, and
may lead to increased incidence of bugs. We shouldn't incur those kinds
of costs unthinkingly.

Harald van Dĳk

unread,

Sep 19, 2006, 3:53:18 PM9/19/06

to

Not stupid, but unintuitive. Which may be acceptable if the advantages
are high enough.

Michal Necasek

unread,

Sep 19, 2006, 3:54:54 PM9/19/06

to

David Wagner wrote:

> Are you sure you really had zero intuitions or guesses about what
> strcat() does before you read the manual?
>

I thought it was for stringing cats together. The name clearly
suggests that, doesn't it?

> It's only natural to look at the name "strcat",
> recognize that it is referring to concatenation of strings (anyone
> who has used /bin/cat on Unix should know what "cat" is short for),
>

There are quite a few C programmers who aren't all that familiar with
UNIX... any assumptions about familiarity with 'cat' are likely unfounded.

Michal

Richard Heathfield

unread,

Sep 19, 2006, 5:55:31 PM9/19/06

to

Michal Necasek said:

> David Wagner wrote:
>
>> It's only natural to look at the name "strcat",
>> recognize that it is referring to concatenation of strings (anyone
>> who has used /bin/cat on Unix should know what "cat" is short for),
> >
> There are quite a few C programmers who aren't all that familiar with
> UNIX... any assumptions about familiarity with 'cat' are likely unfounded.

Yup. I cut my C teeth in MS-DOS, and then moved on to Windows. I'd probably
been doing C for - oh, maybe eight or nine years before I finally settled
into a decent Linux distro. Slow thinker that I am, I was about to say that
I didn't know what /bin/cat is, but I just worked out that David means
cat(1), which I generally (mis?)use as a clone of MS-DOS's TYPE command.
So, for me, cat(1) means "clueful advanced type"!

David has a point that many of us probably don't remember when we first
learned strcat. But, being slower of thought than most other geniuses, I
was a slave to the docs when learning C - admittedly the Turbo C reference
manual rather than anything authoritative - and so I read up very carefully
on each function before trying to use it. Not always carefully enough,
alas, but that's humans for you.

But in a way, that's neither here nor there. I'm not overly interested in
what neophytes assume about C, except insofar as understanding those
assumptions helps me to help them when they seek such help. I'm certainly
not interested in changing the language for the sake of making C easier to
learn. It's already as simple as it can reasonably be expected to be (less
so since C99, though), and I think Einstein had exactly the right attitude
about simplicity.

No, what I object to is Mr Hsieh's apparent assumption that anyone who
disagrees with him is either stupid or dishonest. Just because /his/
intuition leads him to a particular conclusion, that doesn't mean that
other people who are at least as bright and honest as he is will
necessarily be led to the same conclusion by /their/ intuition.

Douglas A. Gwyn

unread,

Sep 19, 2006, 5:45:04 PM9/19/06

to

David Wagner wrote:
> ...It's only natural to look at the name "strcat",

> recognize that it is referring to concatenation of strings (anyone

> who has used /bin/cat on Unix should know what "cat" is short for), ...

Good example; what should "cat a > a" do?

There is a big difference between abstract values in some
mathematical model space and actual behavior where real
devices have to be used to implement some approximation to
that model. Naturally it is nicer the closer these match,
but in reality there are often choices to be made and
various trade-offs to be evaluated. A computing
professional can reasonably be expected to learn the
documented properties of the components with which he works.

kuy...@wizard.net

unread,

Sep 19, 2006, 6:17:10 PM9/19/06

to

David Wagner wrote:
> >On what evidence are you basing your accusation that
> >I was lying when I wrote: "I didn't have any intuition about what
> >strcat() would do when I first heard about it. I read its
> >specification, and expected it to operate as specified."?
>
> Are you sure you really had zero intuitions or guesses about what
> strcat() does before you read the manual?

Yes. I find nothing intuitive about that name, it clearly says string
catenation, but those words never did, and still don't, convey a unique
meaning to me.

> ... It's only natural to look at the name "strcat",

> recognize that it is referring to concatenation of strings

Yes, after reading the documentation I realized that this is what
strcat() was an abbreviation for, but it's not something I would
consider obvious without the other str*() functions as analogs. All of
the str*() functions are described on the same man page, so I was
introduced to all of them at the same time.

Even knowing that it does string catenation doesn't resolve the issue.
Does it work on null-terminated strings, counted strings, or strings
delimited by '"' characters? Is it an in-place catenation, or a
catenation that puts the result in a new location - I know of
non-programming uses of the word catenate that could justify either
expectation. If it is an in-place catenation, how does it handle the
possibilty of overlap between the input strings? If it did use a new
location, is that location user-provided, static, or dynamically
allocated)? Until I read the documentation for strcat(), I didn't know
the answers to any of those questions, and had no particular
expectations. After I read the documentation, I knew, and didn't have
any seriously pre-conconcieved expectations that needed to be revised.
If I'd ever used another library before which had a function named
something like strcat(), it would be a different matter, I'd expect
similar behavior to whatever that other library did. But I got all of
my initial expectations for strcat() from the documentation.

Richard Heathfield

unread,

Sep 19, 2006, 6:52:31 PM9/19/06

to

Douglas A. Gwyn said:

> David Wagner wrote:
>> ...It's only natural to look at the name "strcat",
>> recognize that it is referring to concatenation of strings (anyone
>> who has used /bin/cat on Unix should know what "cat" is short for), ...
>
> Good example; what should "cat a > a" do?

So I was curious...

me@here:~> cat > del.me
Now is the time for all good men to party.
me@here:~> cat del.me > del.me
cat: del.me: input file is output file
me@here:~> cat < del.me > del.me
me@here:~> cat del.me
me@here:~>

I was impressed that cat was able to detect the first collision. I didn't
expect it. So I was half-expecting it to be able to handle the second, too,
and mildly disappointed when it couldn't.

<snip>

> A computing
> professional can reasonably be expected to learn the
> documented properties of the components with which he works.

You'd've thunk so, wouldn't you? But nowadays, many computing professionals
can't even work out which way up an email is supposed to go. Sigh.

David Wagner

unread,

Sep 19, 2006, 7:06:27 PM9/19/06

to

Richard Heathfield wrote:
>David has a point that many of us probably don't remember when we first
>learned strcat. But, being slower of thought than most other geniuses, I
>was a slave to the docs when learning C - admittedly the Turbo C reference
>manual rather than anything authoritative - and so I read up very carefully
>on each function before trying to use it. Not always carefully enough,
>alas, but that's humans for you.

I wonder if part of the issue here is that folks on this newsgroup are
not representative of the programmer population at large. The folks
on this newsgroup are incredibly knowledgeable about C. Most of the
folks who post here can be fairly characterized as C gurus -- you all
are 3 sigmas out. But many C programmers are not C gurus. Anyone can
design a language where it is _possible_ for gurus to program securely.
Anyone can design a language where it is possible for people who are 3
sigmas out and who have memorized the official language specification to
program securely. But the important challenge is to design a language
where most programmers -- say, programmers who are 1 sigma out, but who
aren't necessarily experts on the official C spec -- can build programs
securely. The real challenge is to design languages that maximize the
chances that programs built by ordinary mortals will be secure.

I think there is a temptation to assume that the everyone in the world is
like you. There is a temptation to design the language to optimize for
a user audience who looks just like the designers. But that temptation
is dangerous, because the language designers (and the folks who post to
this newsgroup) are not representative of programmers at large.

Too many of the posts here seem to draw a false dichotomy: either you are
a C guru (of the level of folks who post here), or else you are a clueless
neophyte. But in reality, things are not black-and-white. There is
an enormous population of programmers who are not clueless neophytes,
but who are also not experts on the esoterics of the official C99 spec.
Just because they don't have all the corner cases of the spec memorized
doesn't mean that they are idiots. I submit that we should be thinking
about how to design our languages and libraries with those vast majority
of programmers in mind as our intended user audience. Most programmers
are well-intentioned but not infallable. Many programmers are expert
in one area or another, but not necessarily experts on the C spec.

I suggest that we should be thinking about how to design the language
and libraries to minimize the chances that these programmers will
inadvertently introduce security bugs and to maximize the chances that
the code they write will be secure. Among other things, this involves
choosing our APIs (both the semantics of the interfaces, and the names of
the interfaces) to minimize the cognitive burden, to make the semantics
as intuitive as possible, and to make the names help to remind you of
the actual semantics as much as possible. Maybe it's too late to make
good choices here for C. But even if we're stuck with the choices that
were made long ago, we should try to appreciate the costs of those choices
forthrightly -- not downplay or ignore them.

P.S. I'm not necessarily talking about making it easier to learn the
language. I'm talking about making it easier to avoid making mistakes
in the language, and about making it easier to avoid inadvertently
introducing bugs and security holes into your code. Well-chosen libraries
can help with that. Making the names match the semantics can help
with that. Choosing the semantics to be as intuitive as possible can
help with that. This is just a matter of good taste, good design,
and good engineering.

>No, what I object to is Mr Hsieh's apparent assumption that anyone who
>disagrees with him is either stupid or dishonest. Just because /his/
>intuition leads him to a particular conclusion, that doesn't mean that
>other people who are at least as bright and honest as he is will
>necessarily be led to the same conclusion by /their/ intuition.

Understandable. I agree with you here. But I also think he has some
valid points that maybe people haven't fully appreciated, and so I wanted
to call out those points that do seem like they are worth discussing.

Richard Tobin

unread,

Sep 19, 2006, 7:22:47 PM9/19/06

to

In article <aMWdnWccRev...@bt.com>,
Richard Heathfield <inv...@invalid.invalid> wrote:

>me@here:~> cat del.me > del.me
>cat: del.me: input file is output file
>me@here:~> cat < del.me > del.me

>I was impressed that cat was able to detect the first collision. I didn't

>expect it. So I was half-expecting it to be able to handle the second, too,
>and mildly disappointed when it couldn't.

The second case can make sense, or rather cat can't so easily tell
that it doesn't, because it doesn't open both the files. When cat's
input and output are both provided by the operating system, the source
and destination could be in the same file and yet not overlap.
Suppose that standard input is positioned 100 bytes from the end of a
file that's more than 200 bytes long, and standard output is
positioned at the beginning. I would expect the last 100 bytes to be
copied to the beginning, and this works on the machines I tried it on:

(cp junk junk1; (dd bs=100 count=2 >/dev/null ; cat) <junk1) >junk1

(junk1 should contain, say, 100 a's, 100 b's, and 100 c's. The cp is
necessary to get something in the file junk even though the shell
redirection empties it).

In your first case, cat is always copying from the beginning of the
file, so the source and destination are bound to overlap. I suppose
cat could examine the initial file offsets to determine that your
second case can't work either.

-- Richard

Michael Mair

unread,

Sep 19, 2006, 7:54:50 PM9/19/06

to

David Wagner schrieb:

In principle, I agree with you.
However, I had not yet had a single English lesson when I first
encountered C and certainly had not heard of Unix and vaguely about
MS-Dos.
I read the C book I had and tried to understand the explanations
in the appendix. As I mostly worked on system level, I did not use
much of the C standard library apart from sprintf() and some math
functions...
I was perfectly happy with the semantics I found because the
identifiers meant nothing at all to me.
Nowadays, I am _not_ perfectly happy if I find a thing that uses
the same name for semantics different from all other languages I
know. I accept it, though.

If there were a real need to change the semantics of the standard
library as is, then I'd support making names clear, throw out
unnecessarily dangerous functions and lifting unnecessary
restrictions.
Adopting one or more sensible libraries as standardised add-ons
to a hosted implementation to provide convenient,
easy-and-safe-to-use standardised ways of doing certain things
would have my approval, too, even though some care would be
necessary to make sure that programmers do not lock themselves
into too small a set of implementations by choosing certain
libraries.

As it is, I do not expect C to evolve any more (at least not
in an accepted way) from "C95 plus most popular parts of C99".

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

David Wagner

unread,

Sep 19, 2006, 8:42:25 PM9/19/06

to

Douglas A. Gwyn wrote:
>There is a big difference between abstract values in some
>mathematical model space and actual behavior where real
>devices have to be used to implement some approximation to
>that model. Naturally it is nicer the closer these match,
>but in reality there are often choices to be made and
>various trade-offs to be evaluated.

Well, it sounded to me like Mr. Hsieh was trying to argue that
those alleged tradeoffs have been mischaracterized: that the
claims about the disadvantages of nice semantics are inaccurate.

As you say, it would be nicer if the actual semantics of strcat()
matched the natural mathematical semantics. It's nicer, because the
closer the match between actual and expected semantics, the greater the
likelihood that the program will be correct; conversely, the greater the
mismatch, the greater the chance of inadvertent correctness bugs due to
programmer confusion. So, obviously, we would prefer nicer semantics
over less nice semantics, all else being equal.

strcat() uses not-so-nice semantics. As I understand it, the standard
explanation why is because the performance costs of the nice semantics
would be too high -- or so it is claimed, anyway. As I understand
Mr. Hsieh's point, he is disputing that claim. He is arguing that
in fact the performance costs of the nice semantics (compared to the
current un-nice semantics) for strcat() are negligible. He is saying
that his string implementation manages to both use nicer semantics and
get better performance than the current string library. In other words,
he is calling into question the design decisions made in the current
string library.

I think Mr. Hsieh raises some important points. Nitpicks about
the meaning of intuition seem to me to miss the point. Claims that
programmers should "just read the manuals" seem to me to miss the point.
These design choices have consequences, and trying to shift all the
blame to the programmer does not seem like a fully satisfactory response
to me.

(I'm reminded of several plane crashes triggered by human factors flaws
in the pilot's user interface. Plane manufacturers love to blame those
crashes on "pilot error", but sometimes the root cause is that the user
interface was poorly designed and most pilots couldn't have been expected
to get it right. Transfering blame onto the pilot may the best way to
save the manufacturer money, but these blame transfers aren't always
the best way to make aviation safer.)

Now it may be that, due to legacy considerations, we are stuck with a
sub-optimal string library, and even though better solutions exist, it's
not practically viable to adopt these better solutions. It may be that
there is no hope for improving the C language in this regard. It may be
that, in retrospect, the C language has flaws that weren't recognized
at the time it was designed that and contribute to many security holes
and correctness bugs. All of those things may well be the case. If so,
I think it would be most intellectually honest to admit up front that the
criticisms of the C standard have some validity, even if you believe it
isn't viable to fix those deficiencies at this point.

webs...@gmail.com

unread,

Sep 19, 2006, 9:26:01 PM9/19/06

to

kuy...@wizard.net wrote:
> webs...@gmail.com wrote:
> > Douglas A. Gwyn wrote:
> ...
> > > No, I'm saying that intuition is a function of experience,
> > > and thus it may vary among individuals. You have heard in
> > > this thread from people who deny that their intuition about
> > > the expected behavior of strcat(a,a) matches yours.
> >
> > I have only heard from people disagree with this *after* they have been
> > indoctrinated by the dictates of the C standard. I have not heard of
> > people (in this thread or otherwise) *unindoctrinated* whose intuition
> > is different. ...
>
> Well, I believe that you could reasonably argue that anyone who knows
> enough about C to be a regular participant in this newsgroup has been
> "indoctrinated", That makes it a perfect excuse for ignoring anything
> anyone says about the matter that is inconsistent with your own
> intuition.

Because we are seeing two clear categories of responses. Those that
just tell the obvious truth, and those that wish to continue to pursue
this ridiculous contrarian position as if aliasing is something
intertwined into people's intiution.

This categorization isn't just some dellusion on my part -- I did not
define what I thought strcat(p,p) should do, yet many people figured it
out without any issue at all (it requires nothing more than honesty)
and on the other hand when I pressed the contrarians about what their
alias-based intuition is all about what do I see?

1) Claims that solutions that satisfy "Hsieh's intiution" (which is so
far still not explicitely declared in this thread, BTW) would be
"slower". Utterly false.
2) Claims that there is only one obvious implementation (and it can't
deal with aliasing) -- I gave one two posts ago that's pretty tight
that doesn't suffer any anti-intuition problem, and even satisfies the
current C specification.
3) Claims that "solving the problem" would require either special
detection or copying through auxilliary buffers.

Think about where your head has to be to be making such fallacious
statements. People who are indoctrinated by some ideology that is
false, can usually be exposed through any simple test of credibility
such as this. In Douglas A. Gwyn's case, his denial extended even
*past* the point where this was made clear (a case of unshakable
indoctrination). So I don't think my categorization is unjustified.

> > ... In this thread, these are just people mischaracterizing
> > what is meant by the word intuition.
> >
> > But of course there are no shortage of people in this thread who are
> > honest. Remember I never *defined* what I thought strcat(p,p) should
> > do -- the honest people knew what I meant without prompting.
>
> It seems to me that you've just called me dishonst, and implicitly
> called me a liar. On what evidence are you basing your accusation that
> I was lying when I wrote: "I didn't have any intuition about what
> strcat() would do when I first heard about it. I read its
> specification, and expected it to operate as specified."?

Did your first reading of the specification include a detailed
description about aliasing? Because that's the way I learned strcat
too -- I read the documentation. As I've pointed out either in this
thread or in others, the vast majority of documentation about strcat
*today* omits mention of the aliasing affect (msdn appears to be
updated with this information -- however this documentation is very new
relative to when I learned C.)

The first few lines of any description of strcat tells us: "The strcat
function appends a copy of the string pointed to by src (including the
terminating null character) to the end of the string pointed to by
dst", we scroll down to the example they give, and we quickly form an
idea about what this function does. Now if *they* were dilligent and
*you* were dilligent, then you can read about how aliasing can screw
you in more documentation, but that clearly comes *after* your initial
understanding.

I.e., your intuition should kick in before you think about aliasing,
which is treated as a big nasty corner case for the whole language (and
thus commonly omitted in the standard documentation).

> > You can't align people's intuition and expectation to anything. You
> > just can't do it. You can only get people to lie about it (obvious
> > reference to 1984); they "learn" that their intuition is wrong. You
> > will not get the same effect because you can't align people's
> > intiution. Worse yet, the attempt to do so is ridiculously expensive.
>
> Realigning people's intuition is a routine, everyday event. Your
> intuition is not some unalterable platonic ideal that you somehow
> become mystically aware of.

Yeah, well buffer overflows being added into code even to this day is
also an everyday event. I am suggesting there is a link between those
two things.

> [...] It is merely the subconscious expectations

> you have developed based upon your prior experience, which means that
> it's constantly changing as you acquire new experiences. If you have
> prior experience with a language where strings are first class objects,
> you might reasonably develop an intuition that says that catenating two
> stings creates a new string containing a copy of the second string
> appended to a copy of the first string.

Actually having to *copy* the string is an implementation based
understanding. People are more likely going to think of strings in
terms of their *contents*. I.e., the contents of the first string,
appended with the second as a whole then is stored in the destination
variable. This is pretty much how I think about it in whatever
language I am using -- including the *first* language I learned.

> [...] However, without such prior

> experience, I don't see any reason why you'd have any particular
> expectations about it.
>
> When I first learned C, the three languages I already knew were Fortran
> I (sic), APL, and Basic, in that order, with APL as my favorite of the
> three.

For me it was BASIC, Fortran, Assembly, Pascal and Logo. I even *knew*
about the general aliasing problem because of assembly. (But assembly
is clearly in a special category -- you learn about aliasing from the
ground up.)

> [...] It's been so long since I've done any Fortran I or Basic that I

> don't remember how they handled the equivalent of strcat().

The Fortran language itself bans aliasing of any kind at the source
level. Basic can never have undefined, or truly bizzare behavior from
operations within its own syntax (Basic does not support any kind of
concept of pointers, beyond "peek" and "poke" which are obviously
extensions.)

> [...] Fortran I

> was far more primitive than even Fortran IV - I'm not sure it even had
> string catenation. APL supports string catenation with precisely the
> semantics described above, but it uses the catenate operator ',' to do
> it, not a specially named function, so I came to C with no prior
> expectations for what a function named "strcat()" would do. If I had
> relied upon my APL background, I would have interpreted strcat("hello
> ", "world!") as a call to a unary function with a single argument
> formed by the ',' operator by catenating those two strings, with
> redundant parenthesis around the argument. It's a good thing I didn't
> rely on my intuition for such purposes.

Well, I pretty quickly saw the difference between syntax and semantics,
since no two languages I encountered in my early programing days seemed
to be the same. I'm sorry you let that subvert your intuition --
especially since there's no good reason for that to have occurred.

> > > [...] strcat(a,a) is not something that
> > > a reasonable C programmer would think of doing.
> >
> > But it is something every reasonable programmer might think of doing.
> > Notice that the only real distinction is the word "C".
>
> By using "but" you imply that you're accepting as true the statement
> that you're responding to.

That's because "reasonable" is subjective, so its pointless to argue
against the point directly.

> [...] The combination of his statement and your

> statement implies that there is no overlap between the sets of
> "reasonable programmers" and "reasonable C programmers". Since the
> latter set is a subset of the former set, having an empty overlap
> implies that the second set is empty. Are you a C programmer?

So you've never seen a "reductio ad absurdum" argument before? Here:
http://en.wikipedia.org/wiki/Reductio_ad_absurdum

David R Tribble

unread,

Sep 19, 2006, 9:34:44 PM9/19/06

to

kuyper wrote:
> When I first learned C, the three languages I already knew were Fortran
> I (sic), APL, and Basic, in that order, with APL as my favorite of the
> three. It's been so long since I've done any Fortran I or Basic that I
> don't remember how they handled the equivalent of strcat(). Fortran I
> was far more primitive than even Fortran IV - I'm not sure it even had
> string catenation. APL supports string catenation with precisely the
> semantics described above, but it uses the catenate operator ',' to do
> it, not a specially named function, so I came to C with no prior
> expectations for what a function named "strcat()" would do.

It's been years (decades) since I wrote any BASIC code, but it
does allow string concatenation:
LET A$ = A$ + A$
or, in some variants:
LET A$ = CONCAT$(A$, A$)

Ironically, even though BASIC has its roots in FORTRAN,
the former treats strings more like first-class objects than
the latter. FORTRAN strings are essentially the same as
C strings, i.e., fixed-length arrays of characters. You might
say that FORTRAN strings are to C char[]'s as BASIC strings
are to C++ std::strings.

Another very important point to be made is that C and FORTRAN pass
strings by reference, whereas other languages pass strings by
value. Thus func(a) can affect the contents of 'a' in the former but
not in the latter languages. I.e., strcat(s, s) affects the value of
s, but CONCAT(A$, A$) does not change A$.

Which is the whole point: you can't use preexisting assumptions
from one language to guide you too deeply in learning a new one.

-drt

webs...@gmail.com

unread,

Sep 19, 2006, 9:48:17 PM9/19/06

to

Douglas A. Gwyn wrote:
> Richard Heathfield wrote:
> > It seems that Mr Hsieh's interpretation of strcat(p, p) concerns a different
> > strcat in a different language, because sure as bananas is banaanas he
> > isn't talking about strcat in C. Even if he thinks he is.
>
> In fairness, he's arguing what he thinks it *should* be
> (according to his notion, which he seems to think is the
> only sensible or "honest" one).
>
> One wonders whether he thinks

You know I'm still in the room.

> [...] that a general matrix

> multiplication function matmul(a,b,c,l,m,n) is stupid
> if it doesn't work when invoked as matmul(a,a,a,n,n,n).

Actually in this case its absolutely clear. Matrix multiplication is
of great concern to *non-programmers*. The mountain of steeped
intuition that you would be trying to undo by not supporting aliasing
is just enormous. Mathematics is not going to yield to computer
science (though * as multiplication has crept into publications, this
is clearly superficial) or more specifically bad computer
implementations. For example, in Fortran (the language many science
people start from) they already deal with the issue by disallowing
aliasing at the source level.

And once again, you are not thinking correctly about the performance.
The cost for getting it right is trivially small in comparison to
ignoring the aliasing case. In this case you really do want to detect
and copy -- but the cost of the main core is high enough that that's
just not going to make a difference.

> The conceptual issues are pretty much the same as for
> string concatenation.

Uh -- no, you just wish this for some reason.

You can implement an aliasing safe strcat() *TODAY* even compliant with
the standard as it exists, at no cost at all. And yet people don't
because people accept the shabby state of the C standard for some
reason. Its apathy all around, and for no good reason.

Matrix multiplication is a completely different issue -- that's a
concern for people who start from their mathematical intiution and
*WILL NOT* bend their understanding no matter you tell them. You could
put it in the standard if you wanted, but people would just balk at it
for sure. In this case people actually care (I mean subject the
assumption that they care about C for some reason); you just wouldn't
get away with it.

kuy...@wizard.net

unread,

Sep 19, 2006, 11:32:29 PM9/19/06

to

webs...@gmail.com wrote:
> kuy...@wizard.net wrote:
> > webs...@gmail.com wrote:
> > > Douglas A. Gwyn wrote:
> > ...

...

> 1) Claims that solutions that satisfy "Hsieh's intiution" (which is so
> far still not explicitely declared in this thread, BTW)

I thought that you yourself had pretty explictly stated the behavior
that you consider intuitive, in your message containing the header
"Date: 27 Jul 2006 14:00:07 -0700":

"Most people would intuitively think of this as simply replacing the
string with a doubled version of itself -- i.e., its analogous to the
C++ expression p += p for std::string's (and to be honest, I don't know
if that's legal or not), or just p = p + p in most other programming
languages."

> > > ... In this thread, these are just people mischaracterizing
> > > what is meant by the word intuition.
> > >
> > > But of course there are no shortage of people in this thread who are
> > > honest. Remember I never *defined* what I thought strcat(p,p) should
> > > do -- the honest people knew what I meant without prompting.
> >
> > It seems to me that you've just called me dishonst, and implicitly
> > called me a liar. On what evidence are you basing your accusation that
> > I was lying when I wrote: "I didn't have any intuition about what
> > strcat() would do when I first heard about it. I read its
> > specification, and expected it to operate as specified."?
>
> Did your first reading of the specification include a detailed
> description about aliasing?

It was 1979, and I no long have access to the precise text that I was
reading at the time to verify whether or not it's description was
complete. However, I do know that I never even considered using
strcat() in the fashion that you've called intuitive, so I suspect that
I must have learned about the restrictions somewhere along the way, and
almost certainly from that text. I don't believe it was as a result of
trial and error learning; I've generally seen little point in using
strcat() - the in-place catenation it performs has seldom been what I
needed. I've generally found sprintf() more convenient for most of the
purposes I might otherwise have used strcat() for.

> ... Because that's the way I learned strcat

> too -- I read the documentation. As I've pointed out either in this
> thread or in others, the vast majority of documentation about strcat
> *today* omits mention of the aliasing affect (msdn appears to be
> updated with this information -- however this documentation is very new
> relative to when I learned C.)

I can't speak of "the vast majority of documentation", i only have
access to a small variety of different sources of documentation, but
the documentation I have access to does describe the problem clearly
and accurately.

...

> Well, I pretty quickly saw the difference between syntax and semantics,
> since no two languages I encountered in my early programing days seemed
> to be the same. I'm sorry you let that subvert your intuition --
> especially since there's no good reason for that to have occurred.

There was no "subversion", just the accumulation of additional
experiences from which to form my expectations for future experiences.

Douglas A. Gwyn

unread,

Sep 20, 2006, 10:08:20 AM9/20/06

to

David R Tribble wrote:
> Another very important point to be made is that C and FORTRAN pass
> strings by reference, whereas other languages pass strings by
> value. Thus func(a) can affect the contents of 'a' in the former but
> not in the latter languages. I.e., strcat(s, s) affects the value of
> s, but CONCAT(A$, A$) does not change A$.

Yes, in C programming aliasing (storage) issues are important,
and an abstract-value model is of no help in resolving them.

James Antill

unread,

Sep 20, 2006, 6:04:43 PM9/20/06

to

On Tue, 19 Sep 2006 11:53:30 +0000, Richard Heathfield wrote:

> I haven't a clue what you think strcat(p, p) will do.

I'd assume that Paul is using strcat() to mean "string concatenation".
Where, I'd argue, that most people expect an atomic concatenation. Eg.
given:

x = "abcd"
y = x + x
x += x

...I'd argue that most people expect x to have the same value as y.
Stupidly converting that to C, you get:

char x[9];
char y[9];

strcpy(x, "abcd");
sprintf(y, "%s%s", x, x);
strcat(x, x);

...but, of course, the last call is bad.

And, to be fair to Paul, I've seen my share of C code that something
like:

strcpy(x, x + 1);
sprintf(x "%s%d", x, foo);

...under similar ignorance of the C API specifications.

--
James Antill -- ja...@and.org
http://www.and.org/and-httpd

Douglas A. Gwyn

unread,

Sep 21, 2006, 12:29:11 PM9/21/06

to

James Antill wrote:
> And, to be fair to Paul, I've seen my share of C code that something
> like:
> strcpy(x, x + 1);
> sprintf(x "%s%d", x, foo);
> ...under similar ignorance of the C API specifications.

Indeed, there is a much broader issue than the mere specification
of the one function "strcat". As another poster pointed out, in
C such "array" parameters are actually pointer parameters, so the
array objects are essentially passed "by reference", not by the
object values; *in any PL* this would raise the question of what
happens when the references are aliases. Many PLs simply outlaw
programs that contain such aliasing. C allows the aliasing,
except when the programmer has used "restrict" qualification or
when the pointed-to types are incompatible, and it is then up to
the API designer to decide what the semantics are going to be.
In the case of strcat, the official specification clearly
indicates that aliased arguments are not allowed, via "restrict"
qualifiers on the parameters in the function declaration.

SuperKoko

unread,

Sep 21, 2006, 3:22:48 PM9/21/06

to

What do you think that strcat should do?
Intuitively, for someone who has never used C but only other languages,
the unintuitive thing about strcat is not that strcat(p,p) doesn't
work, but that strcat("hello ", "world!") doesn't work!

Intuitively for a newbie, strcat(x,y) doesn't modify y nor x and
returns a new string...

Obviously, this semantic shall not change in next C standards (though,
higher level string functions might be added).

Now, more seriously.
Consider a C programmer who has just learnt that strcat doesn't do what
he expected (i.e. he now understands that there the dest string must be
large enough to contain new characters, etc.).
In order to learn that he read the manual... And I do think that any
decent manual should warn the reader than dest and src must not be
aliased.
I just read the man page (in French) and that's mentioned. Good.
This programmer should have no real problem.
If you read a C99 manual, that would even be more obvious (thanks to
the restrict keyword).
Moreover, C99 programmers are very acustomed to the "restrict" keyword,
and know that aliasing might fail anywhere if they don't check the
manual.

But I fear that there are many Bad Manuals that don't specify it... (I
would like a confirmation, since I'm not sure).

Now, a second question:
For an intermediate C programmer who knows C enough to understand the
design principles of the C library... Does the expression strcat(p,p)
seems dubious enough to him that he checks the manual as soon as he
wants to use it?
The answer is not obvious... I think that many programmer would.... But
perhaps some programmers would not... In the latter case, it's really
harmful.
If that latter case is frequent enough, it might be worth revisiting
the specification of strcat

Now, I think that there is a far more unintuitive semantic in the C
standard library, that may produce bugs in the code of advanced
programmers... I think it's unintuitive even for an experienced
programmer who think to know well the language.

That's the semantic of memcpy and memove if their third argument is 0.
It requires that pointer point to objects, implying that:
memcpy(dest, NULL, 0);
Has undefined behavior (though, it doesn't crash on implementations I
know).

1) I've read the man page of my Linux distrib, and that thing is NOT
specified in it! :(
I fear that most docs don't talk about that issue at all.
2) If the standard specified that this is valid, and do nothing,
implementations would have only either a zero-overhead on
non-obfuscated platforms, or a minor overhead on those obfuscated
platforms, namely, a conditional test:

if (n>0) {
/* do the stuff */
}
Instead of:
/* do the stuff */

This overhead is almost negligible on platforms where memcpy and memove
are not inlined (and, memcpy and memove are not often inlined, at least
with implementations I know).
This overhead might be non-negligible on platforms where memcpy and
memove are inlined, though it's not huge at all, and programmers are
acustomed to the fact that calling memcpy or memove on very small
memory blocks has an inertial non-negligible overhead and program in a
way that doesn't decrease greatly performances if memcpy or memove have
this inertial overhead.

My point is that, specifying that memcpy(dest, NULL, 0) is ok, would be
almost negligible on all existing platforms, and would have a
zero-overhead on many common platforms.

Now, allowing aliasing in strcat is probably far less benefitial, since
it's far less counter-intuitive for intermediate & advanced
programmers, very uncommon (code such as memcpy(dest,
ptr_that_might_be_null, size_that_might_be_zero) is not uncommon).

One might think that it's hard to write an efficient implementation of
strcat accepting pointer aliasing.
But there seems to be a portable, efficient implementation:

websn...@gmail.com wrote:
> char * safestrcat (char * dst, const char * src) {
> if (*src) {
> char * dend = dst + strlen (dst);
> strcpy (dend + 1, src + 1);
> *dend = *src;
> }
> return dst;
> }
>
>
> Now tell me this cannot be translated to an optimzed solution on any platform.
>
It seems good to me. There seems to be no aliasing problem, and there
seems to be no overhead.

In that case, it might be worth considering either adding a safestrcat
function to the standard in the spirit of memcpy vs memove, or update
the specification of strcat (probably the simpliest solution) and
optionally provide a no_alias_strcat.

kuy...@wizard.net

unread,

Sep 21, 2006, 3:51:54 PM9/21/06

to

SuperKoko wrote:
...

> One might think that it's hard to write an efficient implementation of
> strcat accepting pointer aliasing.
> But there seems to be a portable, efficient implementation:
>
> websn...@gmail.com wrote:
> > char * safestrcat (char * dst, const char * src) {
> > if (*src) {
> > char * dend = dst + strlen (dst);
> > strcpy (dend + 1, src + 1);
> > *dend = *src;
> > }
> > return dst;
> > }
> >
> >
> > Now tell me this cannot be translated to an optimzed solution on any platform.
> >

> It seems good to me. There seems to be no aliasing problem, ...

Not quite. I strongly suspect that anyone who's surprised by the fact
that strcat(p,p) has undefined behavior is likely to be equally
surprised by the fact that the following code also has undefined
behavior:

char name[] = "Paul Hsieh";
char* space = strchr(name, ' ');
if(space)
{
*space = '\0'; // Split into two parts at space character.
safestrcat(name, space+1); // Merge first and second part of name.
}

David Wagner

unread,

Sep 21, 2006, 5:51:35 PM9/21/06

to

SuperKoko wrote:
>And I do think that any decent manual should warn the reader than
>dest and src must not be aliased.

Of course, a good manual should mention all required preconditions.

But my suggestion is that it would have been even better still for
the semantics of strcat() to have been designed so that no such restriction
is needed. Whenever you introduce restrictions like that, they have costs.
They add some mental burden to the programmer, who has to remember whether
or not strcat() can be safely used in the presence of aliasing or not.
And they introduce an opportunity for error.

As Mr. Hsieh has pointed out, it is possible to have a cleaner semantics
for strcat(), to eliminate this opportunity for error, and all while
maintaining performance that is as good or better than current implementations
of strcat(). I consider that a pretty serious criticism.

This is not just about Good Manuals vs Bad Manuals. This is about
Good Design of library interfaces.

>For an intermediate C programmer who knows C enough to understand the
>design principles of the C library... Does the expression strcat(p,p)
>seems dubious enough to him that he checks the manual as soon as he
>wants to use it?
>The answer is not obvious... I think that many programmer would.... But
>perhaps some programmers would not... In the latter case, it's really
>harmful.
>If that latter case is frequent enough, it might be worth revisiting
>the specification of strcat

Yes, of course some programmers may fail to double-check the manual,
perhaps because they don't realize the need or because they misremembered
what the manual said. As long as the number of programmers who fail
to check the manual is non-zero, there is some risk here. Why would we
want to take on unnecessary risks without any compensating benefits?

>That's the semantic of memcpy and memove if their third argument is 0.
>It requires that pointer point to objects, implying that:
>memcpy(dest, NULL, 0);
>Has undefined behavior (though, it doesn't crash on implementations I
>know).

Interesting, I didn't know that. Thanks for pointing that out.

>Now, allowing aliasing in strcat is probably far less benefitial, [...]

The claim is not that the design of strcat() is somehow catastrophic.
On its own, it's a minor irritant -- very minor. But this is just one
example. If this is one example of a larger phenomenom, then the sum
of many minor irritants can become a major problem.

>One might think that it's hard to write an efficient implementation of
>strcat accepting pointer aliasing.

>But there seems to be a portable, efficient implementation: [...]

Yes, as Mr. Hsieh already stated.

av

unread,

Sep 21, 2006, 8:48:09 PM9/21/06

to

Re: meaning of strcat(p,p)

char* strcat(char* a, char* b)
{char *p, *h;
if(a==0 || b==0) return 0;
for(p=a; *p; ++p) ;
h=p
if(a==b) {for(; b<h; ) *p++=*b++;
*p=0;
return a;
}
else while(b!=a && *p++=*b++);
return b==a ? 0: a;
}

not tested...
0 for error
and "strcat(p,p);" is ok (if p has right mem space)
don't know for strcat(p, p+4); or strcat(p+4, p);
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Douglas A. Gwyn

unread,

Sep 22, 2006, 1:34:43 PM9/22/06

to

SuperKoko wrote:
> Intuitively, for someone who has never used C but only other languages,
> the unintuitive thing about strcat is not that strcat(p,p) doesn't
> work, but that strcat("hello ", "world!") doesn't work!
> Intuitively for a newbie, strcat(x,y) doesn't modify y nor x and
> returns a new string...

Good points.

> ... Does the expression strcat(p,p) seems dubious enough to him
> that he checks the manual as soon as he wants to use it?

It would to a moderately experienced C programmer. Actually this
is something a C99 compiler could flag, since the prototype has
restrict qualifiers on its pointer parameters.

> My point is that, specifying that memcpy(dest, NULL, 0) is ok, would be
> almost negligible on all existing platforms, and would have a
> zero-overhead on many common platforms.

The C standard does explicitly allow null pointer arguments for
a few functions, where a good case was made for programming
convenience. However, for a lot of cases such as memcpy, it is
not likely that a correct program would even try to invoke the
function with a null pointer, so even if one wanted to extend
the existing semantics to give meaning to that, it isn't clear
that one is doing the programmer any favor. Far better if the
function *traps* (assert fails, etc.) for a detectable invalid
argument, that that it silently provides some artificial
semantics that is probably not what the program was meant to do.

On the rare occasions when it makes sense for an algorithm to
have somehow created a null pointer value at that point in the
code, it is easy enough for the caller of memcpy to do his own
explicit test, rather than depending on memcpy to do it. That
is also more general, in that the programmer gets to pick the
appropriate semantics.

There are implementations of C library functions that do detect
many instances of improper pointer values, etc. The standard
does not require that, because even a single extra instruction
per function invocation can be considered unacceptable when
multiplied by all calls of every function for every program for
every platform, especially when there is *no value at all* for
correctly-written programs.

SuperKoko

unread,

Sep 23, 2006, 7:06:26 AM9/23/06

to

Well, but this issue (I think), only concerns very beginner programmers
(it might be interesting to add a new safe & easy to use string API,
but it would be a distinct API, of course).
Moreover, such a programmer shall see that his programmer doesn't work
well, and shall have to check the spec.

Anyway, the main aim, here, is to prevent intermediate programmers from
accidentally aliasing pointers when they don't know well the strcat
specification.

> Of course, a good manual should mention all required preconditions.
>
> But my suggestion is that it would have been even better still for
> the semantics of strcat() to have been designed so that no such restriction
> is needed. Whenever you introduce restrictions like that, they have costs.
> They add some mental burden to the programmer, who has to remember whether
> or not strcat() can be safely used in the presence of aliasing or not.
> And they introduce an opportunity for error.
>

I agree.

> As Mr. Hsieh has pointed out, it is possible to have a cleaner semantics
> for strcat(), to eliminate this opportunity for error, and all while
> maintaining performance that is as good or better than current implementations
> of strcat(). I consider that a pretty serious criticism.
>

Well, I'll look at the benefits/tradeoffs of specifying that:

Benefits:
1) Easier to remember the spec, especially for intermediate
programmers.
2) Less many bugs in intermediate programmer's code, because of their
ignorance.

Tradeoffs:
1) None I see.

Tradeoffs that are NOT present:
1) Compatibility problem:
It's perfectly compatible with the current standard
2) Performance problems:
No apparent performance tradeoff, even on obfuscated machines (the
solution given by Heisch is portable).

How the transition would be done:
Actually compilers effectively produce bad behaviors with aliasing, so,
in a first time, compilers would integrate the new behavior in their
libraries
In a second time, programmers would be able to effectively, reliably,
use strcat(p,p) on all platforms.
In an intermediate time, programmers would have to remember the old
specification of strcat, since new compliant implementations would not
yet be available on all platforms.
That's not a big deal especially because such advanced programmers
would avoid aliasing everywhere at least for a long time.
The transition will probably not be hard.

So, I guess you've entirely convinced me.
That's not a major modification, but a minor modification with non
negligible (though not huge) benefits and no noticeable tradeoff.

> Yes, of course some programmers may fail to double-check the manual,
> perhaps because they don't realize the need or because they misremembered
> what the manual said. As long as the number of programmers who fail
> to check the manual is non-zero, there is some risk here. Why would we
> want to take on unnecessary risks without any compensating benefits?

Yes. My point was that, if tradeoffs where really non negligible (but I
think they're negligible or equal to zero), it would worth (but now,
it's no more necessary) to count effectively if this number of
programmers is equal to 1/1000000 or 1/10000 or 1/1000 or 1/100 or
1/10.. It's probably somewhere between a few percents and 1/1000. I
don't know the exact number.

webs...@gmail.com

unread,

Sep 23, 2006, 12:28:16 PM9/23/06

to

Its a concern for programmers who do it, see that it works fine, and
are none the wiser.

Obviously I introduced strcpy into the equation, forgetting that that
guy has *technically* weak aliasing semantics too. Clearly I am
intending the "obvious implementation" of strcpy (noticing the irony
here) which would never have an aliasing problem in that situation. So
let me accept a correction and also require that it be written on top
of semisafestrcpy() which, however its implemented (but 99% of existing
implementations already satisfy this criteria), does not fail so long
as the destination pointer doesn't alias to the same object *after* the
source parameter (something that is basically true today, even if the
spec doesn't require it). In such circumstances, the safestrcat()
function would work as advertised.

Keep in mind, that even using block techniques, hardware string
instructions, etc, there is no issue. Its only really twisted
implementations with non-linear memory where there can be a real
screw-up here.

> (it might be interesting to add a new safe & easy to use string API,
> but it would be a distinct API, of course).

This was the point of TR 24731 and "Managed Strings". (I've responded
claiming that they both failed to meet the standard of either safe or
easy, by missing aliasing and other issues.) Bstrlib exists and
although I have not submitted it as a proposal to the ANSI folks (there
are technical and philosophical reasons for not doing so) it should at
the very least serve as a model by which any other similar extensions
are judged. By his own admission, Richard Seacord says he didn't even
look at Bstrlib. I'm sorry, but at this point its not just a matter of
my personal ego. Bstrlib, by now, has enough propogation that it would
*have* to be an obvious thing to at least check for ideas before going
off to write your own string library.

> Moreover, such a programmer shall see that his programmer doesn't work
> well, and shall have to check the spec.

Actually, it will work just fine on nearly any existing system. Its
not technically well defined, but the fact that it works anyways should
tell us something.

> Anyway, the main aim, here, is to prevent intermediate programmers from
> accidentally aliasing pointers when they don't know well the strcat
> specification.

Well there are other aims -- it reduces what you need to understand to
be able to use the library properly, and expands the functionality to
include anything that is reasonably possible.

> David Wagner wrote:
> > Of course, a good manual should mention all required preconditions.
> >
> > But my suggestion is that it would have been even better still for
> > the semantics of strcat() to have been designed so that no such restriction
> > is needed. Whenever you introduce restrictions like that, they have costs.
> > They add some mental burden to the programmer, who has to remember whether
> > or not strcat() can be safely used in the presence of aliasing or not.
> > And they introduce an opportunity for error.
>
> I agree.
>
> > As Mr. Hsieh has pointed out, it is possible to have a cleaner semantics
> > for strcat(), to eliminate this opportunity for error, and all while
> > maintaining performance that is as good or better than current implementations
> > of strcat(). I consider that a pretty serious criticism.
>
> Well, I'll look at the benefits/tradeoffs of specifying that:
>
> Benefits:
> 1) Easier to remember the spec, especially for intermediate
> programmers.
> 2) Less many bugs in intermediate programmer's code, because of their
> ignorance.

You forgot:

3) Increases the actual functionality of the function (how else am I
supposed to concatenate a string or some tail of it to itself?)

The current standard simply takes functionality *away* from us.

> Tradeoffs:
> 1) None I see.

Actually the ANSI C committee would have to swallow some pride or
something. That cost is apparently extremely high.

> Tradeoffs that are NOT present:
> 1) Compatibility problem:
> It's perfectly compatible with the current standard

Correct. Implementations today can *already* solve the problem if they
feel like it. But there is the ego/pride cost for the ANSI C
committee. So my main point is that the problem can and should be
addressed in TR 24731 and the "Managed String" proposals, where the
analysis is the same. (Notice that MS went on various excursions to
solve all sorts of multi-threading issues in TR 24731 which are
technically irrelevant to the scope of the ANSI standard -- and yet
they ignore aliasing (beyond sticking "restrict" everywhere) which is
an existing problem for all platforms.)

> 2) Performance problems:
> No apparent performance tradeoff, even on obfuscated machines (the
> solution given by Heisch is portable).
>
> How the transition would be done:
> Actually compilers effectively produce bad behaviors with aliasing, so,
> in a first time, compilers would integrate the new behavior in their
> libraries
> In a second time, programmers would be able to effectively, reliably,
> use strcat(p,p) on all platforms.
> In an intermediate time, programmers would have to remember the old
> specification of strcat, since new compliant implementations would not
> yet be available on all platforms.
> That's not a big deal especially because such advanced programmers
> would avoid aliasing everywhere at least for a long time.
> The transition will probably not be hard.

Oh no -- the damage in the real strcat() has been done. We can't
actually save that, since many compilers just will not be updated
(Microsoft's comments about C99 are very telling.) The new APIs (TR
24731 or Managed Strings) are really the only place where there is any
hope. Users of Bstrlib don't need to worry about any of this of
course.

> So, I guess you've entirely convinced me.
> That's not a major modification, but a minor modification with non
> negligible (though not huge) benefits and no noticeable tradeoff.

The C library is filled with those, BTW.

> > Yes, of course some programmers may fail to double-check the manual,
> > perhaps because they don't realize the need or because they misremembered
> > what the manual said. As long as the number of programmers who fail
> > to check the manual is non-zero, there is some risk here. Why would we
> > want to take on unnecessary risks without any compensating benefits?

Sure. Now think about more experienced programmers who simply would
like to concatenate the tail of a string to the end of it. Should we
just throw our hands up? Give up any hope of hardware specific
acceleration and do things by hand (if I care about portability, for
example) and write a char-by-char loop? I guess we should compute the
length, and do things with memmove -- funny that, perhaps Bstrlib might
be worth taking a look at afterall.