Is strncmp allowed to read past first difference?

Andreas Schwab

unread,

Apr 4, 2011, 4:53:43 AM4/4/11

to

Is the following program strictly compliant?

#include <string.h>

char s1[10] = "1234567890";
char s2[10] = "1234567891";

int
main (void)
{
return strncmp (s1, s2, 42) == 0;
}

Or in the context of the implementation, is strncmp allowed to peek past
the first differing character upto the size as passed to it, assuming no
zero byte occurs before it, even if that would cause the program to
crash if those addresses were inaccessible?

The standard says that strncmp does not peek past the first zero
character, but, AFAICS, doesn't say that it stops reading at the first
differing characters. So I would say that the program above is causing
undefined behavior.

Andreas.

--
Andreas Schwab, sch...@redhat.com
GPG Key fingerprint = D4E8 DBE3 3813 BB5D FA84 5EC7 45C6 250E 6F00 984E
"And now for something completely different."

James Kuyper

unread,

Apr 4, 2011, 7:22:59 AM4/4/11

to

On 04/04/2011 04:53 AM, Andreas Schwab wrote:
> Is the following program strictly compliant?
>
> #include<string.h>
>
> char s1[10] = "1234567890";
> char s2[10] = "1234567891";
>
> int
> main (void)
> {
> return strncmp (s1, s2, 42) == 0;
> }
>
> Or in the context of the implementation, is strncmp allowed to peek past
> the first differing character upto the size as passed to it, assuming no
> zero byte occurs before it, even if that would cause the program to
> crash if those addresses were inaccessible?
>
> The standard says that strncmp does not peek past the first zero
> character, but, AFAICS, doesn't say that it stops reading at the first
> differing characters. So I would say that the program above is causing
> undefined behavior.

One could argue that permission to read from the the string pointed at
by s2 is implied only by the permission to compare characters. That
permission runs out at the first null character, or the first character
which does not match, or the Nth character, whichever comes first. If
you were to complain that this is a weak argument, I would have to agree.

However, without such an interpretation, most of the other string
functions in the standard library suffer from similar problems. I think
it's pretty clearly not the intent of the committee to allow any
standard library function which takes a pointer to a string as an
argument, to use that pointer to read past the end (or before the
beginning) of the string pointed at.

--
James Kuyper

Andreas Schwab

unread,

Apr 4, 2011, 9:17:53 AM4/4/11

to

James Kuyper <james...@verizon.net> writes:

> However, without such an interpretation, most of the other string
> functions in the standard library suffer from similar problems. I
> think it's pretty clearly not the intent of the committee to allow any
> standard library function which takes a pointer to a string as an
> argument, to use that pointer to read past the end (or before the
> beginning) of the string pointed at.

But unlike the other string functions, the strncmp arguments are not
strings, but arrays of characters. What is unclear is whether the size
of each array is required to be at least n if it contains no zero bytes.
There is no such ambiguity in the other length-limited functions
(strncat and strncpy), since those always read n characters in absense
of zero bytes.

lawrenc...@siemens.com

unread,

Apr 4, 2011, 7:57:26 PM4/4/11

to

Andreas Schwab <sch...@redhat.com> wrote:

> Or in the context of the implementation, is strncmp allowed to peek past
> the first differing character upto the size as passed to it, assuming no
> zero byte occurs before it, even if that would cause the program to
> crash if those addresses were inaccessible?

I don't see anything in the description of the function that disallows
it, so I would say that it is allowed.
--
Larry Jones

Hello, I'm wondering if you sell kegs of dynamite. -- Calvin

Tim Rentsch

unread,

Apr 10, 2011, 3:51:35 PM4/10/11

to

lawrenc...@siemens.com writes:

> Andreas Schwab <sch...@redhat.com> wrote:
>
>> Or in the context of the implementation, is strncmp allowed to peek past
>> the first differing character upto the size as passed to it, assuming no
>> zero byte occurs before it, even if that would cause the program to
>> crash if those addresses were inaccessible?
>
> I don't see anything in the description of the function that disallows
> it, so I would say that it is allowed.

The conclusion seems right but the reasoning seems wrong.
It does seem implicit in the description of strncmp that these
characters may be read, but I think it's more usual for
functions (and other language semantics) to be permitted
to do only what the Standard explicitly allows. Saying
the description doesn't disallow it and therefore it's allowed
is at odds with pretty much every other section in the Standard.

dbrower

unread,

Apr 11, 2011, 10:43:07 AM4/11/11

to

On Apr 10, 12:51 pm, Tim Rentsch <t...@alumni.caltech.edu> wrote:

If a further read isn't going to SEGV, I don't see how the language
rules could
prohibit it. If the observable result is the same, the
implementation is free to
do what seems to it best. Having the library read the string in 32
or 64 bit
words, w/o SEGV, is essentially the same as the hardware reading a
full
cache line, isn't it?

-dB

Keith Thompson

unread,

Apr 11, 2011, 11:33:04 AM4/11/11

to

Certainly it can read anything it likes if it doesn't change the
behavior. The question is whether it can read past the first
non-matching character even if it does change behavior.

The declaration of strncmp() is:

int strncmp(const char *s1, const char *s2, size_t n);

and the description is:

The strncmp function compares not more than n characters
(characters that follow a null character are not compared)
from the array pointed to by s1 to the array pointed to by s2.

The strncmp function returns an integer greater than, equal to,
or less than zero, accordingly as the possibly null-terminated
array pointed to by s1 is greater than, equal to, or less than
the possibly null-terminated array pointed to by s2.

I agree with Lawrence, based on the description, that strncmp() *may*
read all n characters, even if it doesn't have to. The standard
doesn't specify *how* strncmp() determines the result.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Tim Rentsch

unread,

Apr 11, 2011, 1:56:02 PM4/11/11

to

dbrower <dbr...@gmail.com> writes:

Certainly an implementation is allowed to read the whole
buffer (or anything else for that matter) if it _doesn't_
change the observable behavior. The question is whether
an implementation is allowed to read the "unnecessary" non-null
characters if doing so _does_ change the observable behavior.

Tim Rentsch

unread,

Apr 11, 2011, 2:18:49 PM4/11/11

to

Keith Thompson <ks...@mib.org> writes:

Like I said, I agree with his conclusion, just not with
the reasoning he used to get there.

> based on the description, that strncmp() *may*
> read all n characters, even if it doesn't have to. The standard
> doesn't specify *how* strncmp() determines the result.

Except strncmp() may not (observably) read any characters in
one of the argument arrays after the first null character in
that array. For example, I think everyone expects that code
like

char big[1000] = "foobas";
...
strncmp( big, "foo", 1000 );

is well defined, not undefined. The description seems intended
to allow each argument array to be read up to 'n' characters or
up to the first null character, whichever comes first, and this
allowance holds for each argument array independent of the
other. It would be nice if the description were more clear on
this point, although if someone asked for a list of passages in
the Standard that should be revised for clarity, this one
wouldn't be anywhere close to my top ten.

William Ahern

unread,

Apr 11, 2011, 3:41:44 PM4/11/11

to

Keith Thompson <ks...@mib.org> wrote:
<snip>

> Certainly it can read anything it likes if it doesn't change the
> behavior. The question is whether it can read past the first
> non-matching character even if it does change behavior.

> The declaration of strncmp() is:

> int strncmp(const char *s1, const char *s2, size_t n);

> and the description is:

> The strncmp function compares not more than n characters
> (characters that follow a null character are not compared)
> from the array pointed to by s1 to the array pointed to by s2.

> The strncmp function returns an integer greater than, equal to,
> or less than zero, accordingly as the possibly null-terminated
> array pointed to by s1 is greater than, equal to, or less than
> the possibly null-terminated array pointed to by s2.

> I agree with Lawrence, based on the description, that strncmp() *may*
> read all n characters, even if it doesn't have to. The standard
> doesn't specify *how* strncmp() determines the result.

C99 7.21.4p1 specifies this:

The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and *strncmp* is determined by the sign of the
difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being
compared.

If you read the definition of strncmp as giving permission to read only to
compare, and only to compare until a return value is derivable, then it
can't read past the first differing pair of characters. That's a tad more
persuasive then this thread has gotten so far, I think. But maybe not
sufficiently.

lawrenc...@siemens.com

unread,

Apr 11, 2011, 3:36:28 PM4/11/11

to

Tim Rentsch <t...@alumni.caltech.edu> wrote:
> lawrenc...@siemens.com writes:
>
> > Andreas Schwab <sch...@redhat.com> wrote:
> >
> >> Or in the context of the implementation, is strncmp allowed to peek past
> >> the first differing character upto the size as passed to it, assuming no
> >> zero byte occurs before it, even if that would cause the program to
> >> crash if those addresses were inaccessible?
> >
> > I don't see anything in the description of the function that disallows
> > it, so I would say that it is allowed.
>
> The conclusion seems right but the reasoning seems wrong.
> It does seem implicit in the description of strncmp that these
> characters may be read, but I think it's more usual for
> functions (and other language semantics) to be permitted
> to do only what the Standard explicitly allows.

Yes, I was perhaps overly terse. The description of strncmp says that
it "compares not more than n characters" and goes on to say that
"characters that follow a null character are not compared", but fails to
say anything about characters that follow a mismatch. So, in the
absense of null characters, there seems to be general license to read
all n characters and no prohibition against reading past a mismatch.
--
Larry Jones

If I was being raised in a better environment, I wouldn't
do things like that. -- Calvin

Vincent Lefevre

unread,

Apr 14, 2011, 10:47:23 AM4/14/11

to

In article <sq1c78-...@jones.homeip.net>,
lawrenc...@siemens.com wrote:

> Yes, I was perhaps overly terse. The description of strncmp says that
> it "compares not more than n characters" and goes on to say that
> "characters that follow a null character are not compared", but fails to
> say anything about characters that follow a mismatch.

Because when there is a mismatch, the result is well-defined. So,
the standard doesn't have to say anything.

Note that the text about the null character is necessary because
strncmp compares arrays of characters, not strings. Think about:

strncmp ("foo\0abc", "foo\0def", 5);

It should return 0. Without the text "characters that follow a null
character are not compared", the result would have been negative.

> So, in the absense of null characters, there seems to be general
> license to read all n characters and no prohibition against reading
> past a mismatch.

But the standard says nothing about the size of the arrays; and in
particular, it doesn't say that the arrays have at least n characters.
So, I think that an implementation must not read more than necessary
to deduce the result (if this changes the behavior, like giving a
segfault).

--
Vincent Lefèvre <vin...@vinc17.net> - Web: <http://www.vinc17.net/>
100% accessible validated (X)HTML - Blog: <http://www.vinc17.net/blog/>
Work: CR INRIA - computer arithmetic / Arénaire project (LIP, ENS-Lyon)

lawrenc...@siemens.com

unread,

Apr 14, 2011, 1:40:39 PM4/14/11

to

Vincent Lefevre <vincen...@vinc17.net> wrote:
>
> But the standard says nothing about the size of the arrays; and in
> particular, it doesn't say that the arrays have at least n characters.
> So, I think that an implementation must not read more than necessary
> to deduce the result (if this changes the behavior, like giving a
> segfault).

But it doesn't say that it only reads as many characters as are
necessary to determine the result, all it says is that it doesn't read
more than n characters and that it doesn't read past null bytes. Plus,
it says in 7.24.1 that "Various methods are used for determining the
lengths of the arrays...", and then goes on to say, "Where an argument
declared as size_t n specifies the length of the array...", which would
seem to apply in this case.
--
Larry Jones

I'll be a hulking, surly teen-ager before you know it!! -- Calvin

Vincent Lefevre

unread,

Apr 14, 2011, 7:37:11 PM4/14/11

to

In article <n5oj78-...@jones.homeip.net>,
lawrenc...@siemens.com wrote:

> Vincent Lefevre <vincen...@vinc17.net> wrote:
> >
> > But the standard says nothing about the size of the arrays; and in
> > particular, it doesn't say that the arrays have at least n characters.
> > So, I think that an implementation must not read more than necessary
> > to deduce the result (if this changes the behavior, like giving a
> > segfault).

> But it doesn't say that it only reads as many characters as are
> necessary to determine the result, all it says is that it doesn't read
> more than n characters

No, it doesn't say that. It says: "The strncmp function *compares*
not more than n characters". "compares", not "reads". One could
imagine that an implementation could read more than n characters
(indeed some processors are faster when reading data by block),
which could be problematic.

So, one should assume that an implementation isn't allowed to do
more than what the standard requires (except if this doesn't change
the behavior). Otherwise more or less any function would have
undefined behavior.

> and that it doesn't read past null bytes.

ditto: "compare", not "read".

> Plus, it says in 7.24.1 that "Various methods are used for
> determining the lengths of the arrays...",

but it doesn't say what method is used.

> and then goes on to say, "Where an argument declared as size_t n
> specifies the length of the array...", which would seem to apply in
> this case.

The description of strncmp does not say that n specifies the length
of the array. And it's quite clear that it does *not* specify the
length of the array, as I wouldn't imagine that

strncmp (s, "foo", 100);

would be disallowed (the length of the array "foo" is 4, certainly
not 100).

Dag-Erling Smørgrav

unread,

Apr 16, 2011, 4:13:50 AM4/16/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:
> So, one should assume that an implementation isn't allowed to do
> more than what the standard requires (except if this doesn't change
> the behavior). Otherwise more or less any function would have
> undefined behavior.

One should assume that an implementation is allowed to do whatever the
hell it pleases as long as the result is indistinguishable from what the
standard says. It is, after all, the *implementation*.

DES
--
Dag-Erling Smørgrav - d...@des.no

Tim Rentsch

unread,

Apr 16, 2011, 11:58:41 AM4/16/11

to

lawrenc...@siemens.com writes:

Oh good, that's basically the same reasoning that
I followed.

Tim Rentsch

unread,

Apr 16, 2011, 12:26:36 PM4/16/11

to

William Ahern <wil...@wilbur.25thandClement.com> writes:

An interesting argument, but not a convincing argument. This
paragraph applies equally to memcmp, strcmp, and strncmp, and
clearly the conclusion doesn't hold for memcmp (see 7.21.4.1p2);
so, it has no bearing on that aspect of strncmp. The statement
here is only descriptive of the return value for comparison
functions, not restrictive as to what objects can be read.

Tim Rentsch

unread,

Apr 16, 2011, 1:13:14 PM4/16/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <n5oj78-...@jones.homeip.net>,
> lawrenc...@siemens.com wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> wrote:
>> >
>> > But the standard says nothing about the size of the arrays; and in
>> > particular, it doesn't say that the arrays have at least n characters.
>> > So, I think that an implementation must not read more than necessary
>> > to deduce the result (if this changes the behavior, like giving a
>> > segfault).
>
>> But it doesn't say that it only reads as many characters as are
>> necessary to determine the result, all it says is that it doesn't read
>> more than n characters
>
> No, it doesn't say that. It says: "The strncmp function *compares*

> not more than n characters". "compares", not "reads". [snip]

That seems like a silly distinction. Surely we can infer
that to compare a value in X and a value in Y we must
first read the value in X, and read the value in Y.

> So, one should assume that an implementation isn't allowed to do
> more than what the standard requires (except if this doesn't change
> the behavior). Otherwise more or less any function would have
> undefined behavior.

Forgive me for being harsh, but I think the reasoning
here is bogus. The same reasoning applied to strcmp
would imply that it's okay to call strcmp on strings
that aren't null terminated but differ in some earlier
(legal) position. I don't think that's what people
expect, and it seems unreasonable to believe that all
implementations observe such a restriction. For example,
it seems perfectly plausible for an implementation of
strcmp() to begin

size_t length_1 = strlen( s1 );
size_t length_2 = strlen( s2 );

and I expect most implementors would think these lines
wouldn't cause any problems with conformance.

>> and that it doesn't read past null bytes.
>
> ditto: "compare", not "read".
>
>> Plus, it says in 7.24.1 that "Various methods are used for
>> determining the lengths of the arrays...",
>
> but it doesn't say what method is used.
>
>> and then goes on to say, "Where an argument declared as size_t n
>> specifies the length of the array...", which would seem to apply in
>> this case.
>
> The description of strncmp does not say that n specifies the length
> of the array. And it's quite clear that it does *not* specify the
> length of the array,

7.21.4.4p3 says in part:

accordingly as the possibly null-terminated array pointed to
by s1 is greater than, equal to, or less than the possibly
null-terminated array pointed to by s2.

Although it would benefit from being made more explicit, this
phrasing shows pretty clearly that strncmp() considers its arguments
to be, independently, possibly null-terminated arrays. And if
the arguments are indeed independent arrays, then n must specify
their lengths (if not null-terminated).

> as I wouldn't imagine that
>
> strncmp (s, "foo", 100);
>
> would be disallowed (the length of the array "foo" is 4, certainly
> not 100).

The 100 is meant as the length only if the array is not
null-terminated.

Considering the phrasing in 7.21.4.4p3, I think the only
sensible conclusion is that the intended semantics allows
up to 'n' characters to be read (but not past the first
null character) independently in either array, regardless
of the function's return value. Of course I would agree
that the existing wording could be improved and made more
explicit, but is there really any doubt here about what
reading is expected?

Phil Carmody

unread,

Apr 17, 2011, 8:45:31 AM4/17/11

to

Tim Rentsch <t...@alumni.caltech.edu> writes:

> Vincent Lefevre <vincen...@vinc17.net> writes:
> > lawrenc...@siemens.com wrote:
> >> Vincent Lefevre <vincen...@vinc17.net> wrote:
> >> >
> >> > But the standard says nothing about the size of the arrays; and in
> >> > particular, it doesn't say that the arrays have at least n characters.
> >> > So, I think that an implementation must not read more than necessary
> >> > to deduce the result (if this changes the behavior, like giving a
> >> > segfault).
> >
> >> But it doesn't say that it only reads as many characters as are
> >> necessary to determine the result, all it says is that it doesn't read
> >> more than n characters
> >
> > No, it doesn't say that. It says: "The strncmp function *compares*
> > not more than n characters". "compares", not "reads". [snip]
>
> That seems like a silly distinction. Surely we can infer
> that to compare a value in X and a value in Y we must
> first read the value in X, and read the value in Y.

You have the implication the wrong way round. You may not compare
unless you've read, clearly, but you may read and then not compare

An Alpha simply cannot read a single byte, for example, so it must
read more, even if it doesn't use the rest of the word for comparison.

Phil
--
"At least you know where you are with Microsoft."
"True. I just wish I'd brought a paddle." -- Matthew Vernon

Keith Thompson

unread,

Apr 17, 2011, 9:43:10 AM4/17/11

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:
>> Vincent Lefevre <vincen...@vinc17.net> writes:
>> > lawrenc...@siemens.com wrote:
>> >> Vincent Lefevre <vincen...@vinc17.net> wrote:
>> >> >
>> >> > But the standard says nothing about the size of the arrays; and in
>> >> > particular, it doesn't say that the arrays have at least n characters.
>> >> > So, I think that an implementation must not read more than necessary
>> >> > to deduce the result (if this changes the behavior, like giving a
>> >> > segfault).
>> >
>> >> But it doesn't say that it only reads as many characters as are
>> >> necessary to determine the result, all it says is that it doesn't read
>> >> more than n characters
>> >
>> > No, it doesn't say that. It says: "The strncmp function *compares*
>> > not more than n characters". "compares", not "reads". [snip]
>>
>> That seems like a silly distinction. Surely we can infer
>> that to compare a value in X and a value in Y we must
>> first read the value in X, and read the value in Y.
>
> You have the implication the wrong way round. You may not compare
> unless you've read, clearly, but you may read and then not compare
>
> An Alpha simply cannot read a single byte, for example, so it must
> read more, even if it doesn't use the rest of the word for comparison.

Right, but on an Alpha, reading the word containing a given byte
can't cause any visible problems if reading the byte itself (if
that were possible) wouldn't; the boundary between readable memory
and non-readable memory can't occur in the middle of a word.

Of course an implementation can read anything it likes if doing so
has no effect.

The real question, may an implementation of strncmp read extra bytes
*when that could have visible effects*?

The code in the original post was:

#include <string.h>

char s1[10] = "1234567890";
char s2[10] = "1234567891";

int
main (void)
{
return strncmp (s1, s2, 42) == 0;
}

The standard's description of strncmp is:

The strncmp function compares not more than n characters
(characters that follow a null character are not compared)
from the array pointed to by s1 to the array pointed to by s2.

The strncmp function returns an integer greater than, equal to,
or less than zero, accordingly as the possibly null-terminated

array pointed to by s1 is greater than, equal to, or less
than the possibly null-terminated array pointed to by s2.

7.21.4.5 The strxfrm function

Neither array contains a string (they're not '\0'-terminated), and
the third argument is 42, so one could argue that the call gives
strncmp permission to read as many as 42 bytes from each array.
If I write
s1[10] == s2[10]
my program's behavior is undefined; does writing
strncmp (s1, s2, 42)
potentially cause the same problem?

I know what the answer *should* be. strncmp shouldn't read past the
bonds of the array, because it doesn't need to. But I'm not quite
convinced that the standard explicitly forbids it to do so.

Tim Rentsch

unread,

Apr 17, 2011, 11:41:04 AM4/17/11

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
>> Vincent Lefevre <vincen...@vinc17.net> writes:
>> > lawrenc...@siemens.com wrote:
>> >> Vincent Lefevre <vincen...@vinc17.net> wrote:
>> >> >
>> >> > But the standard says nothing about the size of the arrays; and in
>> >> > particular, it doesn't say that the arrays have at least n characters.
>> >> > So, I think that an implementation must not read more than necessary
>> >> > to deduce the result (if this changes the behavior, like giving a
>> >> > segfault).
>> >
>> >> But it doesn't say that it only reads as many characters as are
>> >> necessary to determine the result, all it says is that it doesn't read
>> >> more than n characters
>> >
>> > No, it doesn't say that. It says: "The strncmp function *compares*
>> > not more than n characters". "compares", not "reads". [snip]
>>
>> That seems like a silly distinction. Surely we can infer
>> that to compare a value in X and a value in Y we must
>> first read the value in X, and read the value in Y.
>
> You have the implication the wrong way round. You may not compare
> unless you've read, clearly, but you may read and then not compare

> [snip example]

Did you misunderstand what I wrote? My implication is the same
as yours: 'compare implies read', not 'read implies compare'.

Vincent Lefevre

unread,

Apr 18, 2011, 3:24:55 AM4/18/11

to

In article <86fwpid...@ds4.des.no>,
Dag-Erling Smørgrav <d...@des.no> wrote:

Yes, that's what I meant by "except if this doesn't change the behavior".

Vincent Lefevre

unread,

Apr 18, 2011, 3:40:50 AM4/18/11

to

In article <kfnwriu...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > No, it doesn't say that. It says: "The strncmp function *compares*
> > not more than n characters". "compares", not "reads". [snip]

> That seems like a silly distinction. Surely we can infer
> that to compare a value in X and a value in Y we must
> first read the value in X, and read the value in Y.

The distinction is important for characters that are *not* compared.

> > So, one should assume that an implementation isn't allowed to do
> > more than what the standard requires (except if this doesn't change
> > the behavior). Otherwise more or less any function would have
> > undefined behavior.

> Forgive me for being harsh, but I think the reasoning
> here is bogus. The same reasoning applied to strcmp
> would imply that it's okay to call strcmp on strings
> that aren't null terminated but differ in some earlier

> (legal) position. [...]

No, the same reasoning cannot be applied to strcmp, because the
standard (7.21.4.2#2 in TC3) explicitely says: "The strcmp function
compares the *string* pointed to by s1 to the *string* pointed to
by s2."

It says "string", not "array", and ditto for 7.21.4.2#3. So, the
strcmp arguments must be strings, thus null-terminated.

> >> and then goes on to say, "Where an argument declared as size_t n
> >> specifies the length of the array...", which would seem to apply in
> >> this case.
> >
> > The description of strncmp does not say that n specifies the length
> > of the array. And it's quite clear that it does *not* specify the
> > length of the array,

> 7.21.4.4p3 says in part:

> accordingly as the possibly null-terminated array pointed to
> by s1 is greater than, equal to, or less than the possibly
> null-terminated array pointed to by s2.

> Although it would benefit from being made more explicit, this
> phrasing shows pretty clearly that strncmp() considers its arguments
> to be, independently, possibly null-terminated arrays. And if
> the arguments are indeed independent arrays, then n must specify
> their lengths (if not null-terminated).

The standard doesn't say that.

> > as I wouldn't imagine that
> >
> > strncmp (s, "foo", 100);
> >
> > would be disallowed (the length of the array "foo" is 4, certainly
> > not 100).

> The 100 is meant as the length only if the array is not
> null-terminated.

The standard doesn't make a difference concerning their length
depending on whether the arrays are null-terminated or not.

> Considering the phrasing in 7.21.4.4p3, I think the only
> sensible conclusion is that the intended semantics allows
> up to 'n' characters to be read (but not past the first
> null character) independently in either array, regardless
> of the function's return value. Of course I would agree
> that the existing wording could be improved and made more
> explicit, but is there really any doubt here about what
> reading is expected?

I don't see how you come to this conclusion. You're assuming
things not said in the standard.

Vincent Lefevre

unread,

Apr 18, 2011, 4:05:54 AM4/18/11

to

In article <ln4o5x6...@nuthaus.mib.org>,
Keith Thompson <ks...@mib.org> wrote:

> Phil Carmody <thefatphi...@yahoo.co.uk> writes:
> >> Vincent Lefevre <vincen...@vinc17.net> writes:
> >> > No, it doesn't say that. It says: "The strncmp function *compares*
> >> > not more than n characters". "compares", not "reads". [snip]
> >>
> >> That seems like a silly distinction. Surely we can infer
> >> that to compare a value in X and a value in Y we must
> >> first read the value in X, and read the value in Y.
> >
> > You have the implication the wrong way round. You may not compare
> > unless you've read, clearly, but you may read and then not compare
> >
> > An Alpha simply cannot read a single byte, for example, so it must
> > read more, even if it doesn't use the rest of the word for comparison.

> Right, but on an Alpha, reading the word containing a given byte
> can't cause any visible problems if reading the byte itself (if
> that were possible) wouldn't; the boundary between readable memory
> and non-readable memory can't occur in the middle of a word.

One can take another example: on the ARM, it can be faster to read
several words (placed in several registers) with a single instruction.
And if some words are in non-readable memory, using this instruction
isn't allowed as it would fail.

> Of course an implementation can read anything it likes if doing so
> has no effect.

> The real question, may an implementation of strncmp read extra bytes
> *when that could have visible effects*?

> The code in the original post was:

> #include <string.h>

> char s1[10] = "1234567890";
> char s2[10] = "1234567891";

> int
> main (void)
> {
> return strncmp (s1, s2, 42) == 0;
> }

> The standard's description of strncmp is:

> The strncmp function compares not more than n characters
> (characters that follow a null character are not compared)
> from the array pointed to by s1 to the array pointed to by s2.

> The strncmp function returns an integer greater than, equal to,
> or less than zero, accordingly as the possibly null-terminated
> array pointed to by s1 is greater than, equal to, or less
> than the possibly null-terminated array pointed to by s2.
> 7.21.4.5 The strxfrm function

> Neither array contains a string (they're not '\0'-terminated), and
> the third argument is 42, so one could argue that the call gives
> strncmp permission to read as many as 42 bytes from each array.

But I don't think the standard gives you this permission.
Certainly not explicitly.

> If I write
> s1[10] == s2[10]
> my program's behavior is undefined; does writing
> strncmp (s1, s2, 42)
> potentially cause the same problem?

> I know what the answer *should* be. strncmp shouldn't read past the
> bonds of the array, because it doesn't need to. But I'm not quite
> convinced that the standard explicitly forbids it to do so.

I'd say that it doesn't explicitly allow to do this. And as this is
not necessary, the implementation shouldn't assume that it may read
past differing characters.

BTW I think that using something like

strncmp (s1, s2, SIZE_MAX)

to compare '\n'-terminated characters sequences[*] that are known to
be different (by the context) would be quite convenient. Note that
one cannot use strcmp here because strcmp works on strings, and the
character sequences here are not null-terminated.

[*] I don't say "strings" just to avoid the ambiguity with C, but
in the manual of some application that would work on such data, they
could be described as "strings".

Vincent Lefevre

unread,

Apr 18, 2011, 4:08:29 AM4/18/11

to

In article <kfnwriu...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > No, it doesn't say that. It says: "The strncmp function *compares*
> > not more than n characters". "compares", not "reads". [snip]

> That seems like a silly distinction. Surely we can infer
> that to compare a value in X and a value in Y we must
> first read the value in X, and read the value in Y.

The distinction is important for characters that are *not* compared.

> > So, one should assume that an implementation isn't allowed to do

> > more than what the standard requires (except if this doesn't change
> > the behavior). Otherwise more or less any function would have
> > undefined behavior.

> Forgive me for being harsh, but I think the reasoning
> here is bogus. The same reasoning applied to strcmp
> would imply that it's okay to call strcmp on strings
> that aren't null terminated but differ in some earlier

> (legal) position. [...]

No, the same reasoning cannot be applied to strcmp, because the

standard (7.21.4.2#2 in TC3) explicitly says: "The strcmp function

compares the *string* pointed to by s1 to the *string* pointed to
by s2."

It says "string", not "array", and ditto for 7.21.4.2#3. So, the
strcmp arguments must be strings, thus null-terminated.

> >> and then goes on to say, "Where an argument declared as size_t n

> >> specifies the length of the array...", which would seem to apply in
> >> this case.
> >
> > The description of strncmp does not say that n specifies the length
> > of the array. And it's quite clear that it does *not* specify the
> > length of the array,

> 7.21.4.4p3 says in part:

> accordingly as the possibly null-terminated array pointed to
> by s1 is greater than, equal to, or less than the possibly
> null-terminated array pointed to by s2.

> Although it would benefit from being made more explicit, this
> phrasing shows pretty clearly that strncmp() considers its arguments
> to be, independently, possibly null-terminated arrays. And if
> the arguments are indeed independent arrays, then n must specify
> their lengths (if not null-terminated).

The standard doesn't say that.

> > as I wouldn't imagine that

> >
> > strncmp (s, "foo", 100);
> >
> > would be disallowed (the length of the array "foo" is 4, certainly
> > not 100).

> The 100 is meant as the length only if the array is not
> null-terminated.

The standard doesn't make a difference concerning their length

depending on whether the arrays are null-terminated or not.

> Considering the phrasing in 7.21.4.4p3, I think the only

> sensible conclusion is that the intended semantics allows
> up to 'n' characters to be read (but not past the first
> null character) independently in either array, regardless
> of the function's return value. Of course I would agree
> that the existing wording could be improved and made more
> explicit, but is there really any doubt here about what
> reading is expected?

I don't see how you come to this conclusion. You're assuming

things not said in the standard.

--

Tim Rentsch

unread,

Apr 18, 2011, 5:21:30 AM4/18/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <kfnwriu...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> writes:
>
>> > No, it doesn't say that. It says: "The strncmp function *compares*
>> > not more than n characters". "compares", not "reads". [snip]
>
>> That seems like a silly distinction. Surely we can infer
>> that to compare a value in X and a value in Y we must
>> first read the value in X, and read the value in Y.
>
> The distinction is important for characters that are *not* compared.

Your point is getting lost here. For what Larry was saying (it
was included in my posting but I guess you snipped it), it
doesn't matter whether "reads" or "compares" is used. Perhaps it
is true that the distinction matters in regards to _other_
matters, but in the context of Larry's statement there doesn't
seem to be any significant difference. Do you think otherwise?
Then please explain what it is.

>> > So, one should assume that an implementation isn't allowed to do
>> > more than what the standard requires (except if this doesn't change
>> > the behavior). Otherwise more or less any function would have
>> > undefined behavior.
>
>> Forgive me for being harsh, but I think the reasoning
>> here is bogus. The same reasoning applied to strcmp
>> would imply that it's okay to call strcmp on strings
>> that aren't null terminated but differ in some earlier
>> (legal) position. [...]
>
> No, the same reasoning cannot be applied to strcmp, because the
> standard (7.21.4.2#2 in TC3) explicitely says: "The strcmp function
> compares the *string* pointed to by s1 to the *string* pointed to
> by s2."
>
> It says "string", not "array", and ditto for 7.21.4.2#3. So, the
> strcmp arguments must be strings, thus null-terminated.

Okay, now you're cheating. You took a fuzzy argument, and let it
work one way in one case, and another way in the other case. You
can't have it both ways.

>> >> and then goes on to say, "Where an argument declared as size_t n
>> >> specifies the length of the array...", which would seem to apply in
>> >> this case.
>> >
>> > The description of strncmp does not say that n specifies the length
>> > of the array. And it's quite clear that it does *not* specify the
>> > length of the array,
>
>> 7.21.4.4p3 says in part:
>
>> accordingly as the possibly null-terminated array pointed to
>> by s1 is greater than, equal to, or less than the possibly
>> null-terminated array pointed to by s2.
>
>> Although it would benefit from being made more explicit, this
>> phrasing shows pretty clearly that strncmp() considers its arguments
>> to be, independently, possibly null-terminated arrays. And if
>> the arguments are indeed independent arrays, then n must specify
>> their lengths (if not null-terminated).
>
> The standard doesn't say that.

I believe I clearly differentiated the text of the standard
and what I infer to be the intended reading of that text. If
you want to disagree with my inference, that's fine, but please
don't put words in my mouth.

>> > as I wouldn't imagine that
>> >
>> > strncmp (s, "foo", 100);
>> >
>> > would be disallowed (the length of the array "foo" is 4, certainly
>> > not 100).
>
>> The 100 is meant as the length only if the array is not
>> null-terminated.
>
> The standard doesn't make a difference concerning their length
> depending on whether the arrays are null-terminated or not.

I think you meant the standard doesn't draw a distinction
concerning their length, etc. In any case, whether that
inference is true or not depends on how the stated requirements
are read (interpreted) by whoever is doing the reading.
Obviously different people read these particular requirements
in several different ways, so for some of them this conclusion
might hold whereas for others it wouldn't.

>> Considering the phrasing in 7.21.4.4p3, I think the only
>> sensible conclusion is that the intended semantics allows
>> up to 'n' characters to be read (but not past the first
>> null character) independently in either array, regardless
>> of the function's return value. Of course I would agree
>> that the existing wording could be improved and made more
>> explicit, but is there really any doubt here about what
>> reading is expected?
>
> I don't see how you come to this conclusion.

Yes, I can believe that you don't.

> You're assuming things not said in the standard.

Yes, of course I do. So does everyone who reads it. One
difference is, I try to make my assumptions consciously and
explicitly. Have you thought about what your operating
assumptions are for determining what the Standard really
requires? Can you explain what those assumptions are?

If we can bring the discussion up a level, I think we may be
working at cross purposes here. Are you trying to argue how
the Standard _must_ be understood on this point? How it
_ought_ to be understood on this point? How the committee
_expects_ it will be understood? How the committee believes
it _should_ be understood? What the Standard _ought_ to
require, regardless of what it does require? How the Standard
will be understood here, _given certain assumptions_? Which
of these applies to your comments? Or, if none of them do,
how would you express what it is you are hoping to accomplish?

Keith Thompson

unread,

Apr 18, 2011, 11:27:01 AM4/18/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:
> In article <ln4o5x6...@nuthaus.mib.org>,
> Keith Thompson <ks...@mib.org> wrote:

[snip]

>> Of course an implementation can read anything it likes if doing so
>> has no effect.
>
>> The real question, may an implementation of strncmp read extra bytes
>> *when that could have visible effects*?
>
>> The code in the original post was:
>
>> #include <string.h>
>
>> char s1[10] = "1234567890";
>> char s2[10] = "1234567891";
>
>> int
>> main (void)
>> {
>> return strncmp (s1, s2, 42) == 0;
>> }
>
>> The standard's description of strncmp is:
>
>> The strncmp function compares not more than n characters
>> (characters that follow a null character are not compared)
>> from the array pointed to by s1 to the array pointed to by s2.
>
>> The strncmp function returns an integer greater than, equal to,
>> or less than zero, accordingly as the possibly null-terminated
>> array pointed to by s1 is greater than, equal to, or less
>> than the possibly null-terminated array pointed to by s2.
>> 7.21.4.5 The strxfrm function
>
>> Neither array contains a string (they're not '\0'-terminated), and

>> strncmp permission to read as many as 42 bytes from each array.

>
> But I don't think the standard gives you this permission.
> Certainly not explicitly.

Ok. So exactly what permission does it give? I don't think there's
a very clear answer to that in the wording of the standard.

>> If I write
>> s1[10] == s2[10]
>> my program's behavior is undefined; does writing
>> strncmp (s1, s2, 42)
>> potentially cause the same problem?
>
>> I know what the answer *should* be. strncmp shouldn't read past the
>> bonds of the array, because it doesn't need to. But I'm not quite
>> convinced that the standard explicitly forbids it to do so.
>
> I'd say that it doesn't explicitly allow to do this. And as this is
> not necessary, the implementation shouldn't assume that it may read
> past differing characters.

I agree that it *shouldn't*, but I don't see that the standard
actually says that it *may not*.

> BTW I think that using something like
>
> strncmp (s1, s2, SIZE_MAX)
>
> to compare '\n'-terminated characters sequences[*] that are known to
> be different (by the context) would be quite convenient. Note that
> one cannot use strcmp here because strcmp works on strings, and the
> character sequences here are not null-terminated.
>
> [*] I don't say "strings" just to avoid the ambiguity with C, but
> in the manual of some application that would work on such data, they
> could be described as "strings".

I think memcmp() would make more sense.

Note that for this to be useful, you'd have to know that the arrays
differ somewhere within their actual sizes, but not know (or not be
able to compute easily) what those sizes are. Otherwise you could
just pass the size of the arrays (or the lesser of their sizes if
they differ) as the third argument.

Vincent Lefevre

unread,

Apr 19, 2011, 6:55:49 AM4/19/11

to

In article <kfnfwpf...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > In article <kfnwriu...@x-alumni2.alumni.caltech.edu>,
> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >
> >> Vincent Lefevre <vincen...@vinc17.net> writes:
> >
> >> > No, it doesn't say that. It says: "The strncmp function *compares*
> >> > not more than n characters". "compares", not "reads". [snip]
> >
> >> That seems like a silly distinction. Surely we can infer
> >> that to compare a value in X and a value in Y we must
> >> first read the value in X, and read the value in Y.
> >
> > The distinction is important for characters that are *not* compared.

> Your point is getting lost here. For what Larry was saying (it
> was included in my posting but I guess you snipped it), it
> doesn't matter whether "reads" or "compares" is used.

Larry said:

| But it doesn't say that it only reads as many characters as are
| necessary to determine the result, all it says is that it doesn't read
| more than n characters

and this is incorrect. The standard uses the word "compare", not
"read" (which would not make sense). But...

> Perhaps it is true that the distinction matters in regards to
> _other_ matters, but in the context of Larry's statement there
> doesn't seem to be any significant difference. Do you think
> otherwise? Then please explain what it is.

If you say that what Larry meant (by changing "read" to "compare"
in his statement) is: "the standard doesn't say that strncmp only
compares as many characters as are necessary to determine the result,
all it says is that it doesn't compare more than n characters".

That's true, but anyway the standard doesn't say that the
implementation is allowed to compare n characters for every
input (even not null-terminated).

> >> > So, one should assume that an implementation isn't allowed to do
> >> > more than what the standard requires (except if this doesn't change
> >> > the behavior). Otherwise more or less any function would have
> >> > undefined behavior.
> >
> >> Forgive me for being harsh, but I think the reasoning
> >> here is bogus. The same reasoning applied to strcmp
> >> would imply that it's okay to call strcmp on strings
> >> that aren't null terminated but differ in some earlier
> >> (legal) position. [...]
> >
> > No, the same reasoning cannot be applied to strcmp, because the
> > standard (7.21.4.2#2 in TC3) explicitely says: "The strcmp function
> > compares the *string* pointed to by s1 to the *string* pointed to
> > by s2."
> >
> > It says "string", not "array", and ditto for 7.21.4.2#3. So, the
> > strcmp arguments must be strings, thus null-terminated.

> Okay, now you're cheating. You took a fuzzy argument, and let it
> work one way in one case, and another way in the other case. You
> can't have it both ways.

I don't understand what you mean here. For strcmp, the standard
says "string", and for strncmp, the standard says "array". That's
an important difference.

If the standard had said "string" for strncmp, then I would agree that
reading n (non-null) characters would have been correct even if there
were a difference earlier, because a string would guarantee that the
first n characters (if non-null) are in readable memory.

> >> >> and then goes on to say, "Where an argument declared as size_t n
> >> >> specifies the length of the array...", which would seem to apply in
> >> >> this case.
> >> >
> >> > The description of strncmp does not say that n specifies the length
> >> > of the array. And it's quite clear that it does *not* specify the
> >> > length of the array,
> >
> >> 7.21.4.4p3 says in part:
> >
> >> accordingly as the possibly null-terminated array pointed to
> >> by s1 is greater than, equal to, or less than the possibly
> >> null-terminated array pointed to by s2.
> >
> >> Although it would benefit from being made more explicit, this
> >> phrasing shows pretty clearly that strncmp() considers its arguments
> >> to be, independently, possibly null-terminated arrays. And if
> >> the arguments are indeed independent arrays, then n must specify
> >> their lengths (if not null-terminated).
> >
> > The standard doesn't say that.

> I believe I clearly differentiated the text of the standard
> and what I infer to be the intended reading of that text. If
> you want to disagree with my inference, that's fine, but please
> don't put words in my mouth.

Well I meant that your inference is incorrect because it is not
based on what the standard really says.

> >> > as I wouldn't imagine that
> >> >
> >> > strncmp (s, "foo", 100);
> >> >
> >> > would be disallowed (the length of the array "foo" is 4, certainly
> >> > not 100).
> >
> >> The 100 is meant as the length only if the array is not
> >> null-terminated.
> >
> > The standard doesn't make a difference concerning their length
> > depending on whether the arrays are null-terminated or not.

> I think you meant the standard doesn't draw a distinction
> concerning their length, etc. In any case, whether that
> inference is true or not depends on how the stated requirements
> are read (interpreted) by whoever is doing the reading.
> Obviously different people read these particular requirements
> in several different ways, so for some of them this conclusion
> might hold whereas for others it wouldn't.

only by assuming things not stated by the standard.

> >> Considering the phrasing in 7.21.4.4p3, I think the only
> >> sensible conclusion is that the intended semantics allows
> >> up to 'n' characters to be read (but not past the first
> >> null character) independently in either array, regardless
> >> of the function's return value. Of course I would agree
> >> that the existing wording could be improved and made more
> >> explicit, but is there really any doubt here about what
> >> reading is expected?
> >
> > I don't see how you come to this conclusion.

> Yes, I can believe that you don't.

> > You're assuming things not said in the standard.

> Yes, of course I do.

What one(s) here? If this is just 7.21.4.4p3, could you explain?
e.g. what you said in <kfnwriu...@x-alumni2.alumni.caltech.edu>

"And if the arguments are indeed independent arrays, then n must
specify their lengths (if not null-terminated)."

Why "n must specify their lengths"? Why cannot the arrays have
a smaller length (in the case this is sufficient to deduce the
result)? And why would a null character take the precedence
over n *concerning the length of the array*?

> So does everyone who reads it. One
> difference is, I try to make my assumptions consciously and
> explicitly. Have you thought about what your operating
> assumptions are for determining what the Standard really
> requires? Can you explain what those assumptions are?

I didn't assume anything. For instance, the standard doesn't say
anything about the lengths of the arrays, so that one knows nothing
particular about them (just that it is necessary to satisfy the
semantics of strncmp).

> If we can bring the discussion up a level, I think we may be
> working at cross purposes here. Are you trying to argue how
> the Standard _must_ be understood on this point? How it
> _ought_ to be understood on this point? How the committee
> _expects_ it will be understood? How the committee believes
> it _should_ be understood? What the Standard _ought_ to
> require, regardless of what it does require? How the Standard
> will be understood here, _given certain assumptions_? Which
> of these applies to your comments? Or, if none of them do,
> how would you express what it is you are hoping to accomplish?

Ideally the answer should be the same for any of them (without
additional assumptions). Currently I disagree with your reasoning,
but perhaps it is incomplete (see the above questions).

Vincent Lefevre

unread,

Apr 19, 2011, 7:46:36 AM4/19/11

to

In article <lnsjtf6...@nuthaus.mib.org>,
Keith Thompson <ks...@mib.org> wrote:

I'd say: no particular permissions, i.e. just what is implied to
satisfy the semantics (I mean that if two characters need to be
compared to satisfy the semantics, then there is an implicit
permission to read them).

Now, one may wonder how we can know whether two characters need to be
compared or not. This is quite easy. The standard doesn't say in which
order the characters are compared (potentially in parallel), but if
k denotes the position of the first differing characters or null
characters (k = n if there are none), then the result depends on
the values of all the characters up to this position k, i.e. it is
necessary to read and compare all these characters (up to allowed
optimizations, as usual). And conversely, the result can be deduced
from the values of these characters only.

> >> If I write
> >> s1[10] == s2[10]
> >> my program's behavior is undefined; does writing
> >> strncmp (s1, s2, 42)
> >> potentially cause the same problem?
> >
> >> I know what the answer *should* be. strncmp shouldn't read past the
> >> bonds of the array, because it doesn't need to. But I'm not quite
> >> convinced that the standard explicitly forbids it to do so.
> >
> > I'd say that it doesn't explicitly allow to do this. And as this is
> > not necessary, the implementation shouldn't assume that it may read
> > past differing characters.

> I agree that it *shouldn't*, but I don't see that the standard
> actually says that it *may not*.

The user has provided two arrays of length 10, on which the semantics
of strncmp (whatever the value of n) is clearly defined (because of
the differing characters). So, from the specification of strncmp
(which doesn't require any additional constraint on the length of
the arrays), I assume that

strncmp (s1, s2, 42)

will give the expected answer (no undefined behavior).

> > BTW I think that using something like
> >
> > strncmp (s1, s2, SIZE_MAX)
> >
> > to compare '\n'-terminated characters sequences[*] that are known to
> > be different (by the context) would be quite convenient. Note that
> > one cannot use strcmp here because strcmp works on strings, and the
> > character sequences here are not null-terminated.
> >
> > [*] I don't say "strings" just to avoid the ambiguity with C, but
> > in the manual of some application that would work on such data, they
> > could be described as "strings".

> I think memcmp() would make more sense.

But how would you determine the value of n without doing the same
work first?

The standard says for memcmp:

"The memcmp function compares the first n characters of the object
pointed to by s1 to the first n characters of the object pointed to
by s2."

so that both s1 and s2 must have at least n characters.

> Note that for this to be useful, you'd have to know that the arrays
> differ somewhere within their actual sizes, but not know (or not be
> able to compute easily) what those sizes are. Otherwise you could
> just pass the size of the arrays (or the lesser of their sizes if
> they differ) as the third argument.

Yes, this is the problem: I don't know what the sizes are.

James Kuyper

unread,

Apr 19, 2011, 8:52:45 AM4/19/11

to

On 04/19/2011 07:46 AM, Vincent Lefevre wrote:
> In article <lnsjtf6...@nuthaus.mib.org>,
> Keith Thompson <ks...@mib.org> wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> writes:
>>> In article <ln4o5x6...@nuthaus.mib.org>,
>>> Keith Thompson <ks...@mib.org> wrote:

...

>>>> The code in the original post was:
>>>
>>>> #include <string.h>
>>>
>>>> char s1[10] = "1234567890";
>>>> char s2[10] = "1234567891";
>>>
>>>> int
>>>> main (void)
>>>> {
>>>> return strncmp (s1, s2, 42) == 0;
>>>> }

...

>>>> Neither array contains a string (they're not '\0'-terminated), and
>
>>>> strncmp permission to read as many as 42 bytes from each array.
>>>
>>> But I don't think the standard gives you this permission.
>>> Certainly not explicitly.
>
>> Ok. So exactly what permission does it give? I don't think there's
>> a very clear answer to that in the wording of the standard.
>
> I'd say: no particular permissions, i.e. just what is implied to
> satisfy the semantics (I mean that if two characters need to be
> compared to satisfy the semantics, then there is an implicit
> permission to read them).

I think replacing "particular" with "explicit" would make more sense;
those requirements seem pretty particular to me, but I can agree that
they are not explicit.

That interpretation prohibits an implementation that loads an entire
word's worth of characters from each array for comparison, compares the
words for equality, and only bothers breaking the words up into
individual characters for individual comparisons if the words are
different. This approach could significantly speed up strncmp() on some
platforms; are you sure the committee intended to prohibit it? Would it
be a good idea to prohibit it?
--
James Kuyper

lawrenc...@siemens.com

unread,

Apr 19, 2011, 10:27:59 AM4/19/11

to

Vincent Lefevre <vincen...@vinc17.net> wrote:
>
> If you say that what Larry meant (by changing "read" to "compare"
> in his statement) is: "the standard doesn't say that strncmp only
> compares as many characters as are necessary to determine the result,
> all it says is that it doesn't compare more than n characters".
>
> That's true, but anyway the standard doesn't say that the
> implementation is allowed to compare n characters for every
> input (even not null-terminated).

Right -- the standard doesn't clearly say one way or the other, so a
strictly conforming program isn't allowed to depend on it being one way
or the other. Which means that an implementation is free to do as it
likes.
--
Larry Jones

Please tell me I'm adopted. -- Calvin

Florian Weimer

unread,

Apr 19, 2011, 1:29:35 PM4/19/11

to

* Tim Rentsch:

> Except strncmp() may not (observably) read any characters in
> one of the argument arrays after the first null character in
> that array. For example, I think everyone expects that code
> like
>
> char big[1000] = "foobas";
> ...
> strncmp( big, "foo", 1000 );

The open question is whether you may pass pointers to arrays shorter
than 1000 characters when you specify 1000 as the character count.
This is not clear at all, unfortunately.

On the other hand, your example should better be valid because lots of
code relies on it. 8-)

Wojtek Lerch

unread,

Apr 19, 2011, 2:02:58 PM4/19/11

to

On 19/04/2011 10:27 AM, lawrenc...@siemens.com wrote:
> Right -- the standard doesn't clearly say one way or the other, so a
> strictly conforming program isn't allowed to depend on it being one way
> or the other. Which means that an implementation is free to do as it
> likes.

Is that really how it works? I thought that if the standard doesn't
clearly say one way ot the other, then it's simply not clear what a
strictly program is allowed to depend on or what an implementation is
required to do, but it's wise for them to try to err on the side of
caution. And if someone cares enough, they can report the ambiguity as
a defect in the standard and request that the Committee clarify their
intent.

You're not saying that there are places in the standard that were
intentionally made unclear, are you?

lawrenc...@siemens.com

unread,

Apr 19, 2011, 4:27:09 PM4/19/11

to

Wojtek Lerch <wojt...@yahoo.ca> wrote:
>
> Is that really how it works? I thought that if the standard doesn't
> clearly say one way ot the other, then it's simply not clear what a
> strictly program is allowed to depend on or what an implementation is
> required to do, but it's wise for them to try to err on the side of
> caution.

"A strictly conforming program...shall not produce output dependent on
any unspecified...behavior...."

> You're not saying that there are places in the standard that were
> intentionally made unclear, are you?

No, but there are things that have been deliberatly left unspecified or
under-specified (although I don't think this is one of them).
--
Larry Jones

Do you think God lets you plea bargain? -- Calvin

Wojtek Lerch

unread,

Apr 19, 2011, 11:32:44 PM4/19/11

to

On 19/04/2011 4:27 PM, lawrenc...@siemens.com wrote:
> Wojtek Lerch<wojt...@yahoo.ca> wrote:
>>
>> Is that really how it works? I thought that if the standard doesn't

>> clearly say one way or the other, then it's simply not clear what a

>> strictly program is allowed to depend on or what an implementation is
>> required to do, but it's wise for them to try to err on the side of
>> caution.
>
> "A strictly conforming program...shall not produce output dependent on
> any unspecified...behavior...."

And what unspecified behaviour are you referring to in this case?

3.4.4 "unspecified behavior: use of an unspecified value, or other
behavior where this International Standard provides two or more
possibilities and imposes no further requirements on which is chosen in
any instance"

I don't think this covers cases where the standard is unclear or
ambiguous and the two possibilities are "X is allowed" and "X is
forbidden" (note that those two are neither values nor behaviours).

>> You're not saying that there are places in the standard that were
>> intentionally made unclear, are you?
>
> No, but there are things that have been deliberatly left unspecified or
> under-specified (although I don't think this is one of them).

I assume you don't mean "unspecified" in the sense defined in 3.4.4?

James Kuyper

unread,

Apr 20, 2011, 6:36:45 AM4/20/11

to

On 04/19/2011 11:32 PM, Wojtek Lerch wrote:
> On 19/04/2011 4:27 PM, lawrenc...@siemens.com wrote:
>> Wojtek Lerch<wojt...@yahoo.ca> wrote:
>>>
>>> Is that really how it works? I thought that if the standard doesn't
>>> clearly say one way or the other, then it's simply not clear what a
>>> strictly program is allowed to depend on or what an implementation is
>>> required to do, but it's wise for them to try to err on the side of
>>> caution.
>>
>> "A strictly conforming program...shall not produce output dependent on
>> any unspecified...behavior...."
>
> And what unspecified behaviour are you referring to in this case?
>
> 3.4.4 "unspecified behavior: use of an unspecified value, or other
> behavior where this International Standard provides two or more
> possibilities and imposes no further requirements on which is chosen in
> any instance"
>
> I don't think this covers cases where the standard is unclear or
> ambiguous and the two possibilities are "X is allowed" and "X is
> forbidden" (note that those two are neither values nor behaviours).

You're right, the "two or more possibilities" he's referring to cannot
be "x is allowed" "x is forbidden". They have to be "x happens" and "x
does not happen": in other words, what 3.4.4 is saying is that unless
the standard explicitly says otherwise, "x is allowed". Keep in mind
that "says otherwise" includes what the standard says about the behavior
of strncmp(): that behavior must occur as described, so any unspecified
behavior that occurs cannot, among other things, prevent strncmp() from
returning it's specified value. Thus, anything, like a memory access
violation, which would abort the program and thereby prevent strncmp()
from returning it's specified value is prohibited.

If a program calls strncmp() with arguments that require it to access
memory locations which are inaccessible in order to perform the behavior
specified for strncmp() by the standard, then the onus for that
violation is on the caller, and strncmp() is under no obligation to
prevent that violation. However, strncmp() doesn't need to read any
memory after the first difference, so if it chooses to do so anyway, it
must protect any such read against the possibility of fatal access
violations.

>>> You're not saying that there are places in the standard that were
>>> intentionally made unclear, are you?
>>
>> No, but there are things that have been deliberatly left unspecified or
>> under-specified (although I don't think this is one of them).
>
> I assume you don't mean "unspecified" in the sense defined in 3.4.4?

I think he meant it in precisely that sense, but as applied to a
different pair of possibilities than you did.
--
James Kuyper

Tim Rentsch

unread,

Apr 20, 2011, 1:17:02 PM4/20/11

to

Florian Weimer <f...@deneb.enyo.de> writes:

> * Tim Rentsch:
>
>> Except strncmp() may not (observably) read any characters in
>> one of the argument arrays after the first null character in
>> that array. For example, I think everyone expects that code
>> like
>>
>> char big[1000] = "foobas";
>> ...
>> strncmp( big, "foo", 1000 );
>
> The open question is whether you may pass pointers to arrays shorter
> than 1000 characters when you specify 1000 as the character count.
> This is not clear at all, unfortunately.

I agree it is not as clear as it could be, and perhaps should be.
But not clear at all? If it really weren't clear /at all/ then
there would have been some question or DR about it sometime in the
20 years since the official description of strncmp() was written.
So apparently it's clear enough so implementors haven't felt any
need to ask about it.

> On the other hand, your example should better be valid because lots of
> code relies on it. 8-)

I don't know of anyone who contends that the Standard allows
undefined behavior for this kind of call to strncmp().

Tim Rentsch

unread,

Apr 20, 2011, 1:35:07 PM4/20/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

That isn't really a response to the question. What
are you hoping to accomplish by these postings?

(I have some other comments/responses but I have to
defer a longer reply until later.)

Tim Rentsch

unread,

Apr 20, 2011, 1:50:25 PM4/20/11

to

lawrenc...@siemens.com writes:

But, you do agree that the description of strncmp() is expected
to be understood as unspecified/under-specified, in that it
allows reading of characters after the point of mismatch
(with no reads in an array after a null in that array), even
though such reads are not absolutely necessary to determine
the return value. Right?

That is, even though unspecified behavior (and the resulting
possible undefined behavior) may not have been deliberate,
AFAYK this meaning is what was intended - yes?

Wojtek Lerch

unread,

Apr 21, 2011, 4:33:16 PM4/21/11

to

On 20/04/2011 6:36 AM, James Kuyper wrote:
> On 04/19/2011 11:32 PM, Wojtek Lerch wrote:
>> 3.4.4 "unspecified behavior: use of an unspecified value, or other
>> behavior where this International Standard provides two or more
>> possibilities and imposes no further requirements on which is chosen in
>> any instance"
>>
>> I don't think this covers cases where the standard is unclear or
>> ambiguous and the two possibilities are "X is allowed" and "X is
>> forbidden" (note that those two are neither values nor behaviours).
>
> You're right, the "two or more possibilities" he's referring to cannot
> be "x is allowed" "x is forbidden". They have to be "x happens" and "x
> does not happen": in other words, what 3.4.4 is saying is that unless
> the standard explicitly says otherwise, "x is allowed".

Um no, 3.4.4 is just the definition of "unspecified". It says nothing
about any behaviours except those that the standard refers to as
"unspecified". Whether x is allowed can potentially depend on many
things that the standard says explicitly, implies, or is silent about,
and also on what exactly you mean by "allowed" and whether x is an
action by a program or by the implementation.

(In this case, substitute "x is allowed" with "the implementation is
allowed to access all the n bytes if none of them is a null character",
or with "the program is allowed to provide arrays that are shorter than
n bytes, as long as they differ or are null-terminated". At least one
of those must be false, even though the standard does not explicitly say
that.)

> Keep in mind
> that "says otherwise" includes what the standard says about the behavior
> of strncmp(): that behavior must occur as described, so any unspecified
> behavior that occurs cannot, among other things, prevent strncmp() from
> returning it's specified value.

Except in programs that have undefined behaviour because they have
violated some requirement of the standard.

> Thus, anything, like a memory access
> violation, which would abort the program and thereby prevent strncmp()
> from returning it's specified value is prohibited.

Unless the program has undefined behaviour.

> If a program calls strncmp() with arguments that require it to access
> memory locations which are inaccessible in order to perform the behavior
> specified for strncmp() by the standard,

... or just arguments for which the standard does not define the
behaviour, or specifically says that they cause undefined behaviour,...

> then the onus for that
> violation is on the caller, and strncmp() is under no obligation to
> prevent that violation. However, strncmp() doesn't need to read any
> memory after the first difference, so if it chooses to do so anyway, it
> must protect any such read against the possibility of fatal access
> violations.

What strcmp() "needs" to read is the wrong question. The right
questions are what conditions have to be met by the program to avoid
undefined behaviour, and if they are met, what coditions must then be
met by the implementation to ensure conformance. In this case, the text
says that the function compares no more than n characters, but not
beyond the first null character; I don't find it outrageously illogical
to argue that it's therefore the program's responsibility to ensure that
the arguments point to arrays that are either null-terminated or at
least n characters long. Just because it might be possible for the
implementation to fulfill the same description of semantics if the
pre-conditions were looser does not mean that they actually are looser.

>>>> You're not saying that there are places in the standard that were
>>>> intentionally made unclear, are you?
>>>
>>> No, but there are things that have been deliberatly left unspecified or
>>> under-specified (although I don't think this is one of them).
>>
>> I assume you don't mean "unspecified" in the sense defined in 3.4.4?
>
> I think he meant it in precisely that sense, but as applied to a
> different pair of possibilities than you did.

Perhaps. But, frankly, I am not really in a mood for second-guessing
what he might possibly have meant any more than I'm in the mood for
second-guessing the standard. As far as I am concerned, neither he nor
the standard is clear about their intent, and unless they're willing to
clarify it, I don't see much point in trying to guess it.

James Kuyper

unread,

Apr 21, 2011, 6:05:59 PM4/21/11

to

On 04/21/2011 04:33 PM, Wojtek Lerch wrote:
> On 20/04/2011 6:36 AM, James Kuyper wrote:
>> On 04/19/2011 11:32 PM, Wojtek Lerch wrote:
>>> 3.4.4 "unspecified behavior: use of an unspecified value, or other
>>> behavior where this International Standard provides two or more
>>> possibilities and imposes no further requirements on which is chosen in
>>> any instance"
>>>
>>> I don't think this covers cases where the standard is unclear or
>>> ambiguous and the two possibilities are "X is allowed" and "X is
>>> forbidden" (note that those two are neither values nor behaviours).
>>
>> You're right, the "two or more possibilities" he's referring to cannot
>> be "x is allowed" "x is forbidden". They have to be "x happens" and "x
>> does not happen": in other words, what 3.4.4 is saying is that unless
>> the standard explicitly says otherwise, "x is allowed".
>
> Um no, 3.4.4 is just the definition of "unspecified". It says nothing
> about any behaviours except those that the standard refers to as
> "unspecified". Whether x is allowed can potentially depend on many
> things that the standard says explicitly, implies, or is silent about,
> and also on what exactly you mean by "allowed" and whether x is an
> action by a program or by the implementation.

In this case, by "x is allowed", I mean that an implementation remains
conforming regardless of whether or not it does "x". When the standard
leaves it unspecified whether or not x occurs, then conformance with the
standard cannot depend upon whether or not "x" occurs.

I agree that when the standard is silent about the behavior,
"unspecified" is not the correct word; the correct word is undefined.
The standard must, at least implicitly, provide two or more permitted
behaviors, in order for the choice between those behaviors to be
unspecified.

...

>> Keep in mind
>> that "says otherwise" includes what the standard says about the behavior
>> of strncmp(): that behavior must occur as described, so any unspecified
>> behavior that occurs cannot, among other things, prevent strncmp() from
>> returning it's specified value.
>
> Except in programs that have undefined behaviour because they have
> violated some requirement of the standard.

Of course.

>> Thus, anything, like a memory access
>> violation, which would abort the program and thereby prevent strncmp()
>> from returning it's specified value is prohibited.
>
> Unless the program has undefined behaviour.

Agreed.

>> If a program calls strncmp() with arguments that require it to access
>> memory locations which are inaccessible in order to perform the behavior
>> specified for strncmp() by the standard,
>
> ... or just arguments for which the standard does not define the
> behaviour, or specifically says that they cause undefined behaviour,...

Agreed.

>> then the onus for that
>> violation is on the caller, and strncmp() is under no obligation to
>> prevent that violation. However, strncmp() doesn't need to read any
>> memory after the first difference, so if it chooses to do so anyway, it
>> must protect any such read against the possibility of fatal access
>> violations.
>
> What strcmp() "needs" to read is the wrong question. The right
> questions are what conditions have to be met by the program to avoid
> undefined behaviour, and if they are met, what coditions must then be
> met by the implementation to ensure conformance. In this case, the text
> says that the function compares no more than n characters, but not
> beyond the first null character; I don't find it outrageously illogical
> to argue that it's therefore the program's responsibility to ensure that
> the arguments point to arrays that are either null-terminated or at
> least n characters long. Just because it might be possible for the
> implementation to fulfill the same description of semantics if the
> pre-conditions were looser does not mean that they actually are looser.

Behavior can be undefined by reason of the absence of a definition; but
that rule is frequently misused. The key point is that an applicable
definition must be actually absent; it can't merely fail to say anything
explicit about a special case; as long as what it does say can apply to
that case. The definition provided for strncmp() does cover the case
you're worried about; that it doesn't say anything special about that
case merely implies that there is nothing special to say about that
case; it doesn't make the behavior in that case undefined.

Definitions of the behavior don't have to provide an explicit list of
everything that is not allowed to happen; requiring that they do so
would force the replacement of each and every clause in the standard
with something bigger than the Library of Congress. When one particular
behavior is defined by the standard, no other behavior that is within
the scope of this standard is allowed to happen (except insofar as it is
covered by the as-if rule).
--
James Kuyper

Wojtek Lerch

unread,

Apr 21, 2011, 11:14:45 PM4/21/11

to

On 21/04/2011 6:05 PM, James Kuyper wrote:
> On 04/21/2011 04:33 PM, Wojtek Lerch wrote:
>> On 20/04/2011 6:36 AM, James Kuyper wrote:
>>> You're right, the "two or more possibilities" he's referring to cannot
>>> be "x is allowed" "x is forbidden". They have to be "x happens" and "x
>>> does not happen": in other words, what 3.4.4 is saying is that unless
>>> the standard explicitly says otherwise, "x is allowed".
>>

>> Um no, 3.4.4 is just the definition of "unspecified". [...]

>
> In this case, by "x is allowed", I mean that an implementation remains
> conforming regardless of whether or not it does "x". When the standard
> leaves it unspecified whether or not x occurs, then conformance with the
> standard cannot depend upon whether or not "x" occurs.

Sure. The only thing I was disagreeing with there was that it's 3.4.4
that's saying that. :)

> I agree that when the standard is silent about the behavior,
> "unspecified" is not the correct word; the correct word is undefined.
> The standard must, at least implicitly, provide two or more permitted
> behaviors, in order for the choice between those behaviors to be
> unspecified.

Actually, when the two possibilities are "x happens" and "x does not
happen", I think it's perfectly acceptable to say that it's unspecified
whether x happens or not, even if the standard doesn't explicitly say
that, or even mention that those two possibilities exist. I would tend
to avoid using the word "undefined" when discussing such cases.

>>> [...] However, strncmp() doesn't need to read any

>>> memory after the first difference, so if it chooses to do so anyway, it
>>> must protect any such read against the possibility of fatal access
>>> violations.
>>

>> What strcmp() "needs" to read is the wrong question. [...]

> Behavior can be undefined by reason of the absence of a definition; but
> that rule is frequently misused.

Perhaps; but I was not trying to apply it here.

> [...] The definition provided for strncmp() does cover the case

> you're worried about; that it doesn't say anything special about that
> case merely implies that there is nothing special to say about that
> case; it doesn't make the behavior in that case undefined.

The definition provided for strncmp() consists of two parts. The first
part says that the function compares no more than n characters from each
array and that characters that follow a null character are not compared.
The second part specifies what value the function returns.

This is how I would summarize the two competing interpretations that
this thread is about:

#1 The two parts should be interpreted somewhat separately. The first
part implies that in the absence of a null character the function is
allowed to compare n bytes (regardless of whether it needs to or not).
This constitutes a pre-condition that the program must satisfy, or
otherwise the behaviour is undefined (not by omission, but by failing to
ensure that what the standard allows the implementation to do is safe).
The second part does not cancel any undefined behaviour that a program
has invoked by violating the first part.

#2 The two parts should be interpreted together, as a description of an
algorithm (a little bit like the standard's description of asctime()).
Since that algorithm doesn't compare characters following the first
difference, the implementation is not allowed to read them either, and
programs are not required to ensure that reading them would be safe.

Personally I don't have the feeling that the text clearly favours either
one of those interpretations. As a programmer, I prefer #1 because it's
safer. If I were an implementer, I'd prefer #2 for the same reason.
But as far as comp.std.c goes, I think the important thing is that this
is an ambiguity in the standard that deserves to be recognized as a defect.

James Kuyper

unread,

Apr 22, 2011, 7:04:47 AM4/22/11

to

On 04/21/2011 11:14 PM, Wojtek Lerch wrote:
> On 21/04/2011 6:05 PM, James Kuyper wrote:

...

>> I agree that when the standard is silent about the behavior,
>> "unspecified" is not the correct word; the correct word is undefined.
>> The standard must, at least implicitly, provide two or more permitted
>> behaviors, in order for the choice between those behaviors to be
>> unspecified.
>
> Actually, when the two possibilities are "x happens" and "x does not
> happen", I think it's perfectly acceptable to say that it's unspecified
> whether x happens or not, even if the standard doesn't explicitly say
> that, or even mention that those two possibilities exist. I would tend
> to avoid using the word "undefined" when discussing such cases.

True, but when the standard is silent, about something, there's usually
also a "y happens" and a "x and y both happen", among infinitely many
other possibilities. In this case, y might be "strncmp writes into the
arrays" or "strncmp() opens a file with a name matching the first string
argument". The standard doesn't say anything more to prohibit either of
those options, than it does to prohibit reading past the characters that
need to be read to in order to determine strncmp()'s return value. I
believe that the description that is provided implicitly allows
strncmp() to do anything whose observable consequences are restricted to
those specified in the description. This includes reading past the
needed characters, but only so long as doing so has no observable
consequences; a memory access violation is observable in this sense.

...

> #2 The two parts should be interpreted together, as a description of an
> algorithm (a little bit like the standard's description of asctime()).
> Since that algorithm doesn't compare characters following the first
> difference, the implementation is not allowed to read them either, and
> programs are not required to ensure that reading them would be safe.

That's basically my understanding, except that I wouldn't say that the
strncmp() can't read additional characters beyond the ones it needs to.
It's allowed to read anything it wants to, but only insofar as such
reading is covered by the as-if rule.

...

> But as far as comp.std.c goes, I think the important thing is that this
> is an ambiguity in the standard that deserves to be recognized as a defect.

It doesn't feel ambiguous to me.
--
James Kuyper

Florian Weimer

unread,

Apr 22, 2011, 5:45:21 PM4/22/11

to

* Tim Rentsch:

> I don't know of anyone who contends that the Standard allows
> undefined behavior for this kind of call to strncmp().

I believe Andreas' question was prompted by a change to GNU libc which
assumes undefined behavior in this case. Someone wrote that patch and
apparently assumes that there is a loophole here. 8-)

Phil Carmody

unread,

Apr 24, 2011, 10:25:26 AM4/24/11

to

Tim Rentsch <t...@alumni.caltech.edu> writes:

Bizarre. Total brainfart.

Phil
--
"At least you know where you are with Microsoft."
"True. I just wish I'd brought a paddle." -- Matthew Vernon

Vincent Lefevre

unread,

Apr 27, 2011, 7:18:50 AM4/27/11

to

In article <iok0iu$an7$1...@dont-email.me>,
James Kuyper <james...@verizon.net> wrote:

> That interpretation prohibits an implementation that loads an entire
> word's worth of characters from each array for comparison, compares the
> words for equality, and only bothers breaking the words up into
> individual characters for individual comparisons if the words are
> different. This approach could significantly speed up strncmp() on some
> platforms; are you sure the committee intended to prohibit it? Would it
> be a good idea to prohibit it?

It depends on whether one may accept to regard

strncmp( big, "foo", 1000 );

as undefined behavior or not.

Vincent Lefevre

unread,

Apr 27, 2011, 7:22:55 AM4/27/11

to

In article <kfn4o5s...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > In article <kfnfwpf...@x-alumni2.alumni.caltech.edu>,
> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >>
> >> If we can bring the discussion up a level, I think we may be
> >> working at cross purposes here. Are you trying to argue how
> >> the Standard _must_ be understood on this point? How it
> >> _ought_ to be understood on this point? How the committee
> >> _expects_ it will be understood? How the committee believes
> >> it _should_ be understood? What the Standard _ought_ to
> >> require, regardless of what it does require? How the Standard
> >> will be understood here, _given certain assumptions_? Which
> >> of these applies to your comments? Or, if none of them do,
> >> how would you express what it is you are hoping to accomplish?
> >
> > Ideally the answer should be the same for any of them (without
> > additional assumptions). Currently I disagree with your reasoning,
> > but perhaps it is incomplete (see the above questions).

> That isn't really a response to the question. What
> are you hoping to accomplish by these postings?

I don't understand what you mean.

My point is about what the standard says. Not more.

James Kuyper

unread,

Apr 27, 2011, 7:36:11 AM4/27/11

to

On 04/27/2011 07:18 AM, Vincent Lefevre wrote:
> In article <iok0iu$an7$1...@dont-email.me>,
> James Kuyper <james...@verizon.net> wrote:
>
>> That interpretation prohibits an implementation that loads an entire
>> word's worth of characters from each array for comparison, compares the
>> words for equality, and only bothers breaking the words up into
>> individual characters for individual comparisons if the words are
>> different. This approach could significantly speed up strncmp() on some
>> platforms; are you sure the committee intended to prohibit it? Would it
>> be a good idea to prohibit it?
>
> It depends on whether one may accept to regard
>
> strncmp( big, "foo", 1000 );
>
> as undefined behavior or not.

The standard provides a definition of the behavior, so we cannot infer
undefined behavior due to the absence of a definition. That implies that
if reading beyond the end of "foo" could cause behavior inconsistent
with that description, the implementation is obligated to ensure that
strncmp() does not do so.

Your statement of the implied requirement seems somewhat stronger; it
seems to prohibit reading additional characters, even if there's no
resulting problematic behavior. However, under the as-if rule, that
doesn't really matter.
--
James Kuyper

Vincent Lefevre

unread,

Apr 27, 2011, 8:46:38 AM4/27/11

to

In article <91ca4b...@mid.individual.net>,
Wojtek Lerch <wojt...@yahoo.ca> wrote:

> The definition provided for strncmp() consists of two parts. The first
> part says that the function compares no more than n characters from each
> array and that characters that follow a null character are not compared.
> The second part specifies what value the function returns.

This ("[...] consists of two parts") is completely not true. There's
everything else in the standard that can relate to strncmp, and in
particular 7.21.4p1, where strncmp is explicitly mentioned:

The sign of a nonzero value returned by the comparison functions
memcmp, strcmp, and strncmp is determined by the sign of the
difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being
compared.

I'd say that's the "real" first part. 7.21.4.4p2 completes this
description by saying that one may need to stop earlier in the
comparisons (from a *semantics* point of view):

The strncmp function compares not more than n characters (characters
that follow a null character are not compared) from the array
pointed to by s1 to the array pointed to by s2.

> This is how I would summarize the two competing interpretations that
> this thread is about:

> #1 The two parts should be interpreted somewhat separately. The first
> part implies that in the absence of a null character the function is
> allowed to compare n bytes (regardless of whether it needs to or not).
> This constitutes a pre-condition that the program must satisfy, or
> otherwise the behaviour is undefined (not by omission, but by failing to
> ensure that what the standard allows the implementation to do is safe).
> The second part does not cancel any undefined behaviour that a program
> has invoked by violating the first part.

I wonder whether it is correct to interpret parts separately.
By taking clauses out of context, one can probably find
contradictions elsewhere in the standard.

Note that "not more than n ..." is not equivalent to "n except
in the case ... [where it is less than n]". If this is what is
meant, this should be rewritten as "[...] compares the first n
characters, except [...]" (note: "the first n characters" would
be the same wording as in the memcmp description).

I would agree with the implication only if you consider that
everything that is not forbidden is allowed: it is not forbidden
to compare n characters (except when there is a null character),
so it is allowed to compare them. But then I would say that since
it is not forbidden to *read* n+1 characters (or read characters
after a null character), then it is allowed to read them, and
strncmp would be rather useless.

So, your interpretation is based on something that is not explicitly
said by the standard.

> #2 The two parts should be interpreted together, as a description of an
> algorithm (a little bit like the standard's description of asctime()).

I wouldn't say that they (with the "real" first part I've mentioned
above) describe an algorithm, but that they describe the semantics
(as a recursive function -- this looks like an algorithm but that's
not important at all).

> Since that algorithm doesn't compare characters following the first
> difference, the implementation is not allowed to read them either, and
> programs are not required to ensure that reading them would be safe.

This is not my reasoning, though the conclusion would be the same
here. My point is that that standard doesn't say what is the size of
the objects, so that the implementation shouldn't assume anything on
their size, *provided* that the behavior (specified by the semantics)
is well-defined: the objects must have at least size 1; if the first
characters are the same and not null and n >= 2, then the objects
must have at least size 2, otherwise the behavior would be obviously
undefined; and so on...

> Personally I don't have the feeling that the text clearly favours either
> one of those interpretations. As a programmer, I prefer #1 because it's
> safer. If I were an implementer, I'd prefer #2 for the same reason.
> But as far as comp.std.c goes, I think the important thing is that this
> is an ambiguity in the standard that deserves to be recognized as a defect.

I agree.

Vincent Lefevre

unread,

Apr 27, 2011, 9:10:57 AM4/27/11

to

In article <ip8v3c$45p$1...@dont-email.me>,
James Kuyper <james...@verizon.net> wrote:

> On 04/27/2011 07:18 AM, Vincent Lefevre wrote:
> > In article <iok0iu$an7$1...@dont-email.me>,
> > James Kuyper <james...@verizon.net> wrote:
> >
> >> That interpretation prohibits an implementation that loads an entire
> >> word's worth of characters from each array for comparison, compares the
> >> words for equality, and only bothers breaking the words up into
> >> individual characters for individual comparisons if the words are
> >> different. This approach could significantly speed up strncmp() on some
> >> platforms; are you sure the committee intended to prohibit it? Would it
> >> be a good idea to prohibit it?
> >
> > It depends on whether one may accept to regard
> >
> > strncmp( big, "foo", 1000 );
> >
> > as undefined behavior or not.

> The standard provides a definition of the behavior, so we cannot infer
> undefined behavior due to the absence of a definition.

[...]

I agree (though some people would interpret the standard differently).
But then, the implementation you proposed would no longer work
correctly on such a case.

What I want to say is that the fact that the standard doesn't allow
reading characters after a null character (e.g. in the case where
there would be a memory boundary, so that the as-if rule doesn't
apply) limits somewhat the implementation. And there would be almost
no benefits by allowing to read n characters when there are no null
characters.

Or do you know how an implementation could take the advantage of
reading n characters, e.g. if the processor has an instruction to
read 16 bytes at a time very quickly?

Such an implementation should work on both

char s1[17] = "12345678901234567";
char s2[17] = "12345678910000000";
strncmp (s1, s2, 17);

and

char s1[10] = "123456789\0";

char s2[10] = "1234567891";

strncmp (s1, s2, 17);

James Kuyper

unread,

Apr 27, 2011, 9:34:54 AM4/27/11

to

On 04/27/2011 09:10 AM, Vincent Lefevre wrote:
> In article <ip8v3c$45p$1...@dont-email.me>,
> James Kuyper <james...@verizon.net> wrote:
>
>> On 04/27/2011 07:18 AM, Vincent Lefevre wrote:
>>> In article <iok0iu$an7$1...@dont-email.me>,
>>> James Kuyper <james...@verizon.net> wrote:
>>>
>>>> That interpretation prohibits an implementation that loads an entire
>>>> word's worth of characters from each array for comparison, compares the
>>>> words for equality, and only bothers breaking the words up into
>>>> individual characters for individual comparisons if the words are
>>>> different. This approach could significantly speed up strncmp() on some
>>>> platforms; are you sure the committee intended to prohibit it? Would it
>>>> be a good idea to prohibit it?
>>>
>>> It depends on whether one may accept to regard
>>>
>>> strncmp( big, "foo", 1000 );
>>>
>>> as undefined behavior or not.
>
>> The standard provides a definition of the behavior, so we cannot infer
>> undefined behavior due to the absence of a definition.
> [...]
>
> I agree (though some people would interpret the standard differently).
> But then, the implementation you proposed would no longer work
> correctly on such a case.

That would be true only if it violated memory protected against being
read. I'd expect hardware memory protection to be enforced only at word
boundaries, so there shouldn't be a problem.

> What I want to say is that the fact that the standard doesn't allow
> reading characters after a null character (e.g. in the case where
> there would be a memory boundary, so that the as-if rule doesn't
> apply) limits somewhat the implementation. And there would be almost
> no benefits by allowing to read n characters when there are no null
> characters.
>
> Or do you know how an implementation could take the advantage of
> reading n characters, e.g. if the processor has an instruction to
> read 16 bytes at a time very quickly?
>
> Such an implementation should work on both
>
> char s1[17] = "12345678901234567";
> char s2[17] = "12345678910000000";
> strncmp (s1, s2, 17);
>
> and
>
> char s1[10] = "123456789\0";
> char s2[10] = "1234567891";
> strncmp (s1, s2, 17);

I see no reason why it shouldn't work on a machine with 16-byte words,
so long as read-protection is enforced by the hardware only at word
boundaries.

--
James Kuyper

Vincent Lefevre

unread,

Apr 27, 2011, 12:21:17 PM4/27/11

to

In article <ip961v$ssr$1...@dont-email.me>,
James Kuyper <james...@verizon.net> wrote:

> On 04/27/2011 09:10 AM, Vincent Lefevre wrote:
> > I agree (though some people would interpret the standard differently).
> > But then, the implementation you proposed would no longer work
> > correctly on such a case.

> That would be true only if it violated memory protected against being
> read. I'd expect hardware memory protection to be enforced only at word
> boundaries, so there shouldn't be a problem.

I was thinking about a block of words (like on the ARM).

> > What I want to say is that the fact that the standard doesn't allow
> > reading characters after a null character (e.g. in the case where
> > there would be a memory boundary, so that the as-if rule doesn't
> > apply) limits somewhat the implementation. And there would be almost
> > no benefits by allowing to read n characters when there are no null
> > characters.
> >
> > Or do you know how an implementation could take the advantage of
> > reading n characters, e.g. if the processor has an instruction to
> > read 16 bytes at a time very quickly?
> >
> > Such an implementation should work on both
> >
> > char s1[17] = "12345678901234567";
> > char s2[17] = "12345678910000000";
> > strncmp (s1, s2, 17);
> >
> > and
> >
> > char s1[10] = "123456789\0";
> > char s2[10] = "1234567891";
> > strncmp (s1, s2, 17);

> I see no reason why it shouldn't work on a machine with 16-byte words,
> so long as read-protection is enforced by the hardware only at word
> boundaries.

But on a machine with 4-byte words?

Anyway, do you have a practical implementation that would work on

char s1[17] = "12345678901234567";
char s2[17] = "12345678910000000";
strncmp (s1, s2, 17);

and

char s1[10] = "123456789\0";
char s2[10] = "1234567891";
strncmp (s1, s2, 17);

but not on

char s1[10] = "1234567890";

char s2[10] = "1234567891";
strncmp (s1, s2, 17);

?

James Kuyper

unread,

Apr 27, 2011, 8:33:58 PM4/27/11

to

On 04/27/2011 12:21 PM, Vincent Lefevre wrote:
> In article <ip961v$ssr$1...@dont-email.me>,
> James Kuyper <james...@verizon.net> wrote:
>
>> On 04/27/2011 09:10 AM, Vincent Lefevre wrote:
>>> I agree (though some people would interpret the standard differently).
>>> But then, the implementation you proposed would no longer work
>>> correctly on such a case.
>
>> That would be true only if it violated memory protected against being
>> read. I'd expect hardware memory protection to be enforced only at word
>> boundaries, so there shouldn't be a problem.
>
> I was thinking about a block of words (like on the ARM).

I'm not familiar with how that works.

...

>>> Or do you know how an implementation could take the advantage of
>>> reading n characters, e.g. if the processor has an instruction to
>>> read 16 bytes at a time very quickly?
>>>
>>> Such an implementation should work on both
>>>
>>> char s1[17] = "12345678901234567";
>>> char s2[17] = "12345678910000000";
>>> strncmp (s1, s2, 17);
>>>
>>> and
>>>
>>> char s1[10] = "123456789\0";
>>> char s2[10] = "1234567891";
>>> strncmp (s1, s2, 17);
>
>> I see no reason why it shouldn't work on a machine with 16-byte words,
>> so long as read-protection is enforced by the hardware only at word
>> boundaries.
>
> But on a machine with 4-byte words?

The actual word size and the actual sizes of the strings are irrelevant.
All that matters is that every word that is overlapped by either s1 or
s2 should be readable. That's sufficient to allow implementation of
strncmp() to use word-oriented machine instructions to speed up processing.

As long as that is the case, then the fact that some of the bytes in
those words are not actually part of s1 or s2 should not be problematic.
It's technically prohibited to read those bytes, but as a practical
matter it's covered by the as-if rule; there should be no detectable
consequences of having done so; if it's done right.

> Anyway, do you have a practical implementation that would work on
>
> char s1[17] = "12345678901234567";
> char s2[17] = "12345678910000000";
> strncmp (s1, s2, 17);
>
> and
>
> char s1[10] = "123456789\0";
> char s2[10] = "1234567891";
> strncmp (s1, s2, 17);
>
> but not on
>
> char s1[10] = "1234567890";
> char s2[10] = "1234567891";
> strncmp (s1, s2, 17);
>
> ?

No, the implementation I'm thinking of would handle both equally well.
I'm not sure I get the point of your question.

It's not a "practical implementation", I'd have to learn a lot of things
I don't currently know, and have no current need to know, in order to
create such an implementation for a particular platform. But I believe
it can be done.
--
James Kuyper

Vincent Lefevre

unread,

Apr 28, 2011, 6:25:17 AM4/28/11

to

In article <ipaclo$3vu$1...@dont-email.me>,
James Kuyper <james...@verizon.net> wrote:

> On 04/27/2011 12:21 PM, Vincent Lefevre wrote:
> > In article <ip961v$ssr$1...@dont-email.me>,
> > James Kuyper <james...@verizon.net> wrote:
> >> That would be true only if it violated memory protected against being
> >> read. I'd expect hardware memory protection to be enforced only at word
> >> boundaries, so there shouldn't be a problem.
> >
> > I was thinking about a block of words (like on the ARM).

> I'm not familiar with how that works.

Basically, there is an instruction that allows one to load several
consecutive words in several registers, with no additional memory
alignment requirement (just the usual 32-bit one). It is faster
than using several single-word transfer instructions.

http://www.heyrick.co.uk/assembler/str.html#stm

> > Anyway, do you have a practical implementation that would work on
> >
> > char s1[17] = "12345678901234567";
> > char s2[17] = "12345678910000000";
> > strncmp (s1, s2, 17);
> >
> > and
> >
> > char s1[10] = "123456789\0";
> > char s2[10] = "1234567891";
> > strncmp (s1, s2, 17);
> >
> > but not on
> >
> > char s1[10] = "1234567890";
> > char s2[10] = "1234567891";
> > strncmp (s1, s2, 17);
> >
> > ?

> No, the implementation I'm thinking of would handle both equally well.
> I'm not sure I get the point of your question.

The point of my question is to answer your question from
<iok0iu$an7$1...@dont-email.me> on Tue, 19 Apr 2011 08:52:45 -0400,
where you said:

| That interpretation prohibits an implementation that loads an entire
| word's worth of characters from each array for comparison, compares the
| words for equality, and only bothers breaking the words up into
| individual characters for individual comparisons if the words are
| different. This approach could significantly speed up strncmp() on some
| platforms; are you sure the committee intended to prohibit it? Would it
| be a good idea to prohibit it?

I'm saying that not allowing

char s1[10] = "1234567890";
char s2[10] = "1234567891";
strncmp (s1, s2, 17);

would *not* lead to faster implementations (perhaps except in very
rare cases), because

char s1[10] = "123456789\0";
char s2[10] = "1234567891";
strncmp (s1, s2, 17);

must still be allowed. So, I think that there is no reason to
prohibit the former, as this would not affect the performance
of implementations.

James Kuyper

unread,

Apr 28, 2011, 7:19:22 AM4/28/11

to

On 04/28/2011 06:25 AM, Vincent Lefevre wrote:
> In article <ipaclo$3vu$1...@dont-email.me>,
> James Kuyper <james...@verizon.net> wrote:
>
>> On 04/27/2011 12:21 PM, Vincent Lefevre wrote:
>>> In article <ip961v$ssr$1...@dont-email.me>,
>>> James Kuyper <james...@verizon.net> wrote:
>>>> That would be true only if it violated memory protected against being
>>>> read. I'd expect hardware memory protection to be enforced only at word
>>>> boundaries, so there shouldn't be a problem.
>>>
>>> I was thinking about a block of words (like on the ARM).
>
>> I'm not familiar with how that works.
>
> Basically, there is an instruction that allows one to load several
> consecutive words in several registers, with no additional memory
> alignment requirement (just the usual 32-bit one). It is faster
> than using several single-word transfer instructions.

Then in that case, a minor variation on the same concept applies: as
long as it's possible for the implementation to ensure that the entire
block of words is safely readable, it's no problem if some of those
bytes are outside of the memory reserved for the strings you're
comparing. Of course, ensuring that they're safely readable would, I
presume, be more difficult in this case?

...

> The point of my question is to answer your question from
> <iok0iu$an7$1...@dont-email.me> on Tue, 19 Apr 2011 08:52:45 -0400,
> where you said:
>
> | That interpretation prohibits an implementation that loads an entire
> | word's worth of characters from each array for comparison, compares the
> | words for equality, and only bothers breaking the words up into
> | individual characters for individual comparisons if the words are
> | different. This approach could significantly speed up strncmp() on some
> | platforms; are you sure the committee intended to prohibit it? Would it
> | be a good idea to prohibit it?
>
> I'm saying that not allowing
>
> char s1[10] = "1234567890";
> char s2[10] = "1234567891";
> strncmp (s1, s2, 17);
>
> would *not* lead to faster implementations (perhaps except in very
> rare cases), because
>
> char s1[10] = "123456789\0";
> char s2[10] = "1234567891";
> strncmp (s1, s2, 17);
>
> must still be allowed. So, I think that there is no reason to
> prohibit the former, as this would not affect the performance
> of implementations.

I was saying that prohibiting an implementation from reading any
character it doesn't need to read, in order to determine the return
value of strncmp(), would force the use of single character operations
for both of these cases. That is, it would do so if the as-if rule
didn't provide an escape hatch.

My more lenient interpretation, that an implementation is allowed to
read extra characters, so long as it avoids any problematic
side-effects, would allow use of word-oriented operations, so long as
any memory access restrictions obey word boundaries. Given the existence
of the as-if rule, my interpretation is essentially equivalent to yours.

--
James Kuyper

Tim Rentsch

unread,

Apr 28, 2011, 5:10:40 PM4/28/11

to

James Kuyper <james...@verizon.net> writes:

> On 04/04/2011 04:53 AM, Andreas Schwab wrote:
>> Is the following program strictly compliant?
>>
>> #include<string.h>

>>
>> char s1[10] = "1234567890";
>> char s2[10] = "1234567891";
>>

>> int
>> main (void)
>> {
>> return strncmp (s1, s2, 42) == 0;
>> }
>>
>> Or in the context of the implementation, is strncmp allowed to peek past
>> the first differing character upto the size as passed to it, assuming no
>> zero byte occurs before it, even if that would cause the program to
>> crash if those addresses were inaccessible?
>>
>> The standard says that strncmp does not peek past the first zero
>> character, but, AFAICS, doesn't say that it stops reading at the first
>> differing characters. So I would say that the program above is causing
>> undefined behavior.
>
> One could argue that permission to read from the the string pointed at
> by s2 is implied only by the permission to compare characters. That
> permission runs out at the first null character, or the first
> character which does not match, or the Nth character, whichever comes
> first. If you were to complain that this is a weak argument, I would
> have to agree.
>
> However, without such an interpretation, most of the other string
> functions in the standard library suffer from similar problems. I
> think it's pretty clearly not the intent of the committee to allow any
> standard library function which takes a pointer to a string as an
> argument, to use that pointer to read past the end (or before the
> beginning) of the string pointed at.

An unconvincing argument without any supporting references, and
none are given.

In fact the argument is not just unconvincing but wrong. What
constitutes a string is defined in 7.1.1p1. Strings are not
the same as arrays. The arguments to strncmp() are arrays,
not strings (7.21.4.4p3).

Tim Rentsch

unread,

Apr 28, 2011, 5:29:47 PM4/28/11

to

James Kuyper <james...@verizon.net> writes:

> On 04/27/2011 07:18 AM, Vincent Lefevre wrote:
>> In article <iok0iu$an7$1...@dont-email.me>,
>> James Kuyper <james...@verizon.net> wrote:
>>
>>> That interpretation prohibits an implementation that loads an entire
>>> word's worth of characters from each array for comparison, compares the
>>> words for equality, and only bothers breaking the words up into
>>> individual characters for individual comparisons if the words are
>>> different. This approach could significantly speed up strncmp() on some
>>> platforms; are you sure the committee intended to prohibit it? Would it
>>> be a good idea to prohibit it?
>>
>> It depends on whether one may accept to regard
>>
>> strncmp( big, "foo", 1000 );
>>
>> as undefined behavior or not.
>
> The standard provides a definition of the behavior, so we cannot infer
> undefined behavior due to the absence of a definition. That implies that
> if reading beyond the end of "foo" could cause behavior inconsistent
> with that description, the implementation is obligated to ensure that
> strncmp() does not do so.

This is begging the question. We are allowed to conclude that
strncmp() may not exhibit undefined behavior in such cases only
if we assume the description of strncmp()'s semantics doesn't
allow the possibility of undefined behavior in those cases.
Not whether the behavior _could_ be defined, but whether
the behavior of any possible implemenation that matches the
description _must_ be defined. (And it needn't, but more
about that shortly...)

Tim Rentsch

unread,

Apr 28, 2011, 6:15:04 PM4/28/11

to

James Kuyper <james...@verizon.net> writes:

> On 04/19/2011 11:32 PM, Wojtek Lerch wrote:
>> On 19/04/2011 4:27 PM, lawrenc...@siemens.com wrote:
>>> Wojtek Lerch<wojt...@yahoo.ca> wrote:
>>>>
>>>> Is that really how it works? I thought that if the standard doesn't
>>>> clearly say one way or the other, then it's simply not clear what a
>>>> strictly program is allowed to depend on or what an implementation is
>>>> required to do, but it's wise for them to try to err on the side of
>>>> caution.
>>>
>>> "A strictly conforming program...shall not produce output dependent on
>>> any unspecified...behavior...."
>>
>> And what unspecified behaviour are you referring to in this case?
>>
>> 3.4.4 "unspecified behavior: use of an unspecified value, or other
>> behavior where this International Standard provides two or more
>> possibilities and imposes no further requirements on which is chosen in
>> any instance"
>>
>> I don't think this covers cases where the standard is unclear or
>> ambiguous and the two possibilities are "X is allowed" and "X is
>> forbidden" (note that those two are neither values nor behaviours).
>
> You're right, the "two or more possibilities" he's referring to cannot
> be "x is allowed" "x is forbidden". They have to be "x happens" and "x
> does not happen": in other words, what 3.4.4 is saying is that unless
> the standard explicitly says otherwise, "x is allowed".

That's wrong, and also a mischaracterization. It's wrong because
3.4.4 is just a definition of unspecified behavior. It's a
mischaracterization because unspecified behavior doesn't allow
behaviors except those that the Standard already provides; it's
not necessary to explicitly rule out a behavior that isn't already
in the set of behaviors that the Standard's descriptions permit.

> Keep in mind
> that "says otherwise" includes what the standard says about the behavior
> of strncmp(): that behavior must occur as described, so any unspecified
> behavior that occurs cannot, among other things, prevent strncmp() from
> returning it's specified value. Thus, anything, like a memory access
> violation, which would abort the program and thereby prevent strncmp()
> from returning it's specified value is prohibited.

Here we have an argument that is either wrong or circular.
Certainly it is possible for unspecified behavior to lead to
undefined behavior for some choices of possible permitted
behaviors. If the semantics of strncmp() include unspecified
behavior (and I believe they do), and if some of those possible
permitted behaviors lead to undefined behavior (and I believe
everyone is in agreement that they would, if they are indeed
permitted), then strncmp() can exhibit undefined behavior in
such cases. By assuming those possibilities that lead to
undefined behavior aren't allowed, we can conclude that there
will be no undefined behavior, but that's a circular argument.
Saying the behavior _could_ be well-defined is correct, but
that's not an argument that all possible permitted behaviors
_must_ be well-defined. The challenge is to prove that last
clause without assuming it.

> If a program calls strncmp() with arguments that require it to access
> memory locations which are inaccessible in order to perform the behavior
> specified for strncmp() by the standard, then the onus for that
> violation is on the caller, and strncmp() is under no obligation to
> prevent that violation. However, strncmp() doesn't need to read any
> memory after the first difference, so if it chooses to do so anyway, it
> must protect any such read against the possibility of fatal access
> violations.

This is proof by repeated assertion.

Tim Rentsch

unread,

Apr 28, 2011, 6:28:14 PM4/28/11

to

James Kuyper <james...@verizon.net> writes:

> [snip] when the standard is silent, about something, there's usually

> also a "y happens" and a "x and y both happen", among infinitely many
> other possibilities. In this case, y might be "strncmp writes into the
> arrays" or "strncmp() opens a file with a name matching the first string
> argument". The standard doesn't say anything more to prohibit either of
> those options, than it does to prohibit reading past the characters that
> need to be read to in order to determine strncmp()'s return value. I
> believe that the description that is provided implicitly allows
> strncmp() to do anything whose observable consequences are restricted to
> those specified in the description.

It allows strncmp() to do anything whose observable consequences
are restricted to those of the behaviors allowed by the description
and the rest of the Standard. The question is, which behaviors
does the description allow?

> This includes reading past the
> needed characters, but only so long as doing so has no observable
> consequences; a memory access violation is observable in this sense.

Here again this begs the question. The conclusion follows only
if we assume the description does not allow those behaviors
that lead to the behavior being undefined.

Tim Rentsch

unread,

Apr 28, 2011, 6:32:05 PM4/28/11

to

Florian Weimer <f...@deneb.enyo.de> writes:

> * Tim Rentsch:
>
>> I don't know of anyone who contends that the Standard allows
>> undefined behavior for this kind of call to strncmp().

>> [snipped example restored]

>>
>> char big[1000] = "foobas";
>> ...
>> strncmp( big, "foo", 1000 );

> I believe Andreas' question was prompted by a change to GNU libc which

> assumes undefined behavior in this case. Someone wrote that patch and
> apparently assumes that there is a loophole here. 8-)

That may be true, but his example at the beginning of this
thread certainly doesn't match the more common example here.

Tim Rentsch

unread,

Apr 28, 2011, 7:27:05 PM4/28/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <kfn4o5s...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> writes:
>
>> > In article <kfnfwpf...@x-alumni2.alumni.caltech.edu>,
>> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> >>
>> >> If we can bring the discussion up a level, I think we may be
>> >> working at cross purposes here. Are you trying to argue how
>> >> the Standard _must_ be understood on this point? How it
>> >> _ought_ to be understood on this point? How the committee
>> >> _expects_ it will be understood? How the committee believes
>> >> it _should_ be understood? What the Standard _ought_ to
>> >> require, regardless of what it does require? How the Standard
>> >> will be understood here, _given certain assumptions_? Which
>> >> of these applies to your comments? Or, if none of them do,
>> >> how would you express what it is you are hoping to accomplish?
>> >
>> > Ideally the answer should be the same for any of them (without
>> > additional assumptions). Currently I disagree with your reasoning,
>> > but perhaps it is incomplete (see the above questions).
>
>> That isn't really a response to the question. What
>> are you hoping to accomplish by these postings?
>
> I don't understand what you mean.
>
> My point is about what the standard says. Not more.

I think you must mean something different. There is no
disagreement about what the Standard says; the text is the same
for everyone who reads it. The question is about the _meaning_
of the text, or what meaning we understand the text to convey;
in other words, What are we allowed to conclude about which
behaviors are permitted of a conforming implementation?
Different people reach different understandings; I'm hoping
to elicit which people and which kinds of understandings
are intended to be the subject of your statements. If you're
talking only about your own understanding then there isn't much
for me to say except "uh huh". If you're talking about how
or what other person(s) understand some sort of meaning in
the Standard, then we have more to talk about. Make sense?

Tim Rentsch

unread,

Apr 28, 2011, 7:57:15 PM4/28/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <kfnfwpf...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>

>> Vincent Lefevre <vincen...@vinc17.net> writes:
>[snipping to refocus the discussion]
>
>> > You're assuming things not said in the standard.
>
>> Yes, of course I do.
> [snip
>> So does everyone who reads it. One
>> difference is, I try to make my assumptions consciously and
>> explicitly. Have you thought about what your operating
>> assumptions are for determining what the Standard really
>> requires? Can you explain what those assumptions are?
>
> I didn't assume anything. [snip]

How do you reconcile that statement with the one (shown at the
end of the next excerpt) in your earlier posting?

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <n5oj78-...@jones.homeip.net>,
> lawrenc...@siemens.com wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> wrote:
>> >
>> > But the standard says nothing about the size of the arrays; and in
>> > particular, it doesn't say that the arrays have at least n characters.
>> > So, I think that an implementation must not read more than necessary
>> > to deduce the result (if this changes the behavior, like giving a
>> > segfault).
>
>> But it doesn't say that it only reads as many characters as are
>> necessary to determine the result, all it says is that it doesn't read
>> more than n characters
>
> No, it doesn't say that. It says: "The strncmp function *compares*
> not more than n characters". "compares", not "reads". One could
> imagine that an implementation could read more than n characters
> (indeed some processors are faster when reading data by block),
> which could be problematic.
>
> So, one should assume that an implementation isn't allowed to do
> more than what the standard requires (except if this doesn't change
> the behavior). [snip]

You start off by saying an assumption should be made, and then
say you aren't assuming anything? Sorta seems like riding a
single horse in two different directions, doesn't it?

In fact, I think there is a key assumption you are making,
namely, that because a result _could_ be well-defined in certain
cases that it must in fact _be_ well-defined in those cases. I
believe that's a bad assumption; the return value of a function
is always conditional, in the sense that it depends on the
function not wandering off into (a permitted) undefined behavior.
(Absent of course explicit statements to the contrary, of which I
expect there are for some library functions.) Consider the
following proposed definition of strncmp():

int
strncmp( const char *s1, const char *s2, size_t n ){
size_t n1 = 0, n2 = 0;
do ; while( n1 < n && s1[ n1++ ] );
do ; while( n2 < n && s2[ n2++ ] );
return memcmp( s1, s2, n1 < n2 ? n1 : n2 );
}

How does this definition fail to meet the description of
strncmp()? It compares no more than 'n' characters. It
doesn't compare characters after a null character. Because of
how memcmp() works, it returns the right result for its
arguments. Is there any text in the Standard you can point to
that precludes this definition from an otherwise conforming
implementation? On the face of it this definition does seem
to match the description.

Tim Rentsch

unread,

Apr 28, 2011, 8:30:45 PM4/28/11

to

Andreas Schwab <sch...@redhat.com> writes:

> Is the following program strictly compliant?
>
> #include <string.h>
>
> char s1[10] = "1234567890";
> char s2[10] = "1234567891";
>
> int
> main (void)
> {
> return strncmp (s1, s2, 42) == 0;
> }
>
> Or in the context of the implementation, is strncmp allowed to peek past
> the first differing character upto the size as passed to it, assuming no
> zero byte occurs before it, even if that would cause the program to
> crash if those addresses were inaccessible?
>
> The standard says that strncmp does not peek past the first zero
> character, but, AFAICS, doesn't say that it stops reading at the first
> differing characters. So I would say that the program above is causing
> undefined behavior.

I spent some time reviewing the thread, and consulting the
Standard. I now believe this conclusion is right, and also that
it is solid.

The reason is 7.1.4p1, which has two key passages. The first is
this:

Each of the following statements applies unless explicitly
stated otherwise in the detailed descriptions that follow:

Note: '_unless explicitly stated otherwise_'. Then the second is
one of the covered statements:

If a function argument is described as being an array, the
pointer actually passed to the function shall have a value
such that all address computations and accesses to objects
(that would be valid if the pointer did point to the first
element of such an array) are in fact valid.

Note: '_all_ address computations and accesses to objects'.

The first two arguments to strncmp() are each described individually
as an array (specifically, as a 'possibly null-terminated array'), in
7.21.4.4p3. There is no statement explicitly contravening the
provisions of 7.1.4p1. Said provisions therefore apply; note that
they do not depend on the other argument array. Hence each argument
array must allow valid accesses up to the first null character or up
to 'n' characters if there is no null character in that range,
regardless of where in the other array a null character exists or
where the first mismatch occurs. If that's not the case, 7.1.4p1
makes undefined behavior explicit.

Phil Carmody

unread,

Apr 29, 2011, 3:12:40 PM4/29/11

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Consider the
> following proposed definition of strncmp():
>
> int
> strncmp( const char *s1, const char *s2, size_t n ){
> size_t n1 = 0, n2 = 0;
> do ; while( n1 < n && s1[ n1++ ] );
> do ; while( n2 < n && s2[ n2++ ] );
> return memcmp( s1, s2, n1 < n2 ? n1 : n2 );
> }
>
> How does this definition fail to meet the description of
> strncmp()? It compares no more than 'n' characters.

I see it comparing up to n characters in each array against 0
before it's even started the memcmp().

Tim Rentsch

unread,

Apr 29, 2011, 6:37:19 PM4/29/11

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
>> Consider the
>> following proposed definition of strncmp():
>>
>> int
>> strncmp( const char *s1, const char *s2, size_t n ){
>> size_t n1 = 0, n2 = 0;
>> do ; while( n1 < n && s1[ n1++ ] );
>> do ; while( n2 < n && s2[ n2++ ] );
>> return memcmp( s1, s2, n1 < n2 ? n1 : n2 );
>> }
>>
>> How does this definition fail to meet the description of
>> strncmp()? It compares no more than 'n' characters.
>
> I see it comparing up to n characters in each array against 0
> before it's even started the memcmp().

Yes it does. In what way does that conflict
with the Standard's specifications?

Vincent Lefevre

unread,

May 4, 2011, 6:47:27 AM5/4/11

to

In article <kfnwrie...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > In article <kfnfwpf...@x-alumni2.alumni.caltech.edu>,
> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >
> >> Vincent Lefevre <vincen...@vinc17.net> writes:
> >[snipping to refocus the discussion]
> >
> >> > You're assuming things not said in the standard.
> >
> >> Yes, of course I do.
> > [snip
> >> So does everyone who reads it. One
> >> difference is, I try to make my assumptions consciously and
> >> explicitly. Have you thought about what your operating
> >> assumptions are for determining what the Standard really
> >> requires? Can you explain what those assumptions are?
> >
> > I didn't assume anything. [snip]

> How do you reconcile that statement with the one (shown at the
> end of the next excerpt) in your earlier posting?

Sorry, by "anything" I meant that anything not said the standard,
that would change the specifications.

The assumption I made below is completely obvious, in the sense
that without it, it would not be possible to write a single
portable program.

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > In article <n5oj78-...@jones.homeip.net>,
> > lawrenc...@siemens.com wrote:
> >
> >> Vincent Lefevre <vincen...@vinc17.net> wrote:
> >> >
> >> > But the standard says nothing about the size of the arrays; and in
> >> > particular, it doesn't say that the arrays have at least n characters.
> >> > So, I think that an implementation must not read more than necessary
> >> > to deduce the result (if this changes the behavior, like giving a
> >> > segfault).
> >
> >> But it doesn't say that it only reads as many characters as are
> >> necessary to determine the result, all it says is that it doesn't read
> >> more than n characters
> >
> > No, it doesn't say that. It says: "The strncmp function *compares*
> > not more than n characters". "compares", not "reads". One could
> > imagine that an implementation could read more than n characters
> > (indeed some processors are faster when reading data by block),
> > which could be problematic.
> >
> > So, one should assume that an implementation isn't allowed to do
> > more than what the standard requires (except if this doesn't change
> > the behavior). [snip]

> You start off by saying an assumption should be made, and then
> say you aren't assuming anything? Sorta seems like riding a
> single horse in two different directions, doesn't it?

No, because without this assumption (e.g. if the implementation is
allowed to read any number of characters it wants, even if this can
crash the machine) it wouldn't be possible to use strncmp at all.

> In fact, I think there is a key assumption you are making,
> namely, that because a result _could_ be well-defined in certain
> cases that it must in fact _be_ well-defined in those cases. I
> believe that's a bad assumption; the return value of a function
> is always conditional, in the sense that it depends on the
> function not wandering off into (a permitted) undefined behavior.
> (Absent of course explicit statements to the contrary, of which I
> expect there are for some library functions.) Consider the
> following proposed definition of strncmp():

> int
> strncmp( const char *s1, const char *s2, size_t n ){
> size_t n1 = 0, n2 = 0;
> do ; while( n1 < n && s1[ n1++ ] );
> do ; while( n2 < n && s2[ n2++ ] );
> return memcmp( s1, s2, n1 < n2 ? n1 : n2 );
> }

> How does this definition fail to meet the description of
> strncmp()? It compares no more than 'n' characters. It
> doesn't compare characters after a null character. Because of
> how memcmp() works, it returns the right result for its
> arguments. Is there any text in the Standard you can point to
> that precludes this definition from an otherwise conforming
> implementation? On the face of it this definition does seem
> to match the description.

I disagree because this wouldn't rule out:

int
strncmp( const char *s1, const char *s2, size_t n ){

size_t n1 = 0, n2 = 0, i;
for (i = 0; i < 17; i++) s1[i];

do ; while( n1 < n && s1[ n1++ ] );
do ; while( n2 < n && s2[ n2++ ] );
return memcmp( s1, s2, n1 < n2 ? n1 : n2 );
}

Indeed it compares no more than 'n' characters. It doesn't compare

characters after a null character. Because of how memcmp() works,
it returns the right result for its arguments.

However, such a class of implementations is obviously incorrect
(replace 17 by any arbitrarily large constant you want...).

Vincent Lefevre

unread,

May 4, 2011, 6:31:11 AM5/4/11

to

In article <kfn4o5i...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:
[...]

> > My point is about what the standard says. Not more.

> I think you must mean something different. There is no
> disagreement about what the Standard says; the text is the same
> for everyone who reads it. The question is about the _meaning_
> of the text, or what meaning we understand the text to convey;
> in other words, What are we allowed to conclude about which
> behaviors are permitted of a conforming implementation?
> Different people reach different understandings; I'm hoping
> to elicit which people and which kinds of understandings
> are intended to be the subject of your statements. If you're
> talking only about your own understanding then there isn't much
> for me to say except "uh huh". If you're talking about how
> or what other person(s) understand some sort of meaning in
> the Standard, then we have more to talk about. Make sense?

There's the meaning. Still, you are not allowed to say gratuitously
e.g. that some object has size n if the standard doesn't say anything
about its size or it cannot be deduced from other properties.

Vincent Lefevre

unread,

May 4, 2011, 6:57:45 AM5/4/11

to

In article <kfnsjt1...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Andreas Schwab <sch...@redhat.com> writes:

> > Is the following program strictly compliant?
> >
> > #include <string.h>
> >
> > char s1[10] = "1234567890";
> > char s2[10] = "1234567891";
> >
> > int
> > main (void)
> > {
> > return strncmp (s1, s2, 42) == 0;
> > }
> >
> > Or in the context of the implementation, is strncmp allowed to peek past
> > the first differing character upto the size as passed to it, assuming no
> > zero byte occurs before it, even if that would cause the program to
> > crash if those addresses were inaccessible?
> >
> > The standard says that strncmp does not peek past the first zero
> > character, but, AFAICS, doesn't say that it stops reading at the first
> > differing characters. So I would say that the program above is causing
> > undefined behavior.

> I spent some time reviewing the thread, and consulting the
> Standard. I now believe this conclusion is right, and also that
> it is solid.

I disagree.

> The reason is 7.1.4p1, which has two key passages. The first is
> this:

> Each of the following statements applies unless explicitly
> stated otherwise in the detailed descriptions that follow:

> Note: '_unless explicitly stated otherwise_'. Then the second is
> one of the covered statements:

> If a function argument is described as being an array, the
> pointer actually passed to the function shall have a value
> such that all address computations and accesses to objects
> (that would be valid if the pointer did point to the first
> element of such an array) are in fact valid.

> Note: '_all_ address computations and accesses to objects'.

OK. Here you have two arrays of size 10. If an implementation
does an access after the 10th element, it is the implementation
that is buggy.

The standard doesn't say that the 3rd argument n of strncmp
is the size of the arrays. It is just implied that their sizes
are *at most* n. So...

> The first two arguments to strncmp() are each described individually
> as an array (specifically, as a 'possibly null-terminated array'), in
> 7.21.4.4p3. There is no statement explicitly contravening the
> provisions of 7.1.4p1. Said provisions therefore apply; note that
> they do not depend on the other argument array. Hence each argument
> array must allow valid accesses up to the first null character or up
> to 'n' characters if there is no null character in that range,
> regardless of where in the other array a null character exists or
> where the first mismatch occurs. If that's not the case, 7.1.4p1
> makes undefined behavior explicit.

Your conclusion is incorrect.

Tim Rentsch

unread,

May 23, 2011, 6:09:06 PM5/23/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <kfn4o5i...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> writes:
> [...]
>> > My point is about what the standard says. Not more.
>
>> I think you must mean something different. There is no
>> disagreement about what the Standard says; the text is the same
>> for everyone who reads it. The question is about the _meaning_
>> of the text, or what meaning we understand the text to convey;
>> in other words, What are we allowed to conclude about which
>> behaviors are permitted of a conforming implementation?
>> Different people reach different understandings; I'm hoping
>> to elicit which people and which kinds of understandings
>> are intended to be the subject of your statements. If you're
>> talking only about your own understanding then there isn't much
>> for me to say except "uh huh". If you're talking about how
>> or what other person(s) understand some sort of meaning in
>> the Standard, then we have more to talk about. Make sense?
>
> There's the meaning.

There is no "the" meaning. The whole issue is that different
people ascribe different meanings to the same text. It would be
nice if the different reasonings that various people use could be
illuminated, so that they may be discussed and compared. Whose
rules of reasoning do you mean to make comments about?

> Still, you are not allowed to say gratuitously
> e.g. that some object has size n if the standard doesn't say anything
> about its size or it cannot be deduced from other properties.

This phrasing seems rather arrogant, doesn't it? What makes you
the authority on what is allowed or whether some statement is
gratuitous or not?

In any case, the question I am hoping to address is which
deductions are considered valid by different people, and why,
because I believe differences of opinion on that question
are at the core of the arguments in this discussion.

Tim Rentsch

unread,

May 23, 2011, 7:02:37 PM5/23/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <kfnwrie...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> writes:
>
>> > In article <kfnfwpf...@x-alumni2.alumni.caltech.edu>,
>> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> >
>> >> Vincent Lefevre <vincen...@vinc17.net> writes:
>> >[snipping to refocus the discussion]
>> >
>> >> > You're assuming things not said in the standard.
>> >
>> >> Yes, of course I do.
>> > [snip
>> >> So does everyone who reads it. One
>> >> difference is, I try to make my assumptions consciously and
>> >> explicitly. Have you thought about what your operating
>> >> assumptions are for determining what the Standard really
>> >> requires? Can you explain what those assumptions are?
>> >
>> > I didn't assume anything. [snip]
>
>> How do you reconcile that statement with the one (shown at the
>> end of the next excerpt) in your earlier posting?
>
> Sorry, by "anything" I meant that anything not said the standard,
> that would change the specifications.

The problem is that "the specifications" is not well defined.
Different people reach different conclusions about what the
text is meant to specify, or should specify, or does specify.
Any conclusions about what the text of the Standard means
requires interpretation, especially since the Standard is
written in not completely precise language, and what the
Standard "specifies" is one example of such conclusions.

> The assumption I made below is completely obvious, in the sense
> that without it, it would not be possible to write a single
> portable program.

You say that like it's the only assumption that would make the
consequent be true. That's wrong. It's obvious there are other
assumptions, or other sets of assumptions, which would enable the
writing of portable programs. There's no reason to expect that
people should prefer your assumption to _all_ the other possible
assumptions that are just as useful in this particular regard.

Again, there are other assumptions that could be made that would
allow strncmp() to be used in a well-defined way. There is
nothing special about the assumption you propose; it is simply
one of many that would allow conclusions to be drawn about what
section 7.21.4.4 specifies exactly for the semantics of strncmp().

The difference between my example function and your example
function is that your example function doesn't conform to any
sensible interpretation of the phrase "the array" that appears
in 7.21.4.4 paragraph 3, and mine does. Your function fails
the "Is there any text in the Standard..." question with
respect to this paragraph.

Tim Rentsch

unread,

May 24, 2011, 2:26:00 AM5/24/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

The Standard doesn't say that. You must be making some
assumption to reach that conclusion.

> The standard doesn't say that the 3rd argument n of strncmp
> is the size of the arrays.

That's true, the Standard contains no such text. Nor does it
contain text that says characters after the first mismatch are
not accessed; nor does it contain text that says characters
after a corresponding position in the other array that contains a
null character are not accessed. That is key: although 7.21.4.4
talks about arrays, it does not identify in unambiguous language
just which arrays are being referred to.

> It is just implied that their sizes
> are *at most* n. So...

I agree that the text of 7.21.4.4p2 does imply that their sizes
are at most n, but there is additional text in 7.21.4.4p3, also
relevant, which says:

The strncmp function returns an integer greater than, equal to,
or less than zero accordingly as the possibly null-terminated
array pointed to by s1 is greater than, equal to, or less than
the possibly null-terminated array pointed to by s2.

Suppose we have an argument array pointed to by s1 that does not
contain a null in its first n characters. What array is meant by
the phrase "the possibly null-terminated array pointed to by s1"?
There are three plausible meanings:

1. The first n characters of the array starting at *s1;

2. An initial portion of the array starting at *s1, up
to and just including the point where the corresponding
position in the array starting at *s2 contains a null
character (assuming such exists before the n'th position);
or,

3. An initial portion of the array starting at *s1, up
to and just including the where the corresponding
position in the array starting at *s2 contains its
first mismatching character (assuming such exists before
the n'th position) relative to the array starting at *s1.

I expect everyone would agree that one of these possibilities is
what the authors intended. Which one is meant is important
because of the provision in 7.1.4p1. More specifically, the
stipulation in 7.1.4p1 provides undefined behavior if the
actual array argument doesn't "measure up to" the array
parameter described in 7.21.4.4p3.

I believe case (1) best fits the actual text of 7.21.4.4p3. The
reason is, cases (2) and (3) both depend on the contents of *s2
for their meaning, but the actual phrase used - "the possibly
null-terminated array pointed to by s1" - doesn't mention s2. If
this phrase were meant to be defined relative to the array
starting at *s2, it's natural to expect that s2 would be mentioned
within it; since that isn't so, possibility (1) is the only
plausible alternative that readily corresponds to the actual
phrase used.

>> The first two arguments to strncmp() are each described individually
>> as an array (specifically, as a 'possibly null-terminated array'), in
>> 7.21.4.4p3. There is no statement explicitly contravening the
>> provisions of 7.1.4p1. Said provisions therefore apply; note that
>> they do not depend on the other argument array. Hence each argument
>> array must allow valid accesses up to the first null character or up
>> to 'n' characters if there is no null character in that range,
>> regardless of where in the other array a null character exists or
>> where the first mismatch occurs. If that's not the case, 7.1.4p1
>> makes undefined behavior explicit.
>
> Your conclusion is incorrect.

I understand that is your opinion; I just don't see any
reason to give that opinion much weight. ISTM your
argument rests entirely on an assumption that is offered
as being self-evident but isn't very compelling. Do you
have any more substantial reasoning to offer than simply
repeating your position?

Vincent Lefevre

unread,

May 27, 2011, 10:54:21 AM5/27/11

to

In article <kfn39k4...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > OK. Here you have two arrays of size 10. If an implementation
> > does an access after the 10th element, it is the implementation
> > that is buggy.

> The Standard doesn't say that. You must be making some
> assumption to reach that conclusion.

Well, of course, an implementation is allowed to do anything
it wants as long as it doesn't change the behavior, but what
I meant is that if one has:

char a[10];

and the implementation places the array so that the address
a+10 is not readable, and if a function provided by the
implementation accesses a+10 while working on this array
(assuming the function knows that the size may be <= 10),
then the implementation is buggy.

But I think the only point on which we disagree is what are
the sizes of the arrays in strncmp (see below).

> > The standard doesn't say that the 3rd argument n of strncmp
> > is the size of the arrays.

> That's true, the Standard contains no such text. Nor does it
> contain text that says characters after the first mismatch are
> not accessed; nor does it contain text that says characters
> after a corresponding position in the other array that contains a
> null character are not accessed. That is key: although 7.21.4.4
> talks about arrays, it does not identify in unambiguous language
> just which arrays are being referred to.

As I said: one needs to assume that the implementation isn't
allowed to do more than necessary (if this can change the
visible behavior). Otherwise something like

int main(void)
{
return 0;
}

would no longer be guaranteed to work. Indeed the standard doesn't
say that when a programs starts, the address 0 (which is protected
on various platforms) is not accessed.

> > It is just implied that their sizes
> > are *at most* n. So...

> I agree that the text of 7.21.4.4p2 does imply that their sizes
> are at most n, but there is additional text in 7.21.4.4p3, also
> relevant, which says:

> The strncmp function returns an integer greater than, equal to,
> or less than zero accordingly as the possibly null-terminated
> array pointed to by s1 is greater than, equal to, or less than
> the possibly null-terminated array pointed to by s2.

> Suppose we have an argument array pointed to by s1 that does not
> contain a null in its first n characters. What array is meant by
> the phrase "the possibly null-terminated array pointed to by s1"?
> There are three plausible meanings:

> 1. The first n characters of the array starting at *s1;

> 2. An initial portion of the array starting at *s1, up
> to and just including the point where the corresponding
> position in the array starting at *s2 contains a null
> character (assuming such exists before the n'th position);
> or,

> 3. An initial portion of the array starting at *s1, up
> to and just including the where the corresponding
> position in the array starting at *s2 contains its
> first mismatching character (assuming such exists before
> the n'th position) relative to the array starting at *s1.

There's a 4th one:

4. The array starting at *s1.

By (4), I mean that I don't try to regard this sentence as an attempt
to define the size of the array. IMHO, the "Returns" clause would be
a bad place to define the meaning of the arguments and constraints on
them anyway.

I see the "possibly null-terminated" as a recall of the fact that
"abc\0d" and "\abc\0e" with n = 5 are equal (because for a function
working on arrays, not strings, it may be a bit strange that the
null character plays a particular role).

> I expect everyone would agree that one of these possibilities is
> what the authors intended. Which one is meant is important
> because of the provision in 7.1.4p1. More specifically, the
> stipulation in 7.1.4p1 provides undefined behavior if the
> actual array argument doesn't "measure up to" the array
> parameter described in 7.21.4.4p3.

I think that if the author meant something more than (4), it would
be a defect in the standard.

> I believe case (1) best fits the actual text of 7.21.4.4p3. The
> reason is, cases (2) and (3) both depend on the contents of *s2
> for their meaning, but the actual phrase used - "the possibly
> null-terminated array pointed to by s1" - doesn't mention s2. If
> this phrase were meant to be defined relative to the array
> starting at *s2, it's natural to expect that s2 would be mentioned
> within it; since that isn't so, possibility (1) is the only
> plausible alternative that readily corresponds to the actual
> phrase used.

And what about (4)?

Vincent Lefevre

unread,

May 27, 2011, 11:07:12 AM5/27/11

to

In article <kfn39k4...@x-alumni2.alumni.caltech.edu>,
Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Vincent Lefevre <vincen...@vinc17.net> writes:

> > OK. Here you have two arrays of size 10. If an implementation
> > does an access after the 10th element, it is the implementation
> > that is buggy.

> The Standard doesn't say that. You must be making some
> assumption to reach that conclusion.

Well, of course, an implementation is allowed to do anything

it wants as long as it doesn't change the behavior, but what
I meant is that if one has:

char a[10];

and the implementation places the array so that the address
a+10 is not readable, and if a function provided by the
implementation accesses a+10 while working on this array
(assuming the function knows that the size may be <= 10),
then the implementation is buggy.

But I think the only point on which we disagree is what are
the sizes of the arrays in strncmp (see below).

> > The standard doesn't say that the 3rd argument n of strncmp

> > is the size of the arrays.

> That's true, the Standard contains no such text. Nor does it
> contain text that says characters after the first mismatch are
> not accessed; nor does it contain text that says characters
> after a corresponding position in the other array that contains a
> null character are not accessed. That is key: although 7.21.4.4
> talks about arrays, it does not identify in unambiguous language
> just which arrays are being referred to.

As I said: one needs to assume that the implementation isn't

allowed to do more than necessary (if this can change the
visible behavior). Otherwise something like

int main(void)
{
return 0;
}

would no longer be guaranteed to work. Indeed the standard doesn't
say that when a programs starts, the address 0 (which is protected
on various platforms) is not accessed.

> > It is just implied that their sizes

> > are *at most* n. So...

> I agree that the text of 7.21.4.4p2 does imply that their sizes
> are at most n, but there is additional text in 7.21.4.4p3, also
> relevant, which says:

> The strncmp function returns an integer greater than, equal to,
> or less than zero accordingly as the possibly null-terminated
> array pointed to by s1 is greater than, equal to, or less than
> the possibly null-terminated array pointed to by s2.

> Suppose we have an argument array pointed to by s1 that does not
> contain a null in its first n characters. What array is meant by
> the phrase "the possibly null-terminated array pointed to by s1"?
> There are three plausible meanings:

> 1. The first n characters of the array starting at *s1;

> 2. An initial portion of the array starting at *s1, up
> to and just including the point where the corresponding
> position in the array starting at *s2 contains a null
> character (assuming such exists before the n'th position);
> or,

> 3. An initial portion of the array starting at *s1, up
> to and just including the where the corresponding
> position in the array starting at *s2 contains its
> first mismatching character (assuming such exists before
> the n'th position) relative to the array starting at *s1.

There's a 4th one:

4. The array starting at *s1.

By (4), I mean that I don't try to regard this sentence as an attempt
to define the size of the array. IMHO, the "Returns" clause would be
a bad place to define the meaning of the arguments and constraints on
them anyway.

I see the "possibly null-terminated" as a recall of the fact that
"abc\0d" and "abc\0e" with n = 5 are equal (because for a function
working on arrays, not strings, it may be a bit strange that the
null character plays a particular role).

> I expect everyone would agree that one of these possibilities is

> what the authors intended. Which one is meant is important
> because of the provision in 7.1.4p1. More specifically, the
> stipulation in 7.1.4p1 provides undefined behavior if the
> actual array argument doesn't "measure up to" the array
> parameter described in 7.21.4.4p3.

I think that if the author meant something more than (4), it would

be a defect in the standard.

> I believe case (1) best fits the actual text of 7.21.4.4p3. The

> reason is, cases (2) and (3) both depend on the contents of *s2
> for their meaning, but the actual phrase used - "the possibly
> null-terminated array pointed to by s1" - doesn't mention s2. If
> this phrase were meant to be defined relative to the array
> starting at *s2, it's natural to expect that s2 would be mentioned
> within it; since that isn't so, possibility (1) is the only
> plausible alternative that readily corresponds to the actual
> phrase used.

And what about (4)?

Tim Rentsch

unread,

Jun 1, 2011, 1:29:31 AM6/1/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

> In article <kfn39k4...@x-alumni2.alumni.caltech.edu>,
> Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>> Vincent Lefevre <vincen...@vinc17.net> writes:

>---> [tim says: snipped context restored]
>--->
>---> > In article <kfnsjt1...@x-alumni2.alumni.caltech.edu>,
>---> > Tim Rentsch <t...@alumni.caltech.edu> wrote:
>---> >
>---> >> Andreas Schwab <sch...@redhat.com> writes:
>---> >
>---> >> > Is the following program strictly compliant?
>---> >> >
>---> >> > #include <string.h>
>---> >> >
>---> >> > char s1[10] = "1234567890";
>---> >> > char s2[10] = "1234567891";
>---> >> >
>---> >> > int
>---> >> > main (void)
>---> >> > {
>---> >> > return strncmp (s1, s2, 42) == 0;
>---> >> > }
>---> >> >

>> > OK. Here you have two arrays of size 10. If an implementation
>> > does an access after the 10th element, it is the implementation
>> > that is buggy.
>
>> The Standard doesn't say that. You must be making some
>> assumption to reach that conclusion.
>
> Well, of course, an implementation is allowed to do anything
> it wants as long as it doesn't change the behavior, but what
> I meant is that if one has:
>
> char a[10];
>
> and the implementation places the array so that the address
> a+10 is not readable, and if a function provided by the
> implementation accesses a+10 while working on this array
> (assuming the function knows that the size may be <= 10),

^^^^^^^^

> then the implementation is buggy.

The ^'ed text sort of proves my point about making an assumption,
doesn't it?

I was surprised to see that you snipped some text that provided
some necessary context for my comments, and more surprised that
you didn't bother to note the snippage. Did you not understand
that the snipped context was relevant? Or did you intentionally
present my comments out of context?

> But I think the only point on which we disagree is what are
> the sizes of the arrays in strncmp (see below).
>
>> > The standard doesn't say that the 3rd argument n of strncmp
>> > is the size of the arrays.
>
>> That's true, the Standard contains no such text. Nor does it
>> contain text that says characters after the first mismatch are
>> not accessed; nor does it contain text that says characters
>> after a corresponding position in the other array that contains a
>> null character are not accessed. That is key: although 7.21.4.4
>> talks about arrays, it does not identify in unambiguous language
>> just which arrays are being referred to.
>
> As I said: one needs to assume that the implementation isn't
> allowed to do more than necessary (if this can change the
> visible behavior).

That assertion is simply false. There are in fact a countable
infinity of different assumptions one might make, all consistent
with the text written in the Standard, and all allowing inferences
to be drawn about which programs are guaranteed to work (with both
the guaranteed set and its complement being non-trivial), that
have different consequences than than those you suggest. I
understand that you would like to make _this_ assumption, but it
is not a necessary assumption, nor even a very well-defined one
since what is "necessary" is itself subject to debate.

> Otherwise something like
>
> int main(void)
> {
> return 0;
> }
>
> would no longer be guaranteed to work.

That depends on what assumptions one believes should be made
to interpret text in the Standard.

> Indeed the standard doesn't
> say that when a programs starts, the address 0 (which is protected
> on various platforms) is not accessed.

That's true. However, it's obviously possible to reach that
conclusion without making the allegedly necessary assumption,
since many people do reach that conclusion without making said
assumption.

Apparently you're hearing things that I'm not saying. In
particular I did not say that this sentence "defines" the size of
the array. My question is, What array is _meant_ by the phrase in
question, regardless of how or where that array is specified?
Obviously _some_ array is meant, otherwise "the <adjective> array"
would have been written differently. So which array is it?

> I see the "possibly null-terminated" as a recall of the fact that
> "abc\0d" and "\abc\0e" with n = 5 are equal (because for a function
> working on arrays, not strings, it may be a bit strange that the
> null character plays a particular role).

Presumably you meant the '\a' in the second string to be just 'a',
not '\a'. Under that presumption, this case isn't helpful because
the different possibilities all boil down to the same array (plus
it doesn't match the conditions of the question -- that *s1 not
contain a null in its first n characters).

>> I expect everyone would agree that one of these possibilities is
>> what the authors intended. Which one is meant is important
>> because of the provision in 7.1.4p1. More specifically, the
>> stipulation in 7.1.4p1 provides undefined behavior if the
>> actual array argument doesn't "measure up to" the array
>> parameter described in 7.21.4.4p3.
>
> I think that if the author meant something more than (4), it would
> be a defect in the standard.

On the contrary, if the author meant no particular array (but just
some unspecified array starting at *s1), he wouldn't have used the
word "the" in front of "possibly null-terminated array".

>> I believe case (1) best fits the actual text of 7.21.4.4p3. The
>> reason is, cases (2) and (3) both depend on the contents of *s2
>> for their meaning, but the actual phrase used - "the possibly
>> null-terminated array pointed to by s1" - doesn't mention s2. If
>> this phrase were meant to be defined relative to the array
>> starting at *s2, it's natural to expect that s2 would be mentioned
>> within it; since that isn't so, possibility (1) is the only
>> plausible alternative that readily corresponds to the actual
>> phrase used.
>
> And what about (4)?

The question is not whether this sentence dictates which array is
specified, but which array is being referred to by the phrase,
regardless of where or how said array is specified. (4) can't be
the answer because it doesn't address the question.

Tim Rentsch

unread,

Jun 1, 2011, 1:32:09 AM6/1/11

to

Vincent Lefevre <vincen...@vinc17.net> writes:

[snip]

> I see the "possibly null-terminated" as a recall of the fact that

> "abc\0d" and "abc\0e" with n = 5 are equal [snip]

I see this almost-duplicate makes the correction that I
presumed in my other response. (I didn't check if there
were other differences.)