Vincent Lefevre <vincent-n...@vinc17.net> writes: > In article <kfnwriuce6t....@x-alumni2.alumni.caltech.edu>, > Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> Vincent Lefevre <vincent-n...@vinc17.net> writes:
>> > No, it doesn't say that. It says: "The strncmp function *compares* >> > not more than n characters". "compares", not "reads". [snip]
>> That seems like a silly distinction. Surely we can infer >> that to compare a value in X and a value in Y we must >> first read the value in X, and read the value in Y.
> The distinction is important for characters that are *not* compared.
Your point is getting lost here. For what Larry was saying (it was included in my posting but I guess you snipped it), it doesn't matter whether "reads" or "compares" is used. Perhaps it is true that the distinction matters in regards to _other_ matters, but in the context of Larry's statement there doesn't seem to be any significant difference. Do you think otherwise? Then please explain what it is.
>> > So, one should assume that an implementation isn't allowed to do >> > more than what the standard requires (except if this doesn't change >> > the behavior). Otherwise more or less any function would have >> > undefined behavior.
>> Forgive me for being harsh, but I think the reasoning >> here is bogus. The same reasoning applied to strcmp >> would imply that it's okay to call strcmp on strings >> that aren't null terminated but differ in some earlier >> (legal) position. [...]
> No, the same reasoning cannot be applied to strcmp, because the > standard (7.21.4.2#2 in TC3) explicitely says: "The strcmp function > compares the *string* pointed to by s1 to the *string* pointed to > by s2."
> It says "string", not "array", and ditto for 7.21.4.2#3. So, the > strcmp arguments must be strings, thus null-terminated.
Okay, now you're cheating. You took a fuzzy argument, and let it work one way in one case, and another way in the other case. You can't have it both ways.
>> >> and then goes on to say, "Where an argument declared as size_t n >> >> specifies the length of the array...", which would seem to apply in >> >> this case.
>> > The description of strncmp does not say that n specifies the length >> > of the array. And it's quite clear that it does *not* specify the >> > length of the array,
>> 7.21.4.4p3 says in part:
>> accordingly as the possibly null-terminated array pointed to >> by s1 is greater than, equal to, or less than the possibly >> null-terminated array pointed to by s2.
>> Although it would benefit from being made more explicit, this >> phrasing shows pretty clearly that strncmp() considers its arguments >> to be, independently, possibly null-terminated arrays. And if >> the arguments are indeed independent arrays, then n must specify >> their lengths (if not null-terminated).
> The standard doesn't say that.
I believe I clearly differentiated the text of the standard and what I infer to be the intended reading of that text. If you want to disagree with my inference, that's fine, but please don't put words in my mouth.
>> > as I wouldn't imagine that
>> > strncmp (s, "foo", 100);
>> > would be disallowed (the length of the array "foo" is 4, certainly >> > not 100).
>> The 100 is meant as the length only if the array is not >> null-terminated.
> The standard doesn't make a difference concerning their length > depending on whether the arrays are null-terminated or not.
I think you meant the standard doesn't draw a distinction concerning their length, etc. In any case, whether that inference is true or not depends on how the stated requirements are read (interpreted) by whoever is doing the reading. Obviously different people read these particular requirements in several different ways, so for some of them this conclusion might hold whereas for others it wouldn't.
>> Considering the phrasing in 7.21.4.4p3, I think the only >> sensible conclusion is that the intended semantics allows >> up to 'n' characters to be read (but not past the first >> null character) independently in either array, regardless >> of the function's return value. Of course I would agree >> that the existing wording could be improved and made more >> explicit, but is there really any doubt here about what >> reading is expected?
> I don't see how you come to this conclusion.
Yes, I can believe that you don't.
> You're assuming things not said in the standard.
Yes, of course I do. So does everyone who reads it. One difference is, I try to make my assumptions consciously and explicitly. Have you thought about what your operating assumptions are for determining what the Standard really requires? Can you explain what those assumptions are?
If we can bring the discussion up a level, I think we may be working at cross purposes here. Are you trying to argue how the Standard _must_ be understood on this point? How it _ought_ to be understood on this point? How the committee _expects_ it will be understood? How the committee believes it _should_ be understood? What the Standard _ought_ to require, regardless of what it does require? How the Standard will be understood here, _given certain assumptions_? Which of these applies to your comments? Or, if none of them do, how would you express what it is you are hoping to accomplish?
Vincent Lefevre <vincent-n...@vinc17.net> writes: > In article <ln4o5x6ljl....@nuthaus.mib.org>, > Keith Thompson <ks...@mib.org> wrote: [snip] >> Of course an implementation can read anything it likes if doing so >> has no effect.
>> The real question, may an implementation of strncmp read extra bytes >> *when that could have visible effects*?
>> int >> main (void) >> { >> return strncmp (s1, s2, 42) == 0; >> }
>> The standard's description of strncmp is:
>> The strncmp function compares not more than n characters >> (characters that follow a null character are not compared) >> from the array pointed to by s1 to the array pointed to by s2.
>> The strncmp function returns an integer greater than, equal to, >> or less than zero, accordingly as the possibly null-terminated >> array pointed to by s1 is greater than, equal to, or less >> than the possibly null-terminated array pointed to by s2. >> 7.21.4.5 The strxfrm function
>> Neither array contains a string (they're not '\0'-terminated), and >> strncmp permission to read as many as 42 bytes from each array.
> But I don't think the standard gives you this permission. > Certainly not explicitly.
Ok. So exactly what permission does it give? I don't think there's a very clear answer to that in the wording of the standard.
>> If I write >> s1[10] == s2[10] >> my program's behavior is undefined; does writing >> strncmp (s1, s2, 42) >> potentially cause the same problem?
>> I know what the answer *should* be. strncmp shouldn't read past the >> bonds of the array, because it doesn't need to. But I'm not quite >> convinced that the standard explicitly forbids it to do so.
> I'd say that it doesn't explicitly allow to do this. And as this is > not necessary, the implementation shouldn't assume that it may read > past differing characters.
I agree that it *shouldn't*, but I don't see that the standard actually says that it *may not*.
> BTW I think that using something like
> strncmp (s1, s2, SIZE_MAX)
> to compare '\n'-terminated characters sequences[*] that are known to > be different (by the context) would be quite convenient. Note that > one cannot use strcmp here because strcmp works on strings, and the > character sequences here are not null-terminated.
> [*] I don't say "strings" just to avoid the ambiguity with C, but > in the manual of some application that would work on such data, they > could be described as "strings".
I think memcmp() would make more sense.
Note that for this to be useful, you'd have to know that the arrays differ somewhere within their actual sizes, but not know (or not be able to compute easily) what those sizes are. Otherwise you could just pass the size of the arrays (or the lesser of their sizes if they differ) as the third argument.
-- Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst> Nokia "We must do something. This is something. Therefore, we must do this." -- Antony Jay and Jonathan Lynn, "Yes Minister"
> Vincent Lefevre <vincent-n...@vinc17.net> writes: > > In article <kfnwriuce6t....@x-alumni2.alumni.caltech.edu>, > > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >> Vincent Lefevre <vincent-n...@vinc17.net> writes:
> >> > No, it doesn't say that. It says: "The strncmp function *compares* > >> > not more than n characters". "compares", not "reads". [snip]
> >> That seems like a silly distinction. Surely we can infer > >> that to compare a value in X and a value in Y we must > >> first read the value in X, and read the value in Y.
> > The distinction is important for characters that are *not* compared. > Your point is getting lost here. For what Larry was saying (it > was included in my posting but I guess you snipped it), it > doesn't matter whether "reads" or "compares" is used.
Larry said:
| But it doesn't say that it only reads as many characters as are | necessary to determine the result, all it says is that it doesn't read | more than n characters
and this is incorrect. The standard uses the word "compare", not "read" (which would not make sense). But...
> Perhaps it is true that the distinction matters in regards to > _other_ matters, but in the context of Larry's statement there > doesn't seem to be any significant difference. Do you think > otherwise? Then please explain what it is.
If you say that what Larry meant (by changing "read" to "compare" in his statement) is: "the standard doesn't say that strncmp only compares as many characters as are necessary to determine the result, all it says is that it doesn't compare more than n characters".
That's true, but anyway the standard doesn't say that the implementation is allowed to compare n characters for every input (even not null-terminated).
> >> > So, one should assume that an implementation isn't allowed to do > >> > more than what the standard requires (except if this doesn't change > >> > the behavior). Otherwise more or less any function would have > >> > undefined behavior.
> >> Forgive me for being harsh, but I think the reasoning > >> here is bogus. The same reasoning applied to strcmp > >> would imply that it's okay to call strcmp on strings > >> that aren't null terminated but differ in some earlier > >> (legal) position. [...]
> > No, the same reasoning cannot be applied to strcmp, because the > > standard (7.21.4.2#2 in TC3) explicitely says: "The strcmp function > > compares the *string* pointed to by s1 to the *string* pointed to > > by s2."
> > It says "string", not "array", and ditto for 7.21.4.2#3. So, the > > strcmp arguments must be strings, thus null-terminated. > Okay, now you're cheating. You took a fuzzy argument, and let it > work one way in one case, and another way in the other case. You > can't have it both ways.
I don't understand what you mean here. For strcmp, the standard says "string", and for strncmp, the standard says "array". That's an important difference.
If the standard had said "string" for strncmp, then I would agree that reading n (non-null) characters would have been correct even if there were a difference earlier, because a string would guarantee that the first n characters (if non-null) are in readable memory.
> >> >> and then goes on to say, "Where an argument declared as size_t n > >> >> specifies the length of the array...", which would seem to apply in > >> >> this case.
> >> > The description of strncmp does not say that n specifies the length > >> > of the array. And it's quite clear that it does *not* specify the > >> > length of the array,
> >> 7.21.4.4p3 says in part:
> >> accordingly as the possibly null-terminated array pointed to > >> by s1 is greater than, equal to, or less than the possibly > >> null-terminated array pointed to by s2.
> >> Although it would benefit from being made more explicit, this > >> phrasing shows pretty clearly that strncmp() considers its arguments > >> to be, independently, possibly null-terminated arrays. And if > >> the arguments are indeed independent arrays, then n must specify > >> their lengths (if not null-terminated).
> > The standard doesn't say that. > I believe I clearly differentiated the text of the standard > and what I infer to be the intended reading of that text. If > you want to disagree with my inference, that's fine, but please > don't put words in my mouth.
Well I meant that your inference is incorrect because it is not based on what the standard really says.
> >> > would be disallowed (the length of the array "foo" is 4, certainly > >> > not 100).
> >> The 100 is meant as the length only if the array is not > >> null-terminated.
> > The standard doesn't make a difference concerning their length > > depending on whether the arrays are null-terminated or not. > I think you meant the standard doesn't draw a distinction > concerning their length, etc. In any case, whether that > inference is true or not depends on how the stated requirements > are read (interpreted) by whoever is doing the reading. > Obviously different people read these particular requirements > in several different ways, so for some of them this conclusion > might hold whereas for others it wouldn't.
only by assuming things not stated by the standard.
> >> Considering the phrasing in 7.21.4.4p3, I think the only > >> sensible conclusion is that the intended semantics allows > >> up to 'n' characters to be read (but not past the first > >> null character) independently in either array, regardless > >> of the function's return value. Of course I would agree > >> that the existing wording could be improved and made more > >> explicit, but is there really any doubt here about what > >> reading is expected?
> > I don't see how you come to this conclusion. > Yes, I can believe that you don't. > > You're assuming things not said in the standard. > Yes, of course I do.
What one(s) here? If this is just 7.21.4.4p3, could you explain? e.g. what you said in <kfnwriuce6t....@x-alumni2.alumni.caltech.edu> "And if the arguments are indeed independent arrays, then n must specify their lengths (if not null-terminated)."
Why "n must specify their lengths"? Why cannot the arrays have a smaller length (in the case this is sufficient to deduce the result)? And why would a null character take the precedence over n *concerning the length of the array*?
> So does everyone who reads it. One > difference is, I try to make my assumptions consciously and > explicitly. Have you thought about what your operating > assumptions are for determining what the Standard really > requires? Can you explain what those assumptions are?
I didn't assume anything. For instance, the standard doesn't say anything about the lengths of the arrays, so that one knows nothing particular about them (just that it is necessary to satisfy the semantics of strncmp).
> If we can bring the discussion up a level, I think we may be > working at cross purposes here. Are you trying to argue how > the Standard _must_ be understood on this point? How it > _ought_ to be understood on this point? How the committee > _expects_ it will be understood? How the committee believes > it _should_ be understood? What the Standard _ought_ to > require, regardless of what it does require? How the Standard > will be understood here, _given certain assumptions_? Which > of these applies to your comments? Or, if none of them do, > how would you express what it is you are hoping to accomplish?
Ideally the answer should be the same for any of them (without additional assumptions). Currently I disagree with your reasoning, but perhaps it is incomplete (see the above questions).
> Vincent Lefevre <vincent-n...@vinc17.net> writes: > > In article <ln4o5x6ljl....@nuthaus.mib.org>, > > Keith Thompson <ks...@mib.org> wrote: > [snip] > >> Of course an implementation can read anything it likes if doing so > >> has no effect.
> >> The real question, may an implementation of strncmp read extra bytes > >> *when that could have visible effects*?
> >> The strncmp function compares not more than n characters > >> (characters that follow a null character are not compared) > >> from the array pointed to by s1 to the array pointed to by s2.
> >> The strncmp function returns an integer greater than, equal to, > >> or less than zero, accordingly as the possibly null-terminated > >> array pointed to by s1 is greater than, equal to, or less > >> than the possibly null-terminated array pointed to by s2. > >> 7.21.4.5 The strxfrm function
> >> Neither array contains a string (they're not '\0'-terminated), and > >> strncmp permission to read as many as 42 bytes from each array.
> > But I don't think the standard gives you this permission. > > Certainly not explicitly. > Ok. So exactly what permission does it give? I don't think there's > a very clear answer to that in the wording of the standard.
I'd say: no particular permissions, i.e. just what is implied to satisfy the semantics (I mean that if two characters need to be compared to satisfy the semantics, then there is an implicit permission to read them).
Now, one may wonder how we can know whether two characters need to be compared or not. This is quite easy. The standard doesn't say in which order the characters are compared (potentially in parallel), but if k denotes the position of the first differing characters or null characters (k = n if there are none), then the result depends on the values of all the characters up to this position k, i.e. it is necessary to read and compare all these characters (up to allowed optimizations, as usual). And conversely, the result can be deduced from the values of these characters only.
> >> If I write > >> s1[10] == s2[10] > >> my program's behavior is undefined; does writing > >> strncmp (s1, s2, 42) > >> potentially cause the same problem?
> >> I know what the answer *should* be. strncmp shouldn't read past the > >> bonds of the array, because it doesn't need to. But I'm not quite > >> convinced that the standard explicitly forbids it to do so.
> > I'd say that it doesn't explicitly allow to do this. And as this is > > not necessary, the implementation shouldn't assume that it may read > > past differing characters. > I agree that it *shouldn't*, but I don't see that the standard > actually says that it *may not*.
The user has provided two arrays of length 10, on which the semantics of strncmp (whatever the value of n) is clearly defined (because of the differing characters). So, from the specification of strncmp (which doesn't require any additional constraint on the length of the arrays), I assume that
strncmp (s1, s2, 42)
will give the expected answer (no undefined behavior).
> > BTW I think that using something like
> > strncmp (s1, s2, SIZE_MAX)
> > to compare '\n'-terminated characters sequences[*] that are known to > > be different (by the context) would be quite convenient. Note that > > one cannot use strcmp here because strcmp works on strings, and the > > character sequences here are not null-terminated.
> > [*] I don't say "strings" just to avoid the ambiguity with C, but > > in the manual of some application that would work on such data, they > > could be described as "strings". > I think memcmp() would make more sense.
But how would you determine the value of n without doing the same work first?
The standard says for memcmp:
"The memcmp function compares the first n characters of the object pointed to by s1 to the first n characters of the object pointed to by s2."
so that both s1 and s2 must have at least n characters.
> Note that for this to be useful, you'd have to know that the arrays > differ somewhere within their actual sizes, but not know (or not be > able to compute easily) what those sizes are. Otherwise you could > just pass the size of the arrays (or the lesser of their sizes if > they differ) as the third argument.
Yes, this is the problem: I don't know what the sizes are.
> In article <lnsjtf60my....@nuthaus.mib.org>, > Keith Thompson <ks...@mib.org> wrote:
>> Vincent Lefevre <vincent-n...@vinc17.net> writes: >>> In article <ln4o5x6ljl....@nuthaus.mib.org>, >>> Keith Thompson <ks...@mib.org> wrote: ... >>>> The code in the original post was:
>>>> int >>>> main (void) >>>> { >>>> return strncmp (s1, s2, 42) == 0; >>>> } ... >>>> Neither array contains a string (they're not '\0'-terminated), and
>>>> strncmp permission to read as many as 42 bytes from each array.
>>> But I don't think the standard gives you this permission. >>> Certainly not explicitly.
>> Ok. So exactly what permission does it give? I don't think there's >> a very clear answer to that in the wording of the standard.
> I'd say: no particular permissions, i.e. just what is implied to > satisfy the semantics (I mean that if two characters need to be > compared to satisfy the semantics, then there is an implicit > permission to read them).
I think replacing "particular" with "explicit" would make more sense; those requirements seem pretty particular to me, but I can agree that they are not explicit.
That interpretation prohibits an implementation that loads an entire word's worth of characters from each array for comparison, compares the words for equality, and only bothers breaking the words up into individual characters for individual comparisons if the words are different. This approach could significantly speed up strncmp() on some platforms; are you sure the committee intended to prohibit it? Would it be a good idea to prohibit it? -- James Kuyper
> If you say that what Larry meant (by changing "read" to "compare" > in his statement) is: "the standard doesn't say that strncmp only > compares as many characters as are necessary to determine the result, > all it says is that it doesn't compare more than n characters".
> That's true, but anyway the standard doesn't say that the > implementation is allowed to compare n characters for every > input (even not null-terminated).
Right -- the standard doesn't clearly say one way or the other, so a strictly conforming program isn't allowed to depend on it being one way or the other. Which means that an implementation is free to do as it likes. -- Larry Jones
> Except strncmp() may not (observably) read any characters in > one of the argument arrays after the first null character in > that array. For example, I think everyone expects that code > like
The open question is whether you may pass pointers to arrays shorter than 1000 characters when you specify 1000 as the character count. This is not clear at all, unfortunately.
On the other hand, your example should better be valid because lots of code relies on it. 8-)
On 19/04/2011 10:27 AM, lawrence.jo...@siemens.com wrote:
> Right -- the standard doesn't clearly say one way or the other, so a > strictly conforming program isn't allowed to depend on it being one way > or the other. Which means that an implementation is free to do as it > likes.
Is that really how it works? I thought that if the standard doesn't clearly say one way ot the other, then it's simply not clear what a strictly program is allowed to depend on or what an implementation is required to do, but it's wise for them to try to err on the side of caution. And if someone cares enough, they can report the ambiguity as a defect in the standard and request that the Committee clarify their intent.
You're not saying that there are places in the standard that were intentionally made unclear, are you?
> Is that really how it works? I thought that if the standard doesn't > clearly say one way ot the other, then it's simply not clear what a > strictly program is allowed to depend on or what an implementation is > required to do, but it's wise for them to try to err on the side of > caution.
"A strictly conforming program...shall not produce output dependent on any unspecified...behavior...."
> You're not saying that there are places in the standard that were > intentionally made unclear, are you?
No, but there are things that have been deliberatly left unspecified or under-specified (although I don't think this is one of them). -- Larry Jones
>> Is that really how it works? I thought that if the standard doesn't >> clearly say one way or the other, then it's simply not clear what a >> strictly program is allowed to depend on or what an implementation is >> required to do, but it's wise for them to try to err on the side of >> caution.
> "A strictly conforming program...shall not produce output dependent on > any unspecified...behavior...."
And what unspecified behaviour are you referring to in this case?
3.4.4 "unspecified behavior: use of an unspecified value, or other behavior where this International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance"
I don't think this covers cases where the standard is unclear or ambiguous and the two possibilities are "X is allowed" and "X is forbidden" (note that those two are neither values nor behaviours).
>> You're not saying that there are places in the standard that were >> intentionally made unclear, are you?
> No, but there are things that have been deliberatly left unspecified or > under-specified (although I don't think this is one of them).
I assume you don't mean "unspecified" in the sense defined in 3.4.4?
>>> Is that really how it works? I thought that if the standard doesn't >>> clearly say one way or the other, then it's simply not clear what a >>> strictly program is allowed to depend on or what an implementation is >>> required to do, but it's wise for them to try to err on the side of >>> caution.
>> "A strictly conforming program...shall not produce output dependent on >> any unspecified...behavior...."
> And what unspecified behaviour are you referring to in this case?
> 3.4.4 "unspecified behavior: use of an unspecified value, or other > behavior where this International Standard provides two or more > possibilities and imposes no further requirements on which is chosen in > any instance"
> I don't think this covers cases where the standard is unclear or > ambiguous and the two possibilities are "X is allowed" and "X is > forbidden" (note that those two are neither values nor behaviours).
You're right, the "two or more possibilities" he's referring to cannot be "x is allowed" "x is forbidden". They have to be "x happens" and "x does not happen": in other words, what 3.4.4 is saying is that unless the standard explicitly says otherwise, "x is allowed". Keep in mind that "says otherwise" includes what the standard says about the behavior of strncmp(): that behavior must occur as described, so any unspecified behavior that occurs cannot, among other things, prevent strncmp() from returning it's specified value. Thus, anything, like a memory access violation, which would abort the program and thereby prevent strncmp() from returning it's specified value is prohibited.
If a program calls strncmp() with arguments that require it to access memory locations which are inaccessible in order to perform the behavior specified for strncmp() by the standard, then the onus for that violation is on the caller, and strncmp() is under no obligation to prevent that violation. However, strncmp() doesn't need to read any memory after the first difference, so if it chooses to do so anyway, it must protect any such read against the possibility of fatal access violations.
>>> You're not saying that there are places in the standard that were >>> intentionally made unclear, are you?
>> No, but there are things that have been deliberatly left unspecified or >> under-specified (although I don't think this is one of them).
> I assume you don't mean "unspecified" in the sense defined in 3.4.4?
I think he meant it in precisely that sense, but as applied to a different pair of possibilities than you did. -- James Kuyper
Florian Weimer <f...@deneb.enyo.de> writes: > * Tim Rentsch:
>> Except strncmp() may not (observably) read any characters in >> one of the argument arrays after the first null character in >> that array. For example, I think everyone expects that code >> like
> The open question is whether you may pass pointers to arrays shorter > than 1000 characters when you specify 1000 as the character count. > This is not clear at all, unfortunately.
I agree it is not as clear as it could be, and perhaps should be. But not clear at all? If it really weren't clear /at all/ then there would have been some question or DR about it sometime in the 20 years since the official description of strncmp() was written. So apparently it's clear enough so implementors haven't felt any need to ask about it.
> On the other hand, your example should better be valid because lots of > code relies on it. 8-)
I don't know of anyone who contends that the Standard allows undefined behavior for this kind of call to strncmp().
Vincent Lefevre <vincent-n...@vinc17.net> writes: > In article <kfnfwpfdied....@x-alumni2.alumni.caltech.edu>, > Tim Rentsch <t...@alumni.caltech.edu> wrote:
>> If we can bring the discussion up a level, I think we may be >> working at cross purposes here. Are you trying to argue how >> the Standard _must_ be understood on this point? How it >> _ought_ to be understood on this point? How the committee >> _expects_ it will be understood? How the committee believes >> it _should_ be understood? What the Standard _ought_ to >> require, regardless of what it does require? How the Standard >> will be understood here, _given certain assumptions_? Which >> of these applies to your comments? Or, if none of them do, >> how would you express what it is you are hoping to accomplish?
> Ideally the answer should be the same for any of them (without > additional assumptions). Currently I disagree with your reasoning, > but perhaps it is incomplete (see the above questions).
That isn't really a response to the question. What are you hoping to accomplish by these postings?
(I have some other comments/responses but I have to defer a longer reply until later.)
>> Is that really how it works? I thought that if the standard doesn't >> clearly say one way ot the other, then it's simply not clear what a >> strictly program is allowed to depend on or what an implementation is >> required to do, but it's wise for them to try to err on the side of >> caution.
> "A strictly conforming program...shall not produce output dependent on > any unspecified...behavior...."
>> You're not saying that there are places in the standard that were >> intentionally made unclear, are you?
> No, but there are things that have been deliberatly left unspecified or > under-specified (although I don't think this is one of them).
But, you do agree that the description of strncmp() is expected to be understood as unspecified/under-specified, in that it allows reading of characters after the point of mismatch (with no reads in an array after a null in that array), even though such reads are not absolutely necessary to determine the return value. Right?
That is, even though unspecified behavior (and the resulting possible undefined behavior) may not have been deliberate, AFAYK this meaning is what was intended - yes?
> On 04/19/2011 11:32 PM, Wojtek Lerch wrote: >> 3.4.4 "unspecified behavior: use of an unspecified value, or other >> behavior where this International Standard provides two or more >> possibilities and imposes no further requirements on which is chosen in >> any instance"
>> I don't think this covers cases where the standard is unclear or >> ambiguous and the two possibilities are "X is allowed" and "X is >> forbidden" (note that those two are neither values nor behaviours).
> You're right, the "two or more possibilities" he's referring to cannot > be "x is allowed" "x is forbidden". They have to be "x happens" and "x > does not happen": in other words, what 3.4.4 is saying is that unless > the standard explicitly says otherwise, "x is allowed".
Um no, 3.4.4 is just the definition of "unspecified". It says nothing about any behaviours except those that the standard refers to as "unspecified". Whether x is allowed can potentially depend on many things that the standard says explicitly, implies, or is silent about, and also on what exactly you mean by "allowed" and whether x is an action by a program or by the implementation.
(In this case, substitute "x is allowed" with "the implementation is allowed to access all the n bytes if none of them is a null character", or with "the program is allowed to provide arrays that are shorter than n bytes, as long as they differ or are null-terminated". At least one of those must be false, even though the standard does not explicitly say that.)
> Keep in mind > that "says otherwise" includes what the standard says about the behavior > of strncmp(): that behavior must occur as described, so any unspecified > behavior that occurs cannot, among other things, prevent strncmp() from > returning it's specified value.
Except in programs that have undefined behaviour because they have violated some requirement of the standard.
> Thus, anything, like a memory access > violation, which would abort the program and thereby prevent strncmp() > from returning it's specified value is prohibited.
Unless the program has undefined behaviour.
> If a program calls strncmp() with arguments that require it to access > memory locations which are inaccessible in order to perform the behavior > specified for strncmp() by the standard,
... or just arguments for which the standard does not define the behaviour, or specifically says that they cause undefined behaviour,...
> then the onus for that > violation is on the caller, and strncmp() is under no obligation to > prevent that violation. However, strncmp() doesn't need to read any > memory after the first difference, so if it chooses to do so anyway, it > must protect any such read against the possibility of fatal access > violations.
What strcmp() "needs" to read is the wrong question. The right questions are what conditions have to be met by the program to avoid undefined behaviour, and if they are met, what coditions must then be met by the implementation to ensure conformance. In this case, the text says that the function compares no more than n characters, but not beyond the first null character; I don't find it outrageously illogical to argue that it's therefore the program's responsibility to ensure that the arguments point to arrays that are either null-terminated or at least n characters long. Just because it might be possible for the implementation to fulfill the same description of semantics if the pre-conditions were looser does not mean that they actually are looser.
>>>> You're not saying that there are places in the standard that were >>>> intentionally made unclear, are you?
>>> No, but there are things that have been deliberatly left unspecified or >>> under-specified (although I don't think this is one of them).
>> I assume you don't mean "unspecified" in the sense defined in 3.4.4?
> I think he meant it in precisely that sense, but as applied to a > different pair of possibilities than you did.
Perhaps. But, frankly, I am not really in a mood for second-guessing what he might possibly have meant any more than I'm in the mood for second-guessing the standard. As far as I am concerned, neither he nor the standard is clear about their intent, and unless they're willing to clarify it, I don't see much point in trying to guess it.
> On 20/04/2011 6:36 AM, James Kuyper wrote: >> On 04/19/2011 11:32 PM, Wojtek Lerch wrote: >>> 3.4.4 "unspecified behavior: use of an unspecified value, or other >>> behavior where this International Standard provides two or more >>> possibilities and imposes no further requirements on which is chosen in >>> any instance"
>>> I don't think this covers cases where the standard is unclear or >>> ambiguous and the two possibilities are "X is allowed" and "X is >>> forbidden" (note that those two are neither values nor behaviours).
>> You're right, the "two or more possibilities" he's referring to cannot >> be "x is allowed" "x is forbidden". They have to be "x happens" and "x >> does not happen": in other words, what 3.4.4 is saying is that unless >> the standard explicitly says otherwise, "x is allowed".
> Um no, 3.4.4 is just the definition of "unspecified". It says nothing > about any behaviours except those that the standard refers to as > "unspecified". Whether x is allowed can potentially depend on many > things that the standard says explicitly, implies, or is silent about, > and also on what exactly you mean by "allowed" and whether x is an > action by a program or by the implementation.
In this case, by "x is allowed", I mean that an implementation remains conforming regardless of whether or not it does "x". When the standard leaves it unspecified whether or not x occurs, then conformance with the standard cannot depend upon whether or not "x" occurs.
I agree that when the standard is silent about the behavior, "unspecified" is not the correct word; the correct word is undefined. The standard must, at least implicitly, provide two or more permitted behaviors, in order for the choice between those behaviors to be unspecified.
...
>> Keep in mind >> that "says otherwise" includes what the standard says about the behavior >> of strncmp(): that behavior must occur as described, so any unspecified >> behavior that occurs cannot, among other things, prevent strncmp() from >> returning it's specified value.
> Except in programs that have undefined behaviour because they have > violated some requirement of the standard.
Of course.
>> Thus, anything, like a memory access >> violation, which would abort the program and thereby prevent strncmp() >> from returning it's specified value is prohibited.
> Unless the program has undefined behaviour.
Agreed.
>> If a program calls strncmp() with arguments that require it to access >> memory locations which are inaccessible in order to perform the behavior >> specified for strncmp() by the standard,
> ... or just arguments for which the standard does not define the > behaviour, or specifically says that they cause undefined behaviour,...
>> then the onus for that >> violation is on the caller, and strncmp() is under no obligation to >> prevent that violation. However, strncmp() doesn't need to read any >> memory after the first difference, so if it chooses to do so anyway, it >> must protect any such read against the possibility of fatal access >> violations.
> What strcmp() "needs" to read is the wrong question. The right > questions are what conditions have to be met by the program to avoid > undefined behaviour, and if they are met, what coditions must then be > met by the implementation to ensure conformance. In this case, the text > says that the function compares no more than n characters, but not > beyond the first null character; I don't find it outrageously illogical > to argue that it's therefore the program's responsibility to ensure that > the arguments point to arrays that are either null-terminated or at > least n characters long. Just because it might be possible for the > implementation to fulfill the same description of semantics if the > pre-conditions were looser does not mean that they actually are looser.
Behavior can be undefined by reason of the absence of a definition; but that rule is frequently misused. The key point is that an applicable definition must be actually absent; it can't merely fail to say anything explicit about a special case; as long as what it does say can apply to that case. The definition provided for strncmp() does cover the case you're worried about; that it doesn't say anything special about that case merely implies that there is nothing special to say about that case; it doesn't make the behavior in that case undefined.
Definitions of the behavior don't have to provide an explicit list of everything that is not allowed to happen; requiring that they do so would force the replacement of each and every clause in the standard with something bigger than the Library of Congress. When one particular behavior is defined by the standard, no other behavior that is within the scope of this standard is allowed to happen (except insofar as it is covered by the as-if rule). -- James Kuyper
> On 04/21/2011 04:33 PM, Wojtek Lerch wrote: >> On 20/04/2011 6:36 AM, James Kuyper wrote: >>> You're right, the "two or more possibilities" he's referring to cannot >>> be "x is allowed" "x is forbidden". They have to be "x happens" and "x >>> does not happen": in other words, what 3.4.4 is saying is that unless >>> the standard explicitly says otherwise, "x is allowed".
>> Um no, 3.4.4 is just the definition of "unspecified". [...]
> In this case, by "x is allowed", I mean that an implementation remains > conforming regardless of whether or not it does "x". When the standard > leaves it unspecified whether or not x occurs, then conformance with the > standard cannot depend upon whether or not "x" occurs.
Sure. The only thing I was disagreeing with there was that it's 3.4.4 that's saying that. :)
> I agree that when the standard is silent about the behavior, > "unspecified" is not the correct word; the correct word is undefined. > The standard must, at least implicitly, provide two or more permitted > behaviors, in order for the choice between those behaviors to be > unspecified.
Actually, when the two possibilities are "x happens" and "x does not happen", I think it's perfectly acceptable to say that it's unspecified whether x happens or not, even if the standard doesn't explicitly say that, or even mention that those two possibilities exist. I would tend to avoid using the word "undefined" when discussing such cases.
>>> [...] However, strncmp() doesn't need to read any >>> memory after the first difference, so if it chooses to do so anyway, it >>> must protect any such read against the possibility of fatal access >>> violations.
>> What strcmp() "needs" to read is the wrong question. [...] > Behavior can be undefined by reason of the absence of a definition; but > that rule is frequently misused.
Perhaps; but I was not trying to apply it here.
> [...] The definition provided for strncmp() does cover the case > you're worried about; that it doesn't say anything special about that > case merely implies that there is nothing special to say about that > case; it doesn't make the behavior in that case undefined.
The definition provided for strncmp() consists of two parts. The first part says that the function compares no more than n characters from each array and that characters that follow a null character are not compared. The second part specifies what value the function returns.
This is how I would summarize the two competing interpretations that this thread is about:
#1 The two parts should be interpreted somewhat separately. The first part implies that in the absence of a null character the function is allowed to compare n bytes (regardless of whether it needs to or not). This constitutes a pre-condition that the program must satisfy, or otherwise the behaviour is undefined (not by omission, but by failing to ensure that what the standard allows the implementation to do is safe). The second part does not cancel any undefined behaviour that a program has invoked by violating the first part.
#2 The two parts should be interpreted together, as a description of an algorithm (a little bit like the standard's description of asctime()). Since that algorithm doesn't compare characters following the first difference, the implementation is not allowed to read them either, and programs are not required to ensure that reading them would be safe.
Personally I don't have the feeling that the text clearly favours either one of those interpretations. As a programmer, I prefer #1 because it's safer. If I were an implementer, I'd prefer #2 for the same reason. But as far as comp.std.c goes, I think the important thing is that this is an ambiguity in the standard that deserves to be recognized as a defect.
> On 21/04/2011 6:05 PM, James Kuyper wrote: ... >> I agree that when the standard is silent about the behavior, >> "unspecified" is not the correct word; the correct word is undefined. >> The standard must, at least implicitly, provide two or more permitted >> behaviors, in order for the choice between those behaviors to be >> unspecified.
> Actually, when the two possibilities are "x happens" and "x does not > happen", I think it's perfectly acceptable to say that it's unspecified > whether x happens or not, even if the standard doesn't explicitly say > that, or even mention that those two possibilities exist. I would tend > to avoid using the word "undefined" when discussing such cases.
True, but when the standard is silent, about something, there's usually also a "y happens" and a "x and y both happen", among infinitely many other possibilities. In this case, y might be "strncmp writes into the arrays" or "strncmp() opens a file with a name matching the first string argument". The standard doesn't say anything more to prohibit either of those options, than it does to prohibit reading past the characters that need to be read to in order to determine strncmp()'s return value. I believe that the description that is provided implicitly allows strncmp() to do anything whose observable consequences are restricted to those specified in the description. This includes reading past the needed characters, but only so long as doing so has no observable consequences; a memory access violation is observable in this sense.
...
> #2 The two parts should be interpreted together, as a description of an > algorithm (a little bit like the standard's description of asctime()). > Since that algorithm doesn't compare characters following the first > difference, the implementation is not allowed to read them either, and > programs are not required to ensure that reading them would be safe.
That's basically my understanding, except that I wouldn't say that the strncmp() can't read additional characters beyond the ones it needs to. It's allowed to read anything it wants to, but only insofar as such reading is covered by the as-if rule.
...
> But as far as comp.std.c goes, I think the important thing is that this > is an ambiguity in the standard that deserves to be recognized as a defect.
> I don't know of anyone who contends that the Standard allows > undefined behavior for this kind of call to strncmp().
I believe Andreas' question was prompted by a change to GNU libc which assumes undefined behavior in this case. Someone wrote that patch and apparently assumes that there is a loophole here. 8-)
Tim Rentsch <t...@alumni.caltech.edu> writes: > Phil Carmody <thefatphil_demun...@yahoo.co.uk> writes:
> > Tim Rentsch <t...@alumni.caltech.edu> writes: > >> Vincent Lefevre <vincent-n...@vinc17.net> writes: > >> > It says: "The strncmp function *compares* > >> > not more than n characters". "compares", not "reads". [snip]
> >> That seems like a silly distinction. Surely we can infer > >> that to compare a value in X and a value in Y we must > >> first read the value in X, and read the value in Y.
> > You have the implication the wrong way round. You may not compare > > unless you've read, clearly, but you may read and then not compare > > [snip example]
> Did you misunderstand what I wrote? My implication is the same > as yours: 'compare implies read', not 'read implies compare'.
Bizarre. Total brainfart.
Phil -- "At least you know where you are with Microsoft." "True. I just wish I'd brought a paddle." -- Matthew Vernon
In article <iok0iu$an...@dont-email.me>, James Kuyper <jameskuy...@verizon.net> wrote:
> That interpretation prohibits an implementation that loads an entire > word's worth of characters from each array for comparison, compares the > words for equality, and only bothers breaking the words up into > individual characters for individual comparisons if the words are > different. This approach could significantly speed up strncmp() on some > platforms; are you sure the committee intended to prohibit it? Would it > be a good idea to prohibit it?
> Vincent Lefevre <vincent-n...@vinc17.net> writes: > > In article <kfnfwpfdied....@x-alumni2.alumni.caltech.edu>, > > Tim Rentsch <t...@alumni.caltech.edu> wrote:
> >> If we can bring the discussion up a level, I think we may be > >> working at cross purposes here. Are you trying to argue how > >> the Standard _must_ be understood on this point? How it > >> _ought_ to be understood on this point? How the committee > >> _expects_ it will be understood? How the committee believes > >> it _should_ be understood? What the Standard _ought_ to > >> require, regardless of what it does require? How the Standard > >> will be understood here, _given certain assumptions_? Which > >> of these applies to your comments? Or, if none of them do, > >> how would you express what it is you are hoping to accomplish?
> > Ideally the answer should be the same for any of them (without > > additional assumptions). Currently I disagree with your reasoning, > > but perhaps it is incomplete (see the above questions). > That isn't really a response to the question. What > are you hoping to accomplish by these postings?
I don't understand what you mean.
My point is about what the standard says. Not more.
> In article <iok0iu$an...@dont-email.me>, > James Kuyper <jameskuy...@verizon.net> wrote:
>> That interpretation prohibits an implementation that loads an entire >> word's worth of characters from each array for comparison, compares the >> words for equality, and only bothers breaking the words up into >> individual characters for individual comparisons if the words are >> different. This approach could significantly speed up strncmp() on some >> platforms; are you sure the committee intended to prohibit it? Would it >> be a good idea to prohibit it?
> It depends on whether one may accept to regard
> strncmp( big, "foo", 1000 );
> as undefined behavior or not.
The standard provides a definition of the behavior, so we cannot infer undefined behavior due to the absence of a definition. That implies that if reading beyond the end of "foo" could cause behavior inconsistent with that description, the implementation is obligated to ensure that strncmp() does not do so.
Your statement of the implied requirement seems somewhat stronger; it seems to prohibit reading additional characters, even if there's no resulting problematic behavior. However, under the as-if rule, that doesn't really matter. -- James Kuyper
In article <91ca4bFnj...@mid.individual.net>, Wojtek Lerch <wojte...@yahoo.ca> wrote:
> The definition provided for strncmp() consists of two parts. The first > part says that the function compares no more than n characters from each > array and that characters that follow a null character are not compared. > The second part specifies what value the function returns.
This ("[...] consists of two parts") is completely not true. There's everything else in the standard that can relate to strncmp, and in particular 7.21.4p1, where strncmp is explicitly mentioned:
The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp is determined by the sign of the difference between the values of the first pair of characters (both interpreted as unsigned char) that differ in the objects being compared.
I'd say that's the "real" first part. 7.21.4.4p2 completes this description by saying that one may need to stop earlier in the comparisons (from a *semantics* point of view):
The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to by s2.
> This is how I would summarize the two competing interpretations that > this thread is about: > #1 The two parts should be interpreted somewhat separately. The first > part implies that in the absence of a null character the function is > allowed to compare n bytes (regardless of whether it needs to or not). > This constitutes a pre-condition that the program must satisfy, or > otherwise the behaviour is undefined (not by omission, but by failing to > ensure that what the standard allows the implementation to do is safe). > The second part does not cancel any undefined behaviour that a program > has invoked by violating the first part.
I wonder whether it is correct to interpret parts separately. By taking clauses out of context, one can probably find contradictions elsewhere in the standard.
Note that "not more than n ..." is not equivalent to "n except in the case ... [where it is less than n]". If this is what is meant, this should be rewritten as "[...] compares the first n characters, except [...]" (note: "the first n characters" would be the same wording as in the memcmp description).
I would agree with the implication only if you consider that everything that is not forbidden is allowed: it is not forbidden to compare n characters (except when there is a null character), so it is allowed to compare them. But then I would say that since it is not forbidden to *read* n+1 characters (or read characters after a null character), then it is allowed to read them, and strncmp would be rather useless.
So, your interpretation is based on something that is not explicitly said by the standard.
> #2 The two parts should be interpreted together, as a description of an > algorithm (a little bit like the standard's description of asctime()).
I wouldn't say that they (with the "real" first part I've mentioned above) describe an algorithm, but that they describe the semantics (as a recursive function -- this looks like an algorithm but that's not important at all).
> Since that algorithm doesn't compare characters following the first > difference, the implementation is not allowed to read them either, and > programs are not required to ensure that reading them would be safe.
This is not my reasoning, though the conclusion would be the same here. My point is that that standard doesn't say what is the size of the objects, so that the implementation shouldn't assume anything on their size, *provided* that the behavior (specified by the semantics) is well-defined: the objects must have at least size 1; if the first characters are the same and not null and n >= 2, then the objects must have at least size 2, otherwise the behavior would be obviously undefined; and so on...
> Personally I don't have the feeling that the text clearly favours either > one of those interpretations. As a programmer, I prefer #1 because it's > safer. If I were an implementer, I'd prefer #2 for the same reason. > But as far as comp.std.c goes, I think the important thing is that this > is an ambiguity in the standard that deserves to be recognized as a defect.
> On 04/27/2011 07:18 AM, Vincent Lefevre wrote: > > In article <iok0iu$an...@dont-email.me>, > > James Kuyper <jameskuy...@verizon.net> wrote:
> >> That interpretation prohibits an implementation that loads an entire > >> word's worth of characters from each array for comparison, compares the > >> words for equality, and only bothers breaking the words up into > >> individual characters for individual comparisons if the words are > >> different. This approach could significantly speed up strncmp() on some > >> platforms; are you sure the committee intended to prohibit it? Would it > >> be a good idea to prohibit it?
> > It depends on whether one may accept to regard
> > strncmp( big, "foo", 1000 );
> > as undefined behavior or not. > The standard provides a definition of the behavior, so we cannot infer > undefined behavior due to the absence of a definition.
[...]
I agree (though some people would interpret the standard differently). But then, the implementation you proposed would no longer work correctly on such a case.
What I want to say is that the fact that the standard doesn't allow reading characters after a null character (e.g. in the case where there would be a memory boundary, so that the as-if rule doesn't apply) limits somewhat the implementation. And there would be almost no benefits by allowing to read n characters when there are no null characters.
Or do you know how an implementation could take the advantage of reading n characters, e.g. if the processor has an instruction to read 16 bytes at a time very quickly?