On 01/05/2018 05:30 PM, William Ahern wrote:
> James R. Kuyper <
james...@verizon.net> wrote:
>> On 01/04/2018 07:19 PM, William Ahern wrote:
> <snip>
>>> So you should be able to allocate a char array of size PTRDIFF_MAX+1. Say P
>>
>> Since ptrdiff_t is a signed type, unless PTRDIFF_MAX < INT_MAX,
>> PTRDIFF_MAX+1 involves signed overflow, and therefore undefined
>> behavior; on typical 2's complement systems the undefined behavior will
>> often take the form of giving a result that's negative.
>
> I should have written (size_t)PTRDIFF_MAX+1, or conversely used different
> language or notation that couldn't be conflated with compliant C code.
>
> But thank you for interpreting my psuedo-code literally and then deducing
> falsehood by illustrating the presence of undefined behavior.
My first draft treated it as just pseudo-code, so I simply pointed out
that you should have warned people that PTRDIFF_MAX+1 should not be
interpreted as a C expression. However, I then noticed your claim that
"(Q-P)+1 evaluates to PTRDIFF_MAX+1 in a well-defined manner", and
concluded that you were in fact unaware that the addition has undefined
behavior. That's when I decided to re-write my response based upon the
possibility that you didn't know that PTRDIFF_MAX+1 probably has
undefined behavior.
Other things you've written, in both this message and the previous one,
suggest that you were in fact aware that it has undefined behavior - but
then why use the incorrect phrase "well-defined" in that sentence? I
was, and remain, confused by that conflict.
...
>>> ... but (Q+1)-P isn't representable as ptrdiff_t and thus
>>> invokes undefined behavior.
>>
>> Correct, and when that is the case, the undefined behavior of that
>> expression overrides the promise implied by the identity that is
>> expressed in terms of that expression.
>
> Conversely, we can avoid extrinsic qualification by not assuming that the
> standard defines the behavior of P+N where N is greater than PTRDIFF_MAX.
That the standard defines the behavior of that expression is not an
assumption, it's a conclusion derived from the words from the standard
containing that definition:
"... if the expression P points to the i-th element of an array object,
the expressions (P)+N ... (where N has the value n) point[s] to ... the
i+n-th ... element of the array object, provided [it] exist[s]." (6.5.6p8).
In this definition, i+n must be interpreted as a mathematical
expression, rather than a C expression. If the array object contains at
least i+n objects, then it's perfectly clear what that definition means,
even if n>PTRDIFF_MAX. There are no direct constraints on the value of
n. The only indirect constraints on n are that "If both the pointer
operand and the result point to elements of the same array object, or
one past the last element of the array object, the evaluation shall not
produce an overflow; otherwise, the behavior is undefined." There's no
constraint on n based upon comparing it with PTRDIFF_MAX.
...
> In the context of a discussion about designing what are often called "safe"
> primitives, we should be especially concerned with not only pointing out
> flaws, but pointing out a failure to affirmatively prove correctness. Take a
> function like the following
>
> char *lc(char *src, size_t len) {
> char *p = src;
> while (p - src < srclen) {
I'm going to assume that srclen and len were supposed to be the same. If
that assumption is incorrect, please explain.
> unsigned char c = *p;
> *(unsigned char *)p++ = tolower(c);
> }
> return src;
> }
>
> which uses identical bounds-checking logic to the supposedly correct
> solution used in CERT C Coding Standard rule STR37-C.[1]>
> It's not "safe" for all valid inputs, presuming the implementation properly
> supports objects greater than PTRDIFF_MAX in size. You explain the reason
> why it's problematic and propose
>
> char *lc(char *src, size_t len) {
> char *p = src, *pe = src + len;
> while (p < pe) {
> unsigned char c = *p;
> *(unsigned char *)p++ = tolower(c);
> }
> return src;
> }
>
> The engineer smartly asks, "but if p - src might be undefined, why wouldn't
> src + len be undefined for the same inputs".
p-src can have undefined behavior because "If the result is not
representable in an object of that type [ptrdiff_t], the behavior is
undefined." (6.5.6p9). The engineer cannot cite a corresponding clause
that makes the behavior of src+len undefined.
Let me moderate that statement slightly. The engineer cannot point to
any restriction based, directly or indirectly, upon comparing the value
of len with PTRDIFF_MAX. However, he can point to undefined behavior so
long as len > 1, by "omission of any explicit definition of behavior"
(4p6), based upon the fact that len is (presumably) the actual length of
the array. I presume that this point is NOT the one you're arguing
about. Let me explain that point:
The standard, as written, implies that the only way to create a pointer
one past the end of an array is by adding 1 to a pointer to the last
element of the array. Many people believe that if P points to the end
of the array, and P-n points within the array, then the standard
requires (P-n)+(n+1) to be an alternative way of calculating such a
pointer. They reach this conclusion by rearranging the math into P-n+n+1
=> P+(-n+n+1) => P+1; but the standard doesn't provide justification for
such a rearrangement. Its definition of how pointer addition and
subtraction works in general would require that rearrangement to be
correct, if P pointed anywhere else in that array, so long as P-n also
pointed inside the array. However, note that the definition (which I
quoted earlier), ends with the phrase "provided they exist." When the
result would point one past the end of the array, the corresponding
element does NOT exist, and that definition does not apply, and
therefore, cannot be used to justify such rearrangements.
However, the committee intended that this should work, and it does in
fact work for essentially all real world implementations. I have, in
fact, had a great deal of trouble convincing many people that the
wording of the standard doesn't match that intent. Even if I'm right,
it's certainly not what your argument is about.
> ... The best we can say in defense
> of the supposedly better implementation is, apparently, that it is because
> it is.[2]
I prefer saying "the worse implementation can have undefined behavior in
cases where the better one only has undefined behavior if you believe
the arguments of one crazy pedant that the words of the standard don't
correctly implement the intent of the committee. No real world
implementation has a problem with src+len, where src points at the start
of an array and len is the number of elements in that array."