Static _Bool initialization

33 views
Skip to first unread message

Michal Necasek

unread,
May 11, 2006, 3:25:18 PM5/11/06
to

I'm looking for clarification about initialization of static _Bool
variables in C99. Consider the following:

extern int foo[];
_Bool bar = &foo[0];

By my reading of C99, which could well be incorrect, '&foo[0]' is a
valid constant expression (address constant) and 6.3.1.2 specifies how
such value is converted to _Bool; therefore the 'bar' initializer is
valid C99. Because 6.3.2.3 tells us that the value of '&foo[0]' cannot
be null, a compiler can conclude that 'bar' will be initialized to '1'.

But how about the following:

extern int foo[];
_Bool bar = &foo[42];

In this case, '&foo[42]' might or might not evaluate to a null
pointer, but that is not known at compile time. If we had something like

extern int foo[];
void *baz = &foo[42];

then there'd be no problem, because linkers and loaders can easily
handle this situation. But not so for _Bool.

I can see two possible solutions:

(1) The C99 committee deliberately or accidentally designed the C99
language such that this feature cannot be implemented with currently
available linker/loader technology, or

(2) I missed something in the standard and 'extern int foo[]; _Bool
bar = &foo[42];' is not valid C99.

Which is it? Is a C99 compiler entitled to reject such code, or is it
just put into an impossible situation?


Michal

David R Tribble

unread,
May 11, 2006, 5:27:10 PM5/11/06
to
Michal Necasek wrote:
> I'm looking for clarification about initialization of static _Bool
> variables in C99. Consider the following:
> [...]

>
> extern int foo[];
> _Bool bar = &foo[42];
>
> In this case, '&foo[42]' might or might not evaluate to a null
> pointer, but that is not known at compile time.

How could &foo[42] evaluate to null?

Are you assuming some kind of wrap-around pointer value, where foo+42
results in an address indistinguishable from null?

In any case, &foo[42] and its equivalent foo+42 do not need to be known
at compile time for the program to compile. The address of foo+42 will
be loaded at runtime. Whether or not this results in an invalid
pointer cannot be known at compile time.

-drt

jacob navia

unread,
May 11, 2006, 5:38:50 PM5/11/06
to
David R Tribble a écrit :

I think that

_Bool bar &foo[42];

is invalid. Can't assign a pointer to a _Bool, types are incompatible.


Wojtek Lerch

unread,
May 11, 2006, 5:35:47 PM5/11/06
to
David R Tribble wrote:
> Michal Necasek wrote:
>
>>I'm looking for clarification about initialization of static _Bool
>>variables in C99. Consider the following:
>>[...]
>>
>>extern int foo[];
>>_Bool bar = &foo[42];
>>
>>In this case, '&foo[42]' might or might not evaluate to a null
>>pointer, but that is not known at compile time.
>
>
> How could &foo[42] evaluate to null?

Perhaps he's referring to the "Insufficient guarantees for null
pointers?" thread from March, where some people argued that when foo has
exactly 42 elements, the standard does not guarantee that foo+42 is not
a null pointer.

> Are you assuming some kind of wrap-around pointer value, where foo+42
> results in an address indistinguishable from null?
>
> In any case, &foo[42] and its equivalent foo+42 do not need to be known
> at compile time for the program to compile. The address of foo+42 will
> be loaded at runtime. Whether or not this results in an invalid
> pointer cannot be known at compile time.

The compiler can simply initialize bar to 1 without worrying about the
address of foo+42.

Michal Necasek

unread,
May 11, 2006, 6:00:40 PM5/11/06
to
jacob navia wrote:

> I think that
>
> _Bool bar &foo[42];
>
> is invalid. Can't assign a pointer to a _Bool, types are incompatible.
>
>

See 6.3.1.2 of C99.


Michal

Michal Necasek

unread,
May 11, 2006, 6:11:58 PM5/11/06
to
David R Tribble wrote:

> How could &foo[42] evaluate to null?
>
> Are you assuming some kind of wrap-around pointer value, where foo+42
> results in an address indistinguishable from null?
>

Yes.

But, for the sake of argument, it could be

extern int foo;
_Bool bar = &foo - 42;

where '&foo - 42' clearly might evaluate to a null pointer - of course
42 might be replaced by some other constant. (or if not, please point me
to relevant text in C99 which renders the above invalid)

> In any case, &foo[42] and its equivalent foo+42 do not need to be known
> at compile time for the program to compile. The address of foo+42 will
> be loaded at runtime. Whether or not this results in an invalid
> pointer cannot be known at compile time.
>

Yes, exactly. What I'm asking is, by what mechanism is a C99
translator expected to convert an address that will only be known at
runtime to a value of type _Bool in a static initializer? 6.3.1.2 is
quite clear on how the pointer value is to be converted, but it's not at
all clear to me how this conversion is supposed to be implemented in the
case of static initializers.


Michal

Douglas A. Gwyn

unread,
May 11, 2006, 6:24:29 PM5/11/06
to
Michal Necasek wrote:
> extern int foo[];
> _Bool bar = &foo[42];
> In this case, '&foo[42]' might or might not evaluate to a null
> pointer, but that is not known at compile time.

However, if it did happen to evaluate to something that might
compare equal to a null pointer, the code would already have
undefined behavior, so we really don't care.

David R Tribble

unread,
May 11, 2006, 6:48:26 PM5/11/06
to
David R Tribble wrote:
>> How could &foo[42] evaluate to null?
>>
>> Are you assuming some kind of wrap-around pointer value, where foo+42
>> results in an address indistinguishable from null?
>

Michal Necasek wrote:
> Yes. But, for the sake of argument, it could be
>
> extern int foo;
> _Bool bar = &foo - 42;
>
> where '&foo - 42' clearly might evaluate to a null pointer - of course
> 42 might be replaced by some other constant. (or if not, please point me
> to relevant text in C99 which renders the above invalid)
>

David R Tribble wrote:
>> In any case, &foo[42] and its equivalent foo+42 do not need to be known
>> at compile time for the program to compile. The address of foo+42 will
>> be loaded at runtime. Whether or not this results in an invalid
>> pointer cannot be known at compile time.
>

Michal Necasek wrote:
> Yes, exactly. What I'm asking is, by what mechanism is a C99
> translator expected to convert an address that will only be known at
> runtime to a value of type _Bool in a static initializer? 6.3.1.2 is
> quite clear on how the pointer value is to be converted, but it's not at
> all clear to me how this conversion is supposed to be implemented in the
> case of static initializers.

For all non-array extern objects, the compiler must assume non-null
addresses. For elements within extern/static arrays, the compiler
must assume that the first element of the array is at a non-null
address. But beyond that (allowing for maximum object sizes) I'm
not sure the standard has anything to say about the addresses of
elements of extern arrays of unknown size. (Does anyone else know?)

-drt

Michal Necasek

unread,
May 11, 2006, 7:01:52 PM5/11/06
to
Douglas A. Gwyn wrote:

> However, if it did happen to evaluate to something that might
> compare equal to a null pointer, the code would already have
> undefined behavior, so we really don't care.
>

OK, I like that explanation :) Thanks.

How about this:

int foo[15];
_Bool bar = &foo[10];

GCC 3.3.2 rejects this with "initializer element is not constant". GCC
4.0.0 rejects it with "initializer element is not computable at load time".

Is that allowable behaviour of a C99 compiler or not? Or must a C99
compiler accept the code (and initialize 'bar' to 1)?


Michal

Douglas A. Gwyn

unread,
May 11, 2006, 7:52:39 PM5/11/06
to
David R Tribble wrote:
> ... But beyond that (allowing for maximum object sizes) I'm

> not sure the standard has anything to say about the addresses of
> elements of extern arrays of unknown size. ...

Somewhere among all the standardese (probably under the
additive operators) is a requirement that there be at the
attempted result-address an actual element of an array based
on that pointer, or that the address be "one past the end" of
such an array; otherwise the behavior is undefined. The fact
that the length of the array is determined in a separate
compilation is irrelevant, except that it prevents compile-
time checking against that requirement. (Such checking isn't
required of conforming implementations.)

Michal Necasek

unread,
May 11, 2006, 8:29:12 PM5/11/06
to
Douglas A. Gwyn wrote:

> Somewhere among all the standardese (probably under the
> additive operators) is a requirement that there be at the
> attempted result-address an actual element of an array based
> on that pointer, or that the address be "one past the end" of
> such an array; otherwise the behavior is undefined.
>

Yes. 6.5.6 (Additive operators), paragraph 8. Says (besides other
things) "If both the pointer operand and the result point to elements of
the same array object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the behavior is
undefined."

This sounds like what I've been looking for. If I'm reading it
correctly, a C99 compiler might conceivably convert all constant address
expressions to a _Bool value of 1, because either that will be the
correct result or the result will be undefined, so it won't be wrong.


Michal

Hans-Bernhard Broeker

unread,
May 11, 2006, 9:56:36 PM5/11/06
to
Michal Necasek <mic...@scitechsoft.com> wrote:

> This sounds like what I've been looking for. If I'm reading it
> correctly, a C99 compiler might conceivably convert all constant address
> expressions to a _Bool value of 1, because either that will be the
> correct result or the result will be undefined, so it won't be wrong.

Except of course those constant address expressions that *are*,
clearly, NULL.

_Bool f = (void*)0;

is still zero.

--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Keith Thompson

unread,
May 11, 2006, 10:01:35 PM5/11/06
to
Hans-Bernhard Broeker <bro...@physik.rwth-aachen.de> writes:
> Michal Necasek <mic...@scitechsoft.com> wrote:
>> This sounds like what I've been looking for. If I'm reading it
>> correctly, a C99 compiler might conceivably convert all constant address
>> expressions to a _Bool value of 1, because either that will be the
>> correct result or the result will be undefined, so it won't be wrong.
>
> Except of course those constant address expressions that *are*,
> clearly, NULL.
>
> _Bool f = (void*)0;
>
> is still zero.

And *possibly* in cases where a pointer just past the end of an array
happens to compare equal to NULL (a case that the standard doesn't
seem to explicitly rule out).

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

jacob navia

unread,
May 12, 2006, 1:33:36 AM5/12/06
to
Michal Necasek a écrit :

6.3.1.2 speaks about values, not pointers. I fail to see that pointers
can be converted (without a cast) into values like that.
In 6.3.2.3 When the standard speaks about pointers it enumerates what
cobversions are allowed. I fail to see any conversion of pointers to
_Bool, long double or whatever.

float bar = &baz[12]; // This should pass now ?

jacob navia

unread,
May 12, 2006, 1:36:09 AM5/12/06
to
Michal Necasek a écrit :

lcc-win32 rejects it too.

Keith Thompson

unread,
May 12, 2006, 2:27:23 AM5/12/06
to
jacob navia <ja...@jacob.remcomp.fr> writes:
> Michal Necasek a écrit :
>> jacob navia wrote:
>>
>>> I think that
>>>
>>> _Bool bar &foo[42];

That should be

_Bool bar = &foo[42];

>>> is invalid. Can't assign a pointer to a _Bool, types are incompatible.
>>>
>>>
>> See 6.3.1.2 of C99.
>

> 6.3.1.2 speaks about values, not pointers. I fail to see that pointers
> can be converted (without a cast) into values like that.
> In 6.3.2.3 When the standard speaks about pointers it enumerates what
> cobversions are allowed. I fail to see any conversion of pointers to
> _Bool, long double or whatever.

C99 6.3.1.2 says:

When any scalar value is converted to _Bool, the result is 0 if
the value compares equal to 0; otherwise, the result is 1.

A pointer value is a scalar value.

C99 6.5.16.1, Simple assignment, says:

Constraints

One of the following shall hold:
[...]
-- the left operand has type _Bool and the right is a pointer.

The same constraints apply to initialization.

> float bar = &baz[12]; // This should pass now ?

No, 6.5.16.1 doesn't allow assigning a pointer to a float, and 6.3.2.3
doesn't define the semantics of converting a pointer to a float.

kuy...@wizard.net

unread,
May 12, 2006, 7:13:02 AM5/12/06
to
Michal Necasek wrote:
> int foo[15];
> _Bool bar = &foo[10];
>
> GCC 3.3.2 rejects this with "initializer element is not constant". GCC
> 4.0.0 rejects it with "initializer element is not computable at load time".
>
> Is that allowable behaviour of a C99 compiler or not? Or must a C99
> compiler accept the code (and initialize 'bar' to 1)?

Keith has cited the text that specifies that a _Bool can be initialized
with a pointer value. The other issue is whether this particular
pointer value meets the requirements for an initializer.

If that code fragment occurs at block scope, then both foo and bar
implicitly have static storage duration. 6.7.8p4 says "All the
expressions in an initializer for an object that has static storage
duration shall be constant expressions or string literals." One of the
possible kinds of constant expressions described in section 6 is an
address constant, described in 6.6p9:

"An address constant is a null pointer, a pointer to an lvalue
designating an object of static storage duration, or a pointer to a
function designator; it shall be created explicitly using the unary &
operator or an integer constant cast to pointer type, or implicitly by
the use of an expression of array or function type. The array-subscript
[] and member-access . and -> operators, the address & and indirection
* unary operators, and pointer casts may be used in the creation of an
address constant, but the value of an object shall not be accessed by
use of these operators."

Your initializer is a pointer to an lvalue designating an object of
static storage duration, created by explicitly using the unary &
operator along with the array-subscript operator, without actually
accessing the value of that object. I don't see any problems with it.

If the code fragment you wrote occurred at block scope, there is no
requirement that the initializer be a constant expression. Instead,
6.7.8p11 applies:

"The initializer for a scalar shall be a single expression, optionally
enclosed in braces. The initial value of the object is that of the
expression (after conversion); the same type constraints and
conversions as for simple assignment apply, taking the type of the
scalar to be the unqualified version of its declared type."

The constraints for simple assignment certainly allow this initializer.

kuy...@wizard.net

unread,
May 12, 2006, 8:32:15 AM5/12/06
to
kuy...@wizard.net wrote:
...

> If that code fragment occurs at block scope, then both foo and bar
> implicitly have static storage duration. 6.7.8p4 says "All the

What I meant to say was "file scope", not block scope. Sorry for the
confusion.

Michal Necasek

unread,
May 12, 2006, 1:09:54 PM5/12/06
to
kuy...@wizard.net wrote:

> If that code fragment occurs at file scope, then both foo and bar


> implicitly have static storage duration.
>

Yes, I was talking about file scope.

> Your initializer is a pointer to an lvalue designating an object of
> static storage duration, created by explicitly using the unary &
> operator along with the array-subscript operator, without actually
> accessing the value of that object. I don't see any problems with it.
>

Okay, that's what I figured. I do see the language in C99 that allows
this, and nothing that disallows it. Hence my wondering why GCC has
problems with such construct.


Michal

Michal Necasek

unread,
May 12, 2006, 1:12:58 PM5/12/06
to
Keith Thompson wrote:

> And *possibly* in cases where a pointer just past the end of an array
> happens to compare equal to NULL (a case that the standard doesn't
> seem to explicitly rule out).
>

I think it does. Or rather, the standard rules out an overflow on the
address calculation, and I don't see how 'non-null-address + x' could
equal null for any positive x without overflow.


Michal

Wojtek Lerch

unread,
May 12, 2006, 1:22:23 PM5/12/06
to
"Michal Necasek" <mic...@scitechsoft.com> wrote in message
news:um39g.16127$Lm5....@newssvr12.news.prodigy.com...

I think you're confusing null pointers with address zero. An implementation
could use a very large machine address (such as ~(uintptr_t)0, assuming the
most obvious conversion rules) to represent null pointers.


Skarmander

unread,
May 12, 2006, 1:36:38 PM5/12/06
to
Even on implementations where the null pointer *is* address zero, the
provision against overflow doesn't apply, since overflow != reduce-modulo.

Compare 6.2.5 (9): "A computation involving unsigned operands can never
overflow, because a result that cannot be represented by the resulting
unsigned integer type is reduced modulo the number that is one greater than
the largest value that can be represented by the resulting type." This makes
it clear that the standard considers overflow a distinct condition and
silent modulo reduction one way to avoid it. C programmers are used to
conflating these things precisely because of how unsigned arithmetic works,
but they're not the same.

Regardless of whether the standard otherwise rules out representing a
one-past-the-end pointer of a particular array with a pointer that compares
equal to a null pointer (and we've had that discussion a while ago), the
provision against overflow does not forbid such a pointer being the result
of pointer arithmetic.

S.

Michal Necasek

unread,
May 12, 2006, 2:17:28 PM5/12/06
to
Wojtek Lerch wrote:

> I think you're confusing null pointers with address zero.
>

Yes, but no. I have conversions to _Bool in mind, and 6.3.1.2
explicitly talks about 0, not null pointers.

Anyways, I'm confused. 6.3.2.3 says: "An integer constant expression
with the value 0, or such an expression cast to type void *, is called a
null pointer constant." I don't quite see how '~(uintptr_t)0' is an
integer constant expression with the value 0? Integer constant
expression it is, but zero it's not... So how does that work?


Michal

Douglas A. Gwyn

unread,
May 12, 2006, 2:18:38 PM5/12/06
to
Keith Thompson wrote:
> And *possibly* in cases where a pointer just past the end of an array
> happens to compare equal to NULL (a case that the standard doesn't
> seem to explicitly rule out).

It's certainly intended that such a pointer not compare equal
to a null pointer.

Keith Thompson

unread,
May 12, 2006, 3:02:54 PM5/12/06
to

You might want to read section 5 of the comp.lang.c FAQ,
<http://www.c-faq.com/>.

A null pointer constant and a null pointer value are two different
things. The former is a source code construct; the latter is some
particular run-time value of a pointer type.

~(uintptr_t)0 is not a null pointer constant, but it could correspond
to the representation of a null pointer value. For example, if all
null pointers are represented as all-bits-one, then this:
void *ptr = 0;
will set ptr to all-bits-one.

Keith Thompson

unread,
May 12, 2006, 3:07:40 PM5/12/06
to

And it's a pity that the standard doesn't actually express that
intent.

Douglas A. Gwyn

unread,
May 12, 2006, 2:53:54 PM5/12/06
to
Michal Necasek wrote:
> I think it does. Or rather, the standard rules out an overflow on the
> address calculation, and I don't see how 'non-null-address + x' could
> equal null for any positive x without overflow.

The wraparound would be benign, were it not for the
expectation that the relational operators work in a
certain way when comparing two pointers based on the
same object. (i.e. &a[10] should compare greater
than &a[0], provided that both are valid.)

However, besides wrapping to an address of 0, there
is another possibility: the implementation may treat some
non-zero address value or values as null pointer value(s).
(Perhaps it does that because 0 traps when loaded into an
address register on that particular architecture.) The
standard allows this; null pointer values do *not* have
to be represented as all-0 bit patterns. Under this
design, the apparent concern is that objects might be
allocated carelessly by the implementation, such that the
top+1 accidentally matches one of the implementation's
null-pointer values. The simple answer is that such an
implementation needs to be more careful than that.

kuy...@wizard.net

unread,
May 12, 2006, 3:25:00 PM5/12/06
to
Michal Necasek wrote:
> Wojtek Lerch wrote:
>
> > I think you're confusing null pointers with address zero.
> >
> Yes, but no. I have conversions to _Bool in mind, and 6.3.1.2
> explicitly talks about 0, not null pointers.

When a pointer is compared with 0, that 0 is treated as a null pointer,
and that comparison comes out true only if what it is compared with is
a null pointer. Therefore, a pointer value coverts to a _Bool value of
0 only if it is a null pointer.

> Anyways, I'm confused. 6.3.2.3 says: "An integer constant expression
> with the value 0, or such an expression cast to type void *, is called a
> null pointer constant." I don't quite see how '~(uintptr_t)0' is an
> integer constant expression with the value 0? Integer constant
> expression it is, but zero it's not... So how does that work?

It isn't. He's not claiming that ~(uintptr_t)0 is a null pointer
constant. He's claiming that a null pointer might be represented, for a
particular implementation, by a pointer that refers to a memory address
of ~(uintptr_t)0. The conversion of pointers to uintptr_t is completely
implementation defined, except for the requirement that
(T*)(uintptr_t)p == p, where T is the type that p points at. However,
on many platforms the result of that conversion is the corresponding
memory address. For an implementation where both of those things are
true,

(uintptr_t)(void*)0 == ~(uintptr_t)0

and

(void*)(uintptr_t)0 == (void*)(~(uintptr_t)0)

Michal Necasek

unread,
May 12, 2006, 3:33:28 PM5/12/06
to
Keith Thompson wrote:

> You might want to read section 5 of the comp.lang.c FAQ,
> <http://www.c-faq.com/>.
>

If the C-FAQ is correct when it says (in 5.1) that "The address-of
operator & will never yield a null pointer" then all this is irrelevant,
because it'd mean that '(_Bool)&<foo>' has to be 1 for any expression
'foo' allowable in that context.

Or is the C-FAQ misleading on this point?


Michal

Skarmander

unread,
May 12, 2006, 3:47:38 PM5/12/06
to
In that sense that it's not the standard, yes.

For example, the standard guarantees that &*E is equivalent to E even if E
is a null pointer (and even though *E would yield undefined behavior in this
case), so &<foo> most definitely can yield a null pointer if expressions are
allowed for <foo>. The FAQ is therefore "wrong", but the FAQ isn't trying to
be formally precise and shouldn't be taken literally. It's just trying to
convey the point that no object can have an address that compares equal to
the null pointer.

The current discussion is on pointers that point to one-past-the-end of an
array. Such pointers can be constructed with & and [], and there is a
(fairly esoteric) argument about whether the standard forbids such pointers
to compare equal to the null pointer or not. Consensus is that the standard
*intends* this (since it's most inconvenient if this is possible), but there
doesn't seem to be a specific chain of reasoning that makes them illegal.

S.

Wojtek Lerch

unread,
May 12, 2006, 4:12:54 PM5/12/06
to
"Skarmander" <inv...@dontmailme.com> wrote in message
news:4464c7a6$0$31653$e4fe...@news.xs4all.nl...

> Wojtek Lerch wrote:
>> "Michal Necasek" <mic...@scitechsoft.com> wrote in message
>> news:um39g.16127$Lm5....@newssvr12.news.prodigy.com...
>>> Keith Thompson wrote:
>>>
>>>> And *possibly* in cases where a pointer just past the end of an array
>>>> happens to compare equal to NULL (a case that the standard doesn't
>>>> seem to explicitly rule out).
>>>>
>>> I think it does. Or rather, the standard rules out an overflow on the
>>> address calculation, and I don't see how 'non-null-address + x' could
>>> equal null for any positive x without overflow.
>>
>> I think you're confusing null pointers with address zero. An
>> implementation could use a very large machine address (such as
>> ~(uintptr_t)0, assuming the most obvious conversion rules) to represent
>> null pointers.
>>
> Even on implementations where the null pointer *is* address zero, the
> provision against overflow doesn't apply, since overflow != reduce-modulo.

The way I see it, the provision against overflow is redundant. It just
confirms, somewhat clumsily, that as long as you stay within the range
described earlier, the addition reliably produces the result described
earlier. It doesn't explain what exactly it refers to as "overflow", but
since it's something that's guaranteed not to happen anyway, I don't think
we have to worry about it.

In general, "overflow" is when the arguments of an operation are such that
it can't produce a result in the normal range; in most such cases, C
explicitly declares undefined behaviour. Many compilers generate opcodes
that produce a result reduced modulo some value in such situations, but that
doesn't mean that those cases don't qualify as "overflow". In particular,
when adding a positive integer to a pointer "wraps" and produces the address
zero, I think it's legitimate to call it overflow.

> Compare 6.2.5 (9): "A computation involving unsigned operands can never
> overflow, because a result that cannot be represented by the resulting
> unsigned integer type is reduced modulo the number that is one greater
> than the largest value that can be represented by the resulting type."
> This makes it clear that the standard considers overflow a distinct
> condition and silent modulo reduction one way to avoid it. C programmers
> are used to conflating these things precisely because of how unsigned
> arithmetic works, but they're not the same.

Agreed, except I think the reason they're used to conflating these things is
because of how *signed* overflow works on their machines. (But I imagine
that the ones who work more with floating point math might be more careful
about the distinction.)

Keith Thompson

unread,
May 12, 2006, 4:36:09 PM5/12/06
to

The "&" operator will never yield a null pointer if its operand is
valid.

There's a loophole in the wording of the standard; it doesn't
explicitly exclude the possibility that an address just past the end
of an array could compare equal to a null pointer. There are
realistic scenarios in which this could happen. Doug Gwyn has said
here that it was not intended to allow this.

The code in the article that started this thread was:

extern int foo[];
_Bool bar = &foo[42];

If foo is an array of more than 42 elements, there's no problem;
&foo[42] is the address of an object, and it cannot be a null pointer.

If foo is an array of fewer than 42 elements, the expression invokes
undefined behavior, and the compiler can do whatever it likes,
including assigning the value 1 to bar.

If foo is an array of *exactly* 42 elements, the expression is an
address just past the end of the array. If the compiler (which can
make use of whatever system-specific knowledge it likes) happens to
know that this address cannot be null, there's no program; it can set
bar to 1. But if the compiler can't prove that (&foo[42] != NULL),
then it doesn't know whether bar should be set to 0 or 1.

This situation supports Doug Gwyn's point; allowing an address just
past the end of an array causes serious problems.

It's probably reasonable for any compiler to assume that sucn an
address cannot be null, rather than stubbornly following the literal
wording of the standard and ignoring the intent. This is a
sufficiently obscure issue that it's probably not worth mentioning in
the FAQ.

I do think that it's worth a DR and an official ruling from the
committee, with a mention in a future TR and in the next C standard,
but that needn't necessarily affect the behavior of any compiler. And
any compiler is free to guarantee that such addresses cannot be null,
even if the standard doesn't explicitly require it to do so.

Skarmander

unread,
May 12, 2006, 4:39:52 PM5/12/06
to
Yes, I was a bit surprised to find that the standard in fact has no
definition for overflow, especially since it's not at all clear what
overflow is supposed to mean for pointer types.

If we generalize slightly and take overflow as "a valid computation
producing a result that is not representable as a value of the target type",
then what the standard says is nothing more or less than that pointer
arithmetic on valid operands always yields representable results (i.e. no
trap representations or hardware signals).

It seems more of a reminder to implementers to make sure this cannot happen
than as a true addition to the semantics.

> In general, "overflow" is when the arguments of an operation are such that
> it can't produce a result in the normal range;

I'd argue that's slightly imprecise, even if we probably have the same idea.

> in most such cases, C explicitly declares undefined behaviour. Many
> compilers generate opcodes that produce a result reduced modulo some
> value in such situations, but that doesn't mean that those cases don't
> qualify as "overflow". In particular, when adding a positive integer to a
> pointer "wraps" and produces the address zero, I think it's legitimate to
> call it overflow.
>

You can call it that, as long as we're talking about the cases that yield
undefined behavior anyway.

If we're talking about cases where the operands are valid, we get back to
the original discussion about whether one-past-the-end pointers (which are
possible results of pointer arithmetic) can legally compare to a null pointer.

If you consider a wrap in this case to be an overflow, then the standard
forbids such a wrap, since overflow is not allowed. If you don't consider it
an overflow (the result is representable, after all, and you cannot really
claim to know how it *ought* to be represented if it has the properties it
should have), then the standard doesn't forbid it. Since the standard
doesn't go to much trouble to define overflow, it's not really a good
argument either way.

>> Compare 6.2.5 (9): "A computation involving unsigned operands can never
>> overflow, because a result that cannot be represented by the resulting
>> unsigned integer type is reduced modulo the number that is one greater
>> than the largest value that can be represented by the resulting type."
>> This makes it clear that the standard considers overflow a distinct
>> condition and silent modulo reduction one way to avoid it. C programmers
>> are used to conflating these things precisely because of how unsigned
>> arithmetic works, but they're not the same.
>
> Agreed, except I think the reason they're used to conflating these things is
> because of how *signed* overflow works on their machines.

I think we're both right, actually. :-)

S.

Wojtek Lerch

unread,
May 12, 2006, 4:45:37 PM5/12/06
to
"Keith Thompson" <ks...@mib.org> wrote in message
news:lnlkt7f...@nuthaus.mib.org...

> The "&" operator will never yield a null pointer if its operand is
> valid.

The "&" operator will never yield a null pointer if its operand is a valid
lvalue; but 6.5.3.2#3 specifically allows applying the "&" operator to an
lvalue that would be invalid in most other contexts:

int *p = &*(int*)0;

Wojtek Lerch

unread,
May 12, 2006, 5:57:47 PM5/12/06
to
"Skarmander" <inv...@dontmailme.com> wrote in message
news:4464f298$0$31655$e4fe...@news.xs4all.nl...
> Wojtek Lerch wrote:
...

>>>> "Michal Necasek" <mic...@scitechsoft.com> wrote in message
>>>> news:um39g.16127$Lm5....@newssvr12.news.prodigy.com...
>>>>> Keith Thompson wrote:
>>>>>> And *possibly* in cases where a pointer just past the end of an array
>>>>>> happens to compare equal to NULL (a case that the standard doesn't
>>>>>> seem to explicitly rule out).
>>>>> I think it does. Or rather, the standard rules out an overflow on the
>>>>> address calculation, and I don't see how 'non-null-address + x' could
>>>>> equal null for any positive x without overflow.
...

>> The way I see it, the provision against overflow is redundant. It just
>> confirms, somewhat clumsily, that as long as you stay within the range
>> described earlier, the addition reliably produces the result described
>> earlier. It doesn't explain what exactly it refers to as "overflow", but
>> since it's something that's guaranteed not to happen anyway, I don't
>> think
>> we have to worry about it.
>>
> Yes, I was a bit surprised to find that the standard in fact has no
> definition for overflow, especially since it's not at all clear what
> overflow is supposed to mean for pointer types.
>
> If we generalize slightly and take overflow as "a valid computation
> producing a result that is not representable as a value of the target
> type",

I must be misunderstanding something. The result of any valid expression in
C is a value that has the type of the expression. How could that value not
be representable in its type?

> then what the standard says is nothing more or less than that pointer
> arithmetic on valid operands always yields representable results (i.e. no
> trap representations or hardware signals).

But there's no need to say it. Any expression that doesn't have undefined
behaviour must either yield a value of its type or raise a signal and not
yield a value at all. Since the standard doesn't say that pointer addition
may raise a signal, I don't think it's necessary to add that it does not
raise a signal, and since it has already specified what value it yields when
the operands are valid, I don't think it's meaningful to additionally
promise that there's no overflow when it yields that value.

And it's even less meaningful to add that when the *result* doesn't point to
an element of the same array as the operand, then the behaviour is
undefined. When the behaviour is undefined, the result, if any, is free to
point to anything, including an element of the same array.

> It seems more of a reminder to implementers to make sure this cannot
> happen than as a true addition to the semantics.

Right; but I'd say that this kind of a reminder belongs in a Compiler
Writing for Dummies book rather than in the language standard.

...


> If we're talking about cases where the operands are valid, we get back to
> the original discussion about whether one-past-the-end pointers (which are
> possible results of pointer arithmetic) can legally compare to a null
> pointer.
>
> If you consider a wrap in this case to be an overflow, then the standard
> forbids such a wrap, since overflow is not allowed. If you don't consider
> it an overflow (the result is representable, after all, and you cannot
> really claim to know how it *ought* to be represented if it has the
> properties it should have), then the standard doesn't forbid it. Since the
> standard doesn't go to much trouble to define overflow, it's not really a
> good argument either way.

It's not overflow in terms of C pointer arithmetic, because C pointers
aren't numbers and address zero has no special meaning in C, and a null
pointer is neither "less" nor "greater" than other pointers (except when it
points to one past an array). But it is overflow in terms of hardware
address computation. I'm sure you'll agree that that's an important
distinction, but I'm under the impression that the distinction wasn't that
clear to Michal Necasek when he wrote the article I originally responded to.

...


> I think we're both right, actually. :-)

Of course. :-)


Keith Thompson

unread,
May 12, 2006, 6:07:33 PM5/12/06
to

You're right, I didn't think of that case.

I've just sent an e-mail message to Steve Summit, the maintainer of
the comp.lang.c FAQ.

Michal Necasek

unread,
May 12, 2006, 6:15:26 PM5/12/06
to
Keith Thompson wrote:

> If foo is an array of *exactly* 42 elements, the expression is an
> address just past the end of the array. If the compiler (which can
> make use of whatever system-specific knowledge it likes) happens to
> know that this address cannot be null, there's no program; it can set
> bar to 1. But if the compiler can't prove that (&foo[42] != NULL),
> then it doesn't know whether bar should be set to 0 or 1.
>

In which case, the compiler is in serious trouble.

Anyway, thanks for your analysis. It makes good sense.


Michal

Hans-Bernhard Broeker

unread,
May 12, 2006, 7:14:58 PM5/12/06
to
Skarmander <inv...@dontmailme.com> wrote:

> Yes, I was a bit surprised to find that the standard in fact has no
> definition for overflow, especially since it's not at all clear what
> overflow is supposed to mean for pointer types.

The standard doesn't have to define overflow because it ruled that you
can't ever have overflow happen in any expression on pointer values
--- they basically put a "here be undefined behaviour" sign on every
conceivable path that would lead from a valid point to overflow.

In other words, "overflow" is just a non-standardese term describing
some of the types of undefined behaviour. The standard doesn't make
distinctions between such sub-categories of UB, so it doesn't have to
define them.

> If we generalize slightly and take overflow as "a valid computation
> producing a result that is not representable as a value of the
> target type", then what the standard says is nothing more or less
> than that pointer arithmetic on valid operands always yields
> representable results (i.e. no trap representations or hardware
> signals).

Not quite. It says that _valid_ pointer arithmetics on valid operands
always yields representable results. There are quite a number of
ways you can perform invalid pointer arithmetics on valid operands, e.g.

int foo[10];
int *bar = foo - 10;

> If we're talking about cases where the operands are valid, we get
> back to the original discussion about whether one-past-the-end
> pointers (which are possible results of pointer arithmetic) can
> legally compare to a null pointer.

A reminder: that's not exactly the "original discussion" of this
thread. It's the subject of an another thread that turned out to be
relevant as an aspect of this one, because it's the one special case where

extern int foo[];
_Bool baz = foo + 13;

might actually have to initialize baz to zero. But since Michal (the
OP) asked about this for the context of a C implementation we have
quite complete control over, we can just decide not to go there.

Come to think of it, the above could even constitute a patch for the
above-mentioned loophole in C99's wording. Here's how: The above
definition of baz has, as far as my reading of 6.3.1.2 and 6.6#7 goes,
a valid constant initializer: foo + 13 is an address constant plus an
integer constant expression, and a _Bool can be initialized from a
pointer. Now, if this initial value of "baz" were to depend on
whether foo[] has 12, 13 or 14 elements, this would mean that this
constant initializer isn't actually a compile-time constant. IANALL,
but that just might be usable to make the case that one-past-the-end
pointers can *not* be NULL.

--
Hans-Bernhard Broeker (bro...@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

kuy...@wizard.net

unread,
May 12, 2006, 7:38:34 PM5/12/06
to
Wojtek Lerch wrote:
> "Skarmander" <inv...@dontmailme.com> wrote in message
> news:4464f298$0$31655$e4fe...@news.xs4all.nl...
...

> > If we generalize slightly and take overflow as "a valid computation
> > producing a result that is not representable as a value of the target
> > type",
>
> I must be misunderstanding something. ...

I 'm certain that you are, but I'm having trouble imagining what it is.

> ... The result of any valid expression in


> C is a value that has the type of the expression. How could that value not
> be representable in its type?

The expression INT_MAX*INT_MAX is an expression of type 'int' that has
a mathematical value that is much larger than INT_MAX itself, and is
therefore not representable in type 'int'.

Skarmander

unread,
May 12, 2006, 8:10:17 PM5/12/06
to
It's legal for an implementation to have an int type that cannot represent
the value 32768. The type of the expression 32767 + 1 is int, and the result
is 32768. That result is not in the range of representable values for int,
however.

"Wait a minute," I hear you say. "That's actually undefined behavior, so
it's not a 'valid expression'." True. I cannot define overflow in terms of
computations-that-would-be-valid-if-only-they-did-not-overflow, however,
since that's circular.

So how about "an expression that does not violate constraints" instead of
"valid computation"?

>> It seems more of a reminder to implementers to make sure this cannot
>> happen than as a true addition to the semantics.
>
> Right; but I'd say that this kind of a reminder belongs in a Compiler
> Writing for Dummies book rather than in the language standard.
>

Nobody ever said writing a standard is easy. Maybe we should rather have a
Standard Writing for Dummies book. I fear for the end results, though.

S.

Douglas A. Gwyn

unread,
May 12, 2006, 8:02:41 PM5/12/06
to
Keith Thompson wrote:
> "Douglas A. Gwyn" <DAG...@null.net> writes:
> > Keith Thompson wrote:
> >> And *possibly* in cases where a pointer just past the end of an array
> >> happens to compare equal to NULL (a case that the standard doesn't
> >> seem to explicitly rule out).
> > It's certainly intended that such a pointer not compare equal
> > to a null pointer.
> And it's a pity that the standard doesn't actually express that
> intent.

Frankly I think we never thought about that, because there is
no wording in the C standard to lead one to think that there
is any "accidental" s.c. way to produce a null pointer value.
Null pointer values (as you know) are just a special convention
for a supported "in-band" way to signal nowhereness in pointers;
that implies logically that no validly formed object pointer
or function pointer should act as a null pointer.

As a general observation, such in-band encoding of exceptional
conditions is generally not good design; one example in Standard
C is EOF as a getc return value (on a platform where chars and
ints have the same width).

Skarmander

unread,
May 12, 2006, 8:41:36 PM5/12/06
to
Hans-Bernhard Broeker wrote:
> Skarmander <inv...@dontmailme.com> wrote:
>
>> Yes, I was a bit surprised to find that the standard in fact has no
>> definition for overflow, especially since it's not at all clear what
>> overflow is supposed to mean for pointer types.
>
> The standard doesn't have to define overflow because it ruled that you
> can't ever have overflow happen in any expression on pointer values
> --- they basically put a "here be undefined behaviour" sign on every
> conceivable path that would lead from a valid point to overflow.
>
> In other words, "overflow" is just a non-standardese term describing
> some of the types of undefined behaviour. The standard doesn't make
> distinctions between such sub-categories of UB, so it doesn't have to
> define them.
>
Then Wojtek Lerch's point seems apt: the standard shouldn't have mentioned
overflow, or indeed anything about the validity of pointer arithmetic
results, at all. Describing what the results should look like is what is
defined, and is sufficient.

>> If we generalize slightly and take overflow as "a valid computation
>> producing a result that is not representable as a value of the
>> target type", then what the standard says is nothing more or less
>> than that pointer arithmetic on valid operands always yields
>> representable results (i.e. no trap representations or hardware
>> signals).
>
> Not quite. It says that _valid_ pointer arithmetics on valid operands
> always yields representable results. There are quite a number of
> ways you can perform invalid pointer arithmetics on valid operands, e.g.
>
> int foo[10];
> int *bar = foo - 10;
>

Yes, an important distinction.

>> If we're talking about cases where the operands are valid, we get
>> back to the original discussion about whether one-past-the-end
>> pointers (which are possible results of pointer arithmetic) can
>> legally compare to a null pointer.
>
> A reminder: that's not exactly the "original discussion" of this
> thread. It's the subject of an another thread that turned out to be
> relevant as an aspect of this one, because it's the one special case where
>
> extern int foo[];
> _Bool baz = foo + 13;
>
> might actually have to initialize baz to zero. But since Michal (the
> OP) asked about this for the context of a C implementation we have
> quite complete control over, we can just decide not to go there.
>

Yes, well, who cares about the stack depth when you're having fun?

> Come to think of it, the above could even constitute a patch for the
> above-mentioned loophole in C99's wording. Here's how: The above
> definition of baz has, as far as my reading of 6.3.1.2 and 6.6#7 goes,
> a valid constant initializer: foo + 13 is an address constant plus an
> integer constant expression, and a _Bool can be initialized from a
> pointer. Now, if this initial value of "baz" were to depend on
> whether foo[] has 12, 13 or 14 elements, this would mean that this
> constant initializer isn't actually a compile-time constant. IANALL,
> but that just might be usable to make the case that one-past-the-end
> pointers can *not* be NULL.
>

I don't think so. The standard describes constant expressions with "a
constant expression can be evaluated during translation rather than runtime,
and accordingly may be used in any place that a constant may be". This
expresses intent and goes a long way to making your case, but as far as
semantics go it doesn't rule out that the value of 'baz' can only be
determined when the program is linked. This is still "during translation".
(In fact, it's practically a given that the value of 'foo + 13' can only be
definitively determined when linking, since coordination between translation
units is required; whether 'foo + 13' is a one-past-the-end pointer is
likewise known then.)

All this only adds more circumstantial evidence for the extreme
undesirability of one-past-the-end pointers that evaluate to null pointers
(and the unlikelihood of an implementation using them), but again they are
not ruled out. Nobody *wants* them to be possible, almost certainly nobody
*makes* them possible, the standard makes more sense if we *assume* they are
not possible... but the hole's still there.

S.

Wojtek Lerch

unread,
May 12, 2006, 9:30:24 PM5/12/06
to
<kuy...@wizard.net> wrote in message
news:1147477114....@i39g2000cwa.googlegroups.com...

> Wojtek Lerch wrote:
>> "Skarmander" <inv...@dontmailme.com> wrote in message
>> news:4464f298$0$31655$e4fe...@news.xs4all.nl...
> ...
>> > If we generalize slightly and take overflow as "a valid computation
>> > producing a result that is not representable as a value of the target
>> > type",
>>
>> I must be misunderstanding something. ...
>
> I 'm certain that you are, but I'm having trouble imagining what it is.

Perhaps I mistook "computation" as referring to a C expression. I can
accept the above, if by "computation" and its "result" it's referring to the
mathematical formula that the C expression represents and its mathematical
value.

Actually, no. The mathematical result of the expression "1.0 / 10" is not
representable as a value if its type on most implementations, but it would
be wrong to call that inacuracy an overflow. Maybe we should say "outside
of the range of the type" instead of "not representable"? But then how does
it apply to pointers -- a pointer type does not represent a "range"...

Wojtek Lerch

unread,
May 12, 2006, 9:50:54 PM5/12/06
to
"Michal Necasek" <mic...@scitechsoft.com> wrote in message
news:2O79g.70176$_S7....@newssvr14.news.prodigy.com...
> Keith Thompson wrote:
>> ... But if the compiler can't prove that (&foo[42] != NULL),

>> then it doesn't know whether bar should be set to 0 or 1.
>>
> In which case, the compiler is in serious trouble.

Not really. Since a null pointer is guaranteed to compare unequal to a
pointer to any object, the compiler can ensure that &foo[42] is not a null
pointer by placing another static object in the space just above foo (I'm
assuming the simple model where the compiler concatenates all static objects
from a translation unit go into one (or maybe two) contiguous logical
"segment", and the linker then concatenates the segments from different
translation units). This way, the only spot within the translation unit's
data segment that the linker could possibly place just below where null
points to is the very end of the data segment; if none of the static objects
can be safely at the end of the data segment, the compiler still can pretend
that the last object in the segment is a little bigger than it really is.

Of course, a compiler like that could also have an explicit flag to tell the
linker not to locate the translation unit's data segment just below null.


Keith Thompson

unread,
May 12, 2006, 10:04:58 PM5/12/06
to
"Douglas A. Gwyn" <DAG...@null.net> writes:
> Keith Thompson wrote:
>> "Douglas A. Gwyn" <DAG...@null.net> writes:
>> > Keith Thompson wrote:
>> >> And *possibly* in cases where a pointer just past the end of an array
>> >> happens to compare equal to NULL (a case that the standard doesn't
>> >> seem to explicitly rule out).
>> > It's certainly intended that such a pointer not compare equal
>> > to a null pointer.
>> And it's a pity that the standard doesn't actually express that
>> intent.
>
> Frankly I think we never thought about that, because there is
> no wording in the C standard to lead one to think that there
> is any "accidental" s.c. way to produce a null pointer value.

It's a perfectly understandable oversight (though I question the
emphasis on strictly conforming programs).

I think this is comparable to the special case of pointer equality
that allows a pointer just past the end of one object to compare equal
to a pointer to another object if they happen to be adjacent to each
other. That case wasn't mentioned in the C90 standard; it was added
in C99.

Would you agree that this is a similar case, and that it's worth an
explicit statement?

Wojtek Lerch

unread,
May 12, 2006, 10:46:51 PM5/12/06
to
"Skarmander" <inv...@dontmailme.com> wrote in message
news:446523e9$0$31641$e4fe...@news.xs4all.nl...

> It's legal for an implementation to have an int type that cannot represent
> the value 32768. The type of the expression 32767 + 1 is int, and the
> result is 32768. That result is not in the range of representable values
> for int, however.

The result of the mathematical operation 32767+1 is 32768. The result of
the C expression 32767+1 is undefined.

> "Wait a minute," I hear you say. "That's actually undefined behavior, so
> it's not a 'valid expression'." True. I cannot define overflow in terms of
> computations-that-would-be-valid-if-only-they-did-not-overflow, however,
> since that's circular.
>
> So how about "an expression that does not violate constraints" instead of
> "valid computation"?

Are you trying to overload the term "constraints"? :)

I think you need to distinguish between the result of the expression (which
is undefined when there's overflow) and the mathematical result of the
computation that the expression represents (which is defined but out of
range). Also, you need to remember about the distinction between overflow,
underflow, and rounding errors.

One problem remains: pointer arithmetic is defined entirely in the standard,
without relying on any mathematical concept resembling pointer-to-integer
addition.


kuy...@wizard.net

unread,
May 13, 2006, 12:05:15 AM5/13/06
to

Wojtek Lerch wrote:
> <kuy...@wizard.net> wrote in message
> news:1147477114....@i39g2000cwa.googlegroups.com...
> > Wojtek Lerch wrote:
> >> "Skarmander" <inv...@dontmailme.com> wrote in message
> >> news:4464f298$0$31655$e4fe...@news.xs4all.nl...
> > ...
> >> > If we generalize slightly and take overflow as "a valid computation
> >> > producing a result that is not representable as a value of the target
> >> > type",
> >>
> >> I must be misunderstanding something. ...
> >
> > I 'm certain that you are, but I'm having trouble imagining what it is.
>
> Perhaps I mistook "computation" as referring to a C expression.

Yes, I think that was your mistake. This definition describes part of
the process for getting to the result returned by the C expression.
It's not referring to the final result, but what the final result
should be if there were no limitations on what values could be
represented.

> ... I can


> accept the above, if by "computation" and its "result" it's referring to the
> mathematical formula that the C expression represents and its mathematical
> value.

You've got it!

> Actually, no. The mathematical result of the expression "1.0 / 10" is not
> representable as a value if its type on most implementations, but it would
> be wrong to call that inacuracy an overflow.

There's a distiction that must be made for floating point values,
between a representable value, and an exactly representable value. The
exact mathematical result of any calculation is usually bracked by two
exactly representable values; I would say that it's reasonable to call
either of those bracketing representations an approximate
representation of the exact result, so that result can be represented,
even if it is with less than perfect precision. Overflow is something
that strictly makes sense only for integer types, and for floating
point types which have no representation for infinity, and should not
be defined in a way that can be interpreted as covering 1.0/10.

> ... Maybe we should say "outside


> of the range of the type" instead of "not representable"? But then how does
> it apply to pointers -- a pointer type does not represent a "range"...

The use of "overflow" for pointers only makes sense in a context where
they can be ordered from smallest to largest. The standard doesn't
require such ordering for pointers that don't point to parts of the
same object, so the concept is not portably meaningful. However, it's
not unusually for an implementation to provide a simple linear address
space, and in the context of such an implementation, the concept of
pointer arithmetic overflowing has a very clear, if non-portable,
meaning.

Skarmander

unread,
May 13, 2006, 5:33:00 AM5/13/06
to
Wojtek Lerch wrote:
> "Skarmander" <inv...@dontmailme.com> wrote in message
> news:446523e9$0$31641$e4fe...@news.xs4all.nl...
>> It's legal for an implementation to have an int type that cannot represent
>> the value 32768. The type of the expression 32767 + 1 is int, and the
>> result is 32768. That result is not in the range of representable values
>> for int, however.
>
> The result of the mathematical operation 32767+1 is 32768. The result of
> the C expression 32767+1 is undefined.
>
A true but misleading statement, and in no way a contradiction of what I wrote.

The results of C additions are defined using the mathematical concept of
addition ("sum" to be precise). There is no way to meaningfully distinguish
between 32766 + 1 and 32767 + 1 and declare the latter undefined (by an
"exceptional condition" occurring because the "result" is "not in the range
of representable values for its type") if we do not consider the operation
as a mapping from the abstract result of the operation to a concrete
representation as a value of a C type.

As far as I can tell, this is indeed how the standard does it. A phrase like
"if the result is [..] not in the range of representable values for its
type" is only meaningful if the "result" is the result of the abstract
operation represented by the expression. Otherwise, the operational
definitions would be circular or the constraints irrelevant.

>> "Wait a minute," I hear you say. "That's actually undefined behavior, so
>> it's not a 'valid expression'." True. I cannot define overflow in terms of
>> computations-that-would-be-valid-if-only-they-did-not-overflow, however,
>> since that's circular.
>>
>> So how about "an expression that does not violate constraints" instead of
>> "valid computation"?
>
> Are you trying to overload the term "constraints"? :)
>

No. I suppose we could go and argue over whether the expression "32767 + 1"
involves a constraint violation, but this would not be very meaningful, as
the end result is the same.

I hereby give up trying to give a definition of overflow in terms of the
standard, and opine that our intuitive notion of the concept suffices.

> I think you need to distinguish between the result of the expression (which
> is undefined when there's overflow) and the mathematical result of the
> computation that the expression represents (which is defined but out of
> range). Also, you need to remember about the distinction between overflow,
> underflow, and rounding errors.
>

Since underflow and rounding errors do not apply to integer arithmetic (and
don't seem to apply to pointer arithmetic either) and none of those
exceptional conditions apply to the mathematical operations, I fail to see why.

> One problem remains: pointer arithmetic is defined entirely in the standard,
> without relying on any mathematical concept resembling pointer-to-integer
> addition.
>

This is only a problem insofar as it renders the concept of "overflow" on
such additions all but meaningless, but we've already agreed that's an issue.

S.

Wojtek Lerch

unread,
May 13, 2006, 1:49:50 PM5/13/06
to
"Skarmander" <inv...@dontmailme.com> wrote in message
news:4465a7cc$0$31655$e4fe...@news.xs4all.nl...

> Wojtek Lerch wrote:
>> "Skarmander" <inv...@dontmailme.com> wrote in message
>> news:446523e9$0$31641$e4fe...@news.xs4all.nl...
>>> It's legal for an implementation to have an int type that cannot
>>> represent the value 32768. The type of the expression 32767 + 1 is int,
>>> and the result is 32768. That result is not in the range of
>>> representable values for int, however.
>>
>> The result of the mathematical operation 32767+1 is 32768. The result of
>> the C expression 32767+1 is undefined.
>>
> A true but misleading statement, and in no way a contradiction of what I
> wrote.

No, but it points out an important distinction. That distiction is what
allows a meaningful non-circular definition of overflow for those C
operators whose semantics are defined in terms of mathematical operations.

> The results of C additions are defined using the mathematical concept of
> addition ("sum" to be precise). There is no way to meaningfully
> distinguish between 32766 + 1 and 32767 + 1 and declare the latter
> undefined (by an "exceptional condition" occurring because the "result" is
> "not in the range of representable values for its type") if we do not
> consider the operation as a mapping from the abstract result of the
> operation to a concrete representation as a value of a C type.

Well, exactly. Signed integer addition is defined in C by a reference to
the mathematical operation of integer addition, and since that mathematical
operation can sometimes produce a mathematical value that doesn't fit into
the range of the appropriate C type, we refer to those cases as overflow.
Unsigned integer addition is defined in C by a reference to a different
mathematical operation, and since that one always produces a value that fits
into the appropriate C type, unsigned addition never overflows. When you're
talking about the "result", it's important to make the distinction between
the result of the mathematical operation that the semantics of the C
operator refers to, and the result of the C operator, if any. Overflow
happens when the two cannot possibly be equal.

Pointer addition is defined directly, by talking about elements of a C
array, rather than as a mapping of a mathematical operation. You can't
meaningfully define overflow as a situation where the mathematical pointer
addition produces a result that can't be represented as a C pointer.
There's no such thing as mathematical pointer addition. If you want a
meaningful definition of overflow for pointer addition, you have to find a
different way of defining it.

> As far as I can tell, this is indeed how the standard does it. A phrase
> like "if the result is [..] not in the range of representable values for
> its type" is only meaningful if the "result" is the result of the abstract
> operation represented by the expression. Otherwise, the operational
> definitions would be circular or the constraints irrelevant.

Yes, but that doesn't work for pointers. There's no abstract operation,
distinct from C pointer addition, represented by the C addition of an
integer to a pointer.

The C standard has the annoying habit of defining the result of an operation
based on what its result is, even in situations when there's no abstract
mathematical operation that the second instance of "result" could possibly
be interpreted as referring to. "A pointer to an object or incomplete type
may be converted to a pointer to a different object or incomplete type. If
the resulting pointer is not correctly aligned for the pointed-to type, the
behavior is undefined." Was that meant to be circular? Or is it referring
to the "mathematical" result of an "abstract" pointer conversion? No, it
simply means to say that the *original* pointer must be correctly aligned
for the result's pointed-to type, or otherwise the result can be anything,
aligned or not, or there might be no result at all.

>>> "Wait a minute," I hear you say. "That's actually undefined behavior, so
>>> it's not a 'valid expression'." True. I cannot define overflow in terms
>>> of computations-that-would-be-valid-if-only-they-did-not-overflow,
>>> however, since that's circular.
>>>
>>> So how about "an expression that does not violate constraints" instead
>>> of "valid computation"?
>>
>> Are you trying to overload the term "constraints"? :)
>>
> No. I suppose we could go and argue over whether the expression "32767 +
> 1" involves a constraint violation, but this would not be very meaningful,
> as the end result is the same.

It's a constraint violation in a constant expression, but that should have
nothing to do with the definition of overflow. The same kind of an overflow
happens in the expression a+b when a is 32767 and b is 1, and that
expression does not violate any constraint of the C standard (provided, of
course, that a and b are declared correcty).

> I hereby give up trying to give a definition of overflow in terms of the
> standard, and opine that our intuitive notion of the concept suffices.

I think it's clear enough what "overflow" refers to in arithmetical
contexts; and since we seem to agree that the sentence that talks about
overflow in a pointer context is pretty much meaningless, I agree that we
don't really need a general definition that covers all contexts where
"overflow" is mentioned in the C standard.

>> I think you need to distinguish between the result of the expression
>> (which is undefined when there's overflow) and the mathematical result of
>> the computation that the expression represents (which is defined but out
>> of range). Also, you need to remember about the distinction between
>> overflow, underflow, and rounding errors.
>>
> Since underflow and rounding errors do not apply to integer arithmetic
> (and don't seem to apply to pointer arithmetic either) and none of those
> exceptional conditions apply to the mathematical operations, I fail to see
> why.

Because I thought your definition was supposed to be general, rather than
applying just to integer types. It sounded as if you were trying to provide
a definition that applied to the use of the term in the paragraph about
pointer arithmetic, and saw no reason to believe that you meant to exclude
any arithmetic types.

Skarmander

unread,
May 13, 2006, 2:19:45 PM5/13/06
to
We are in complete agreement.

> Pointer addition is defined directly, by talking about elements of a C
> array, rather than as a mapping of a mathematical operation. You can't
> meaningfully define overflow as a situation where the mathematical pointer
> addition produces a result that can't be represented as a C pointer.
> There's no such thing as mathematical pointer addition. If you want a
> meaningful definition of overflow for pointer addition, you have to find a
> different way of defining it.
>

Ah, *now* I see what you were getting at... Well, why didn't you say this at
the beginning instead of waffling about C expressions always yielding
representable C values? :-)

>> As far as I can tell, this is indeed how the standard does it. A phrase
>> like "if the result is [..] not in the range of representable values for
>> its type" is only meaningful if the "result" is the result of the abstract
>> operation represented by the expression. Otherwise, the operational
>> definitions would be circular or the constraints irrelevant.
>
> Yes, but that doesn't work for pointers. There's no abstract operation,
> distinct from C pointer addition, represented by the C addition of an
> integer to a pointer.
>

Yes. (Others have pointed out that systems typically implement pointer
addition by something similar or even identical to integer addition, and
this suggests a more explicit version of the operation, but that's not
relevant to this discussion.)

> The C standard has the annoying habit of defining the result of an operation
> based on what its result is, even in situations when there's no abstract
> mathematical operation that the second instance of "result" could possibly
> be interpreted as referring to. "A pointer to an object or incomplete type
> may be converted to a pointer to a different object or incomplete type. If
> the resulting pointer is not correctly aligned for the pointed-to type, the
> behavior is undefined." Was that meant to be circular? Or is it referring
> to the "mathematical" result of an "abstract" pointer conversion? No, it
> simply means to say that the *original* pointer must be correctly aligned
> for the result's pointed-to type, or otherwise the result can be anything,
> aligned or not, or there might be no result at all.
>

Such are the pitfalls of an operational definition, I suppose. Add to that
that's an operational definition in English and the stage is set.

>>> I think you need to distinguish between the result of the expression
>>> (which is undefined when there's overflow) and the mathematical result of
>>> the computation that the expression represents (which is defined but out
>>> of range). Also, you need to remember about the distinction between
>>> overflow, underflow, and rounding errors.
>>>
>> Since underflow and rounding errors do not apply to integer arithmetic
>> (and don't seem to apply to pointer arithmetic either) and none of those
>> exceptional conditions apply to the mathematical operations, I fail to see
>> why.
>
> Because I thought your definition was supposed to be general, rather than
> applying just to integer types. It sounded as if you were trying to provide
> a definition that applied to the use of the term in the paragraph about
> pointer arithmetic, and saw no reason to believe that you meant to exclude
> any arithmetic types.
>

No, it wasn't intended to be general, at least not for the purpose of this
discussion. As a general definition it's no good (and I see how you brought
underflow and rounding errors into it).

S.

Douglas A. Gwyn

unread,
May 15, 2006, 3:08:19 PM5/15/06
to
Wojtek Lerch wrote:
> Not really. Since a null pointer is guaranteed to compare unequal to a
> pointer to any object, the compiler can ensure that &foo[42] is not a null
> pointer by placing another static object in the space just above foo ...

It has to ensure that that is a valid *address* anyway (except in
special cases where the compiler happens to have enough context
to be sure that the last+1 address will never be computed). In
general it would be more economical to allocte an extra byte in
front of whatever address is being used as a null pointer value,
similar to
extern char __null[2];
#define NULL ((void*)&__null[1]) // not really, but
// you get the idea
in which case no matter what the linker did, last+1 addresses
would never compare equal to the null pointer value.

Douglas A. Gwyn

unread,
May 15, 2006, 3:11:28 PM5/15/06
to
Keith Thompson wrote:
> Would you agree that this is a similar case, and that it's worth an
> explicit statement?

It's somewhat similar, and I wouldn't object to similar extra
wording to state the assumption explicitly, although it doesn't
seem very likely to be a problem in practice (unlike with
accidental contiguity, which occurs commonly and thus needs
to be clearly allowed [or clearly prohibited]).

Keith Thompson

unread,
May 15, 2006, 4:26:44 PM5/15/06
to
"Douglas A. Gwyn" <DAG...@null.net> writes:

In one sense, it's not likely to be a problem: an implementation isn't
likely to have a pointer just past the end of an array actually


compare equal to a null pointer.

But in another sense, it's (potentially) a real problem, in that a
programmer can't safely *assume* that such a pointer is non-null.
Also, as discussed in this thread, a compiler implementer might not be
able to determine whether this:

static int array[10];
_Bool b = array+10;

will set b to 1.

Wojtek Lerch

unread,
May 15, 2006, 6:07:48 PM5/15/06
to
"Douglas A. Gwyn" <DAG...@null.net> wrote in message
news:4468D1A3...@null.net...

> Wojtek Lerch wrote:
>> Not really. Since a null pointer is guaranteed to compare unequal to a
>> pointer to any object, the compiler can ensure that &foo[42] is not a
>> null
>> pointer by placing another static object in the space just above foo ...
>
> It has to ensure that that is a valid *address* anyway (except in
> special cases where the compiler happens to have enough context
> to be sure that the last+1 address will never be computed). In

But we're talking about the case where that valid address may be a
representation of a null pointer, and the decision about that is made by the
linker. If the compiler generates code that computes &foo[42] and compares
it to the special null address at runtime, that's fine, even if it turns out
equal (remember that we're assuming an interpretation that allows &foo[42]
to be a null pointer); the problem is when the compiler must pick an initial
value for a static _Bool because it has no way to tell the linker to make
the comparison. The compiler has no choice but to either pick 0 and somehow
convince the linker to make &foo[42] a null pointer, or pick 1 and fool the
linker into putting the null pointer elsewhere.

> general it would be more economical to allocte an extra byte in
> front of whatever address is being used as a null pointer value,
> similar to
> extern char __null[2];
> #define NULL ((void*)&__null[1]) // not really, but
> // you get the idea
> in which case no matter what the linker did, last+1 addresses
> would never compare equal to the null pointer value.

Right; but I think we were talking about the situation where the linker and
the library are not willing to co-operate, for instance in a system where
the header already defines NULL to be &__null[0] and the "native" compiler
only supports C89 and is not concerned about _Bools statically initialized
to a pointer.


Douglas A. Gwyn

unread,
May 15, 2006, 5:33:34 PM5/15/06
to
Keith Thompson wrote:
> But in another sense, it's (potentially) a real problem, in that a
> programmer can't safely *assume* that such a pointer is non-null.
> Also, as discussed in this thread, a compiler implementer might not be
> able to determine whether this:
> static int array[10];
> _Bool b = array+10;
> will set b to 1.

I don't think any implementation or program actually gets this
wrong (by which I mean: doesn't reflect the intent of the
standard); the "problem" seems to me to be a pedantic invention
rather than a practical issue.

Douglas A. Gwyn

unread,
May 15, 2006, 7:49:06 PM5/15/06
to
Wojtek Lerch wrote:
> Right; but I think we were talking about the situation where the linker and
> the library are not willing to co-operate, for instance in a system where
> the header already defines NULL to be &__null[0] and the "native" compiler
> only supports C89 and is not concerned about _Bools statically initialized
> to a pointer.

No matter how inaccessible the linker is, the C implementation is
in charge of the content of the headers and run-time library. It
really ought to choose an appropriate address for the null-pointer
value (when using such a scheme).

Note that there is no issue of the linker determining the proper
initializer for the _Bool object; the compiler can simply convert
&foo[42] to "true" without thinking about it.

SuperKoko

unread,
May 21, 2006, 5:52:23 PM5/21/06
to

kuy...@wizard.net wrote:

> "An address constant is a null pointer, a pointer to an lvalue
> designating an object of static storage duration, or a pointer to a
> function designator; it shall be created explicitly using the unary &
> operator or an integer constant cast to pointer type, or implicitly by
> the use of an expression of array or function type. The array-subscript
> [] and member-access . and -> operators, the address & and indirection
> * unary operators, and pointer casts may be used in the creation of an
> address constant, but the value of an object shall not be accessed by
> use of these operators."
>
> Your initializer is a pointer to an lvalue designating an object of
> static storage duration, created by explicitly using the unary &
> operator along with the array-subscript operator, without actually
> accessing the value of that object. I don't see any problems with it.
>

If I understand correctly : The pointer must be a pointer to an lvalue
designating an object, or a null pointer...
It seems clear that a pointer pointing one past the last element of an
array is not a pointer to an lvalue designating an object.
Thus, except in the special case where it accidentally points to null,
it is invalid:
That is:
extern char c[]; /* assuming that c contains 10 items */
static _Bool b=c+10; /* undefined behavior if c+10 is not null */

Thus, since the compiler is (from a standard point-of-view) free to
choose where objects are placed... The standard does not guarantee that
c+10 be null.
Thus, the compiler may (perhaps with a "lie") deem that c+10 is not
null, and thus, deem that the behavior is undefined, and thus, adopt
the behavior of assigning true to the _Bool.

undefined behavior means that anything can occur... Including that the
cause that created the UB disappears... :p
So, one may write such code:

extern char c[10];
static _Bool b=c+10;

And see (at runtime) that b is set to true... but that c+10 is null...
And if the guy asks (in his mind) to the compiler why this behavior
exists... The "compiler" would answer that c+10 was not null at the
startup of the program, and thus, c+10 was not a pointer to a valid
object nor a null pointer, and thus the behavior became undefined...
And everything went wrong... And, in particular, c+10 "changed" its
value to null.

I'm not sure I expressed me clearly, but the idea is that:

extern char c[10];
static _Bool b=c+10;

Can't appear in a conformant program, because c+10 may be (very
probably) non-null.

Thus, the compiler can safely set the _Bool to true.

kuy...@wizard.net

unread,
May 21, 2006, 8:39:03 PM5/21/06
to
SuperKoko wrote:
...

> Thus, the compiler may (perhaps with a "lie") deem that c+10 is not
> null, and thus, deem that the behavior is undefined, and thus, adopt
> the behavior of assigning true to the _Bool.

If the compiler can't tell for certain whether or not c+10 is a null
pointer, it must generate code that produces the correct result
(b==false) in the case where c+10 turns out to be null. It's not
allowed to resolve the uncertainty prematurely with a "lie". If the
platform has characteristics that make it impossible for the compiler
to generate code that behaves as it's required to behave, then a
conforming implementation of C is impossible on that platform.

> undefined behavior means that anything can occur... Including that the
> cause that created the UB disappears... :p
> So, one may write such code:
>
> extern char c[10];
> static _Bool b=c+10;
>
> And see (at runtime) that b is set to true... but that c+10 is null...

Such behavior would render the implementation that generated it
non-conforming. If c+10 is null, it's a legitimate initializer for b,
and therefore no argument based upon the assumption that the behavior
is undefined applies.

...


> I'm not sure I expressed me clearly, but the idea is that:
>
> extern char c[10];
> static _Bool b=c+10;
>
> Can't appear in a conformant program, because c+10 may be (very
> probably) non-null.

"conformant program" isn't a term defined in the standard. A program
can be "conforming" or "strictly conforming", but not "conformant".

Such code can't occur in a strictly conforming program, for precisely
the reason you give. However, if an implementation chooses to document
circumstances under which c+10 can be null, and if an program is
written in such a fashion as to gurantee that c+10 is null when
translated by that implementation, than that implementation can't use
this as a reason for rejecting the program, and must initialize b with
a value of false. If even one implementation accepts the program, then
it is a confoming program.

Dave Thompson

unread,
May 21, 2006, 8:45:47 PM5/21/06
to
On Fri, 12 May 2006 21:47:38 +0200, Skarmander
<inv...@dontmailme.com> wrote:

> Michal Necasek wrote:

> > If the C-FAQ is correct when it says (in 5.1) that "The address-of
> > operator & will never yield a null pointer" then all this is irrelevant,
> > because it'd mean that '(_Bool)&<foo>' has to be 1 for any expression
> > 'foo' allowable in that context.
> >
> > Or is the C-FAQ misleading on this point?
> >
> In that sense that it's not the standard, yes.
>
> For example, the standard guarantees that &*E is equivalent to E even if E
> is a null pointer (and even though *E would yield undefined behavior in this
> case), so &<foo> most definitely can yield a null pointer if expressions are
> allowed for <foo>. <snip>

Note that this guarantee in 6.5.3.1p3 is new in C99. In C90 in the
abstract semantics of the Standard say you dereference the null
pointer, causing UB, before taking its address. In practice this is
such an obvious and simple optimization all implementations or damn
near did it, and 'just worked'. (If there is any implementor who
didn't, they obviously didn't object effectively to the C99 change.)

- David.Thompson1 at worldnet.att.net

Francis Glassborow

unread,
May 22, 2006, 2:42:56 PM5/22/06