[union] Pointers to inherited structs are valid ?

Maciej Labanowicz

unread,

Jan 1, 2013, 6:45:48 AM1/1/13

to

Hi,

Please analyze following example:

/*--[beg:test.c]-------------------------------------------------*/
01:
02: #include <stdio.h> /* printf */
03: #include <stdlib.h> /* EXIT_SUCCESS */
04:
05: struct a_s { int x; };
06: struct b_s { struct a_s super; int y; };
07: struct c_s { struct b_s super; int z; };
08:
09: union common_u {
10: struct a_s * ptr_a;
11: struct b_s * ptr_b;
12: struct c_s * ptr_c;
13: };
14:
15: int main(void)
16: {
17: struct c_s c;
18: union common_u common;
19:
20: ((struct a_s *)(&c))->x = 5;
21: ((struct b_s *)(&c))->y = 6;
22: c.z = 7;
23:
24: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
25:
26: common.ptr_c = &c;
27: common.ptr_c->z += 10;
28:
29: common.ptr_a->x += 20;
30: common.ptr_b->y += 30;
31:
32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
33:
34: return EXIT_SUCCESS;
35: }
/*--[eof:test.c]-------------------------------------------------*/

/*--[beg:output]-------------------------------------------------*/
01: x=5,y=6,z=7
02: x=25,y=36,z=17
/*--[eof:output]-------------------------------------------------*/

There are structs that implements inheritance of members:

a_s
|
+b_s
|
+c_s

So, casts in lines 20,21 are valid in C.

'union common_u' contains pointers to all of those structs.

Line 26 contains assignment of address of 'c' (leaf in the tree) to
union member: ptr_c.

So 'common.ptr_c' pointer is valid (line 27 is correct).

Question is:
Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
common.ptr_b and common.ptr_a (lines: 29,30)

Best Regards

--
Maciek

Barry Schwarz

unread,

Jan 1, 2013, 8:59:22 PM1/1/13

to

Assuming N1570 is still current in this area, look at footnote 95.

BTW, in the real world this code justifies terminating employment.

--
Remove del for email

Shao Miller

unread,

Jan 1, 2013, 9:06:52 PM1/1/13

to

On 1/1/2013 06:45, Maciej Labanowicz wrote:
>[...]

> 04:
> 05: struct a_s { int x; };
> 06: struct b_s { struct a_s super; int y; };
> 07: struct c_s { struct b_s super; int z; };
> 08:
> 09: union common_u {
> 10: struct a_s * ptr_a;
> 11: struct b_s * ptr_b;
> 12: struct c_s * ptr_c;
> 13: };
> 14:
> 15: int main(void)
> 16: {
> 17: struct c_s c;
> 18: union common_u common;

> [...]

> 26: common.ptr_c = &c;
> 27: common.ptr_c->z += 10;
> 28:
> 29: common.ptr_a->x += 20;
> 30: common.ptr_b->y += 30;
> 31:
> 32: printf("x=%d,y=%d,z=%d\n", c.super.super.x, c.super.y, c.z);
> 33:
> 34: return EXIT_SUCCESS;
> 35: }

> [...]

>
>
> Question is:
> Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
> common.ptr_b and common.ptr_a (lines: 29,30)

You appear to be type-punning the value of the 'ptr_c' member as a
'struct a_s *' on line 29 and as a 'struct b_s *' on line 30.

In C89, we can see this:

"A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type. Similarly. pointers to
qualified or unqualified versions of compatible types shall have the
same representation and alignment requirements. ” Pointers to other
types need not have the same representation or alignment requirements."

so my answer to your question would be "no". However in practice, it's
probably always going to work. In C99, we can see this:

"A pointer to void shall have the same representation and alignment
requirements as a pointer to a character type.39) Similarly, pointers to
qualified or unqualified versions of compatible types shall have the
same representation and alignment requirements. All pointers to
structure types shall have the same representation and alignment
requirements as each other. All pointers to union types shall have the
same representation and alignment requirements as each other. Pointers
to other types need not have the same representation or alignment
requirements.

39) The same representation and alignment requirements are meant to
imply interchangeability as arguments to functions, return values from
functions, and members of unions."

But the implementation's actual pointer representation could be
complicated and so there's still no guarantee. If you can dream up a
pointer representation, then you can dream up a counter-example to your
code's portability.

- Shao Miller

Tim Rentsch

unread,

Jan 2, 2013, 1:18:28 PM1/2/13

to

Barry Schwarz <schw...@dqel.com> writes:

> On Tue, 1 Jan 2013 03:45:48 -0800 (PST), Maciej Labanowicz

> <m.laba...@gmail.com> wrote: [condensed]

>>
>>05: struct a_s { int x; };
>>06: struct b_s { struct a_s super; int y; };
>>07: struct c_s { struct b_s super; int z; };
>>08:
>>09: union common_u {
>>10: struct a_s * ptr_a;
>>11: struct b_s * ptr_b;
>>12: struct c_s * ptr_c;
>>13: };
>>14:

>>18: union common_u common;

>>26: common.ptr_c = &c;
>>27: common.ptr_c->z += 10;
>>28:
>>29: common.ptr_a->x += 20;
>>30: common.ptr_b->y += 30;

>>So 'common.ptr_c' pointer is valid (line 27 is correct).
>>
>>Question is:
>> Is the rest of the union pointers are valid (ANSI-C/ISO/C89) ?
>> common.ptr_b and common.ptr_a (lines: 29,30)
>
> Assuming N1570 is still current in this area, look at footnote 95.

It isn't just that the union member access needs to get the right
bytes -- it is also important that the representations of the
different members agree. That agreement holds under C99 and C11,
but not under C89/C90.

Tim Rentsch

unread,

Jan 2, 2013, 1:34:53 PM1/2/13

to

As a practical matter it should work. Strictly speaking it is
not guaranteed under C89/C90/C95, though it is under C99 and the
current standard, C11.

However, even though you can (most probably) get away with this
approach, code like this should raise a BIG RED FLAG whenever you
see it, especially if you are the one writing it. What you want
to do can easily be done in a way that's completely type safe
(ie, without using either casts or void *), as the printf() call
shows. Why use casting or type punning when not absolutely
necessary? Is there something else about what you're trying to
do that makes a cast-free approach unattractive? If there is,
you probably should ask about that, because it's likely a
different approach would reduce or eliminate that shortcoming,
and give an overall better result.

Tim Rentsch

unread,

Jan 2, 2013, 1:45:20 PM1/2/13

to

> [in C99 and C11, pointers to struct have the same representation,
> but in C89/C90 this guarantee is not present.]

Good to have this pointed out - thank you for tracking it down.

> [under the C99 rules --]

> But the implementation's actual pointer representation could be
> complicated and so there's still no guarantee. If you can dream up a
> pointer representation, then you can dream up a counter-example to
> your code's portability.

The stipulation that all pointers to structs have the same
representation and alignment requirements means that the
type-punning union member access has to work. That's what
having the same represention means -- that the same object
representation (ie, the same bytes) will have the same value.
Any choice of representations for the two cases that doesn't
produce identical results here means the two representations
are not the same, ie, the implementation is not conforming
(under C99/C11 rules).

Shao Miller

unread,

Jan 2, 2013, 7:56:17 PM1/2/13

to

On 1/2/2013 13:45, Tim Rentsch wrote:

> Shao Miller <sha0....@gmail.com> writes:
>> But the implementation's actual pointer representation could be
>> complicated and so there's still no guarantee. If you can dream up a
>> pointer representation, then you can dream up a counter-example to
>> your code's portability.
>
> The stipulation that all pointers to structs have the same
> representation and alignment requirements means that the
> type-punning union member access has to work. That's what
> having the same represention means -- that the same object
> representation (ie, the same bytes) will have the same value.
> Any choice of representations for the two cases that doesn't
> produce identical results here means the two representations
> are not the same, ie, the implementation is not conforming
> (under C99/C11 rules).
>

Here are three examples that I would consider to be counter-examples:

1. A 'struct any *' pointer representation that is a simple index.

This could provide a level of indirection into a table. The table
element could have type and bounds information, along with some other
form of address for the pointee. When the representation (a simple
index) is loaded into a 'struct bar *' instead of into a 'struct foo *',
a trap could be generated.

2. A 'struct any *' pointer representation that encodes bounds
information. While the original post "has this covered" because the
bounds of the the original pointee encompass the bounds of the members
and sub-members, it's not safe in the general case. When the
representation is loaded into a 'struct bigger *' instead of a 'struct
smaller *', the bounds mismatch could generate a trap.

3. A 'struct any *' pointer representation that encodes type
information. Maybe for the sole reason of generating a trap when the
representation is loaded into an incompatible pointer type of object.

It seems clear to me that size, alignment, argument promotion (none) and
format of 'struct foo *' and 'struct bar *' must be the same, but I
don't yet understand how that ties into compatible types nor into
defined behaviour, since

"Certain object representations need not represent a value of the
object type. If the stored value of an object has such a representation
and is read by an lvalue expression that does not have character type,
the behavior is undefined. If such a representation is produced by a
side effect that modifies all or any part of the object by an lvalue
expression that does not have character type, the behavior is
undefined.41) Such a representation is called a trap representation."

Why can a valid 'struct foo *' value's representation represent a valid
'struct foo *' value but not a trap for a 'struct bar *'? For example,
it might be useful to trap a 'const struct baz *' representation read
into a 'struct baz *' object. A single bit in the representation would
be sufficient for that. The representation would be the same, wouldn't it?

- Shao Miller

Shao Miller

unread,

Jan 3, 2013, 8:28:46 AM1/3/13

to

Example #1: "...interchangeability as arguments to functions..."

/* libbaz.h */

typedef void f_baz_callback(structptr_t);

extern void BazFunc(f_baz_callback * Callback, structptr_t StructPtr);

/* libbaz.c */

typedef struct any * structptr_t;
#include "libbaz.h"

void BazFunc(f_baz_callback * callback, structptr_t sptr) {
/*
* 'struct any' is an incomplete object type.
* Trap representations are more limited than if it was a
* a complete object type.
*
* A trap representation for _any_ pointer type could
* still be present. A trapresentation for _any_
* 'struct XXX *' could still be present.
*
* A trapresentation based on bounds could still be present
* if 'sptr' is non-null, but somehow indicates 0 bytes
* of storage, or some other invalid value.
*
* A trapresentation based on lifetime could still be
* present. Same with 'const'-ness.
*
* etc.
*
* foo.c and bar.c have a different type for 'sptr', but
* since the representation is the same, there's no problem.
*/
callback(sptr);
}

/* foo.c */

typedef struct s_foo * structptr_t;
#include "libbaz.h"

struct s_foo {
int i;
};

f_baz_callback foo_callback;
void foo_callback(structptr_t sptr) {
sptr->i = 42;
}

void foo_func(void) {
struct s_foo foo;

BazFunc(foo_callback, &foo);
}

/* bar.c */

typedef struct s_bar * structptr_t;
#include "libbaz.h"

struct s_bar {
double d;
};

f_baz_callback bar_callback;
void bar_callback(structptr_t sptr) {
sptr->d = 3.14159;
}

void bar_func(void) {
struct s_bar bar;

BazFunc(bar_callback, &bar);
}

Example #2: "...and members of unions."

/* libnextgen.h version 1.0 */

struct apple;
struct orange;

union u_dyn_obj {
struct apple * apple;
struct orange * orange;
};

extern void NextGenFunc(union u_dyn_obj * DynamicObject);

/* libnextgen.h version 2.0 */

struct apple;
struct orange;
struct dog;
struct cat;

union u_dyn_obj {
struct apple * apple;
struct orange * orange;
struct dog * dog;
struct cat * cat;
};

extern void NextGenFunc(union u_dyn_obj DynamicObject);

/* user.c */

#include "libnextgen.h"

void UserFunc(void) {
struct apple apple;
union u_dyn_obj dyn_obj;

/*
* It doesn't matter which version of the header we
* were built with, _nor_ which version of the library
* is installed, because the representation (and thus
* size) and alignment are always going to be the same.
*
* We only work with apples and oranges, but 2.0's
* support for dogs and cats doesn't affect us.
*/
dyn_obj.apple = &apple;
NextGenFunc(dyn_obj);
}

Example #3: "... Such a representation is called a trap representation."

/* hmmm1.c */

#include <stdlib.h>
#include <stdio.h>

struct s_smaller {
char arr[4];
};

struct s_bigger {
char arr[sizeof (struct s_smaller)];
double d;
};

int main(void) {
void * storage;
struct s_smaller * smaller;

/* Allocate enough storage for an s_smaller */
storage = calloc(1, sizeof (struct s_smaller));
if (!storage)
return 0;
smaller = storage;

/*
* Problem #3.1: Although the representation is
* the same for both types, the value cannot
* point to an s_bigger due to insufficient storage.
* There's enough storage for arr, but that's
* irrelevant.
*/
(*((struct s_bigger **) &smaller))->arr[0] = 'C';

printf("Result: %s\n", (char *) storage);

return 0;
}

Example #4: "... Such a representation is called a trap representation."

/* hmmm2.c */

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

struct s_smaller {
char arr[4];
};

struct s_bigger {
char arr[sizeof (struct s_smaller)];
double d;
};

union u_of_ptrs {
struct s_smaller * smaller;
struct s_bigger * bigger;
};

void discard_provenance(
union u_of_ptrs * left,
union u_of_ptrs * right,
union u_of_ptrs * combined
);

int main(void) {
void * storage;
union u_of_ptrs first, first_backup, second, third;

/* Allocate enough storage for an s_bigger */
storage = calloc(1, sizeof (struct s_bigger));
if (!storage)
return 0;

/* Plenty of storage for an s_smaller */
first.smaller = storage;

/* Backup */
memcpy(&first_backup, &first, sizeof first_backup);

/* Free some storage */
storage = realloc(storage, sizeof (struct s_smaller));
if (!storage)
return 0;

/* Right amount of storage */
second.smaller = storage;

/* Compare the representations */
if (memcmp(&first_backup, &second, sizeof first_backup))
return 0;

/* Discard any "provenance" for a later test */
discard_provenance(&first_backup, &second, &third);

/*
* Problem #4.1: second.bigger cannot point to an
* s_bigger, as there's insufficient storage.
* There's storage enough for arr, but that's
* irrelevant.
*/
second.bigger->arr[0] = '1';

/*
* Problem #4.2: Same problem with first.bigger, even
* though its "provenance" was from the earlier allocation.
*/
first.bigger->arr[1] = '2';

/*
* Problem #4.3: Same problem with third.bigger, even
* though its "provenance" has been discarded.
*/
third.bigger->arr[2] = '3';

printf("Result: %s\n", (char *) storage);

return 0;
}

void discard_provenance(
union u_of_ptrs * left,
union u_of_ptrs * right,
union u_of_ptrs * combined
) {
unsigned char * lp = (void *) left;
unsigned char * rp = (void *) right;
unsigned char * cp = (void *) combined;
unsigned char * end = (void *) (combined + 1);

while (cp < end)
*cp++ = *lp++ & *rp++;
}

I would certainly appreciate a C99-/C11-conforming implementation that
is able to catch the problems of examples #3 & #4. One way would be to
deem trap representations for one object type and not for another, where
the types are not compatible.

My interpretation of "same representation and alignment requirements"
for struct pointer types is along the lines of:

- If there are padding bits in one, there are padding bits at the
same positions in the other
- If there are parity bits in one, there are parity bits at the same
positions in the other
- If a segment is encoded in one, then it is encoded in the same way
in the other
- If type information is encoded in one, then it is encoded in the
same way in the other
- If bounds information is encoded in one, then it is encoded in the
same way in the other
- If lifetime/duration information is encoded in one, then it is
encoded in the same way in the other
- etc.

Since this interpretation supports the fair examples #1 & #2 as well as
the more contrived examples #3 & #4, I fail to understand the benefit of
adopting a more restrictive interpretation which seemingly prohibits the
problems of #3 and #4 from being caught; perhaps with trap
representations. But perhaps I've misunderstood.

- Shao Miller

Tim Rentsch

unread,

Jan 3, 2013, 6:31:11 PM1/3/13

to

These ideas aren't consistent with how the Standard uses the
notion of having the same representation in other instances. For
example, an object of type (int) has the same representation and
alignment requirements as an object of type (const int). Yet it's
ridiculous to think that loading an (int) object through a pointer
of type (const int *) might cause a trap when accessing the object
just as a plain int wouldn't, despite the two types being distinct
and not compatible.

> It seems clear to me that size, alignment, argument promotion (none)
> and format of 'struct foo *' and 'struct bar *' must be the same, but
> I don't yet understand how that ties into compatible types nor into
> defined behaviour, since
>
> "Certain object representations need not represent a value of the
> object type. If the stored value of an object has such a
> representation and is read by an lvalue expression that does not have
> character type, the behavior is undefined. If such a representation is
> produced by a side effect that modifies all or any part of the object
> by an lvalue expression that does not have character type, the
> behavior is undefined.41) Such a representation is called a trap
> representation."
>
> Why can a valid 'struct foo *' value's representation represent a
> valid 'struct foo *' value but not a trap for a 'struct bar *'? For
> example, it might be useful to trap a 'const struct baz *'
> representation read into a 'struct baz *' object. A single bit in the
> representation would be sufficient for that. The representation would
> be the same, wouldn't it?

No. I expect you're thinking of "representation" as more or less
synonymous with "format", but representation means more than that.
The representation of a type is the mapping from the bits (ie, the
byte values of the object representation) to values in the type's
abstract value space, including trap values. If two types have
the same representation, that means the two mappings produce
corresponding values (ie, for each object representatioon) in the
two abstract value spaces. For C, corresponding values are what
would be produced by conversion between the two types in question.
In other words, if types A and B have the same representation,
then copying the bytes (eg, with memcpy()) from an 'A a;' into a
'B b;' must give the same results as 'b = (B) a;'. Any change in
behavior between the two cases means the two representations are
not the same. Accessing via type B using a union member access
works the same way that the memcpy() would.

For pointers, there is the additional concern that the converted
or corresponding value be a non-trap value in the abstract value
space of the new pointer type. However, in the particular example
here (ie, in the original posting, even though since disappeared
in the subthread), we know the pointer conversions have to work
because of the way the particular structs being pointed to are
nested.

Shao Miller

unread,

Jan 3, 2013, 8:55:11 PM1/3/13

to

Ok. I agree with your example. But 2 points:

- The representation of 'int' is discussed in much greater detail than
the representation of any pointer type. Pointer representations are
much more opaque and free for the implementation to decide upon.

- I don't think it makes practical sense to encode type information in
the padding bits of an 'int', but it certainly seems useful to encode
extra information in a pointer representation, since they are derived
types with abstract values.

Surely if, in

void somefunc(void) {
unsigned char c;
/* ... */
}

'c' is permitted to have a trap representation due to its "provenance,"
then it is especially convenient that pointer representations are
opaque, so "provenance" or other meta-data can be encoded directly. No?

>> It seems clear to me that size, alignment, argument promotion (none)
>> and format of 'struct foo *' and 'struct bar *' must be the same, but
>> I don't yet understand how that ties into compatible types nor into
>> defined behaviour, since
>>
>> "Certain object representations need not represent a value of the
>> object type. If the stored value of an object has such a
>> representation and is read by an lvalue expression that does not have
>> character type, the behavior is undefined. If such a representation is
>> produced by a side effect that modifies all or any part of the object
>> by an lvalue expression that does not have character type, the
>> behavior is undefined.41) Such a representation is called a trap
>> representation."
>>
>> Why can a valid 'struct foo *' value's representation represent a
>> valid 'struct foo *' value but not a trap for a 'struct bar *'? For
>> example, it might be useful to trap a 'const struct baz *'
>> representation read into a 'struct baz *' object. A single bit in the
>> representation would be sufficient for that. The representation would
>> be the same, wouldn't it?
>
> No. I expect you're thinking of "representation" as more or less
> synonymous with "format",

Yes, you are right about that.

> but representation means more than that.
> The representation of a type is the mapping from the bits (ie, the
> byte values of the object representation) to values in the type's
> abstract value space, including trap values. If two types have
> the same representation, that means the two mappings produce
> corresponding values (ie, for each object representatioon) in the
> two abstract value spaces. For C, corresponding values are what
> would be produced by conversion between the two types in question.
> In other words, if types A and B have the same representation,
> then copying the bytes (eg, with memcpy()) from an 'A a;' into a
> 'B b;' must give the same results as 'b = (B) a;'. Any change in
> behavior between the two cases means the two representations are
> not the same.

I'm struggling to reconcile that with C99's 3.17p1 and 6.2.6.1. 3.17p1:

"value
precise meaning of the contents of an object when interpreted as
having a specific type"

I'm missing the part where it's possible for the same object
representation to represent the same value for two incompatible types,
since the value depends on the type.

Regarding conversion, 6.3p2 has that

"Conversion of an operand value to a compatible type causes no change
to the value or the representation."

Why mention both of them instead of simply "representation," if there's
a one-to-one correspondence between representation and value, given
compatible type? (Let alone incompatible types with the same
representation.)

Regarding pointer conversion, 6.3.2.3p1 has that

"For any qualifier q, a pointer to a non-q-qualified type may be
converted to a pointer to the q-qualified version of the type; the
values stored in the original and converted pointers shall compare equal."

Doesn't this explicitly hint that a 'const int *' value's representation
is permitted to be a trap representation for an 'int *', but not the
other way around? It seems convenient that such meta-data can be
directly encoded into the pointer representation, since pointer
representation is so opaque.

There's also p7:

"A pointer to an object or incomplete type may be converted to a
pointer to a different object or incomplete type. If the resulting
pointer is not correctly aligned57) for the pointed-to type, the
behavior is undefined. Otherwise, when converted back again, the result
shall compare equal to the original pointer. ..."

Doesn't this explicitly hint that it's not the most portable idea to do
anything much with a converted pointer other than to eventually convert
it back before using it? If I understand you correctly, there's no
conversion happening, as the value is simply becoming one in a different
type's value space, so there's no problem with p7.

Regarding your equivalence between the 'memcpy' and the cast for two
types with the same representation, 6.5.4p4 has that

"Preceding an expression by a parenthesized type name converts the
value of the expression to the named type. This construction is called a
cast.89) A cast that specifies no conversion has no effect on the type
or value of an expression."

If '(B) a' is already the same value as 'a' due to the types having the
same representation, then there is no conversion, right? If that's the
case, then the type of '(B) a' should be 'A'. Like 3.17p1, type and
value are once again tied together, so it seems to me that incompatible
types can have incompatible values.

HOWEVER, you said _corresponding_values_. So I'd ask: May a value in
the value space for type 'A' not have a corresponding, but invalid value
in the value space for type 'B'? If it may, then I fail to understand
why the original post's code is well-defined in C99 and C11.

> Accessing via type B using a union member access
> works the same way that the memcpy() would.

I absolutely agree with your equivalence between 'memcpy' and union
members. Also: Re-interpreting the object representation with something
like:

A * ptr;
(*((B **) &ptr));

(where types 'A' and 'B' have the same representation.)

> For pointers, there is the additional concern that the converted
> or corresponding value be a non-trap value in the abstract value
> space of the new pointer type. However, in the particular example
> here (ie, in the original posting, even though since disappeared
> in the subthread), we know the pointer conversions have to work
> because of the way the particular structs being pointed to are
> nested.
>

Ah, that answers my last question, above. But there's a bit of a jump
in the logic that I can't grasp, and that's why the nesting of the
structures in the original example has anything at all to do with the
corresponding pointer value having to work. Yes, I agree that the
original example's bounds are covered because of the nesting, but I
don't understand why that's the only important subject.

To back up a bit from the original example, 'char *' and 'void *' have
the same representation. Would you say that in:

void reinterpret(void) {
void * vp = &vp;
vp = (*((char **) &vp)) + 1;
}

the expression-statement has Standard-defined behaviour? I'm worried
about this example because an implementation might wish to represent
"the stride" of the pointer arithmetic, just as "Multi-Dimensional Array
Simulator"[1] does. Implicit and explicit conversions (like the
promotions, casts, equality and ternary semantics, etc.) seem to offer
all the protection we need, while re-interpretation does not.

- Shao Miller

[1] http://www.iso-9899.info/wiki/Code_snippets

Shao Miller

unread,

Jan 3, 2013, 9:00:33 PM1/3/13

to

On 1/3/2013 20:55, Shao Miller wrote:
>
> I absolutely agree with your equivalence between 'memcpy' and union
> members. Also: Re-interpreting the object representation with something
> like:
>
> A * ptr;
> (*((B **) &ptr));
>
> (where types 'A' and 'B' have the same representation.)
>

I meant where 'A *' and 'B *' have the same representation.

Shao Miller

unread,

Jan 4, 2013, 12:56:38 PM1/4/13

to

(And alignment requirements.)

However, please allow me to retract this equivalence with type-punning
via union members and 'memcpy'. After reviewing some discussion with
Mr. Clive Feather, now I'm not sure so... He points out that there is
an effective type involved, but we end up with an lvalue attempting to
access a stored value with that effective type associated, but the
lvalue attempting to access it has a type not permitted by 6.5p7.

- Shao Miller

Shao Miller

unread,

Jan 4, 2013, 12:59:39 PM1/4/13

to

On 1/3/2013 20:55, Shao Miller wrote:

Since else-thread I'm retracting the union member type-punning
equivalence with this kind of raw re-interpretation, please allow me to
also retract this example and replace it with:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u = { &u };
u.cp = u.cp + 1;

}

> the expression-statement has Standard-defined behaviour? I'm worried
> about this example because an implementation might wish to represent
> "the stride" of the pointer arithmetic, just as "Multi-Dimensional Array
> Simulator"[1] does. Implicit and explicit conversions (like the
> promotions, casts, equality and ternary semantics, etc.) seem to offer
> all the protection we need, while re-interpretation does not.
>

> [1] http://www.iso-9899.info/wiki/Code_snippets

Tim Rentsch

unread,

Jan 6, 2013, 11:00:45 PM1/6/13

to

That doesn't change the point I was making.

> - I don't think it makes practical sense to encode type
> information in the padding bits of an 'int', but it certainly
> seems useful to encode extra information in a pointer
> representation, since they are derived types with abstract
> values.

Even if that's true, it doesn't change what the Standard mandates.

> Surely if, in
>
> void somefunc(void) {
> unsigned char c;
> /* ... */
> }
>
> 'c' is permitted to have a trap representation due to its
> "provenance,"

It isn't. You are either mis-remembering or have misunderstood.

> then it is especially convenient that pointer
> representations are opaque, so "provenance" or other meta-data can be
> encoded directly. No?

Irrelevant. Such a statement might be an argument for changing
a future Standard, but it has no bearing on what is said
in the current Standard.

> [quoted paragraph snipped]

>
> I'm missing the part where it's possible for the same object
> representation to represent the same value for two incompatible
> types, since the value depends on the type.

I don't see why you are confused. There is no wording that
forbids it, and it's obviously possible, as 'int' and 'const int'
illustrate. On many machines 'int' and 'long' provide another
example. Or two of the three character types.

> Regarding conversion, 6.3p2 has that
>
> "Conversion of an operand value to a compatible type causes no
> change to the value or the representation."
>
> Why mention both of them instead of simply "representation," if
> there's a one-to-one correspondence between representation and
> value, given compatible type? (Let alone incompatible types
> with the same representation.)

Do you think the Standard includes a sentence saying compatible
types must have the same representation and alignment requirements?

Incidentally, there isn't a one-to-one correspondence between object
representations and values (necessarily, that is). The mapping is
_from_ object representations _to_ the abstract value space, but it
need not be one-to-one; also, the abstract value space includes
"trap values" which correspond to trap representations but are not
'values' as the Standard normally uses the term.

> Regarding pointer conversion, 6.3.2.3p1 has that
>
> "For any qualifier q, a pointer to a non-q-qualified type may be
> converted to a pointer to the q-qualified version of the type; the
> values stored in the original and converted pointers shall compare
> equal."
>
> Doesn't this explicitly hint that a 'const int *' value's
> representation is permitted to be a trap representation for an 'int

> *', but not the other way around? [snip]

No. Converting a valid 'const int *' to an 'int *' is well-defined
and must succeed.

> There's also p7:
>
> "A pointer to an object or incomplete type may be converted to a
> pointer to a different object or incomplete type. If the resulting
> pointer is not correctly aligned57) for the pointed-to type, the
> behavior is undefined. Otherwise, when converted back again, the
> result shall compare equal to the original pointer. ..."
>
> Doesn't this explicitly hint that it's not the most portable idea to
> do anything much with a converted pointer other than to eventually
> convert it back before using it?

No.

> If I understand you correctly, there's no conversion happening,
> as the value is simply becoming one in a different type's value
> space, so there's no problem with p7.

What I think you mean is there is no change to the object
representation (which I didn't say and which doesn't have to
be true). What I said was basically that the result must be the
same whether the object representation changes or not (in cases
where the two types involved have the same representation).

> Regarding your equivalence between the 'memcpy' and the cast for two
> types with the same representation, 6.5.4p4 has that
>
> "Preceding an expression by a parenthesized type name converts the
> value of the expression to the named type. This construction is called
> a cast.89) A cast that specifies no conversion has no effect on the
> type or value of an expression."
>
> If '(B) a' is already the same value as 'a' due to the types having
> the same representation, then there is no conversion, right?

Wrong. Casting always does a conversion, even if the conversion
doesn't change either the value or the object representation.
Assignment also always does a conversion, even if the types are
the same. Furthermore for the case we are discussing, namely two
pointer-to-structure types, if the referenced types are different
then the value spaces of the two pointer types are disjoint, so
it can't be the case that the two values are the same.

> If that's the case, then the type of '(B) a' should be 'A'.
> Like 3.17p1, type and value are once again tied together, so it
> seems to me that incompatible types can have incompatible
> values.

This sentence is gibberish.

> HOWEVER, you said _corresponding_values_. So I'd ask: May a
> value in the value space for type 'A' not have a corresponding,
> but invalid value in the value space for type 'B'? If it may,
> then I fail to understand why the original post's code is
> well-defined in C99 and C11.

I shouldn't have to explain this again. Converting the value
with a cast has to work, because of how the struct's are nested.
Therefore reinterpreting the object representation using a union
member access has to work, because that's what "having the same
representation" means.

>> Accessing via type B using a union member access
>> works the same way that the memcpy() would.
>
> I absolutely agree with your equivalence between 'memcpy' and union
> members. Also: Re-interpreting the object representation with
> something like:
>
> A * ptr;
> (*((B **) &ptr));
>
> (where types 'A' and 'B' have the same representation.)

That doesn't work, as I think you pointed out subsequently,
because of effective type rules. Except for that, yes, same
idea.

>> For pointers, there is the additional concern that the converted
>> or corresponding value be a non-trap value in the abstract value
>> space of the new pointer type. However, in the particular example
>> here (ie, in the original posting, even though since disappeared
>> in the subthread), we know the pointer conversions have to work
>> because of the way the particular structs being pointed to are
>> nested.
>
> Ah, that answers my last question, above. But there's a bit of
> a jump in the logic that I can't grasp, and that's why the
> nesting of the structures in the original example has anything
> at all to do with the corresponding pointer value having to
> work. Yes, I agree that the original example's bounds are
> covered because of the nesting, but I don't understand why
> that's the only important subject.

There are two important facts: one, the struct values are
nested appropriately; and two, the pointers to those structs
have the same representation (and alignment requirements).
Therefore the type-punning union member access gets a set
of bits that are both interpreted correctly and valid for
the type in question.

> To back up a bit from the original example, 'char *' and 'void
> *' have the same representation. Would you say that in:
>
>
> void reinterpret(void) {
> void * vp = &vp;
> vp = (*((char **) &vp)) + 1;
> }
>

Again, there is a violation of effective type rules in this case,
but if the analogous thing were done using union member access
then yes it has to work.

> the expression-statement has Standard-defined behaviour? I'm
> worried about this example because an implementation might wish
> to represent "the stride" of the pointer arithmetic, just as
> "Multi-Dimensional Array Simulator"[1] does. Implicit and
> explicit conversions (like the promotions, casts, equality and
> ternary semantics, etc.) seem to offer all the protection we
> need, while re-interpretation does not.

You're confusing what you think might be a good idea with
what the Standard mandates. My comments are concerned only
with the latter.

Shao Miller

unread,

Jan 7, 2013, 12:47:46 AM1/7/13

to

And here is the analogous thing, offered elsethread:

void reinterpret(void) {
union {
void * vp;
char * cp;
} u = { &u };
u.cp = u.cp + 1;

/* Hmm ^^^^ */
}

The 'u.cp' expression marked by the comment (having type 'char *') is an
lvalue whose type is not one of those listed by 6.5p7, but it attempts
to access the value of 'u.vp'. (Doesn't it?) This appears to yield
undefined behaviour, doesn't it? Or would you suggest that the 'u'
sub-expression (having the union type) is the lvalue for purposes of
6.5p7, and that the type of the containing expression 'u.cp' doesn't matter?

- Shao Miller

Tim Rentsch

unread,

Jan 7, 2013, 2:06:12 AM1/7/13

to

Look harder. Think more. Write less.

Shao Miller

unread,

Jan 7, 2013, 4:03:32 AM1/7/13

to

Please don't resort to this sort of personally-directed nonsense as
you've done before. If you don't have an answer, please simply say so.
If you really think I've missed something, it'd certainly be more
helpful to point it out instead of implying laziness or stupidity.

If you think I write too much, well, I think you write too little
Standard, and too much "Mr. T. Rentsch knows best." Unfortunately, that
doesn't work for me, as your knowledge isn't directly accessible to me.
I'm sorry if that makes our discussions difficult! If you choose to
help me to understand your valuable perspective, I'll be appreciative.

Just in case you're nit-picking an error in the code that hardly seems
relevant to the meat of the question, please allow me to offer the
corrected code:

void reinterpret(void) {
union {
void * vp;
char * cp;

} u;
u.vp = &u;

u.cp = u.cp + 1;
/* Hmm ^^^^ */
}

int main(void) {
reinterpret();
return 0;
}

Otherwise, would anyone else please point out what I might've missed
about whether or not the above example results in undefined behaviour?
The "shall"[6.5p7] is outside of a constraint, so that'd seem to be
undefined behaviour if the lvalue under consideration is 'u.cp'. If the
lvalue is 'u', then its union type _is_ permitted by 6.5p7 (as
acknowledged in a previous post, above), but it'd be good to know
_which_ is the lvalue under consideration.

- Shao Miller

Shao Miller

unread,

Jan 7, 2013, 7:41:32 AM1/7/13

to

On 1/6/2013 23:00, Tim Rentsch wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> Surely if, in
>>
>> void somefunc(void) {
>> unsigned char c;
>> /* ... */
>> }
>>
>> 'c' is permitted to have a trap representation due to its
>> "provenance,"
>
> It isn't. You are either mis-remembering or have misunderstood.
>

Committee Discussion in Defect Report #260:

"In addition the C Standard does not prohibit an implementation from
tracking the provenance of the bit-pattern representing a value. An
indeterminate value happening to have a bit pattern that is identical to
a bit pattern representing a determinate value is not sufficient to
allow access to the indeterminate value free from undefined behavior."

That suggests to me that real implementation representatives discussed
it, and some of them must have argued that there is more to object
representation and value than a simple mapping. I suggest that there
are other meta-considerations (such as "indeterminate value"), some of
which are crucial to an implementation that wishes to have "enforceable
coding rules":

http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1663.pdf

'c' above is permitted to have a trap representation without even having
that fact coded into its object representation. If I've misunderstood,
then I apologize. If you have further knowledge of the status of DR
#260, then please share! :)

- Shao Miller

Shao Miller

unread,

Jan 7, 2013, 8:25:21 AM1/7/13

to

On 1/7/2013 04:03, Shao Miller wrote:
>
> void reinterpret(void) {
> union {
> void * vp;
> char * cp;
> } u;
> u.vp = &u;
> u.cp = u.cp + 1;
> /* Hmm ^^^^ */
> }
>
> int main(void) {
> reinterpret();
> return 0;
> }
>
> Otherwise, would anyone else please point out what I might've missed
> about whether or not the above example results in undefined behaviour?
> The "shall"[6.5p7] is outside of a constraint, so that'd seem to be
> undefined behaviour if the lvalue under consideration is 'u.cp'. If the
> lvalue is 'u', then its union type _is_ permitted by 6.5p7 (as
> acknowledged in a previous post, above), but it'd be good to know
> _which_ is the lvalue under consideration.

Mr. Clive D. W. Feather very kindly gave his valuable time and shared in
agreement about this code.

6.5p7 makes this undefined behaviour, just as it does for the original
post's use of the two different union members, despite the two
pointer-to-structure types having the same representation and alignment
requirements.

The penultimate bullet of 6.5p7 regarding unions is so that the
following code is well-defined:

void reinterpret(void) {
union {
void * vp;
char * cp;

} u, v;
u.vp = &u;

/* Union lvalue on right accesses the stored value */
v = u;
(void) v;

}

int main(void) {
reinterpret();
return 0;
}

I'm glad that if I've lost some marbles, someone else lost the same ones. :)

- Shao Miller

Tim Rentsch

unread,

Jan 12, 2013, 5:52:54 PM1/12/13

to

Shao Miller <sha0....@gmail.com> writes:

> On 1/6/2013 23:00, Tim Rentsch wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>> Surely if, in
>>>
>>> void somefunc(void) {
>>> unsigned char c;
>>> /* ... */
>>> }
>>>
>>> 'c' is permitted to have a trap representation due to its
>>> "provenance,"
>>
>> It isn't. You are either mis-remembering or have misunderstood.
>

> Committee Discussion in Defect Report #260: [snip]

The type unsigned char does not have trap representations. There
are no exceptions. Types that don't have trap representations
never have a trap representation.

In C11, accessing a variable like 'c' above before it has been
initialiized is undefined behavior. But that is because C11
added (relative to, eg, N1256) a specific statement regarding
such cases, stating explicitly that the behavior is undefined;
it has nothing to do with provenance or trap representations.
Indeed, seeing that this proviso was added in C11 makes it
obvious that DR 260 doesn't apply to cases like the example
above, because otherwise there would be no reason to add it.

Tim Rentsch

unread,

Jan 13, 2013, 1:34:46 PM1/13/13

to

Let me offer a longer comment explaining what I was trying to say
and why. I preface this with a disclaimer that none of what
follows is meant as a statement of fact but merely my perceptions
and opinions.

I think you have a genuine interest in learning and understanding
C and what the Standard says about the language, and a sincere
desire to participate in discussion in both main newsgroups for
that.

Unfortunately, how you express yourself gets in the way of doing
that. Based on your writing, you seem like someone who is a
careless reader, a lazy writer, and who tends to think with his
mouth more than with his brain. Upon choosing to write, you write
the first thing that pops into your head, wandering like an
meandering river until you arrive at some destination, perhaps
related to what prompted you to start writing in the first place,
and perhaps not. More than any other poster in clc/csc that I am
aware of, you post followups to your own comments, giving second
thoughts, third thoughts, afterthoughts, tangential thoughts, and
(of course) corrections. There isn't anything wrong with doing
any of these things, but doing so as often as you do gives the
impression that you don't think through what you want to say when
you first say -- that is, write -- it.

Just as important is the matter of _how_ you say what you want to
communicate. Many times in reading your writing I don't know what
point you're trying to make or what question you want answered.
Even worse, sometimes I'm not sure _you_ know. This discourages
me from trying to read what you are writing, because it takes so
much effort to try to read it. It seems like either you don't
understand how to express yourself clearly, or you aren't willing
to make the effort to do so.

Besides that, a lot of times you ask questions that it seems like
you could answer yourself if you just took the time to do so. An
example came up recently in comp.std.c where you asked about a
change in wording in a paragraph describing pointer conversions.
This question was easily answerable in only a few minutes either
by doing a text search or by looking in the index. And I don't
think this is an isolated example. It's the relative frequency
that matters -- everyone has a blind spot occasionally, but it
seems to occur more rarely for most people than it does for you.
This further reduces my motiviation to try to read your comments
or put effort into crafting a reply.

The suggestions I gave earlier weren't meant as criticism or as a
complaint about your writing. It's true they were born largely
out of exasperation, but my intention was to offer helpful advice.
If you choose to disregard that advice, well that's up to you.
However, I don't feel any obligation to try to help someone who
not only ignores my attempts to be helpful but also asks in a way
that's easy for him but makes things harder for the people he is
asking. You want to reduce your confusion about this example? I
made suggestions that I thought would help you do that. You want
my help in addressing future confusions? Following, or even
clearly making an earnest effort of trying to follow, those same
suggestions is also the best way to do that. You want help but
don't want to change what you do in asking for it? In that case
you shouldn't expect me to try to help or to respond in some
particular way just because it happens to suit what you want.

Shao Miller

unread,

Jan 13, 2013, 3:19:05 PM1/13/13

to

On 1/12/2013 17:52, Tim Rentsch wrote:
> Shao Miller <sha0....@gmail.com> writes:
>
>> On 1/6/2013 23:00, Tim Rentsch wrote:
>>> Shao Miller <sha0....@gmail.com> writes:
>>>> Surely if, in
>>>>
>>>> void somefunc(void) {
>>>> unsigned char c;
>>>> /* ... */
>>>> }
>>>>
>>>> 'c' is permitted to have a trap representation due to its
>>>> "provenance,"
>>>
>>> It isn't. You are either mis-remembering or have misunderstood.
>>
>> Committee Discussion in Defect Report #260:
>>

>> "In addition the C Standard does not prohibit an implementation from
>> tracking the provenance of the bit-pattern representing a value. An
>> indeterminate value happening to have a bit pattern that is identical to
>> a bit pattern representing a determinate value is not sufficient to
>> allow access to the indeterminate value free from undefined behavior."
>>
>> That suggests to me that real implementation representatives discussed
>> it, and some of them must have argued that there is more to object
>> representation and value than a simple mapping. I suggest that there
>> are other meta-considerations (such as "indeterminate value"), some of
>> which are crucial to an implementation that wishes to have "enforceable
>> coding rules":
>>
>> http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1663.pdf
>>
>> 'c' above is permitted to have a trap representation without even having
>> that fact coded into its object representation. If I've misunderstood,
>> then I apologize. If you have further knowledge of the status of DR
>> #260, then please share! :)
>

> The type unsigned char does not have trap representations. There
> are no exceptions. Types that don't have trap representations
> never have a trap representation.
>
> In C11, accessing a variable like 'c' above before it has been
> initialiized is undefined behavior. But that is because C11
> added (relative to, eg, N1256) a specific statement regarding
> such cases, stating explicitly that the behavior is undefined;
> it has nothing to do with provenance or trap representations.
> Indeed, seeing that this proviso was added in C11 makes it
> obvious that DR 260 doesn't apply to cases like the example
> above, because otherwise there would be no reason to add it.
>

Now I think I understand what you are talking about, here. I think you
are discussing a trap representation as if it is associated with a type,
whereas I was discussing it as associated with an indeterminate value.

" 3.19.2
1 indeterminate value
either an unspecified value or a trap representation"

We could probably agree that the value of 'c' is indeterminate.

So you would say that in C11, 'c' has an unspecified value, which, once
read, leads to undefined behaviour, and that it does not have a trap
representation, which, once read, would not lead to undefined behaviour
because of the exemption for character type lvalues.

I would say that the value of 'c' happens to have a bit pattern that is
identical to a bit pattern representing an unspecified value, but that
reading it still leads to undefined behaviour (see committee discussion
above), just as a trap representation would for non-character types of
lvalues.

So if I'd typed "indeterminate value," it mightn't've been clear that I
was emphasizing the undefined behaviour of a read. If I'd typed
"unspecified value," same problem. So I chose to type "trap
representation." But to have avoided disagreement, I suppose I ought to
have typed something longer than that.

I don't know why you might think that DR #260 and DR #338 are so
different. Both cases involve the provenance of an object's value, and
how the mapping of object representation to value isn't the whole story.
DR #260 happened to come 6 years earlier and happens to explain it
pretty nicely, in my opinion. My point way above was that since pointer
representations are opaque (unlike the only other type of scalar), then
it is convenient that "more" of "the whole story" _can_ be encoded directly.

But if you say that my use of "trap representation" in the two previous
posts above is a misuse, I will not offer an argument against that. :)
Thank you for clarifying and for [albeit indirectly] referring to the
change brought about by DR #338. (I hope.)

--
- Shao Miller
--
"Thank you for the kind words; those are the kind of words I like to hear.

Cheerily," -- Richard Harter

Shao Miller

unread,

Jan 13, 2013, 4:00:26 PM1/13/13

to

On 1/13/2013 13:34, Tim Rentsch wrote:
>
> [some thoughtful and genuine criticisms]
>

Just in case my private e-mail goes into your spam folder: I sincerely
thank you for your criticisms; certainly much food for thought. I have
no complaint about such criticisms except that they're off-topic, _here_. :)

Tim Rentsch

unread,

Jan 14, 2013, 3:04:10 PM1/14/13

to

Whether a given object representation is a trap representation
is defined only the context of a particular type.

The term "indeterminate value" is defined in terms of (among
other things) trap representations, not the other way around.

> We could probably agree that the value of 'c' is indeterminate.

That may be true, but it has no bearing on what I said about
trap representations.

> So you would say that in C11, 'c' has an unspecified value, which,
> once read, leads to undefined behaviour, and that it does not have a
> trap representation, which, once read, would not lead to undefined
> behaviour because of the exemption for character type lvalues.

No. The reading takes place only when the behavior is, or might be,
defined. In cases like this one the undefined behavior occurs
before there is any attempt at reading (alternatively, instead
of an attempt at reading). Whether c holds a legal value or
not, or any value at all, is completely immaterial.

> I would say that the value of 'c' happens to have a bit pattern that
> is identical to a bit pattern representing an unspecified value, but
> that reading it still leads to undefined behaviour (see committee
> discussion above), just as a trap representation would for

> non-character types of lvalues. [snip elaboration]

PLease read 6.3.2.1 p2 again carefully. No reading takes place
(obviously not counting the possibility that anything could have
taken place because the behavior is undefined). What value c
has, or whether it has a value, or whether storage has even been
allocated for c, has no bearing either on what happens or how
the Standard describes what happens.

Tim Rentsch

unread,

Jan 14, 2013, 3:27:48 PM1/14/13

to

Shao Miller <sha0....@gmail.com> writes:

> On 1/13/2013 13:34, Tim Rentsch wrote:
>>
>> [some thoughtful and genuine criticisms]
>
> Just in case my private e-mail goes into your spam folder: I
> sincerely thank you for your criticisms; certainly much food
> for thought.

You're welcome, although I didn't mean to criticize, only give my
own impressions. But in any case I hope you find them helpful.

> I have no complaint about such criticisms except that they're
> off-topic, _here_. :)

I usually don't think about whether something is "on topic" or
not. If I think it will on the average provide a benefit to
those people who are likely to read it, typically it gets
posted. Of course I'm not always right in my judgments in
that respect, but, oh well, nobody's perfect.

Shao Miller

unread,

Jan 14, 2013, 6:52:11 PM1/14/13

to

Yes I think I grok your model (but could be mistaken)... If I
understand you correctly, type T has _exactly_ 2 to the power of (sizeof
(T) * CHAR_BIT) possible object representations. At some time S during
execution, those object representations can be partitioned thusly: Valid
values, trap representations. Furthermore, different object
representations can represent the same value.

Have I described your model correctly? Does S make any difference in
your model? At time S, is the partitioning the same for all objects
with type T, or can it be different for different objects?

> The term "indeterminate value" is defined in terms of (among
> other things) trap representations, not the other way around.
>

Agreed.

>> We could probably agree that the value of 'c' is indeterminate.
>
> That may be true, but it has no bearing on what I said about
> trap representations.
>

I assume that you are referring to the equivalent of "An object with
type 'unsigned char' cannot possess an object representation that is a
trap representation for that type, as there are no trap representations
for 'unsigned char'"

>> So you would say that in C11, 'c' has an unspecified value, which,
>> once read, leads to undefined behaviour, and that it does not have a
>> trap representation, which, once read, would not lead to undefined
>> behaviour because of the exemption for character type lvalues.
>
> No. The reading takes place only when the behavior is, or might be,
> defined. In cases like this one the undefined behavior occurs
> before there is any attempt at reading (alternatively, instead
> of an attempt at reading). Whether c holds a legal value or
> not, or any value at all, is completely immaterial.
>
>> I would say that the value of 'c' happens to have a bit pattern that
>> is identical to a bit pattern representing an unspecified value, but
>> that reading it still leads to undefined behaviour (see committee
>> discussion above), just as a trap representation would for
>> non-character types of lvalues. [snip elaboration]
>
> PLease read 6.3.2.1 p2 again carefully. No reading takes place
> (obviously not counting the possibility that anything could have
> taken place because the behavior is undefined). What value c
> has, or whether it has a value, or whether storage has even been
> allocated for c, has no bearing either on what happens or how
> the Standard describes what happens.
>

Ok, I've re-read it. I don't understand what you've just said in the
last sentence and in the sentence further above it "Whether c...". I'd
better return to "We could probably agree that the value of 'c' is
indeterminate". Is this true?

I'll assume that you're uninterested in my explanation of why I used the
term "trap representation", since you know it to be one thing and I
meant something else.

I'll assume that you're uninterested in discussing potential
similarities between DR #260 and DR #338, since you've already stated
that you do not believe DR #260 is relevant, and explained why.

Once I've understood what you mean by "trap representation," perhaps I
can adjust my "Surely if..." accordingly.

Thank you.

Tim Rentsch

unread,

Jan 16, 2013, 3:01:11 AM1/16/13

to

Shao Miller <sha0....@gmail.com> writes:

> On 1/14/2013 15:04, Tim Rentsch wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>
>>> On 1/12/2013 17:52, Tim Rentsch wrote:
>>>> Shao Miller <sha0....@gmail.com> writes:
>>>>
>>>>> On 1/6/2013 23:00, Tim Rentsch wrote:
>>>>>> Shao Miller <sha0....@gmail.com> writes:
>>>>>>> Surely if, in
>>>>>>>
>>>>>>> void somefunc(void) {
>>>>>>> unsigned char c;
>>>>>>> /* ... */
>>>>>>> }
>>>>>>>

>> > > [snip]

>>>
>>> I would say that the value of 'c' happens to have a bit pattern that
>>> is identical to a bit pattern representing an unspecified value, but
>>> that reading it still leads to undefined behaviour (see committee
>>> discussion above), just as a trap representation would for
>>> non-character types of lvalues. [snip elaboration]
>>
>> PLease read 6.3.2.1 p2 again carefully. No reading takes place
>> (obviously not counting the possibility that anything could have
>> taken place because the behavior is undefined). What value c
>> has, or whether it has a value, or whether storage has even been
>> allocated for c, has no bearing either on what happens or how
>> the Standard describes what happens.
>

> Ok, I've re-read it. I don't understand ... [snip]

Think, man, think! This isn't that hard of a problem.

Shao Miller

unread,

Jan 16, 2013, 5:43:52 AM1/16/13

to

Too long; didn't read. :)

But seriously, I'd tried to break this part of the discussion into tiny
little pieces. My previous post had 4 questions. Three of them could
be answered with "yes" or "no" and the other also had two possible
answers. Is that so much to ask?

Instead, you type a one-liner that answers none of the questions,
doesn't demonstrate that agreement or understanding are even possible,
presents me to be stupid, causes me to respond with a disproportionate
amount of text in an attempt to pull some teeth, while at other times
you complain about the amount I type. That hardly seems fair.

I'll tell you what I think: I think you've misunderstood what I don't
understand about what you said. I didn't say that I don't understand
your point about the UB coming before an act of reading.

I don't know if you are trying to hint that this undefined behaviour is
at translation-time, or something. It can't always be.

unsigned char scary(int x) {
unsigned char c;

if (x == 42)
c = '\0';
return c;
}

What I didn't understand is:

- Why you'd type "Whether c holds a legal value or not, or any value
at all, is completely immaterial". 6.3.2.1p2 seems to indicate that
it's _not_ immaterial. If it's initialized or assigned-to, it
_certainly_ holds a valid value (short of any prior UB).

- Why you'd type "What value c has, or whether it has a value, or

whether storage has even been allocated for c, has no bearing either on

what happens or how the Standard describes what happens". 6.3.2.1p2
seems to indicate that it _does_ have a bearing on what happens. If
it's initialized or assigned-to, it _certainly_ holds a valid value and
has storage (short of any prior UB).

That is to say, if we can prove[2] it has a valid value, then we can
deduce that it was initialized or assigned-to, because that proof[2]
must involve such operations. Please remember that I was originally
talking about the possibility that 'c' could have a trap
representation[1]. If that[1] is false, then the proof[2] of this other
business _needn't_ involve such operations; there're only unspecified
values allowed. But you can't establish that trap representation[1] is
false by explaining how some other form of undefined behaviour is true,
so I didn't understand the relevance of your two statements.

It looked like you were making general statements, but perhaps you were
addressing _only_ the code example at the top, which _could_ be detected
at translation-time?

What about the rest of my previous post? Instead of giving a response
to each of the bits that might lead to some common ground, you've given
a cryptic response to one bit. It's like taking a sound-bite of a
politician saying "Uhhh" and promoting their stupidity in a campaign
against them, because stupid people frequently say "uhhh". If you'd
snipped after four more words, it'd be slightly different. But perhaps
this is where you snip in your mind, as well. "He doesn't understand
something, so he needs to think more!" :)

Geoff

unread,

Jan 16, 2013, 12:01:55 PM1/16/13

to

On Wed, 16 Jan 2013 05:43:52 -0500, Shao Miller
<sha0....@gmail.com> wrote:

> unsigned char scary(int x) {
> unsigned char c;
>
> if (x == 42)
> c = '\0';
> return c;
> }
>

Eh? What is the value of c when x is not 42?
What is contained in object c prior to the 'if' statement?
What code is executed when x is not 42?

Tim Rentsch

unread,

Jan 16, 2013, 1:59:39 PM1/16/13

to

You're confusing what you want with which aspects
I believe are worth addressing.

Shao Miller

unread,

Jan 16, 2013, 6:32:43 PM1/16/13

to

In order, excluding the first question: "Indeterminate", "an
indeterminate value", "'return c;'".

But don't be fooled, this indeterminate value is not just any ordinary
indeterminate value, it is one that we can never know, since using the
lvalue in a context in which it'd normally result in a read causes
undefined behaviour. Depending on who you ask, it does or doesn't look
like a trap representation, but quacks just like one, but a little
earlier than a TR would, in C11.

The C11 additional sentence to 6.3.2.1p2 was prompted by Defect Report
#338. The submitter suggested that "indeterminate value" be amended:

"either an unspecified value or a trap representation; or in the case
of an object of automatic storage duration whose address is never taken,
a value that behaves as if it were a trap representation, even for types
that have no trap representations in memory (including type unsigned char)"

In SC22WG14.11380, Mr. Douglas Gwyn suggests, "Trap rep. was an
unfortunate choice of name, having no necessary connection with
trapping; it was only meant to describe any bit configuration that would
not be a valid representation for the type."

In N1300.pdf, Mr. Clark Nelson suggested that besides a "variable"
having a valid value or a trap representation, another possible state
was "uninitialized", and that either of the other two states could still
apply.

Geoff

unread,

Jan 16, 2013, 8:32:01 PM1/16/13

to

I think what they might have been driving at with "trap representation" was the
possibility that the environment (e.g., a debugger or a debugging runtime) could
initialize the contents to a determinate value. This would be analogous to
Microsoft's debugger putting 0xCC, 0xDD, 0xCF values into various portions of
the process memory space, allowing coders to "trap" uninitialized variables. I
think the proper word would be "detect".

I think the language of the specification would have to allow this facility but
not define it.

Microsoft documents their compiler trap values:
Value Name Description
------ -------- -------------------------
0xCD Clean Memory Allocated memory via malloc or new but never
written by the application.

0xDD Dead Memory Memory that has been released with delete or
free.
Used to detect writing through dangling
pointers.

0xED or Aligned Fence 'No man's land' for aligned allocations.
0xBD Using a different value here than 0xFD
allows the runtime to detect not only
writing outside the allocation,
but to also detect mixing alignment-specific
allocation/deallocation routines with the
regular ones.

0xFD Fence Memory Also known as "no mans land." This is used to
wrap the allocated memory (surrounding it
with a fence) and is used to detect indexing
arrays out of bounds or other accesses
(especially writes) past
the end (or start) of an allocated block.

0xFD or Buffer slack Used to fill slack space in some memory buffers
0xFE (unused parts of `std::string` or the user buffer
passed to `fread()`). 0xFD is used in VS 2005 (maybe
some prior versions, too), 0xFE is used in VS 2008
and later.

0xCC When the code is compiled with the /GZ option,
uninitialized variables are automatically assigned
to this value (at byte level).

// the following magic values are done by the OS, not the C runtime:

0xAB (Allocated Block?) Memory allocated by LocalAlloc().

0xBAADF00D Bad Food Memory allocated by LocalAlloc() with LMEM_FIXED,but
not yet written to.

0xFEEEFEEE OS fill heap memory, which was marked for usage,
but wasn't allocated by HeapAlloc() or LocalAlloc().
Or that memory just has been freed by HeapFree().

Shao Miller

unread,

Jan 16, 2013, 8:57:05 PM1/16/13

to

While that is certainly consistent with my experiences (having worked
with Microsoft environments for over a decade) where there isn't always
the luxury of having padding bits, and so seems an intuitive notion of
"trap representation" in _practice_, it doesn't seem 100% consistent
with Mr. Douglas Gwyn's note nor with perhaps the strictest reading of
the Standard, as Mr. T. Rentsch points out upthread. Oh well, we can
call it something else in discussion, such as "trappable-unspecified
value", so the term isn't confused with a Standard-defined term.

Wonderful, wonderful summary! I thought null pointers having 0x0C as
the least-significant byte was "a thing," too, but now I can't remember
having seen that documented anywhere.

Keith Thompson

unread,

Jan 17, 2013, 10:55:18 AM1/17/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/16/2013 20:32, Geoff wrote:

[...]

>> Microsoft documents their compiler trap values:
>> Value Name Description
>> ------ -------- -------------------------
>> 0xCD Clean Memory Allocated memory via malloc or new but never
>> written by the application.
>>

[snip]

>> 0xFEEEFEEE OS fill heap memory, which was marked for usage,
>> but wasn't allocated by HeapAlloc() or LocalAlloc().
>> Or that memory just has been freed by HeapFree().
>
> Wonderful, wonderful summary! I thought null pointers having 0x0C as
> the least-significant byte was "a thing," too, but now I can't remember
> having seen that documented anywhere.

I'm fairly sure Microsoft uses all-bits-zero for null pointers. Most
implementations do the same thing, though of course the standard doesn't
require it.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Working, but not speaking, for JetHead Development, Inc.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Shao Miller

unread,

Jan 17, 2013, 4:47:53 PM1/17/13

to

On 1/17/2013 10:55, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> On 1/16/2013 20:32, Geoff wrote:
> [...]
>>> Microsoft documents their compiler trap values:
>>> Value Name Description
>>> ------ -------- -------------------------
>>> 0xCD Clean Memory Allocated memory via malloc or new but never
>>> written by the application.
>>>
> [snip]
>>> 0xFEEEFEEE OS fill heap memory, which was marked for usage,
>>> but wasn't allocated by HeapAlloc() or LocalAlloc().
>>> Or that memory just has been freed by HeapFree().
>>
>> Wonderful, wonderful summary! I thought null pointers having 0x0C as
>> the least-significant byte was "a thing," too, but now I can't remember
>> having seen that documented anywhere.
>
> I'm fairly sure Microsoft uses all-bits-zero for null pointers. Most
> implementations do the same thing, though of course the standard doesn't
> require it.
>

I think all-bits-zero is one null pointer value representation, but I
was talking about "trap representations" in practice (as opposed to a
discussion of those that depend on padding bits). In Windows NT
kernel-land, more often than not I see that when a null pointer is
trapped, it's actually _not_ all-bits-zero; differing in the LSB. The
debugger still calls it a null pointer.

Keith Thompson

unread,

Jan 18, 2013, 11:41:51 AM1/18/13

to

A null pointer is not a trap representation; it's a perfectly valid
pointer value (that can't legally be dereferenced).

What exactly do you mean when you say "The debugger still calls it a
null pointer"? Does the debugger actually use the phrase "null pointer"
to refer to a pointer value that compares unequal to NULL?

Shao Miller

unread,

Jan 18, 2013, 3:53:17 PM1/18/13

to

Agreed. And that's kind of what I was getting at... That this
representation is:

1. A valid null pointer representation (behaves as a null pointer in C
code; no undefined behaviour from using this pointer value unless
dereferencing)

2. Very consistently _not_ all-bits-zero, suggesting a practical use for
a debugger to trap (as opposed to a same-sized integer/object with
all-zeroes)

> What exactly do you mean when you say "The debugger still calls it a
> null pointer"? Does the debugger actually use the phrase "null pointer"
> to refer to a pointer value that compares unequal to NULL?
>

It absolutely does. (I see it more often than I'd like, some days. ;) )
I suspect that it's an instance of trap representations in practice,
as opposed to "trap representations" in C theory.

Keith Thompson

unread,

Jan 18, 2013, 8:38:07 PM1/18/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/18/2013 11:41, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:

[...]

>>> I think all-bits-zero is one null pointer value representation, but I
>>> was talking about "trap representations" in practice (as opposed to a
>>> discussion of those that depend on padding bits). In Windows NT
>>> kernel-land, more often than not I see that when a null pointer is
>>> trapped, it's actually _not_ all-bits-zero; differing in the LSB. The
>>> debugger still calls it a null pointer.
>>
>> A null pointer is not a trap representation; it's a perfectly valid
>> pointer value (that can't legally be dereferenced).
>>
>
> Agreed. And that's kind of what I was getting at... That this
> representation is:
>
> 1. A valid null pointer representation (behaves as a null pointer in C
> code; no undefined behaviour from using this pointer value unless
> dereferencing)
>
> 2. Very consistently _not_ all-bits-zero, suggesting a practical use for
> a debugger to trap (as opposed to a same-sized integer/object with
> all-zeroes)

When you say it "behaves as a null pointer in C code", what exactly do
you mean by that?

Does it compare equal to NULL? If not, then it *doesn't* behave as a
null pointer in C code.

>> What exactly do you mean when you say "The debugger still calls it a
>> null pointer"? Does the debugger actually use the phrase "null pointer"
>> to refer to a pointer value that compares unequal to NULL?
>
> It absolutely does. (I see it more often than I'd like, some days. ;) )
> I suspect that it's an instance of trap representations in practice,
> as opposed to "trap representations" in C theory.

A lot of technical terms have specific meanings in C that may not apply
in other contexts. If a debugger refers to something other than a C
null pointer as a "null pointer", then (a) it's a pity that a tool that
deals with C source code uses a term in a manner that's inconsistent
with the C meaning of the term, but (b) it doesn't have much direct
bearing on C (i.e., on anything relevant to this newsgroup).

Can you point to some documentation that discusses this use of the term
"null pointer"?

glen herrmannsfeldt

unread,

Jan 18, 2013, 9:31:46 PM1/18/13

to

Keith Thompson <ks...@mib.org> wrote:
> Shao Miller <sha0....@gmail.com> writes:

(snip)

>> 2. Very consistently _not_ all-bits-zero, suggesting a practical use for
>> a debugger to trap (as opposed to a same-sized integer/object with
>> all-zeroes)

> When you say it "behaves as a null pointer in C code", what exactly do
> you mean by that?

> Does it compare equal to NULL? If not, then it *doesn't* behave as a
> null pointer in C code.

I can imagine a system that masks off some bits before comparing
it to NULL. Not that I know any that actually do it.

(snip)

> A lot of technical terms have specific meanings in C that may not apply
> in other contexts. If a debugger refers to something other than a C
> null pointer as a "null pointer", then (a) it's a pity that a tool that
> deals with C source code uses a term in a manner that's inconsistent
> with the C meaning of the term, but (b) it doesn't have much direct
> bearing on C (i.e., on anything relevant to this newsgroup).

> Can you point to some documentation that discusses this use of the term
> "null pointer"?

-- glen

Shao Miller

unread,

Jan 18, 2013, 11:02:30 PM1/18/13

to

Yeah, come to think of it, you're right: I guess it wouldn't've behaved
_exactly_ the same.

>>> What exactly do you mean when you say "The debugger still calls it a
>>> null pointer"? Does the debugger actually use the phrase "null pointer"
>>> to refer to a pointer value that compares unequal to NULL?
>>
>> It absolutely does. (I see it more often than I'd like, some days. ;) )
>> I suspect that it's an instance of trap representations in practice,
>> as opposed to "trap representations" in C theory.
>
> A lot of technical terms have specific meanings in C that may not apply
> in other contexts. If a debugger refers to something other than a C
> null pointer as a "null pointer", then (a) it's a pity that a tool that
> deals with C source code uses a term in a manner that's inconsistent
> with the C meaning of the term, but (b) it doesn't have much direct
> bearing on C (i.e., on anything relevant to this newsgroup).
>

Well I have to offer partial disagreement with (a). I think that the
practical usage of some terms finds Standard C in a good position to
consider adjusting its definitions. :) WinDbg can handle C++ and
assembly too, so the terms it uses seem more likely to be "as practiced"
rather than "as preached".

> Can you point to some documentation that discusses this use of the term
> "null pointer"?
>

On 1/16/2013 20:57, Shao Miller wrote:
> I thought null pointers having 0x0C as the least-significant byte was "a thing," too, but now I can't remember having seen that documented anywhere.

Sorry for the confusion. I was talking about a recollection, as shown
in the sentence, above. What I'm recalling is associated with
"NULL_CLASS_PTR_DEREFERENCE", but please see the sentence above,
regarding documentation.

I thought I remembered the LSB 0x0C pattern was used as "trap
representation" for "null pointers", but where neither of these terms
are quite what they are in C. If I ever come across such documentation,
I'll share it. Maybe Geoff has a clue?

Shao Miller

unread,

Jan 18, 2013, 11:15:49 PM1/18/13

to

On 1/18/2013 21:31, glen herrmannsfeldt wrote:
> Keith Thompson <ks...@mib.org> wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>
> (snip)
>>> 2. Very consistently _not_ all-bits-zero, suggesting a practical use for
>>> a debugger to trap (as opposed to a same-sized integer/object with
>>> all-zeroes)
>
>> When you say it "behaves as a null pointer in C code", what exactly do
>> you mean by that?
>
>> Does it compare equal to NULL? If not, then it *doesn't* behave as a
>> null pointer in C code.
>
> I can imagine a system that masks off some bits before comparing
> it to NULL. Not that I know any that actually do it.
>

Yes well that's along the lines that it might be. For example, if a
'free'd pointer becomes 0x0000000C, in C terminology, it'd be an
"indeterminate value." But, using C terminology again, it'd be an
"unspecified value" rather than a "trap representation", as the value
can be read and passed around. In C terminology, it would "point to no
object", just as a "null pointer" does. Upon dereferencing though,
Windows could tell from the representation just what the problem was:
Use of a 'free'd pointer. And yes, the higher bits might indicate that
it was a pointer that points to no object ("null class", perhaps we
could call it).

Philip Lantz

unread,

Jan 19, 2013, 4:28:22 AM1/19/13

to

Keith Thompson wrote:

> Shao Miller writes:
> > Keith Thompson wrote:

> >> Shao Miller writes:
> >>> I think all-bits-zero is one null pointer value representation,
> >>> but I was talking about "trap representations" in practice (as
> >>> opposed to a discussion of those that depend on padding bits).
> >>> In Windows NT kernel-land, more often than not I see that when
> >>> a null pointer is trapped, it's actually _not_ all-bits-zero;
> >>> differing in the LSB. The debugger still calls it a null pointer.
> >>
> >> A null pointer is not a trap representation; it's a perfectly valid
> >> pointer value (that can't legally be dereferenced).
> >
> > Agreed. And that's kind of what I was getting at... That this
> > representation is:
> >
> > 1. A valid null pointer representation (behaves as a null pointer in C
> > code; no undefined behaviour from using this pointer value unless
> > dereferencing)
> >
> > 2. Very consistently _not_ all-bits-zero, suggesting a practical use for
> > a debugger to trap (as opposed to a same-sized integer/object with
> > all-zeroes)
>
> When you say it "behaves as a null pointer in C code", what exactly do
> you mean by that?
>
> Does it compare equal to NULL? If not, then it *doesn't* behave as a
> null pointer in C code.

I would guess that he's seeing something like the following:

struct {
int a, b, c, d;
} *p = NULL;

p->d = 0;

This traps in the debugger, and the debugger reports a "null pointer
dereference" at address 0x0000000c.

Shao Miller

unread,

Jan 19, 2013, 12:08:13 PM1/19/13

to

Indeed! Or, perhaps more commonly:

typedef struct { UINT32 A, B, C, D; } FOO, * PFOO;

PUINT32 GetFooD(PFOO foo) {
return &foo->D;
}

(I don't particularly like pointer typedefs, but pretend they're
Microsoft's.)

Then the caller dereferences the returned pointer, or passes it around
until some other function does. That is, I find that it's not usually
quite as obvious as in your example, and sometimes due to opacity or
abstraction.

(And thanks for the set-up for the function name; I'm hungry. :) )

Keith Thompson

unread,

Jan 19, 2013, 11:03:36 PM1/19/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/18/2013 21:31, glen herrmannsfeldt wrote:
>> Keith Thompson <ks...@mib.org> wrote:
>>> Shao Miller <sha0....@gmail.com> writes:
>>
>> (snip)
>>>> 2. Very consistently _not_ all-bits-zero, suggesting a practical use for
>>>> a debugger to trap (as opposed to a same-sized integer/object with
>>>> all-zeroes)
>>
>>> When you say it "behaves as a null pointer in C code", what exactly do
>>> you mean by that?
>>
>>> Does it compare equal to NULL? If not, then it *doesn't* behave as a
>>> null pointer in C code.
>>
>> I can imagine a system that masks off some bits before comparing
>> it to NULL. Not that I know any that actually do it.
>
> Yes well that's along the lines that it might be. For example, if a
> 'free'd pointer becomes 0x0000000C, in C terminology, it'd be an
> "indeterminate value."

[...]

How does a free'd pointer *become* anything?

For example:

int *p = malloc(sizeof *p);
// Assume malloc() succeeded
// Assume the representation of p, when viewed as
// bytes, is 0x12, 0x34, 0x56, 0x78
free(p);
// Any attempt to refer to the value of p has undefined behavior.
// But if you examine its representation, say by type-punning it
// as an array of unsigned char, it will still look like
// 0x12, 0x34, 0x56, 0x78

The argument to free() is passed *by value*, so free() can't modify the
contents of the pointer object. The argument needn't even be an lvalue;
free(p + 1 - 1) is perfectly valid.

An implementation *could* do some sort of compiler magic, changing the
representation of a pointer object passed to free(), but (a) I don't
think it would be worth the effort, and (b) it's not clear that it would
even be conforming (the bytes making up the representation of the
pointer are objects in their own right, whose values don't change unless
you write to them). In any case, I'm not aware that Microsoft's
implementation does this -- nor am I aware that it implements "==" on
pointers so that anything other that all-bits-zero would compare equal
to NULL.

There are a lot of strange things an implementation *could* do, but the
topic at hand is what a particular implementation actually does.
Storing some specific byte sequence in uninitialized objects is a great
idea, but it doesn't have much to do with null pointers as C defines the
term.

Shao Miller

unread,

Jan 20, 2013, 2:19:43 AM1/20/13

to

On 1/19/2013 23:03, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> On 1/18/2013 21:31, glen herrmannsfeldt wrote:
>>> Keith Thompson <ks...@mib.org> wrote:
>>>> Shao Miller <sha0....@gmail.com> writes:
>>>
>>> (snip)
>>>>> 2. Very consistently _not_ all-bits-zero, suggesting a practical use for
>>>>> a debugger to trap (as opposed to a same-sized integer/object with
>>>>> all-zeroes)
>>>
>>>> When you say it "behaves as a null pointer in C code", what exactly do
>>>> you mean by that?
>>>
>>>> Does it compare equal to NULL? If not, then it *doesn't* behave as a
>>>> null pointer in C code.
>>>
>>> I can imagine a system that masks off some bits before comparing
>>> it to NULL. Not that I know any that actually do it.
>>
>> Yes well that's along the lines that it might be. For example, if a
>> 'free'd pointer becomes 0x0000000C, in C terminology, it'd be an
>> "indeterminate value."
> [...]
>
> How does a free'd pointer *become* anything?
>

The value of a pointer becomes indeterminate at the end of the
pointed-to object's lifetime. Defect Report #260 discusses this.

Windows "Checked" builds are more easily debugged than their "Free"
counterparts. I don't think a freed pointer's representation changing
is beyond the realm of either possible or useful. But "might," above,
means that this was just a guess, which now seems likely to be wrong,
thanks to Mr. Philip Lantz' more obvious explanation.

However, the masking of bits that Mr. Glen Herrmannsfeldt mentioned
still seems about right. Here is another guess: The "null class
pointer" that WinDbg talks about is some reasonably-sized window around
the unsigned value 0. Given a null pointer of all-zeroes, the
'CONTAINING_RECORD' macro could yield, let's say, 0xFFFFFFF0. Mr.
Lantz' example shows the other direction.

> [...]

>
> There are a lot of strange things an implementation *could* do, but the
> topic at hand is what a particular implementation actually does.
> Storing some specific byte sequence in uninitialized objects is a great
> idea, but it doesn't have much to do with null pointers as C defines the
> term.
>

Which of these most closely resembles the pointer value 0x0000000C in
this Windows scenario?:

1. A trap representation for some pointer type, invoking UB when read
2. A valid value for some pointer type; pointing to an object
3. A null pointer; pointing to no object, invoking UB when dereferenced

If it doesn't compare equal to 'NULL', then it doesn't appear to match
any of these. But if we can fuzz our way from Standard theory to
real-world practice, we might note some similarities to #3.

Perhaps interestingly, Mr. Lantz' explanation actually means the
representation gives hints about which member of a structure was
involved in a bug. In Windows, many "object types" (not to be confused
with the C notions) are C structures beginning with a common initial
sequence. A "bad" pointer might reveal which member of this common
initial sequence was involved in a bug. :) This isn't the "pointer was
freed" guess of above, but still useful!

(By the way, I've asked in another forum about possible documentation
for NULL_CLASS_PTR_DEREFERENCE, although I don't know if you or anyone
else cares.)

Keith Thompson

unread,

Jan 20, 2013, 10:57:30 AM1/20/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/19/2013 23:03, Keith Thompson wrote:
[...]
>> There are a lot of strange things an implementation *could* do, but the
>> topic at hand is what a particular implementation actually does.
>> Storing some specific byte sequence in uninitialized objects is a great
>> idea, but it doesn't have much to do with null pointers as C defines the
>> term.
>
> Which of these most closely resembles the pointer value 0x0000000C in
> this Windows scenario?:
>
> 1. A trap representation for some pointer type, invoking UB when read
> 2. A valid value for some pointer type; pointing to an object
> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>
> If it doesn't compare equal to 'NULL', then it doesn't appear to match
> any of these. But if we can fuzz our way from Standard theory to
> real-world practice, we might note some similarities to #3.

It could easily be #1. For example, if a type `struct foo` has a
member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
that has a null pointer value, then `&(ptr->bar)` could easily have
a representation that looks like 0x0000000C.

[...]

> (By the way, I've asked in another forum about possible documentation
> for NULL_CLASS_PTR_DEREFERENCE, although I don't know if you or anyone
> else cares.)

The name NULL_CLASS_PTR_DEREFERENCE wouldn't necessarily refer to
a *null pointer*.

I don't think there's anything strange going on here. None of the
exotic possibilities permitted by the standard (null pointers with a
representation other than all-bits-zero, pointer objects changing
representation after being passed to free(), pointers with different
representations appear to be equal) appear to be happening.

Shao Miller

unread,

Jan 20, 2013, 3:25:38 PM1/20/13

to

On 1/20/2013 10:57, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>>

>> Which of these most closely resembles the pointer value 0x0000000C in
>> this Windows scenario?:
>>
>> 1. A trap representation for some pointer type, invoking UB when read
>> 2. A valid value for some pointer type; pointing to an object
>> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>>
>> If it doesn't compare equal to 'NULL', then it doesn't appear to match
>> any of these. But if we can fuzz our way from Standard theory to
>> real-world practice, we might note some similarities to #3.
>
> It could easily be #1. For example, if a type `struct foo` has a
> member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
> that has a null pointer value, then `&(ptr->bar)` could easily have
> a representation that looks like 0x0000000C.
>

It could be #1, but it isn't #1 in this Windows scenario. Using such a
pointer value doesn't yield undefined behaviour until it's dereferenced.
It'll still compare as unequal with valid pointer values, can still be
assigned to a pointer, etc.

> [...]
>
>> (By the way, I've asked in another forum about possible documentation
>> for NULL_CLASS_PTR_DEREFERENCE, although I don't know if you or anyone
>> else cares.)
>
> The name NULL_CLASS_PTR_DEREFERENCE wouldn't necessarily refer to
> a *null pointer*.
>

It depends on the definition. For the C definition, if it doesn't
compare equal to 'NULL', then it's probably not a null pointer, as you
pointed out earlier. For WinDbg users, it probably is. Hopefully I'll
find out just what Microsoft has to say about it.

> I don't think there's anything strange going on here. None of the
> exotic possibilities permitted by the standard (null pointers with a
> representation other than all-bits-zero, pointer objects changing
> representation after being passed to free(), pointers with different
> representations appear to be equal) appear to be happening.
>

I agree with you about this; seems pretty straight-forward. I had
wondered about the observed representation 0x0000000C (given Geoff's
post about trap representations[1]), but I'm sure Mr. Lantz' explanation
is right and that this is not such a case.

[1] (Not to be confused with the C definition, but simply meaning a
representation which the debugger can trap/detect.)

Ben Bacarisse

unread,

Jan 20, 2013, 6:58:39 PM1/20/13

to

Shao Miller <sha0....@gmail.com> writes:

> On 1/20/2013 10:57, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>>
>>> Which of these most closely resembles the pointer value 0x0000000C in
>>> this Windows scenario?:
>>>
>>> 1. A trap representation for some pointer type, invoking UB when read
>>> 2. A valid value for some pointer type; pointing to an object
>>> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>>>
>>> If it doesn't compare equal to 'NULL', then it doesn't appear to match
>>> any of these. But if we can fuzz our way from Standard theory to
>>> real-world practice, we might note some similarities to #3.
>>
>> It could easily be #1. For example, if a type `struct foo` has a
>> member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
>> that has a null pointer value, then `&(ptr->bar)` could easily have
>> a representation that looks like 0x0000000C.
>>
>
> It could be #1, but it isn't #1 in this Windows scenario. Using such
> a pointer value doesn't yield undefined behaviour until it's
> dereferenced. It'll still compare as unequal with valid pointer
> values, can still be assigned to a pointer, etc.

All of that seems to be entirely consistent with #1.

Talking about whether something is UB in "this Windows scenario" looks
very odd to me because the term UB is defined by the language, not the
implementation. If the language says something is UB then it is and you
can't conclude from the behaviour of an implementation that it isn't.
All behaviours are consistent with UB.

<snip>
--
Ben.

Philip Lantz

unread,

Jan 20, 2013, 7:48:03 PM1/20/13

to

Shao Miller wrote:
> Keith Thompson wrote:

> > Shao Miller writes:
> >>
> >> Which of these most closely resembles the pointer value 0x0000000C in
> >> this Windows scenario?:
> >>
> >> 1. A trap representation for some pointer type, invoking UB when read
> >> 2. A valid value for some pointer type; pointing to an object
> >> 3. A null pointer; pointing to no object, invoking UB when dereferenced
> >>
> >> If it doesn't compare equal to 'NULL', then it doesn't appear to match
> >> any of these. But if we can fuzz our way from Standard theory to
> >> real-world practice, we might note some similarities to #3.
> >
> > It could easily be #1. For example, if a type `struct foo` has a
> > member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
> > that has a null pointer value, then `&(ptr->bar)` could easily have
> > a representation that looks like 0x0000000C.
>
> It could be #1, but it isn't #1 in this Windows scenario. Using such a
> pointer value doesn't yield undefined behaviour until it's dereferenced.
> It'll still compare as unequal with valid pointer values, can still be
> assigned to a pointer, etc.

The undefined behavior occurred when &ptr->bar was executed (with ptr
equal to NULL). You cannot refer to the language definition to make
sense of anything that happens after that.

Keith Thompson

unread,

Jan 20, 2013, 8:07:30 PM1/20/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/20/2013 10:57, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>>
>>> Which of these most closely resembles the pointer value 0x0000000C in
>>> this Windows scenario?:
>>>
>>> 1. A trap representation for some pointer type, invoking UB when read
>>> 2. A valid value for some pointer type; pointing to an object
>>> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>>>
>>> If it doesn't compare equal to 'NULL', then it doesn't appear to match
>>> any of these. But if we can fuzz our way from Standard theory to
>>> real-world practice, we might note some similarities to #3.
>>
>> It could easily be #1. For example, if a type `struct foo` has a
>> member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
>> that has a null pointer value, then `&(ptr->bar)` could easily have
>> a representation that looks like 0x0000000C.
>>
>
> It could be #1, but it isn't #1 in this Windows scenario.

I believe it is.

> Using such a
> pointer value doesn't yield undefined behaviour until it's dereferenced.

Yes, it does. Concretely:

int main(void) {
struct foo {
char c[12];
int i;
};

struct foo *fptr = NULL;

int *iptr = &(fptr->i);

return 0;
}

The evaluation of the expression `&(fptr->i)` has undefined behavior.
If you disagree, please show me where the standard defines its
behavior.

If the observed behavior is that the representation 0x0000000C is stored
in `iptr`, that doesn't mean the behavior is defined.

> It'll still compare as unequal with valid pointer values, can still be
> assigned to a pointer, etc.

It happens to do so in the implementation we're discussing, and probably
in most implementations.

>> [...]
>>
>>> (By the way, I've asked in another forum about possible documentation
>>> for NULL_CLASS_PTR_DEREFERENCE, although I don't know if you or anyone
>>> else cares.)
>>
>> The name NULL_CLASS_PTR_DEREFERENCE wouldn't necessarily refer to
>> a *null pointer*.
>
> It depends on the definition. For the C definition, if it doesn't
> compare equal to 'NULL', then it's probably not a null pointer, as you
> pointed out earlier.

A pointer that doesn't compare equal to NULL *definitely* isn't a null
pointer, by the definition of pointer equality in N1570 6.5.9p6.
(Unless the program's behavior is undefined for other reasons, in which
case anything goes.)

> For WinDbg users, it probably is. Hopefully I'll
> find out just what Microsoft has to say about it.

I don't believe so. As far as I can see, the only evidence you've
presented that this thing is a "null pointer" is the identifier
"NULL_CLASS_PTR_DEREFERENCE".

What I believe that name implies is that the pointer value whose
representation looks like 0x0000000C was most likely created as
the result of a null pointer. It doesn't imply that 0x0000000C
is itself a null pointer. Nobody other than you is saying that.
(Unless you can produce some Microsoft documentation that uses the
phrase "null pointer" -- with no other words between "null" and
"pointer" -- to refer to such a pointer value.)

Shao Miller

unread,

Jan 20, 2013, 9:30:04 PM1/20/13

to

On 1/20/2013 18:58, Ben Bacarisse wrote:
> Shao Miller <sha0....@gmail.com> writes:
>
>> On 1/20/2013 10:57, Keith Thompson wrote:
>>> Shao Miller <sha0....@gmail.com> writes:
>>>>
>>>> Which of these most closely resembles the pointer value 0x0000000C in
>>>> this Windows scenario?:
>>>>
>>>> 1. A trap representation for some pointer type, invoking UB when read
>>>> 2. A valid value for some pointer type; pointing to an object
>>>> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>>>>
>>>> If it doesn't compare equal to 'NULL', then it doesn't appear to match
>>>> any of these. But if we can fuzz our way from Standard theory to
>>>> real-world practice, we might note some similarities to #3.
>>>
>>> It could easily be #1. For example, if a type `struct foo` has a
>>> member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
>>> that has a null pointer value, then `&(ptr->bar)` could easily have
>>> a representation that looks like 0x0000000C.
>>>
>>
>> It could be #1, but it isn't #1 in this Windows scenario. Using such
>> a pointer value doesn't yield undefined behaviour until it's
>> dereferenced. It'll still compare as unequal with valid pointer
>> values, can still be assigned to a pointer, etc.
>
> All of that seems to be entirely consistent with #1.
>

I made the mistake of saying "undefined behaviour" there, instead of
"unexpected behaviour." My point was that the unexpected only manifests
from a _dereference_, rather than a _read_.

> Talking about whether something is UB in "this Windows scenario" looks
> very odd to me because the term UB is defined by the language, not the
> implementation. If the language says something is UB then it is and you
> can't conclude from the behaviour of an implementation that it isn't.
>

> <snip>
>

Agreed, and it's supposed to be odd. This discussion is pretty useless
though, because no matter how many times I try to suggest considering "C
definitions" versus "practiced usage," nobody quite seems to; possibly
because the latter is uninteresting to them, or because this Windows
case is uninteresting to them, or because I make mistakes like the above
without qualifying each term as "not to be confused with the C
definition". I thought it would be an easy discussion; it isn't. My
mistake.

And agreed again; we all [ought to] know:

> All behaviours are consistent with UB.

Thankfully, 3 people have now made it abundantly clear that comparing C
terminology to subjects outside of C is not going to be fruitful. :)

Shao Miller

unread,

Jan 20, 2013, 9:33:47 PM1/20/13

to

Agreed; my mistake. I should not have typed "undefined behaviour," but
"unexpected behaviour." I guess it wasn't obvious that I was trying to
discuss a bit "outside the box" from the C definitions.

As you point out:

&ptr->bar
&((*ptr).bar)
^^^^------ C UB

Ben Bacarisse

unread,

Jan 20, 2013, 9:53:22 PM1/20/13

to

How far back? In how many posts must I substitute "unexpected
behaviour" for "undefined behaviour" in order to determine your meaning?
Have you said what "unexpected behaviour" is? Does it depend on who is
doing the expecting? Does any meaning survive after addressing these
questions?

>> Talking about whether something is UB in "this Windows scenario" looks
>> very odd to me because the term UB is defined by the language, not the
>> implementation. If the language says something is UB then it is and you
>> can't conclude from the behaviour of an implementation that it isn't.
>>
>> <snip>
>
> Agreed, and it's supposed to be odd. This discussion is pretty
> useless though, because no matter how many times I try to suggest
> considering "C definitions" versus "practiced usage," nobody quite
> seems to; possibly because the latter is uninteresting to them, or
> because this Windows case is uninteresting to them, or because I make
> mistakes like the above without qualifying each term as "not to be
> confused with the C definition". I thought it would be an easy
> discussion; it isn't. My mistake.

But you seem to make the same mistake when explicitly using C's terms
and without any of the "unexpected/undefined" madness above. E.g.:

| For example, if a free'd pointer becomes 0x0000000C, in C terminology,

| it'd be an "indeterminate value." But, using C terminology again,
| it'd be an "unspecified value" rather than a "trap representation", as
| the value can be read and passed around.

This suggest that you think it can't be a trap representation because it
can be read and passed around. A trap representation is nothing more
than a representation that does not denote a valid value for the type
in question. The effect of using it can be entirely benign.

> And agreed again; we all [ought to] know:
>
>> All behaviours are consistent with UB.
>
> Thankfully, 3 people have now made it abundantly clear that comparing
> C terminology to subjects outside of C is not going to be fruitful. :)

Are you not using C's terms in the quote above?

--
Ben.

Shao Miller

unread,

Jan 20, 2013, 10:20:24 PM1/20/13

to

On 1/20/2013 20:07, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> On 1/20/2013 10:57, Keith Thompson wrote:
>>> Shao Miller <sha0....@gmail.com> writes:
>>>>
>>>> Which of these most closely resembles the pointer value 0x0000000C in
>>>> this Windows scenario?:
>>>>
>>>> 1. A trap representation for some pointer type, invoking UB when read
>>>> 2. A valid value for some pointer type; pointing to an object
>>>> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>>>>
>>>> If it doesn't compare equal to 'NULL', then it doesn't appear to match
>>>> any of these. But if we can fuzz our way from Standard theory to
>>>> real-world practice, we might note some similarities to #3.
>>>
>>> It could easily be #1. For example, if a type `struct foo` has a
>>> member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
>>> that has a null pointer value, then `&(ptr->bar)` could easily have
>>> a representation that looks like 0x0000000C.
>>>
>>
>> It could be #1, but it isn't #1 in this Windows scenario.
>
> I believe it is.
>

I'm very glad you think so. I guess that means it resembles Geoff's
explanation of "trap representations" (not to be confused with the C
definition).

>> Using such a
>> pointer value doesn't yield undefined behaviour until it's dereferenced.
>
> Yes, it does. Concretely:
>
> int main(void) {
> struct foo {
> char c[12];
> int i;
> };
>
> struct foo *fptr = NULL;
>
> int *iptr = &(fptr->i);
>
> return 0;
> }
>
> The evaluation of the expression `&(fptr->i)` has undefined behavior.
> If you disagree, please show me where the standard defines its
> behavior.
>
> If the observed behavior is that the representation 0x0000000C is stored
> in `iptr`, that doesn't mean the behavior is defined.
>

My mistake. I should have either stated that I was talking about a
fuzzy, as-observed version of "undefined behaviour", or should have
called it "unexpected behaviour". You are obviously right.

>> It'll still compare as unequal with valid pointer values, can still be
>> assigned to a pointer, etc.
>
> It happens to do so in the implementation we're discussing, and probably
> in most implementations.
>

Sure.

>>> [...]
>>>
>>>> (By the way, I've asked in another forum about possible documentation
>>>> for NULL_CLASS_PTR_DEREFERENCE, although I don't know if you or anyone
>>>> else cares.)
>>>
>>> The name NULL_CLASS_PTR_DEREFERENCE wouldn't necessarily refer to
>>> a *null pointer*.
>>
>> It depends on the definition. For the C definition, if it doesn't
>> compare equal to 'NULL', then it's probably not a null pointer, as you
>> pointed out earlier.
>
> A pointer that doesn't compare equal to NULL *definitely* isn't a null
> pointer, by the definition of pointer equality in N1570 6.5.9p6.
> (Unless the program's behavior is undefined for other reasons, in which
> case anything goes.)
>

I don't quite understand your use of "definitely" when I had said "If it

doesn't compare equal to 'NULL', then it doesn't appear to match any of
these. But if we can fuzz our way from Standard theory to real-world

practice..." Was this unclear?

But anyway, sure. So you'd say that this dissimilarity weighs enough
that it's closer to #1, and that it's not particularly relevant or
interesting to ask which of the three is a close match, because of the
earlier undefined behaviour. Ok!

As I mentioned before, one of the difficulties for a _programmer_ is
that sometimes _they_get_ these pointers from OS functions, without
having obviously invoked C's UB. Because they can read, assign, and
pass these pointers without unexpected behaviour, it's not like a
signalling NaN, or an integer with improper parity.

>> For WinDbg users, it probably is. Hopefully I'll
>> find out just what Microsoft has to say about it.
>
> I don't believe so. As far as I can see, the only evidence you've
> presented that this thing is a "null pointer" is the identifier
> "NULL_CLASS_PTR_DEREFERENCE".
>
> What I believe that name implies is that the pointer value whose
> representation looks like 0x0000000C was most likely created as
> the result of a null pointer. It doesn't imply that 0x0000000C
> is itself a null pointer. Nobody other than you is saying that.
> (Unless you can produce some Microsoft documentation that uses the
> phrase "null pointer" -- with no other words between "null" and
> "pointer" -- to refer to such a pointer value.)
>

I don't know what you think I'm trying to prove. Are you pointing out
that when I said "WinDbg absolutely calls it a null pointer," I haven't
backed that up?

When programming for the Windows NT kernel, if a pointer is C's 'NULL'
or 0x0000000C, it crashes the computer when it is used for indirect
access, as neither refers to a valid C object. Reading, storing,
passing these values do not crash the computer. What is the practical
difference for an NT driver developer? That they don't compare as equal
with value but only with behaviour? Comparing with 'NULL' yields a
false negative relative to the goal of indirect access. No?

Shao Miller

unread,

Jan 21, 2013, 12:05:49 AM1/21/13

to

Probably 12 posts back from this one I'm typing now; sorry.

> In how many posts must I substitute "unexpected
> behaviour" for "undefined behaviour" in order to determine your meaning?

5 of mine; sorry.

> Have you said what "unexpected behaviour" is?

Not explicitly, no. I thought it would be obvious that it was the
context of a Windows debugger notifying the programmer, since Geoff had
just discussed that.

> Does it depend on who is
> doing the expecting?

Not really. If the program operates as expected without the debugger
interrupting, that's probably commonly desired by different programmers.

> Does any meaning survive after addressing these
> questions?
>

Yes, I think so. Having seen these answers, do you? :)

>>> Talking about whether something is UB in "this Windows scenario" looks
>>> very odd to me because the term UB is defined by the language, not the
>>> implementation. If the language says something is UB then it is and you
>>> can't conclude from the behaviour of an implementation that it isn't.
>>>
>>> <snip>
>>
>> Agreed, and it's supposed to be odd. This discussion is pretty
>> useless though, because no matter how many times I try to suggest
>> considering "C definitions" versus "practiced usage," nobody quite
>> seems to; possibly because the latter is uninteresting to them, or
>> because this Windows case is uninteresting to them, or because I make
>> mistakes like the above without qualifying each term as "not to be
>> confused with the C definition". I thought it would be an easy
>> discussion; it isn't. My mistake.
>
> But you seem to make the same mistake when explicitly using C's terms
> and without any of the "unexpected/undefined" madness above. E.g.:
>
> | For example, if a free'd pointer becomes 0x0000000C, in C terminology,
> | it'd be an "indeterminate value." But, using C terminology again,
> | it'd be an "unspecified value" rather than a "trap representation", as
> | the value can be read and passed around.
>
> This suggest that you think it can't be a trap representation because it
> can be read and passed around. A trap representation is nothing more
> than a representation that does not denote a valid value for the type
> in question. The effect of using it can be entirely benign.
>

I'm sorry you interpreted it that way. Would it be best to surround
every paragraph of every response with context, to prevent
misunderstanding? In that quote, I thought it was obvious that this was
in the context of using a Microsoft debugger, where the behaviour would
agree with the semantics of an "unspecified value"; not warning the
programmer during a read of the value (which it _could_).

In the summary below, I hope it's clear that a recurring theme was to
establish 0x0000000C as some kind of a trap value, so the quote above
doesn't match with that without deciding that it needs additional context.

Here's a summary of how the conversation has gone, from my perspective.
All quotes are fictional, and names have been omitted:

Me: "If an uninitialized 'unsigned char' can have a trap representation,
then...[something]"

A: "Whatever it has isn't a trap representation, you idiot. It might
not even have a value."

Me: "Why wouldn't it have a value? Is it because it might not
successfully translate? Ok, what about in this code example where it
does translate?"

B: "What would you say it has?"

Me: "An [C] indeterminate value and a magic property \"uninitialized\".
Here's some history, since you asked what I would say..."

B: "Here are Microsoft's documented [non-C] trap values..."

Me: "Nice summary! Say, isn't there another one, 0x0000000C, as a
[non-C] trap for [non-C] null pointers? I don't remember..."

C: "Oh no, that wouldn't be a [C] null pointer."

Me: "Yes, well I was talking about B's [non-C] trap values. I see a
Microsoft debugger catch these things and call them [non-C] null pointers."

C: "A [C] null pointer isn't a [C] trap representation. Does this thing
behave like a [C] null pointer?"

Me (intended): "Well it only behaves badly upon dereferencing."

C: "If it's not a [C] null pointer, then it shouldn't be called one, and
it's off-topic."

D: "There could be multiple [C] null pointer representations."

Me (missing C's cue to drop it): "That could be. Maybe the bits have
something to do with useful debug-info, such as a 'free'd pointer."

C: "That seems unlikely, due to pass-by-value. We're still not talking
about [C] null pointers if it doesn't behave like a [C] null pointer."

Me: "Agreed; it doesn't match. So what's the closest thing,
behaviour-wise? It seems like a [C] null pointer, in some ways."

C: "It could be closest to a [C] trap representation. It seems pretty
straight-forward."

Me (intended): "Well for those who observe 0x0000000C in a Microsoft
debugger, the program behaves well until it's dereferenced, so it's not
closest to a [C] trap representation, but to a [C] null pointer."

You: "It could be closest to a [C] trap representation. And don't say
\"undefined behavour\" if you don't mean it."

Me: "Oops. True."

>> And agreed again; we all [ought to] know:
>>
>>> All behaviours are consistent with UB.
>>
>> Thankfully, 3 people have now made it abundantly clear that comparing
>> C terminology to subjects outside of C is not going to be fruitful. :)
>
> Are you not using C's terms in the quote above?
>

Yes I was, but it only makes for confusion, so I'll try to avoid that.

Keith Thompson

unread,

Jan 21, 2013, 1:02:55 AM1/21/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/20/2013 20:07, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>> On 1/20/2013 10:57, Keith Thompson wrote:
>>>> Shao Miller <sha0....@gmail.com> writes:
>>>>>
>>>>> Which of these most closely resembles the pointer value 0x0000000C in
>>>>> this Windows scenario?:
>>>>>
>>>>> 1. A trap representation for some pointer type, invoking UB when read
>>>>> 2. A valid value for some pointer type; pointing to an object
>>>>> 3. A null pointer; pointing to no object, invoking UB when dereferenced
>>>>>
>>>>> If it doesn't compare equal to 'NULL', then it doesn't appear to match
>>>>> any of these. But if we can fuzz our way from Standard theory to
>>>>> real-world practice, we might note some similarities to #3.
>>>>
>>>> It could easily be #1. For example, if a type `struct foo` has a
>>>> member `bar` at offset 12 (0xC), and a `struct foo*` object `ptr`
>>>> that has a null pointer value, then `&(ptr->bar)` could easily have
>>>> a representation that looks like 0x0000000C.
>>>>
>>>
>>> It could be #1, but it isn't #1 in this Windows scenario.
>>
>> I believe it is.
>
> I'm very glad you think so. I guess that means it resembles Geoff's
> explanation of "trap representations" (not to be confused with the C
> definition).

I'm not going to go back in this thread to figure out what Geoff's
explanation of "trap representations" is supposed to be.

[...]

>>>>> (By the way, I've asked in another forum about possible documentation
>>>>> for NULL_CLASS_PTR_DEREFERENCE, although I don't know if you or anyone
>>>>> else cares.)
>>>>
>>>> The name NULL_CLASS_PTR_DEREFERENCE wouldn't necessarily refer to
>>>> a *null pointer*.
>>>
>>> It depends on the definition. For the C definition, if it doesn't
>>> compare equal to 'NULL', then it's probably not a null pointer, as you
>>> pointed out earlier.
>>
>> A pointer that doesn't compare equal to NULL *definitely* isn't a null
>> pointer, by the definition of pointer equality in N1570 6.5.9p6.
>> (Unless the program's behavior is undefined for other reasons, in which
>> case anything goes.)
>
> I don't quite understand your use of "definitely" when I had said "If it
> doesn't compare equal to 'NULL', then it doesn't appear to match any of
> these. But if we can fuzz our way from Standard theory to real-world
> practice..." Was this unclear?

There's nothing fuzzy about any of this. A pointer value is a null
pointer if and only if it compares equal to NULL.

> But anyway, sure. So you'd say that this dissimilarity weighs enough
> that it's closer to #1, and that it's not particularly relevant or
> interesting to ask which of the three is a close match, because of the
> earlier undefined behaviour. Ok!
>
> As I mentioned before, one of the difficulties for a _programmer_ is
> that sometimes _they_get_ these pointers from OS functions, without
> having obviously invoked C's UB. Because they can read, assign, and
> pass these pointers without unexpected behaviour, it's not like a
> signalling NaN, or an integer with improper parity.

The C standard does not define the behavior of OS functions. Therefore
a call to an OS function has undefined behavior.

>>> For WinDbg users, it probably is. Hopefully I'll
>>> find out just what Microsoft has to say about it.
>>
>> I don't believe so. As far as I can see, the only evidence you've
>> presented that this thing is a "null pointer" is the identifier
>> "NULL_CLASS_PTR_DEREFERENCE".
>>
>> What I believe that name implies is that the pointer value whose
>> representation looks like 0x0000000C was most likely created as
>> the result of a null pointer. It doesn't imply that 0x0000000C
>> is itself a null pointer. Nobody other than you is saying that.
>> (Unless you can produce some Microsoft documentation that uses the
>> phrase "null pointer" -- with no other words between "null" and
>> "pointer" -- to refer to such a pointer value.)
>
> I don't know what you think I'm trying to prove. Are you pointing out
> that when I said "WinDbg absolutely calls it a null pointer," I haven't
> backed that up?

I don't recall you using those exact words, but yes, if you're saying
that WinDbg calls it a null pointer, you haven't backed it up.

Are you in fact claiming that WinDbg calls this a "null pointer"? If
so, please prove it.

> When programming for the Windows NT kernel, if a pointer is C's 'NULL'
> or 0x0000000C, it crashes the computer when it is used for indirect
> access, as neither refers to a valid C object.

So there's some similarity of behavior between a null pointer
(0x00000000 for this implementaiton) and a particular non-null pointer
representation (0x0000000C).

> Reading, storing,
> passing these values do not crash the computer.

One of the myriad possible consequences of undefined behavior.

> What is the practical
> difference for an NT driver developer? That they don't compare as equal
> with value but only with behaviour? Comparing with 'NULL' yields a
> false negative relative to the goal of indirect access. No?

No.

Ignoring for the moment the fact that you can't create or access
that value without having invoked undefined behavior:

The expression `(void*)0x0000000C == NULL` yields 0 (false).
This is not a "false negative"; it's a perfectly correct result
indicating that `(void*)0x0000000C` is not a null pointer.

If you want to determine whether a given pointer value may be safely
dereferenced, comparing it to NULL is not the way to do that.

Please be clear: what exactly are you claiming?

glen herrmannsfeldt

unread,

Jan 21, 2013, 2:00:52 AM1/21/13

to

Keith Thompson <ks...@mib.org> wrote:

(snip)

>> When programming for the Windows NT kernel, if a pointer is C's 'NULL'
>> or 0x0000000C, it crashes the computer when it is used for indirect
>> access, as neither refers to a valid C object.

> So there's some similarity of behavior between a null pointer
> (0x00000000 for this implementaiton) and a particular non-null pointer
> representation (0x0000000C).

(snip)

> The expression `(void*)0x0000000C == NULL` yields 0 (false).
> This is not a "false negative"; it's a perfectly correct result
> indicating that `(void*)0x0000000C` is not a null pointer.

I would try memcpy() of the 0x0000000C into a pointer variable,
and then compare it to NULL.

> If you want to determine whether a given pointer value may be safely
> dereferenced, comparing it to NULL is not the way to do that.

> Please be clear: what exactly are you claiming?

-- glen

James Kuyper

unread,

Jan 21, 2013, 8:17:19 AM1/21/13

to

On 01/21/2013 02:00 AM, glen herrmannsfeldt wrote:
> Keith Thompson <ks...@mib.org> wrote:

...

>> So there's some similarity of behavior between a null pointer
>> (0x00000000 for this implementaiton) and a particular non-null pointer
>> representation (0x0000000C).
>
> (snip)
>
>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>> This is not a "false negative"; it's a perfectly correct result
>> indicating that `(void*)0x0000000C` is not a null pointer.
>
> I would try memcpy() of the 0x0000000C into a pointer variable,
> and then compare it to NULL.

How? NULL is the name of a macro which must expand into a null pointer
constant. It therefore cannot be an lvalue (at least in C - in C++ the
rules are different), and therefore cannot be compared using memcpy().

I suppose you could define two pointer variables, store 0xC into one,
and the expansion of NULL into the other. However, The result of storing
NULL into a pointer variable need not have a unique representation; it
could sometimes be represented by 0, and other times by 0xC. The results
could easily vary depending upon the type of the pointer variable. Less
likely, but still permitted by the standard, is the possibility that you
get a different null pointer representation each time the assignment is
executed.

More importantly, why would you do it that way? The standard says
nothing about what the results of that comparison should be, and the
results of that comparison have nothing to do with the defined behavior
of null pointers. The only testable behavior associated uniquely with
null pointers is that they all compare equal to each other, and unequal
to any pointer to an actual object or function.
Now, if two valid pointers of the same type have the same
representation, they must compare equal, but the converse is not true.
--
James Kuyper

Dr Nick

unread,

Jan 21, 2013, 9:13:59 AM1/21/13

to

James Kuyper <james...@verizon.net> writes:

> On 01/21/2013 02:00 AM, glen herrmannsfeldt wrote:
>> Keith Thompson <ks...@mib.org> wrote:
> ...
>>> So there's some similarity of behavior between a null pointer
>>> (0x00000000 for this implementaiton) and a particular non-null
> pointer
>>> representation (0x0000000C).
>>
>> (snip)
>>
>>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>>> This is not a "false negative"; it's a perfectly correct result
>>> indicating that `(void*)0x0000000C` is not a null pointer.
>>
>> I would try memcpy() of the 0x0000000C into a pointer variable,
>> and then compare it to NULL.
>
> How? NULL is the name of a macro which must expand into a null pointer
> constant. It therefore cannot be an lvalue (at least in C - in C++ the
> rules are different), and therefore cannot be compared using memcpy().

That's true. But then nothing can be compared using memcpy. Once
you've done the copy you can just use == to do the comparison.

> More importantly, why would you do it that way? The standard says
> nothing about what the results of that comparison should be, and the
> results of that comparison have nothing to do with the defined
> behavior of null pointers.

Well he was asking about a particular implementation as I read it. So
doing it this way would help to find out whether his hypothesis was
correct or not. That sounds quite a good reason to me.

> The only testable behavior associated uniquely with null pointers is
> that they all compare equal to each other, and unequal to any pointer
> to an actual object or function.

Which seems to suggest that several different bit patterns could all be
valid null pointers. So it his question (which, my memory may be
failing, seems to be "I'm seeing 0x000000C where I'd expect to see a
null pointer, I wonder if it's because there are several different null
pointers used for some compiler/implementation magic") doesn't sound
entirely bonkers.

Ben Bacarisse

unread,

Jan 21, 2013, 9:12:07 AM1/21/13

to

James Kuyper <james...@verizon.net> writes:

> On 01/21/2013 02:00 AM, glen herrmannsfeldt wrote:
>> Keith Thompson <ks...@mib.org> wrote:
> ...
>>> So there's some similarity of behavior between a null pointer
>>> (0x00000000 for this implementaiton) and a particular non-null pointer
>>> representation (0x0000000C).
>>
>> (snip)
>>
>>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>>> This is not a "false negative"; it's a perfectly correct result
>>> indicating that `(void*)0x0000000C` is not a null pointer.
>>
>> I would try memcpy() of the 0x0000000C into a pointer variable,
>> and then compare it to NULL.
>
> How? NULL is the name of a macro which must expand into a null pointer
> constant. It therefore cannot be an lvalue (at least in C - in C++ the
> rules are different), and therefore cannot be compared using memcpy().

Presumably that last "memcpy"is a typo. You presumably meant "and
therefore cannot be compared using memcmp()" but why do you think that's
what glen herrmannsfeldt (sic) meant? He is suggesting using memcpy to
get the desired representation into a pointer object and then using ==
to see if it compares equal to a null pointer.

<snip>
--
Ben.

Keith Thompson

unread,

Jan 21, 2013, 11:20:21 AM1/21/13

to

James Kuyper <james...@verizon.net> writes:
> On 01/21/2013 02:00 AM, glen herrmannsfeldt wrote:
>> Keith Thompson <ks...@mib.org> wrote:
> ...
>>> So there's some similarity of behavior between a null pointer
>>> (0x00000000 for this implementaiton) and a particular non-null pointer
>>> representation (0x0000000C).
>>
>> (snip)
>>
>>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>>> This is not a "false negative"; it's a perfectly correct result
>>> indicating that `(void*)0x0000000C` is not a null pointer.
>>
>> I would try memcpy() of the 0x0000000C into a pointer variable,
>> and then compare it to NULL.
>
> How? NULL is the name of a macro which must expand into a null pointer
> constant. It therefore cannot be an lvalue (at least in C - in C++ the
> rules are different), and therefore cannot be compared using memcpy().

I think you've misread what glen wrote (memcpy vs. memcmp?). I believe
he meant something like this:

uintptr_t x = 0x0000000C;
void *ptr;
memcpy(&ptr, &x, sizeof ptr);
if (ptr == NULL) { ... }

But I don't think glen's suggestion is particularly relevant to
the current discussion. It would be for an implementation in which
pointer-to-integer conversion did something other than just copying
the bits, but I don't believe the implementation we're dealing with
(Microsoft's) does that.

The standard permits all sorts of exotic pointer behavior:
null pointers with representations other than all-bits-zero,
pointer conversions that do something other than just copying
or reinterpreting the representation, equality operators that
do something other than a simple bit-by-bit comparison, etc.
And any of those things could lead to a pointer with the same
representation as the integer 0x0000000C being a null pointer.
As far as I know, Microsoft's implementation doesn't do any of
these things; a pointer value with an all-bits-zero representation
is the one and only null pointer for a given pointer type, a pointer
whose representation looks like 0x0000000C is not a null pointer, and
nothing in Microsoft's documentation refers to it as a null pointer.
(Code that dereferences a pointer with that representation is
reasonably assumed, by the debugger, to be the result of referring
to a member of a structure "pointed to" by a null pointer.)

Or perhaps I've misunderstood the point Shao Miller is trying
to make.

Ben Bacarisse

unread,

Jan 21, 2013, 11:31:38 AM1/21/13

to

I don't recall Geoff saying that anything was unexpected. That's your
phrase that you admit you have not defined. I can't fathom what you
think is unexpected behaviour or what point that would made. Should I
be able to work it out?

Geoff's post looks simple and correct: the standard permits trap
representations that produce undefined behaviour so an implementation is
permitted to use special values to trigger interesting effects (either
in a debugger or else where).

>> Does it depend on who is
>> doing the expecting?
>
> Not really. If the program operates as expected without the debugger
> interrupting, that's probably commonly desired by different
> programmers.
>
>> Does any meaning survive after addressing these
>> questions?
>
> Yes, I think so. Having seen these answers, do you? :)

No. I have no idea what your point is anymore.

No I read in context and try to respond in context as well. I don't
think I missed the context.

> In that quote, I thought it was obvious that this
> was in the context of using a Microsoft debugger, where the behaviour
> would agree with the semantics of an "unspecified value"; not warning
> the programmer during a read of the value (which it _could_).

It could be an unspecified value (you are explicitly using C terms here)
if it is a valid pointer value. It may well be. Is it? It may equally
well not be. It could be either a trap representation or an unspecified
value but you seem to suggest that it is one not the other. You seem to
suggest that one possibility is more likely than the other for reasons
that are spurious. The best evidence that it might not be a trap
representation is that it's a valid pointer, but you give no evidence for
that -- quite the contrary in fact.

> In the summary below, I hope it's clear that a recurring theme was to
> establish 0x0000000C as some kind of a trap value, so the quote above
> doesn't match with that without deciding that it needs additional
> context.
>
> Here's a summary of how the conversation has gone, from my
> perspective. All quotes are fictional, and names have been omitted:
>
> Me: "If an uninitialized 'unsigned char' can have a trap
> representation, then...[something]"
>
> A: "Whatever it has isn't a trap representation, you idiot. It might
> not even have a value."
>
> Me: "Why wouldn't it have a value? Is it because it might not
> successfully translate? Ok, what about in this code example where it
> does translate?"
>
> B: "What would you say it has?"
>
> Me: "An [C] indeterminate value and a magic property
> \"uninitialized\". Here's some history, since you asked what I would
> say..."
>
> B: "Here are Microsoft's documented [non-C] trap values..."

If this was Geoff, he correctly called then "trap representation"
(rather than values) and he referred to "they" meaning the C committee.
He was talking at C trap representations not "[non-C] trap values".

> Me: "Nice summary! Say, isn't there another one, 0x0000000C, as a
> [non-C] trap for [non-C] null pointers? I don't remember..."
>
> C: "Oh no, that wouldn't be a [C] null pointer."
>
> Me: "Yes, well I was talking about B's [non-C] trap values. I see a
> Microsoft debugger catch these things and call them [non-C] null
> pointers."

I think you missed Geoff's point.

Ben.

Tim Rentsch

unread,

Jan 21, 2013, 12:23:17 PM1/21/13

to

Philip Lantz <p...@canterey.us> writes:

>> [snip]

>
> The undefined behavior occurred when &ptr->bar was executed (with ptr

> equal to NULL). [snip]

Actually just slightly before -- when ptr is a null pointer,
evaluating ptr->bar is already undefined behavior.

Geoff

unread,

Jan 21, 2013, 12:27:36 PM1/21/13

to

On Thu, 17 Jan 2013 16:47:53 -0500, Shao Miller <sha0....@gmail.com> wrote:

>On 1/17/2013 10:55, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:

>>> On 1/16/2013 20:32, Geoff wrote:
>> [...]
>>>> Microsoft documents their compiler trap values:
>>>> Value Name Description
>>>> ------ -------- -------------------------
>>>> 0xCD Clean Memory Allocated memory via malloc or new but never
>>>> written by the application.
>>>>
>> [snip]
>>>> 0xFEEEFEEE OS fill heap memory, which was marked for usage,
>>>> but wasn't allocated by HeapAlloc() or LocalAlloc().
>>>> Or that memory just has been freed by HeapFree().
>>>
>>> Wonderful, wonderful summary! I thought null pointers having 0x0C as

>>> the least-significant byte was "a thing," too, but now I can't remember
>>> having seen that documented anywhere.
>>

>> I'm fairly sure Microsoft uses all-bits-zero for null pointers. Most
>> implementations do the same thing, though of course the standard doesn't
>> require it.

>>
>
>I think all-bits-zero is one null pointer value representation, but I
>was talking about "trap representations" in practice (as opposed to a
>discussion of those that depend on padding bits). In Windows NT
>kernel-land, more often than not I see that when a null pointer is
>trapped, it's actually _not_ all-bits-zero; differing in the LSB. The
>debugger still calls it a null pointer.
>

I'm not sure that when Windows traps in the manner you speak of that it's a NULL
pointer exception. It's most likely an x86 protected mode exception being
reported due to dereference of a pointer outside the process allocated virtual
address space. This mechanism is outside the purview of the C standard.

Can you post a simple code example that exhibits the behavior you describe?

Shao Miller

unread,

Jan 21, 2013, 12:37:17 PM1/21/13

to

On 1/21/2013 01:02, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>>
>> Was this unclear?
>
> There's nothing fuzzy about any of this. A pointer value is a null
> pointer if and only if it compares equal to NULL.
>

Ok, I miscommunicated, then. I agree with your notion of a null pointer.

>> What is the practical
>> difference for an NT driver developer? That they don't compare as equal
>> with value but only with behaviour? Comparing with 'NULL' yields a
>> false negative relative to the goal of indirect access. No?
>
> No.
>
> Ignoring for the moment the fact that you can't create or access
> that value without having invoked undefined behavior:
>

Minor nit-pick: Why not? One can modify the object representation.
Wouldn't we have to decide that it's a trap representation before
suggesting undefined behaviour?

> The expression `(void*)0x0000000C == NULL` yields 0 (false).
> This is not a "false negative"; it's a perfectly correct result
> indicating that `(void*)0x0000000C` is not a null pointer.
>

Which is why I wrote "relative to the goal of indirect access."

> If you want to determine whether a given pointer value may be safely
> dereferenced, comparing it to NULL is not the way to do that.
>

So true! Windows NT has 'MmIsAddressValid' and some other methods.
However, this isn't recommended. It seems pretty common in NT that
there are functions taking an "optional" pointer value. The cheapest
way to test would be 'if (!ptr)'. But if the caller passes 0x0000000C,
we most likely will crash.

> Please be clear: what exactly are you claiming?
>

Nothing much; sorry. Let me try to make some fun definitions, deriving
from the C ones (plus an omitted definition of "debugger"):

- Debug representation: An object representation which provides
information to a debugger. A debug representation can represent the
same possibilities as an indeterminate value (an unspecified value or a
trap representation).

- Nasty representation: A trap representation for a given type which,
when read by an lvalue expression with that type, causes the program to
terminate and provides an implementation-defined prompt for a debugging
opportunity with a debugger.

- VA32: Any pointer type with a representation that is 32 bits and
which has no trap representations.

- Unsafe pointer: Any pointer having a VA32 type and having a value
that does not refer to any object. If such a pointer value is used in
an attempt to access the stored value of a pointed-to object, the
behaviour is undefined.

- Lull pointer: An unsafe pointer having a value that compares
unequal with a null pointer and having a debug representation which,
when interpreted by a debugger, suggests (but does not guarantee) that a
recent operation expected a non-null pointer and was provided with a
null pointer. Such a pointer can lull a program into a false sense of
safety for future operations because it compares unequal with a null
pointer. Nevertheless, a lull pointer may be used for all purposes,
except that the note in the description for "unsafe pointer" still applies.

After Geoff's summary of what we might call "debug representations," I
thought I remembered 0x0000000C being another one. I wondered if it
might carry status information, in particular. Mr. Philip Lantz'
explanation makes it likely that it is what we might call a "lull
pointer". I shouldn't've argued with you that it wasn't a trap
representation; that was a mistake. All I meant was that it didn't
resemble what we might call a "nasty representation".

A lull pointer is an unsafe pointer. An unsafe pointer has a VA32 type.
A VA32 type has no trap representations. A nasty representation is a
trap representation. So: A lull pointer cannot have a nasty
representation, but still has a debug representation.

But who cares? 8-)}

Shao Miller

unread,

Jan 21, 2013, 12:46:06 PM1/21/13

to

I thought it was obvious that the mechanism was via page fault and that
the bits of the attempted address were examined in order to determine
some useful information about the nature of a recent problem. All I was
wondering was if 0x0000000C had a particular meaning for debugging, like
your values above.

Mr. Philip Lantz suggested that the origin of such a thing is from a
pointer resulting from a computation involving a null pointer. (See
below.) Having read his post, this seems pretty obvious to be the
likely case, to me.

On 1/19/2013 04:28, Philip Lantz wrote:
> I would guess that he's seeing something like the following:
>
> struct {
> int a, b, c, d;
> } *p = NULL;
>
> p->d = 0;
>
> This traps in the debugger, and the debugger reports a "null pointer
> dereference" at address 0x0000000c.

Shao Miller

unread,

Jan 21, 2013, 2:07:59 PM1/21/13

to

On 1/21/2013 11:31, Ben Bacarisse wrote:
> Shao Miller <sha0....@gmail.com> writes:
>>
>> Not explicitly, no. I thought it would be obvious that it was the
>> context of a Windows debugger notifying the programmer, since Geoff
>> had just discussed that.
>
> I don't recall Geoff saying that anything was unexpected. That's your
> phrase that you admit you have not defined. I can't fathom what you
> think is unexpected behaviour or what point that would made. Should I
> be able to work it out?
>

No, I'm explaining the nature of my mistake. "I thought it would be
obvious." <- Wrong.

> Geoff's post looks simple and correct: the standard permits trap
> representations that produce undefined behaviour so an implementation is
> permitted to use special values to trigger interesting effects (either
> in a debugger or else where).
>

I agree that his post does look that way and that C does allow for that.
There's a subtle bit here, though: If we're discussing a program, once
that program invokes undefined behaviour, anything goes. However, if
I'm not mistaken, with Windows NT, if you have a pointer value which
does not point to an object, the program will continue to operate as per
the C semantics until such a time (if ever) as that pointer might be
used for indirect access.

So it is imprecise to say that such a value has a trap representation,
because the behaviour is still well-defined. Otherwise, the last
sentence of N1570's 6.5.3.2p4 is redundant:

"If an invalid value has been assigned to the pointer, the behavior
of the unary * operator is undefined.102)"

But anyway, a pointer value pointing to no object can still be "trapped"
by Windows during indirection and can still provide useful information
to a Windows debugger. It just does not precisely match C's trap
representation, for behavioural differences.

>
> No I read in context and try to respond in context as well. I don't
> think I missed the context.
>

Well then that's my mistake.

>> In that quote, I thought it was obvious that this
>> was in the context of using a Microsoft debugger, where the behaviour
>> would agree with the semantics of an "unspecified value"; not warning
>> the programmer during a read of the value (which it _could_).
>
> It could be an unspecified value (you are explicitly using C terms here)
> if it is a valid pointer value. It may well be. Is it? It may equally
> well not be. It could be either a trap representation or an unspecified
> value but you seem to suggest that it is one not the other. You seem to
> suggest that one possibility is more likely than the other for reasons
> that are spurious. The best evidence that it might not be a trap
> representation is that it's a valid pointer, but you give no evidence for
> that -- quite the contrary in fact.
>

I'd say that it is not a trap representation.

Take Windows NT's 'IRP' structure. It has a sub-member called
'Tail.Overlay.DriverContext', which is an array of 4 'void *'. This is
one of _very_few_ places where a driver can associate information with
an IRP, and is extremely valuable for that reason.

The implementation defines the results of casting an appropriately-sized
integer value to a 'void *', so such an integer can be "passed" via this
mechanism. We _certainly_ would not wish to believe that this results
in undefined behaviour, so we certainly would not wish to believe that
such a result is a trap representation.

>>
>> B: "Here are Microsoft's documented [non-C] trap values..."
>
> If this was Geoff, he correctly called then "trap representation"
> (rather than values) and he referred to "they" meaning the C committee.
> He was talking at C trap representations not "[non-C] trap values".
>

This was in regards to the lower part of his post where he specifically
says "Microsoft documents their compiler trap values".

>> Me: "Nice summary! Say, isn't there another one, 0x0000000C, as a
>> [non-C] trap for [non-C] null pointers? I don't remember..."
>>
>> C: "Oh no, that wouldn't be a [C] null pointer."
>>
>> Me: "Yes, well I was talking about B's [non-C] trap values. I see a
>> Microsoft debugger catch these things and call them [non-C] null
>> pointers."
>
> I think you missed Geoff's point.
>

I doubt it.

Keith Thompson

unread,

Jan 21, 2013, 3:17:03 PM1/21/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/21/2013 01:02, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>>
>>> Was this unclear?
>>
>> There's nothing fuzzy about any of this. A pointer value is a null
>> pointer if and only if it compares equal to NULL.
>>
>
> Ok, I miscommunicated, then. I agree with your notion of a null pointer.
>
>>> What is the practical
>>> difference for an NT driver developer? That they don't compare as equal
>>> with value but only with behaviour? Comparing with 'NULL' yields a
>>> false negative relative to the goal of indirect access. No?
>>
>> No.
>>
>> Ignoring for the moment the fact that you can't create or access
>> that value without having invoked undefined behavior:
>
> Minor nit-pick: Why not? One can modify the object representation.
> Wouldn't we have to decide that it's a trap representation before
> suggesting undefined behaviour?

Good point. Yes, you can modify the representation of a pointer object
without undefined behavior:

int *p
*(uintptr_t*)&p = 0x0000000C;

Some, but not all, operations that result in such a pointer value have
undefined behavior.

>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>> This is not a "false negative"; it's a perfectly correct result
>> indicating that `(void*)0x0000000C` is not a null pointer.
>
> Which is why I wrote "relative to the goal of indirect access."

Comparing a pointer to NULL, if you don't know in advance that it's
either a null pointer or a valid pointer to some object, does not meet
"the goal of indirect access". The phrase "relative to the goal of
indirect access" is meaningless in this context, or just plain wrong.
You might as well talk about "comparing an integer to zero relative to
the goal of determining whether it's even".

>> If you want to determine whether a given pointer value may be safely
>> dereferenced, comparing it to NULL is not the way to do that.
>>
>
> So true! Windows NT has 'MmIsAddressValid' and some other methods.
> However, this isn't recommended. It seems pretty common in NT that
> there are functions taking an "optional" pointer value. The cheapest
> way to test would be 'if (!ptr)'. But if the caller passes 0x0000000C,
> we most likely will crash.
>
>> Please be clear: what exactly are you claiming?
>>
>
> Nothing much; sorry. Let me try to make some fun definitions, deriving
> from the C ones (plus an omitted definition of "debugger"):
>

[snip]

> - VA32: Any pointer type with a representation that is 32 bits and
> which has no trap representations.

What makes you think that 32-bit pointers on MS Windows have no
trap representations? `(void*)0x0000000C` almost certainly is a
trap representation for the implementation in question. (I'm using
the phrase "trap representation" as the C standard uses it; I lack
interest in any other meanings *unless* some documentation uses
that exact phrase with a different meaning.)

> - Unsafe pointer: Any pointer having a VA32 type and having a value
> that does not refer to any object. If such a pointer value is used in
> an attempt to access the stored value of a pointed-to object, the
> behaviour is undefined.

Such a pointer value is either a null pointer or a trap representation.

[...]

> But who cares? 8-)}

I no longer do. Whatever relevant claims you were making, you've now
quietly backed away from them. We could have avoided wasting a great
deal of time if you'd done so sooner.

glen herrmannsfeldt

unread,

Jan 21, 2013, 3:41:22 PM1/21/13

to

Keith Thompson <ks...@mib.org> wrote:

(snip, someone wrote)

>>>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>>>> This is not a "false negative"; it's a perfectly correct result
>>>> indicating that `(void*)0x0000000C` is not a null pointer.

(then I wrote)

>>> I would try memcpy() of the 0x0000000C into a pointer variable,
>>> and then compare it to NULL.

>> How? NULL is the name of a macro which must expand into a null pointer
>> constant. It therefore cannot be an lvalue (at least in C - in C++ the
>> rules are different), and therefore cannot be compared using memcpy().

> I think you've misread what glen wrote (memcpy vs. memcmp?). I believe
> he meant something like this:

> uintptr_t x = 0x0000000C;
> void *ptr;
> memcpy(&ptr, &x, sizeof ptr);
> if (ptr == NULL) { ... }

Yes.

> But I don't think glen's suggestion is particularly relevant to
> the current discussion. It would be for an implementation in which
> pointer-to-integer conversion did something other than just copying
> the bits, but I don't believe the implementation we're dealing with
> (Microsoft's) does that.

Yes, but if you wanted to test that, even if you don't believe
it very likely, seems to me that would work.

I might also test 0x0C000000 in case someone got the endianness
wrong.

> The standard permits all sorts of exotic pointer behavior:
> null pointers with representations other than all-bits-zero,
> pointer conversions that do something other than just copying
> or reinterpreting the representation, equality operators that
> do something other than a simple bit-by-bit comparison, etc.
> And any of those things could lead to a pointer with the same
> representation as the integer 0x0000000C being a null pointer.

> As far as I know, Microsoft's implementation doesn't do any of
> these things; a pointer value with an all-bits-zero representation
> is the one and only null pointer for a given pointer type, a pointer
> whose representation looks like 0x0000000C is not a null pointer, and
> nothing in Microsoft's documentation refers to it as a null pointer.
> (Code that dereferences a pointer with that representation is
> reasonably assumed, by the debugger, to be the result of referring
> to a member of a structure "pointed to" by a null pointer.)

As far as I know, too, but then I never looked. Well, I haven't
used any MS compilers for long enough that I didn't have any
reason to look.

> Or perhaps I've misunderstood the point Shao Miller is trying
> to make.

Does seem that it could be useful for a (person using a) debugger
to know where an unexpected NULL pointer came from. The cost isn't
all that high, though you would probably have to do it consistently.
(Someone might link to a non-debug version of a library).

-- glen

glen herrmannsfeldt

unread,

Jan 21, 2013, 3:48:46 PM1/21/13

to

Shao Miller <sha0....@gmail.com> wrote:

(snip, someone wrote)

>> Can you post a simple code example that exhibits the behavior you describe?

> I thought it was obvious that the mechanism was via page fault and that
> the bits of the attempted address were examined in order to determine
> some useful information about the nature of a recent problem. All I was
> wondering was if 0x0000000C had a particular meaning for debugging, like
> your values above.

If anyone ever used large model (48 bit pointers, with a 16 bit segment
selector and 32 bit offset) on 80386 and later processors, there are
many interesting things one could do.

> Mr. Philip Lantz suggested that the origin of such a thing is from a
> pointer resulting from a computation involving a null pointer. (See
> below.) Having read his post, this seems pretty obvious to be the
> likely case, to me.

I have no idea how MS does the page tables, though.

If the whole page with high bits zero is invalid, one might be able
to do some interesting things.

-- glen

Ben Bacarisse

unread,

Jan 21, 2013, 4:05:11 PM1/21/13

to

Shao Miller <sha0....@gmail.com> writes:

> On 1/21/2013 11:31, Ben Bacarisse wrote:

<snip>

>> Geoff's post looks simple and correct: the standard permits trap
>> representations that produce undefined behaviour so an implementation is
>> permitted to use special values to trigger interesting effects (either
>> in a debugger or else where).
>
> I agree that his post does look that way and that C does allow for
> that. There's a subtle bit here, though: If we're discussing a
> program, once that program invokes undefined behaviour, anything goes.
> However, if I'm not mistaken, with Windows NT, if you have a pointer
> value which does not point to an object, the program will continue to
> operate as per the C semantics until such a time (if ever) as that
> pointer might be used for indirect access.

Of course. The "C semantics" are exactly as you describe: "anything
goes". If such a program caused the machine to halt, that, too, would
be operating as per "C semantics".

> So it is imprecise to say that such a value has a trap representation,
> because the behaviour is still well-defined. Otherwise, the last
> sentence of N1570's 6.5.3.2p4 is redundant:
>
> "If an invalid value has been assigned to the pointer, the behavior
> of the unary * operator is undefined.102)"

Presumably you read the footnote so you must be aware of all the invalid
values that this clause covers.

A particular bit-pattern (0xC in this case, I think) can either
represent a valid value (for the pointer type in question), an invalid
value, or it can be a trap representation. These three possibilities
are, at a particular time in the program's execution, mutually
exclusive. The middle category, invalid pointer values, needs 6.5.3.2
p4.

> But anyway, a pointer value pointing to no object can still be
> "trapped" by Windows during indirection and can still provide useful
> information to a Windows debugger. It just does not precisely match
> C's trap representation, for behavioural differences.

What's the difference? The behaviour you describe here matches what you
can expect from what C calls a trap representation.

>> No I read in context and try to respond in context as well. I don't
>> think I missed the context.
>>
>
> Well then that's my mistake.
>
>>> In that quote, I thought it was obvious that this
>>> was in the context of using a Microsoft debugger, where the behaviour
>>> would agree with the semantics of an "unspecified value"; not warning
>>> the programmer during a read of the value (which it _could_).
>>
>> It could be an unspecified value (you are explicitly using C terms here)
>> if it is a valid pointer value. It may well be. Is it? It may equally
>> well not be. It could be either a trap representation or an unspecified
>> value but you seem to suggest that it is one not the other. You seem to
>> suggest that one possibility is more likely than the other for reasons
>> that are spurious. The best evidence that it might not be a trap
>> representation is that it's a valid pointer, but you give no evidence for
>> that -- quite the contrary in fact.
>
> I'd say that it is not a trap representation.

But you don't say why. The "it" above presumably refers to what we've
been talking about -- that 0xC bit-pattern. Your only evidence that
it's not a trap representation seems to be that nothing "odd" happens
until you dereference it. That's entirely consistent with it being a

trap representation.

> Take Windows NT's 'IRP' structure. It has a sub-member called
> Tail.Overlay.DriverContext', which is an array of 4 'void *'. This is
> one of _very_few_ places where a driver can associate information with
> an IRP, and is extremely valuable for that reason.
>
> The implementation defines the results of casting an
> appropriately-sized integer value to a 'void *', so such an integer
> can be "passed" via this mechanism. We _certainly_ would not wish to
> believe that this results in undefined behaviour, so we certainly
> would not wish to believe that such a result is a trap representation.

Are you saying that something can't be a trap representation because
something other than C defines what happens when it's used? That's
exactly why, in part, C leaves so many things undefined (the behaviour
of trap representations being one such thing) so that implementations
are free to do useful things in such situations.

<snip>
--
Ben.

Keith Thompson

unread,

Jan 21, 2013, 4:32:50 PM1/21/13

to

Shao Miller <sha0....@gmail.com> writes:
[...]

> I agree that his post does look that way and that C does allow for
> that. There's a subtle bit here, though: If we're discussing a
> program, once that program invokes undefined behaviour, anything goes.
> However, if I'm not mistaken, with Windows NT, if you have a pointer
> value which does not point to an object, the program will continue to
> operate as per the C semantics until such a time (if ever) as that
> pointer might be used for indirect access.

I (mostly) agree with that.

> So it is imprecise to say that such a value has a trap representation,
> because the behaviour is still well-defined. Otherwise, the last
> sentence of N1570's 6.5.3.2p4 is redundant:
>
> "If an invalid value has been assigned to the pointer, the behavior
> of the unary * operator is undefined.102)"

No. Accessing an object with a trap representation has undefined
behavior. What that means is that the behavior is not defined *by the C
standard*; see N1370 3.4.3, the definion of the phrase "undefined
behavior". Another entity can certainly define behavior for such
accesses.

For example, (void*)0x0000000C is very likely a trap representation
under Windows NT. The behavior of applying unary "*" to that value is
undefined, in the sense that the C standard does not define its
behavior. Windows NT can define the behavior if it likes; that doesn't
cause it to cease being a trap representation.

> But anyway, a pointer value pointing to no object can still be
> "trapped" by Windows during indirection and can still provide useful
> information to a Windows debugger. It just does not precisely match
> C's trap representation, for behavioural differences.

Yes, it does.

[...]

> Take Windows NT's 'IRP' structure. It has a sub-member called
> Tail.Overlay.DriverContext', which is an array of 4 'void *'. This is
> one of _very_few_ places where a driver can associate information with
> an IRP, and is extremely valuable for that reason.
>
> The implementation defines the results of casting an
> appropriately-sized integer value to a 'void *', so such an integer
> can be "passed" via this mechanism. We _certainly_ would not wish to
> believe that this results in undefined behaviour, so we certainly
> would not wish to believe that such a result is a trap representation.

I'm afraid that what you wish is irrelevant. It has undefined behavior.
A program that depends on the behavior of some construct whose behavior
is not defined by the C standard is not portable. There's nothing wrong
with that; sometimes you *need* to write non-portable code, code that
depends on guarantees made by a given environment but not by the
language standard.

Don't forget that "undefined behavior" does not* mean "this will crash".
It means exactly what the C standard says it means in 3.4.3: "behavior,
upon use of a nonportable or erroneous program construct or of erroneous
data, for which this International Standard imposes no requirements".

A given construct causing Microsoft's debugger to consistently display a
"NULL_CLASS_PTR_DEREFERENCE" message is perfectly consistent with that
construct having undefined behavior as defined by the C standard. If
you don't understand that, you don't understand undefined behavior.

>>> B: "Here are Microsoft's documented [non-C] trap values..."
>>
>> If this was Geoff, he correctly called then "trap representation"
>> (rather than values) and he referred to "they" meaning the C committee.
>> He was talking at C trap representations not "[non-C] trap values".
>>
>
> This was in regards to the lower part of his post where he
> specifically says "Microsoft documents their compiler trap values".

Here's Geoff's article:

https://groups.google.com/group/comp.lang.c/msg/eeb8c91036919e10?dmode=source&output=gplain&noredirect

The "compiler trap values" that Microsoft documents are not the same
thing as C "trap representations" (that's probably why they have
a *different name*, and Geoff merely said that they're *analagous"
to C trap representations. For example, "Clean Memory" is filled
with 0xCD bytes, which can easily represent a valid value for some
type (certainly for unsigned char).

>>> Me: "Nice summary! Say, isn't there another one, 0x0000000C, as a
>>> [non-C] trap for [non-C] null pointers? I don't remember..."
>>>
>>> C: "Oh no, that wouldn't be a [C] null pointer."
>>>
>>> Me: "Yes, well I was talking about B's [non-C] trap values. I see a
>>> Microsoft debugger catch these things and call them [non-C] null
>>> pointers."
>>
>> I think you missed Geoff's point.
>
> I doubt it.

If you think Geoff was using the phrase "null pointer" or "trap
representation" to refer to anything other than a null pointer
or trap representation as defined by the C standard, then yes,

you missed Geoff's point.

"I see a Microsoft debugger catch these things and call them [non-C]

null pointers." -- I don't believe you have seen that.

Shao Miller

unread,

Jan 21, 2013, 4:33:07 PM1/21/13

to

On 1/21/2013 15:17, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> On 1/21/2013 01:02, Keith Thompson wrote:
>>> The expression `(void*)0x0000000C == NULL` yields 0 (false).
>>> This is not a "false negative"; it's a perfectly correct result
>>> indicating that `(void*)0x0000000C` is not a null pointer.
>>
>> Which is why I wrote "relative to the goal of indirect access."
>
> Comparing a pointer to NULL, if you don't know in advance that it's
> either a null pointer or a valid pointer to some object, does not meet
> "the goal of indirect access". The phrase "relative to the goal of
> indirect access" is meaningless in this context, or just plain wrong.
> You might as well talk about "comparing an integer to zero relative to
> the goal of determining whether it's even".
>

status_t some_func(input_t * input, output_t ** output) {
output_t * new_item;
status_t status;

/* Do stuff. Populate 'new_item', 'status' */

if (output) {
/* Caller wants to refer to the result */
*output = new_item;
}
return status;
}

Here, if 'output' does not compare equal to a null pointer, we'll
attempt indirection on it. If the caller passed an argument for
'output' which they believed to be one of {null pointer, pointer to an
object}, then our test here yields a false positive for the latter if
what they actually passed was the third possibility: A non-null pointer
to no object.

>> - VA32: Any pointer type with a representation that is 32 bits and
>> which has no trap representations.
>
> What makes you think that 32-bit pointers on MS Windows have no
> trap representations? `(void*)0x0000000C` almost certainly is a
> trap representation for the implementation in question. (I'm using
> the phrase "trap representation" as the C standard uses it; I lack
> interest in any other meanings *unless* some documentation uses
> that exact phrase with a different meaning.)
>

Well I didn't quite say that. I was referring to a particular subset of
32-bit pointers. I would certainly consider the representation
0x00000001 to be a trap representation for an 'int *' in 32-bit Windows.

For VA32 (which would correspond to 'void *', 'char *', etc.), the
reason I would think that is that I've over a decade of use, I guess. I
could be wrong! But here's a starting-point, perhaps:

http://msdn.microsoft.com/en-us/library/k26sa92e.aspx

>> - Unsafe pointer: Any pointer having a VA32 type and having a value
>> that does not refer to any object. If such a pointer value is used in
>> an attempt to access the stored value of a pointed-to object, the
>> behaviour is undefined.
>
> Such a pointer value is either a null pointer or a trap representation.
>

Wait, are you saying that any pointer that does not match one of {null
pointer, points to an object} must necessarily be a trap representation?

>> But who cares? 8-)}
>
> I no longer do. Whatever relevant claims you were making, you've now
> quietly backed away from them.

I haven't intentionally backed away from any claims. I wish I knew what
claims you might be referring to.

If you're asking about "null pointer", I thought Mr. Philip Lantz
already answered that when he said: "This traps in the debugger, and the

debugger reports a "null pointer dereference" at address 0x0000000c."

> We could have avoided wasting a great
> deal of time if you'd done so sooner.
>

I really don't know what you're talking about. Here's my recollection
(quotes aren't actual quotes, but interpretations):

Me: "I thought null pointers having 0x0C as the least-significant byte

was \"a thing,\" too, but now I can't remember having seen that
documented anywhere."

You: "I'm fairly sure Microsoft uses all-bits-zero for null pointers."

Me: "Yes I didn't mean via the strict C definition, but via a practiced
usage of the term, with Microsoft."

You: "Null pointers and trap representations aren't the same thing."

Me: "I'm talking about a pointer value that behaves like a null pointer."

You: "Does it compare equal to 'NULL'?"

Me: "Good point; no, it doesn't. But it still doesn't point to any
object and only invoked undefined behaviour once used for indirection.
That makes it similar to a null pointer, in my view. What do you think?"

You: "It seems more like a trap representation."

Me: "(Well, it isn't, strictly speaking...)"

My last one shouldn't've followed so closely behind the question before
it, because the question was confusing. I don't know if there's a
problem other than confusion. I hope not.

Shao Miller

unread,

Jan 21, 2013, 5:13:53 PM1/21/13

to

On 1/21/2013 16:05, Ben Bacarisse wrote:
> Shao Miller <sha0....@gmail.com> writes:
>
>> On 1/21/2013 11:31, Ben Bacarisse wrote:
> <snip>
>>> Geoff's post looks simple and correct: the standard permits trap
>>> representations that produce undefined behaviour so an implementation is
>>> permitted to use special values to trigger interesting effects (either
>>> in a debugger or else where).
>>
>> I agree that his post does look that way and that C does allow for
>> that. There's a subtle bit here, though: If we're discussing a
>> program, once that program invokes undefined behaviour, anything goes.
>> However, if I'm not mistaken, with Windows NT, if you have a pointer
>> value which does not point to an object, the program will continue to
>> operate as per the C semantics until such a time (if ever) as that
>> pointer might be used for indirect access.
>
> Of course. The "C semantics" are exactly as you describe: "anything
> goes". If such a program caused the machine to halt, that, too, would
> be operating as per "C semantics".
>

No, it wouldn't. The C semantics do not include undefined behaviour.

"[...what a constraint violation is...] Undefined behavior is
otherwise indicated in this International Standard by the words
‘‘undefined behavior’’ or by the omission of any explicit definition of
behavior. There is no difference in emphasis among these three; they all
describe ‘‘behavior that is undefined’’."

I used the word "subtle." There is _no_chance_ that _anything_other_
than what _is_ described by the C Standard will happen. No undefined
behaviour. No trap representation.

>> So it is imprecise to say that such a value has a trap representation,
>> because the behaviour is still well-defined. Otherwise, the last
>> sentence of N1570's 6.5.3.2p4 is redundant:
>>
>> "If an invalid value has been assigned to the pointer, the behavior
>> of the unary * operator is undefined.102)"
>
> Presumably you read the footnote so you must be aware of all the invalid
> values that this clause covers.
>
> A particular bit-pattern (0xC in this case, I think) can either
> represent a valid value (for the pointer type in question), an invalid
> value, or it can be a trap representation. These three possibilities
> are, at a particular time in the program's execution, mutually
> exclusive. The middle category, invalid pointer values, needs 6.5.3.2
> p4.
>

You appear to be agreeing with me. Did you think I meant something else?

>> But anyway, a pointer value pointing to no object can still be
>> "trapped" by Windows during indirection and can still provide useful
>> information to a Windows debugger. It just does not precisely match
>> C's trap representation, for behavioural differences.
>
> What's the difference? The behaviour you describe here matches what you
> can expect from what C calls a trap representation.
>

The difference is that the behaviour is _defined_, instead of
_undefined_. Yes, the behaviour for both can match. No, the
expectation is different between them; you don't know what to expect
from undefined behaviour.

>>> It could be an unspecified value (you are explicitly using C terms here)
>>> if it is a valid pointer value. It may well be. Is it? It may equally
>>> well not be. It could be either a trap representation or an unspecified
>>> value but you seem to suggest that it is one not the other. You seem to
>>> suggest that one possibility is more likely than the other for reasons
>>> that are spurious. The best evidence that it might not be a trap
>>> representation is that it's a valid pointer, but you give no evidence for
>>> that -- quite the contrary in fact.
>>
>> I'd say that it is not a trap representation.
>
> But you don't say why. The "it" above presumably refers to what we've
> been talking about -- that 0xC bit-pattern. Your only evidence that
> it's not a trap representation seems to be that nothing "odd" happens
> until you dereference it. That's entirely consistent with it being a
> trap representation.
>

I say why right below. I didn't realize that you couldn't easily accept
this and that you required evidence.

>> Take Windows NT's 'IRP' structure. It has a sub-member called
>> Tail.Overlay.DriverContext', which is an array of 4 'void *'. This is
>> one of _very_few_ places where a driver can associate information with
>> an IRP, and is extremely valuable for that reason.
>>
>> The implementation defines the results of casting an
>> appropriately-sized integer value to a 'void *', so such an integer
>> can be "passed" via this mechanism. We _certainly_ would not wish to
>> believe that this results in undefined behaviour, so we certainly
>> would not wish to believe that such a result is a trap representation.
>
> Are you saying that something can't be a trap representation because
> something other than C defines what happens when it's used? That's
> exactly why, in part, C leaves so many things undefined (the behaviour
> of trap representations being one such thing) so that implementations
> are free to do useful things in such situations.
>

I'm not suggesting that at all. This is a case of Standard behaviour
plus implementation-defined behaviour. Since Keith asked for it, I dug
it up from Microsoft:

"an integral type can be converted to a pointer type according to the
following rules:

- If the integral type is the same size as the pointer type, the
conversion simply causes the integral value to be treated as a pointer
(an unsigned integer)."

Does that help in any way? Can the subject of the parentheses have a
trap representation if all 32 bits are value bits?

Shao Miller

unread,

Jan 21, 2013, 5:46:22 PM1/21/13

to

On 1/21/2013 16:32, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>

>> So it is imprecise to say that such a value has a trap representation,
>> because the behaviour is still well-defined. Otherwise, the last
>> sentence of N1570's 6.5.3.2p4 is redundant:
>>
>> "If an invalid value has been assigned to the pointer, the behavior
>> of the unary * operator is undefined.102)"
>
> No. Accessing an object with a trap representation has undefined
> behavior.

I don't understand this as an answer for "it is imprecise to say that

such a value has a trap representation, because the behaviour is still

well-defined." I don't understand it as an answer for "Otherwise, the
last sentence of N1570's 6.5.3.2p4 is redundant." Did you think that I
didn't think that accessing a trap representation was undefined behaviour?

> What that means is that the behavior is not defined *by the C

> standard*; see [N1570] 3.4.3, the definion of the phrase "undefined

> behavior". Another entity can certainly define behavior for such
> accesses.
>

That is true and is also not applicable to the case under discussion.

> For example, (void*)0x0000000C is very likely a trap representation
> under Windows NT.

I don't know why you find that to be likely. Maybe I've just done too
much x86?

> The behavior of applying unary "*" to that value is
> undefined, in the sense that the C standard does not define its
> behavior. Windows NT can define the behavior if it likes; that doesn't
> cause it to cease being a trap representation.
>

I am thankful for your patience with the discussion. I think I
understand that what was missing in the discussion so far was
"implementation-defined", which _also_ allows Windows NT to define the
behaviour.

>> But anyway, a pointer value pointing to no object can still be
>> "trapped" by Windows during indirection and can still provide useful
>> information to a Windows debugger. It just does not precisely match
>> C's trap representation, for behavioural differences.
>
> Yes, it does.
>

No, it doesn't. :) The behaviour that is being discussed _is_ defined,
_until_ the indirection. It is not undefined before that point.

> I'm afraid that what you wish is irrelevant.

Give me a break. How unfortunate it would be for Microsoft to market
their compiler as "taking care of all your undefined behaviour needs!"
Better confidence is possible.

> It has undefined behavior.

Maybe you're right (I've been wrong!), but I don't see why you believe
that strongly enough to assert it.

> [...more about undefined behaviour...]

>> This was in regards to the lower part of his post where he
>> specifically says "Microsoft documents their compiler trap values".
>
> Here's Geoff's article:
>
> https://groups.google.com/group/comp.lang.c/msg/eeb8c91036919e10?dmode=source&output=gplain&noredirect
>
> The "compiler trap values" that Microsoft documents are not the same
> thing as C "trap representations" (that's probably why they have
> a *different name*, and Geoff merely said that they're *analagous"
> to C trap representations. For example, "Clean Memory" is filled
> with 0xCD bytes, which can easily represent a valid value for some
> type (certainly for unsigned char).
>

Did you think I meant something else? That is _precisely_ why I said it
was a "Wonderful, wonderful summary!" That is why I was talking about
this "analogous" thing, and not trap representations. I tried to
clarify that a few times, but only managed to confuse. Sorry about that.

>>>> Me: "Nice summary! Say, isn't there another one, 0x0000000C, as a
>>>> [non-C] trap for [non-C] null pointers? I don't remember..."
>>>>
>>>> C: "Oh no, that wouldn't be a [C] null pointer."
>>>>
>>>> Me: "Yes, well I was talking about B's [non-C] trap values. I see a
>>>> Microsoft debugger catch these things and call them [non-C] null
>>>> pointers."
>>>
>>> I think you missed Geoff's point.
>>
>> I doubt it.
>
> If you think Geoff was using the phrase "null pointer" or "trap
> representation" to refer to anything other than a null pointer
> or trap representation as defined by the C standard, then yes,
> you missed Geoff's point.
>

No, I don't think that. I have no idea why you might think that I think
that.

> "I see a Microsoft debugger catch these things and call them [non-C]
> null pointers." -- I don't believe you have seen that.
>

"These things" == the subject that Geoff had most recently discussed:
"[non-C] trap values".

Keith Thompson

unread,

Jan 21, 2013, 6:50:57 PM1/21/13

to

Right, because, as I said, checking whether a pointer is null does not
reliably check whether it may be dereferenced. It's a false negative
for the question "May I safely dereference this pointer?". It's not a
false negative for the question "Is this a null pointer?"

The test yields a false negative because it's not a valid test for what
it's trying to test. (No such valid test is possible in portable C.)

>>> - VA32: Any pointer type with a representation that is 32 bits and
>>> which has no trap representations.
>>
>> What makes you think that 32-bit pointers on MS Windows have no
>> trap representations? `(void*)0x0000000C` almost certainly is a
>> trap representation for the implementation in question. (I'm using
>> the phrase "trap representation" as the C standard uses it; I lack
>> interest in any other meanings *unless* some documentation uses
>> that exact phrase with a different meaning.)
>
> Well I didn't quite say that. I was referring to a particular subset of
> 32-bit pointers. I would certainly consider the representation
> 0x00000001 to be a trap representation for an 'int *' in 32-bit Windows.

Yes, you did quite say that. You said the pointer type "has no trap
representations".

> For VA32 (which would correspond to 'void *', 'char *', etc.), the
> reason I would think that is that I've over a decade of use, I guess. I
> could be wrong! But here's a starting-point, perhaps:
>
> http://msdn.microsoft.com/en-us/library/k26sa92e.aspx

That looks a lot like the C Standard's requirements for pointer
conversions, with some extra information about how Microsoft's compiler
performs such conversions. Unlike the standard, it doesn't mention trap
representations.

>>> - Unsafe pointer: Any pointer having a VA32 type and having a value
>>> that does not refer to any object. If such a pointer value is used in
>>> an attempt to access the stored value of a pointed-to object, the
>>> behaviour is undefined.
>>
>> Such a pointer value is either a null pointer or a trap representation.
>>
>
> Wait, are you saying that any pointer that does not match one of {null
> pointer, points to an object} must necessarily be a trap representation?

I believe so, yes.

Given a pointer value that is neither a null pointer nor a pointer to an
object, what criteria would cause you to claim that it's *not* a trap

representation?

>>> But who cares? 8-)}
>>
>> I no longer do. Whatever relevant claims you were making, you've now
>> quietly backed away from them.
>
> I haven't intentionally backed away from any claims. I wish I knew what
> claims you might be referring to.
>
> If you're asking about "null pointer", I thought Mr. Philip Lantz
> already answered that when he said: "This traps in the debugger, and the
> debugger reports a "null pointer dereference" at address 0x0000000c."

I don't remember that particular statement. It's been a long thread.

/* Let struct foo have a member m at offset 12 (0xc) */
struct foo *ptr = NULL;
do_something_with(foo->m);

The evaluation of `foo->m` has undefined behavior because the value of
`foo` is a null pointer. It's likely that the generated code will take
the value stored in `foo` (0x00000000), add the offset 12 to it
(yielding 0x0000000c), and then attempt to dereference the resulting
pointer value. If the program is being executed under the debugger,
this is likely to cause a trap. The debugger sees an attempt to
dereference address 0x0000000c and, quite reasonably, infers that it was
probably the result of accessing a member of a structure or class, at an
offset of 12 bytes, via a null pointer. The debugger may well have
other information available that makes that inference stronger.

(Conceivably a compiler could generate code to test the value of ptr
against 0x00000000 before attempting to add the offset to it; that could
catch the error slightly sooner and more directly, but at a considerable
performance cost.)

I believe you've been implying that this means that 0x0000000c is
a null pointer. It isn't. Seeing Philip's remark, quoted above,
it's a little clearer why you might have thought so.

[snip]

A pointer with the value 0x0000000c is not a null pointer, nor does it
point to any object. It is, I believe, a trap representation. In
certain contexts, the existence of such a pointer may imply that there's
been an attempt to dereference a null pointer; the null pointer itself
(in this particular implementation) has the representation 0x00000000.

Finally, a correction to something I may or may not have implied
earlier. Creating such a pointer, by evaluating `(void*)0x0000000c`,
does not itself have undefined behavior; 6.3.2.3p5 says that the
conversion may yield a trap representation, not that the conversion
itself has undefined behavior. Any attempt to *use* such a pointer
value does have undefined behavior.

0x0000000c;
/* obviously ok, no UB */

(void*)0x0000000c;
/* Creates a trap representation, then discards it; no UB */

p = (void*)0x0000000c;
/* Attempts to access a trap representation, UB */

if ((void*)0x0000000c) == NULL) ...
/* UB */

It's very likely that the third statement will quietly store the
expected value in `p`, and that the last will evaluate the condition as
false; these are entirely consistent with the behavior being undefined.

Keith Thompson

unread,

Jan 21, 2013, 7:11:52 PM1/21/13

to

Shao Miller <sha0....@gmail.com> writes:
[...]

> I'm not suggesting that at all. This is a case of Standard behaviour
> plus implementation-defined behaviour. Since Keith asked for it, I dug
> it up from Microsoft:
>
> "an integral type can be converted to a pointer type according to the
> following rules:
>
> - If the integral type is the same size as the pointer type, the
> conversion simply causes the integral value to be treated as a pointer
> (an unsigned integer)."

Microsoft's wording here implies that a pointer is an unsigned integer.

Microsoft is wrong. (It may be just sloppy wording.)

A pointer may well have the same *representation* as an unsigned
integer, and many of the same behaviors (which is probably what they
meant), but there are two distinct things.

> Does that help in any way? Can the subject of the parentheses have a
> trap representation if all 32 bits are value bits?

The standard's discussion of "value bits" applies only to integer types.
It doesn't say enough about how pointers are represented for the concept
of "value bits" to meaningfully apply to pointers.

Yes, the result of converting an integer to a pointer can be a trap
representation.

Geoff

unread,

Jan 21, 2013, 9:52:48 PM1/21/13

to

On Mon, 21 Jan 2013 12:46:06 -0500, Shao Miller <sha0....@gmail.com> wrote:

>I thought it was obvious that the mechanism was via page fault and that
>the bits of the attempted address were examined in order to determine
>some useful information about the nature of a recent problem. All I was
>wondering was if 0x0000000C had a particular meaning for debugging, like
>your values above.
>
>Mr. Philip Lantz suggested that the origin of such a thing is from a
>pointer resulting from a computation involving a null pointer. (See
>below.) Having read his post, this seems pretty obvious to be the
>likely case, to me.
>
>On 1/19/2013 04:28, Philip Lantz wrote:
>> I would guess that he's seeing something like the following:
>>
struct {
int a, b, c, d;
} *p = NULL;

p->d = 0;

Generates: mov dword ptr ds:[0Ch],0

>>
>> This traps in the debugger, and the debugger reports a "null pointer
>> dereference" at address 0x0000000c.

The answer is simple. It is not a special representation or a trap value within
the context of C or even of the OS, it's a consequence of the construction of
the compiled code.

In the case above, 0x0000000c is the result of the addition of the base pointer
into the structure with the offset within the structure, in this case decimal 12
bytes. If you change p->d = 0; to p->a = 0; the result is an unhandled exception
yielding 0x00000000.

There is no significance to the value of that address, other than it is the
actual null (base) pointer plus the offset into the member (if any).

The access is actually trapped by the memory protection hardware within the x86
since the memory address being accessed is outside the process address space.

Tim Rentsch

unread,

Jan 21, 2013, 10:45:14 PM1/21/13

to

Keith Thompson <ks...@mib.org> writes:

>> [snip]

>>
> Finally, a correction to something I may or may not have implied
> earlier. Creating such a pointer, by evaluating `(void*)0x0000000c`,
> does not itself have undefined behavior; 6.3.2.3p5 says that the
> conversion may yield a trap representation, not that the conversion
> itself has undefined behavior. Any attempt to *use* such a pointer

> value does have undefined behavior. [snip examples]

Strictly speaking the result of a conversion cannot be a trap
representation, because the result of a conversion is a value
and a trap representation is not a value but an object
representation. The Standard is being careless in how it
uses the term here.

Despite that, the intention seems clear, namely, that converting
any value where the result cannot be used as a pointer (for
example, for comparison to NULL) is undefined behavior,
regardless of whether the resulting value is used or discarded
(ie, converted to (void)). There isn't any point in being
allowed to produce a value but then not be able to do anything
with it, or even just store it. Furthermore this is how we
expect actual hardware would work -- it is trying to construct
a bogus address value that is likely to cause a trap, not doing
a store operation to put the bogus result in memory.

A quote from the C99 Rationale document might be illuminating
here:

Since pointers and integers are now considered incommensurate,
the only integer value that can be safely converted to a
pointer is a constant expression with the value 0.

Considering the context in which it was made, this statement
seems exactly on point.

glen herrmannsfeldt

unread,

Jan 21, 2013, 11:07:30 PM1/21/13

to

Geoff <ge...@invalid.invalid> wrote:

(snip)

> In the case above, 0x0000000c is the result of the addition of the base pointer
> into the structure with the offset within the structure, in this case decimal 12
> bytes. If you change p->d = 0; to p->a = 0; the result is an unhandled exception
> yielding 0x00000000.

Reminds me of my first program using strchr().
(Or maybe the BSD index().)

After being used to the PL/I INDEX, and since I wanted the
position, not the pointer, I subtracted the first argument:

j=index(str,'x')-str;

Then when I had to test if it wasn't found,

if(j+str==NULL)

-- glen

Shao Miller

unread,

Jan 21, 2013, 11:15:12 PM1/21/13

to

Mr. Philip Lantz' post makes that clear. Your post here makes it
abundantly clear. :)

> The access is actually trapped by the memory protection hardware within the x86
> since the memory address being accessed is outside the process address space.
>

(Outside of an accessible memory "pool", using Windows terminology.)
And in fact, even addresses that do reference into a memory pool can
cause traps ("paging," as I'm sure you know). And we wouldn't call
those trap representations either, I hope.

Tim Rentsch

unread,

Jan 21, 2013, 11:53:17 PM1/21/13

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:

>> [snip[

>>
> A particular bit-pattern (0xC in this case, I think) can either
> represent a valid value (for the pointer type in question), an invalid
> value, or it can be a trap representation. These three possibilities
> are, at a particular time in the program's execution, mutually
> exclusive. The middle category, invalid pointer values, needs 6.5.3.2
> p4.

Actually there are four kinds of pointer values (with the
understanding that "value" here includes some that cannot
be used definedly):

1. unusable (any use is undefined behavior)
2. null pointers (can be compared for equality/inequality)
3. equality, pointer arithmetic, relational (eg <) comparison
4. like 3 but also can be dereferenced

Type 3 values are, eg, pointers one past the end of an array, or
non-null values returned from doing a malloc(0). Type 4 values
are regular pointers to objects.

The footnote makes it clear that the phrase 'invalid value' used
in 6.5.3.2 p4 means categories 1-3 above. However this meaning
of the phrase is meant to apply only to this section.

If a pointer value is stored, the resulting object representation
can be a trap representation only for values of type 1. (Of
course any attempt to store a value of type 1 could store anything
at all, including some representation of any of the above
categories.)

If an object is read as a pointer type, and the resulting value
is of type 1 (and assuming there wasn't anything else causing
undefined behavior), then the object representation of that
object must be (or have been) a trap representation (when
considered as the type used to do the access). This follows from
the definition of trap representation.

Ben Bacarisse

unread,

Jan 21, 2013, 11:59:53 PM1/21/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/21/2013 16:05, Ben Bacarisse wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>
>>> On 1/21/2013 11:31, Ben Bacarisse wrote:
>> <snip>
>>>> Geoff's post looks simple and correct: the standard permits trap
>>>> representations that produce undefined behaviour so an implementation is
>>>> permitted to use special values to trigger interesting effects (either
>>>> in a debugger or else where).
>>>
>>> I agree that his post does look that way and that C does allow for
>>> that. There's a subtle bit here, though: If we're discussing a
>>> program, once that program invokes undefined behaviour, anything goes.
>>> However, if I'm not mistaken, with Windows NT, if you have a pointer
>>> value which does not point to an object, the program will continue to
>>> operate as per the C semantics until such a time (if ever) as that
>>> pointer might be used for indirect access.
>>
>> Of course. The "C semantics" are exactly as you describe: "anything
>> goes". If such a program caused the machine to halt, that, too, would
>> be operating as per "C semantics".
>>
>
> No, it wouldn't. The C semantics do not include undefined behaviour.

This is such an odd statement that a recap is probably needed. What do
you mean by "the C semantics"? And are we still talking about a program
that accesses a pointer whose bit-pattern looks like 0xC?

I use the phrase to mean what the C standard say about the meaning of a
C program. The standard says that such a pointer representation *might*
be a trap representation. No one is saying that *is* one (well, I'm
not) but I have disagreed with the arguments you've put forward for
asserting that it is *not* one.

> "[...what a constraint violation is...] Undefined behavior is
> otherwise indicated in this International Standard by the words
> ‘‘undefined behavior’’ or by the omission of any explicit definition
> of behavior. There is no difference in emphasis among these three;
> they all describe ‘‘behavior that is undefined’’."
>
> I used the word "subtle." There is _no_chance_ that _anything_other_
> than what _is_ described by the C Standard will happen. No undefined
> behaviour. No trap representation.

Too subtle for me. What, then, does the C standard say will happen when
the representation in question is used as a pointer?

>>> So it is imprecise to say that such a value has a trap representation,
>>> because the behaviour is still well-defined. Otherwise, the last
>>> sentence of N1570's 6.5.3.2p4 is redundant:
>>>
>>> "If an invalid value has been assigned to the pointer, the behavior
>>> of the unary * operator is undefined.102)"
>>
>> Presumably you read the footnote so you must be aware of all the invalid
>> values that this clause covers.
>>
>> A particular bit-pattern (0xC in this case, I think) can either
>> represent a valid value (for the pointer type in question), an invalid
>> value, or it can be a trap representation. These three possibilities
>> are, at a particular time in the program's execution, mutually
>> exclusive. The middle category, invalid pointer values, needs 6.5.3.2
>> p4.
>
> You appear to be agreeing with me. Did you think I meant something
> else?

Yes, of course I did.

>>> But anyway, a pointer value pointing to no object can still be
>>> "trapped" by Windows during indirection and can still provide useful
>>> information to a Windows debugger. It just does not precisely match
>>> C's trap representation, for behavioural differences.
>>
>> What's the difference? The behaviour you describe here matches what you
>> can expect from what C calls a trap representation.
>
> The difference is that the behaviour is _defined_, instead of
> _undefined_. Yes, the behaviour for both can match. No, the
> expectation is different between them; you don't know what to expect
> from undefined behaviour.

I can't unpick this. If it's central to your point, then I don't
understand what your point is. If it isn't, maybe we can just put it to
one side.

>>>> It could be an unspecified value (you are explicitly using C terms here)
>>>> if it is a valid pointer value. It may well be. Is it? It may equally
>>>> well not be. It could be either a trap representation or an unspecified
>>>> value but you seem to suggest that it is one not the other. You seem to
>>>> suggest that one possibility is more likely than the other for reasons
>>>> that are spurious. The best evidence that it might not be a trap
>>>> representation is that it's a valid pointer, but you give no evidence for
>>>> that -- quite the contrary in fact.
>>>
>>> I'd say that it is not a trap representation.
>>
>> But you don't say why. The "it" above presumably refers to what we've
>> been talking about -- that 0xC bit-pattern. Your only evidence that
>> it's not a trap representation seems to be that nothing "odd" happens
>> until you dereference it. That's entirely consistent with it being a
>> trap representation.
>
> I say why right below. I didn't realize that you couldn't easily
> accept this and that you required evidence.

I don't object to the conclusion, I object to the argument. If, below,
you say why 0xC can't be a trap representation in this particular
implementation, then I'll happily accept it. What I did not accept is
what looked to me like your spurious arguments in support of this
claim.

>>> Take Windows NT's 'IRP' structure. It has a sub-member called
>>> Tail.Overlay.DriverContext', which is an array of 4 'void *'. This is
>>> one of _very_few_ places where a driver can associate information with
>>> an IRP, and is extremely valuable for that reason.
>>>
>>> The implementation defines the results of casting an
>>> appropriately-sized integer value to a 'void *', so such an integer
>>> can be "passed" via this mechanism. We _certainly_ would not wish to
>>> believe that this results in undefined behaviour, so we certainly
>>> would not wish to believe that such a result is a trap representation.
>>
>> Are you saying that something can't be a trap representation because
>> something other than C defines what happens when it's used? That's
>> exactly why, in part, C leaves so many things undefined (the behaviour
>> of trap representations being one such thing) so that implementations
>> are free to do useful things in such situations.
>
> I'm not suggesting that at all. This is a case of Standard behaviour
> plus implementation-defined behaviour. Since Keith asked for it, I
> dug it up from Microsoft:
>
> "an integral type can be converted to a pointer type according to
> the following rules:
>
> - If the integral type is the same size as the pointer type, the
> conversion simply causes the integral value to be treated as a pointer
> (an unsigned integer)."
>
> Does that help in any way? Can the subject of the parentheses have a
> trap representation if all 32 bits are value bits?

This does not say that a pointer object can not hold a trap
representation. The phrase "treated as a pointer" is just too vague to
be sure. Let's say that some bit pattern like 0xC is indeed a void *
trap representation. The conversion (void *)0xC "treats it as a
pointer" but does not involve a representation, so we can't say from
that phrase alone whether that bit-pattern would be a trap
representation if it were in a pointer object.

It seems likely that this text intends to say that all bit patterns
represent valid or invalid (i.e. non-trap) representations for all
pointer types, but it does not get round to actually saying it. And
since the behaviour of a trap representation is a free-for-all, there
is very little way to tell, other than an actual statement of that
fact.

--
Ben.

Shao Miller

unread,

Jan 22, 2013, 12:18:10 AM1/22/13

to

On 1/21/2013 18:50, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> On 1/21/2013 15:17, Keith Thompson wrote:
>>> Shao Miller <sha0....@gmail.com> writes:
>>>> - VA32: Any pointer type with a representation that is 32 bits and
>>>> which has no trap representations.
>>>
>>> What makes you think that 32-bit pointers on MS Windows have no
>>> trap representations? `(void*)0x0000000C` almost certainly is a
>>> trap representation for the implementation in question. (I'm using
>>> the phrase "trap representation" as the C standard uses it; I lack
>>> interest in any other meanings *unless* some documentation uses
>>> that exact phrase with a different meaning.)
>>
>> Well I didn't quite say that. I was referring to a particular subset of
>> 32-bit pointers. I would certainly consider the representation
>> 0x00000001 to be a trap representation for an 'int *' in 32-bit Windows.
>
> Yes, you did quite say that. You said the pointer type "has no trap
> representations".
>

I did _not_ say "that 32-bit pointers on MS Windows have no trap
representations". I said that a "VA32", an "unsafe pointer" and a "lull
pointer" do not have trap representations. Yes, I meant to imply that
there was some relevance to Windows. The relationship I would suggest
between a VA32 and Windows is described immediately below.

>> For VA32 (which would correspond to 'void *', 'char *', etc.), the
>> reason I would think that is that I've over a decade of use, I guess. I
>> could be wrong! But here's a starting-point, perhaps:
>>
>> http://msdn.microsoft.com/en-us/library/k26sa92e.aspx
>
> That looks a lot like the C Standard's requirements for pointer
> conversions, with some extra information about how Microsoft's compiler
> performs such conversions. Unlike the standard, it doesn't mention trap
> representations.
>

There is a good reason for this: It conforms to C90. There were no trap
representations in C90. Fortunately, developing for Windows doesn't
require any. What we do have are what's been available since the
dinosaurs roamed the data-centre: Bit patterns that carry useful
debugging information.

>> Wait, are you saying that any pointer that does not match one of {null
>> pointer, points to an object} must necessarily be a trap representation?
>
> I believe so, yes.
>
> Given a pointer value that is neither a null pointer nor a pointer to an

> object [or function], what criteria would cause you to claim that it's *not* a trap
> representation?
>

I think it has to work the other way around. I think the implementation
has to define it as a trap representation. I don't think we can infer
this property from anything that any Standard says. This could be where
our agreement diverges.

>> I haven't intentionally backed away from any claims. I wish I knew what
>> claims you might be referring to.
>>

> /* Let struct foo have a member m at offset 12 (0xc) */
> struct foo *ptr = NULL;
> do_something_with(foo->m);
>
> The evaluation of `foo->m` has undefined behavior because the value of
> `foo` is a null pointer.

Agreed.

> It's likely that the generated code will take
> the value stored in `foo` (0x00000000), add the offset 12 to it
> (yielding 0x0000000c), and then attempt to dereference the resulting
> pointer value. If the program is being executed under the debugger,
> this is likely to cause a trap. The debugger sees an attempt to
> dereference address 0x0000000c and, quite reasonably, infers that it was
> probably the result of accessing a member of a structure or class, at an
> offset of 12 bytes, via a null pointer. The debugger may well have
> other information available that makes that inference stronger.
>
> (Conceivably a compiler could generate code to test the value of ptr
> against 0x00000000 before attempting to add the offset to it; that could
> catch the error slightly sooner and more directly, but at a considerable
> performance cost.)
>
> I believe you've been implying that this means that 0x0000000c is
> a null pointer. It isn't. Seeing Philip's remark, quoted above,
> it's a little clearer why you might have thought so.
>

I did not mean to imply that. Glen Herrmannsfeldt suggested a scenario
where that could be true, but I responded by saying that his case was
"along the lines that might be" the case I was talking about, which is
exactly true: The Microsoft debugger distinguishes between an invalid
pointer and an invalid pointer in the "null class". This implies that
the bits are significant and Glen's masking comes into play. It does
_not_ imply that that I'm saying 0x0000000C is a null pointer. If it
was confusing before, it should have been clear once I answered you
about it comparing as unequal with a null pointer. Sorry about the
confusion.

> [snip]

Shao Miller

unread,

Jan 22, 2013, 1:05:24 AM1/22/13

to

On 1/21/2013 23:59, Ben Bacarisse wrote:
> Shao Miller <sha0....@gmail.com> writes:
>> On 1/21/2013 16:05, Ben Bacarisse wrote:
>>> Shao Miller <sha0....@gmail.com> writes:
>>>
>>>> On 1/21/2013 11:31, Ben Bacarisse wrote:
>>> <snip>
>>>>> Geoff's post looks simple and correct: the standard permits trap
>>>>> representations that produce undefined behaviour so an implementation is
>>>>> permitted to use special values to trigger interesting effects (either
>>>>> in a debugger or else where).
>>>>
>>>> I agree that his post does look that way and that C does allow for
>>>> that. There's a subtle bit here, though: If we're discussing a
>>>> program, once that program invokes undefined behaviour, anything goes.
>>>> However, if I'm not mistaken, with Windows NT, if you have a pointer
>>>> value which does not point to an object, the program will continue to
>>>> operate as per the C semantics until such a time (if ever) as that
>>>> pointer might be used for indirect access.
>>>
>>> Of course. The "C semantics" are exactly as you describe: "anything
>>> goes". If such a program caused the machine to halt, that, too, would
>>> be operating as per "C semantics".
>>>
>>
>> No, it wouldn't. The C semantics do not include undefined behaviour.
>
> This is such an odd statement that a recap is probably needed. What do
> you mean by "the C semantics"? And are we still talking about a program
> that accesses a pointer whose bit-pattern looks like 0xC?
>

The C semantics are defined by the Standard. A strictly conforming
program's behaviour is predictable, given these.

Where we see a mention of implementation-defined subject matter, a C
program's behaviour is defined by both the C semantics as well as
implementation-specific definitions. The program's behaviour is
predictable, because the implementation documents their definitions.

Where we see a mention of undefined behaviour, a C program's behaviour
is not guaranteed to be defined by any known means. The program's
behaviour is unpredictable, _unless_ we happen to know it by means which
aren't referred to by the Standard. An implementation or other Standard
(such as POSIX) can certainly define the behaviour.

I'm trying to argue that case for undefined behaviour does not apply.

> I use the phrase to mean what the C standard say about the meaning of a
> C program. The standard says that such a pointer representation *might*
> be a trap representation. No one is saying that *is* one (well, I'm
> not) but I have disagreed with the arguments you've put forward for
> asserting that it is *not* one.
>

Ok. I understand that you didn't like my argument. My recent argument
about IRPs used the English "extremely valuable." This is supposed to
suggest that the English "value" might have some relevance to C's "valid
value": The implementation wishes to allow for something that is
valuable to be represented and worked-with in a well-defined manner. It
is roughly a thought experiment. Maybe that's not the argument that you
didn't enjoy.

>> "[...what a constraint violation is...] Undefined behavior is
>> otherwise indicated in this International Standard by the words
>> ‘‘undefined behavior’’ or by the omission of any explicit definition
>> of behavior. There is no difference in emphasis among these three;
>> they all describe ‘‘behavior that is undefined’’."
>>
>> I used the word "subtle." There is _no_chance_ that _anything_other_
>> than what _is_ described by the C Standard will happen. No undefined
>> behaviour. No trap representation.
>

> [...] What, then, does the C standard say will happen when

> the representation in question is used as a pointer?
>

The C Standard _plus_ the implementation say: It can be stored, read,
passed, discarded, converted, compared, its size determined, etc.
Pretty well anything that doesn't involve using the pointer for indirect
access.

>
> I can't unpick this. If it's central to your point, then I don't
> understand what your point is. If it isn't, maybe we can just put it to
> one side.
>

It is central. Please see my notes about predictability, above.

>> I say why right below. I didn't realize that you couldn't easily
>> accept this and that you required evidence.
>
> I don't object to the conclusion, I object to the argument. If, below,
> you say why 0xC can't be a trap representation in this particular
> implementation, then I'll happily accept it. What I did not accept is
> what looked to me like your spurious arguments in support of this
> claim.
>

Ok. I understand your objection. I didn't think it'd be objectionable.

>> Since Keith asked for it, I
>> dug it up from Microsoft:
>>
>> "an integral type can be converted to a pointer type according to
>> the following rules:
>>
>> - If the integral type is the same size as the pointer type, the
>> conversion simply causes the integral value to be treated as a pointer
>> (an unsigned integer)."
>>
>> Does that help in any way? Can the subject of the parentheses have a
>> trap representation if all 32 bits are value bits?
>
> This does not say that a pointer object can not hold a trap
> representation. The phrase "treated as a pointer" is just too vague to
> be sure. Let's say that some bit pattern like 0xC is indeed a void *
> trap representation. The conversion (void *)0xC "treats it as a
> pointer" but does not involve a representation, so we can't say from
> that phrase alone whether that bit-pattern would be a trap
> representation if it were in a pointer object.
>

There was no claim that this Microsoft quote states that a pointer
cannot hold a trap representation. Such a claim would be impossible,
since there are no trap representations in C90 (which Microsoft conforms
to). The quote is certainly a hint that Microsoft represents pointers
the same way as unsigned integers with the same size. Since we happen
to know (I hope "we" applies) that all bits are value bits for the
integer representation, it suggests to me that the number of pointer
values == the number of the corresponding unsigned integer type's values.

However, although C90 has no trap representations, I would still say
that a pointer-to-object whose representation addresses a misaligned
object is a trap representation. I couldn't prove such a statement, but
it makes sense, to me.

> It seems likely that this text intends to say that all bit patterns
> represent valid or invalid (i.e. non-trap) representations for all
> pointer types, but it does not get round to actually saying it. And
> since the behaviour of a trap representation is a free-for-all, there
> is very little way to tell, other than an actual statement of that
> fact.
>

The reason why it doesn't get around to saying it is discussed, above.
Else-thread:

On 1/22/2013 00:18, Shao Miller wrote:
> Fortunately, developing for Windows doesn't require any. What we do
> have are what's been available since the dinosaurs roamed the
> data-centre: Bit patterns that carry useful debugging information.

(Which is what Geoff was talking about, I surely hope.)

Philip Lantz

unread,

Jan 22, 2013, 3:58:42 AM1/22/13

to

Tim Rentsch wrote:

I think there may be one more kind of pointer value, which is the kind
that caused this thread. It is a value that may appear in a pointer
object as a result of some prior undefined behavior. This is of course
outside the scope of the C standard--which is why it correctly doesn't
appear in your list--but it does occur in actual implementations.

It behaves much like your type 1, but I'm not sure it's identical; in
particular, its behavior can't be described by referring to the
standard. Before reading this thread, I would not have ever thought to
call it a trap representation, but it is pretty similar.

Ben Bacarisse

unread,

Jan 22, 2013, 9:53:26 AM1/22/13

to

Shao Miller <sha0....@gmail.com> writes:
<all snipped>

I don't have anything more to add to this exchange. Commenting on your
comments will just add to the noise, I fear, without adding anything
new.

--
Ben.

Keith Thompson

unread,

Jan 22, 2013, 10:43:51 AM1/22/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/21/2013 16:32, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>> So it is imprecise to say that such a value has a trap representation,
>>> because the behaviour is still well-defined. Otherwise, the last
>>> sentence of N1570's 6.5.3.2p4 is redundant:
>>>
>>> "If an invalid value has been assigned to the pointer, the behavior
>>> of the unary * operator is undefined.102)"
>>
>> No. Accessing an object with a trap representation has undefined
>> behavior.
>
> I don't understand this as an answer for "it is imprecise to say that
> such a value has a trap representation, because the behaviour is still
> well-defined." I don't understand it as an answer for "Otherwise, the
> last sentence of N1570's 6.5.3.2p4 is redundant." Did you think that I
> didn't think that accessing a trap representation was undefined behaviour?

I *think* the phrase "such a value" refers to a pointer value such as
(void*)0x0000000C, in a Microsoft Windows 32-bit environment. If you're
talking about something else, the following probably won't make much
sense.

Let's try a concrete example. Given this code snippet:

int *ptr;
uintptr_t x = 0x0000000C;
memcpy(&ptr, &x, sizeof ptr);
int deref = *ptr;

I assert that evaluating `*ptr` has undefined behavior. I also assert,
though with slightly less confidence, that after the memcpy() call ptr
contains a trap representation.

Do you disagree? If so, can you cite where the C standard defines the
behavior of the dereference?

[snip]

>> "I see a Microsoft debugger catch these things and call them [non-C]
>> null pointers." -- I don't believe you have seen that.
>
> "These things" == the subject that Geoff had most recently discussed:
> "[non-C] trap values".

What you've seen, as I recall, is the Microsoft debugger, on an
empt to dereference (on the machine code level) a pointer with
the representation 0x0000000C, printing an error message that
includes the identifier "NULL_CLASS_PTR_DEREFERENCE". That is not
the debugger calling 0x0000000C a null pointer. It's the debugger
inferring, from the attempt to dereference 0x0000000C, that there
was an attempt to dereference a C null pointer (0x00000000). Or am
I missing something?

Keith Thompson

unread,

Jan 22, 2013, 11:01:48 AM1/22/13

to

Shao Miller <sha0....@gmail.com> writes:
> On 1/21/2013 18:50, Keith Thompson wrote:

[...]

>> I believe you've been implying that this means that 0x0000000c is
>> a null pointer. It isn't. Seeing Philip's remark, quoted above,
>> it's a little clearer why you might have thought so.
>
> I did not mean to imply that. Glen Herrmannsfeldt suggested a scenario
> where that could be true, but I responded by saying that his case was
> "along the lines that might be" the case I was talking about, which is
> exactly true: The Microsoft debugger distinguishes between an invalid
> pointer and an invalid pointer in the "null class". This implies that
> the bits are significant and Glen's masking comes into play. It does
> _not_ imply that that I'm saying 0x0000000C is a null pointer. If it
> was confusing before, it should have been clear once I answered you
> about it comparing as unequal with a null pointer. Sorry about the
> confusion.

You're inferring that the debugger message "NULL_CLASS_PTR_DEREFERENCE"
refers to pointers in the "null class". I don't think that's what it
means.

Microsoft's development environment places a greater emphasis on C++
than on C. An attempted dereference of address 0x0000000C is likely
to result from an attempt to dereference a pointer to a class type
(in C++, structs are classes), and the debugger may have more
information that strengthens this inference.

class foo {
public:
char x[12];
int y;
};

foo* ptr = 0;
int kaboom = ptr->y;

(I just tried running a C++ program with the above code under MS Visual
C++ 2010 Express, and didn't get that message; instead, I got "Unhandled
exception at 0x009013a8 in null_class_ptr.exe: 0xC0000005: Access
violation reading location 0x0000000c.".)

Keith Thompson

unread,

Jan 22, 2013, 11:25:21 AM1/22/13

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
[...]

> Actually there are four kinds of pointer values (with the
> understanding that "value" here includes some that cannot
> be used definedly):
>
> 1. unusable (any use is undefined behavior)
> 2. null pointers (can be compared for equality/inequality)
> 3. equality, pointer arithmetic, relational (eg <) comparison
> 4. like 3 but also can be dereferenced
>
> Type 3 values are, eg, pointers one past the end of an array, or
> non-null values returned from doing a malloc(0). Type 4 values
> are regular pointers to objects.

[...]

And I believe that type 1 is exactly the set of trap representations,
but I haven't been able to prove it.

The definition of "trap representation" in 3.19.4 is a bit vague:

trap representation

an object representation that need not represent a value of the
object type

I wonder why it doesn't say "does not" rather than "need not".

Keith Thompson

unread,

Jan 22, 2013, 11:38:47 AM1/22/13

to

Shao Miller <sha0....@gmail.com> writes:
[...]

> The C semantics are defined by the Standard. A strictly conforming
> program's behaviour is predictable, given these.
>
> Where we see a mention of implementation-defined subject matter, a C
> program's behaviour is defined by both the C semantics as well as
> implementation-specific definitions. The program's behaviour is
> predictable, because the implementation documents their definitions.
>
> Where we see a mention of undefined behaviour, a C program's behaviour
> is not guaranteed to be defined by any known means. The program's
> behaviour is unpredictable, _unless_ we happen to know it by means
> which aren't referred to by the Standard. An implementation or other
> Standard (such as POSIX) can certainly define the behaviour.

I agree with the above.

> I'm trying to argue that case for undefined behaviour does not apply.

Does not apply *to what*?

Can you provide a small self-contained program that's relevant to
the point you're making?

If you're saying that there's some construct whose behavior you
say is implementation-defined, and I say is undefined, then please
point out the construct in question in your program.

Keith Thompson

unread,

Jan 22, 2013, 11:40:18 AM1/22/13

to

How does "its behavior can't be described by referring to the standard"
differ from "any use is undefined behavior"? (Apart from the quibble
that pointer values don't have behavior; operations on them do.)

Shao Miller

unread,

Jan 22, 2013, 4:07:27 PM1/22/13

to

I'd say that it is implementation-defined behaviour; the implementation
knows

All I've been trying to point out is that, for Windows NT, 'ptr' does
not contain a trap representation, so any use _other_ than indirection
(via '*', '->', or function-call '()', for example) is well-defined.
For indirection, it depends on if 'ptr' points to an 'int' object, as
always.

Any confusion could be a matter of my not picking the right choice of
words at the right times.

> I also assert,
> though with slightly less confidence, that after the memcpy() call ptr
> contains a trap representation.
> Do you disagree? If so, can you cite where the C standard defines the
> behavior of the dereference?
>

Yes, I disagree with the second part about 'ptr' containing a trap
representation, for Windows NT.

" 6.2.6.1 General

1 The representations of all types are unspecified except as stated
in this subclause.

2 Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of which
are either explicitly specified or implementation-defined."

we don't see 'int *' explicitly specified, thereafter. For 32-bit
Windows NT, it so happens that this type is represented the same way as
an unsigned integer with all value bits. It so happens that 0xC is a
multiple of 'sizeof (int)' and represents a suitable alignment for an 'int'.

> [snip]
>
>>> "I see a Microsoft debugger catch these things and call them [non-C]
>>> null pointers." -- I don't believe you have seen that.
>>
>> "These things" == the subject that Geoff had most recently discussed:
>> "[non-C] trap values".
>
> What you've seen, as I recall, is the Microsoft debugger, on an
> empt to dereference (on the machine code level) a pointer with
> the representation 0x0000000C, printing an error message that
> includes the identifier "NULL_CLASS_PTR_DEREFERENCE". That is not
> the debugger calling 0x0000000C a null pointer. It's the debugger
> inferring, from the attempt to dereference 0x0000000C, that there
> was an attempt to dereference a C null pointer (0x00000000). Or am
> I missing something?
>

I don't think there needs to be an inference that were was a dereference
to a null pointer, but other than that, I think you're right. I really
wish I'd said "NULL_CLASS_PTR_DEREFERENCE" in the beginning, so it
wouldn't have been an issue of discussion. The trap values that Geoff
had discussed were not C trap representations, so I was likewise being
loose with "null pointers." I am filled with regret about that. As
we've already discussed, such a pointer value does not compare equal to
a C null pointer, so the debugger could hardly claim that it's a C null
pointer.

In order to understand what I mean by "loose" and why I might've done
so, for what it's worth, there are some people who don't use the C
definition for "null pointer" in non-pedantic discussion:

Scott Noone (well known at OSR Online):
http://social.msdn.microsoft.com/Forums/en-AU/windbg/thread/669a4137-f328-495f-a002-120fb6841542

Chad Bramwell:
http://www.altdevblogaday.com/2011/09/26/how-did-i-crash-in-that-func/

On 1/22/2013 11:01, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:

I get your point. I think another example would be a class method (or
whatever they're called) doing something with 'this->y'.

Off-topically, I'd suggest installing Debugging Tools for Windows for
messing around with Windows debugging; it's quite nifty! After an
error, '!analyze -v'.

On 1/22/2013 11:38, Keith Thompson wrote:
> Shao Miller <sha0....@gmail.com> writes:

> [...]
>> The C semantics are defined by the Standard. A strictly conforming
>> program's behaviour is predictable, given these.
>>
>> Where we see a mention of implementation-defined subject matter, a C
>> program's behaviour is defined by both the C semantics as well as
>> implementation-specific definitions. The program's behaviour is
>> predictable, because the implementation documents their definitions.
>>
>> Where we see a mention of undefined behaviour, a C program's behaviour
>> is not guaranteed to be defined by any known means. The program's
>> behaviour is unpredictable, _unless_ we happen to know it by means
>> which aren't referred to by the Standard. An implementation or other
>> Standard (such as POSIX) can certainly define the behaviour.
>
> I agree with the above.
>

I'm glad to read that. :)

>> I'm trying to argue that case for undefined behaviour does not apply.
>
> Does not apply *to what*?
>

Considering a Windows pointer with representation 0xC, I meant with
regards to what Ben had asked about and what I had typed later in the
same post: "The C Standard _plus_ the implementation say: It can be

stored, read, passed, discarded, converted, compared, its size
determined, etc. Pretty well anything that doesn't involve using the
pointer for indirect access."

I would've added pointer arithmetic as allowed, but then I'd've had to
add a note about a complete object type. I would've added function
calls as not being allowed, but I hoped that would be understood by
"indirect access". An exhaustive list would be tedious.

> Can you provide a small self-contained program that's relevant to
> the point you're making?
>
> If you're saying that there's some construct whose behavior you
> say is implementation-defined, and I say is undefined, then please
> point out the construct in question in your program.
>

int main(void) {
void * vp1 = (void *) 0xC;
void * vp2 = vp1;
return 0;
}

This is implementation-defined, not undefined. If the implementation
does not define the result of the cast to be a trap representation, or
better yet, defines that pointers are implemented as unsigned integers
with all value bits, then there is no undefined behaviour. This is the
case for Windows NT, as far as I'm aware.

If someone wants to suggest that 'vp1' and 'vp2' have trap
representations, I'd like to know why. If we can agree that their
representations can provide hints to a debugger, then I'll be glad to
read that.

Keith Thompson

unread,

Jan 22, 2013, 5:15:34 PM1/22/13

to

I disagree, and I'll elaborate below.

> All I've been trying to point out is that, for Windows NT, 'ptr' does
> not contain a trap representation, so any use _other_ than indirection
> (via '*', '->', or function-call '()', for example) is well-defined.
> For indirection, it depends on if 'ptr' points to an 'int' object, as
> always.

Assume that there is no int object at address 0x0000000C. Do you
believe that the behavior of an access to ptr is well-defined *by
the C standard*?

A trap representation is a representation such that accessing an object
holding it has undefined behavior. Undefined behavior is behavior that
is not defined by the C standard, regardless of whether the
implementation chooses to define it.

[...]

> I don't think there needs to be an inference that were was a dereference
> to a null pointer, but other than that, I think you're right. I really
> wish I'd said "NULL_CLASS_PTR_DEREFERENCE" in the beginning, so it
> wouldn't have been an issue of discussion. The trap values that Geoff
> had discussed were not C trap representations, so I was likewise being
> loose with "null pointers." I am filled with regret about that. As
> we've already discussed, such a pointer value does not compare equal to
> a C null pointer, so the debugger could hardly claim that it's a C null
> pointer.

And in fact it doesn't claim either that 0x000000C a C null pointer, or
that it's any kind of null pointer.

> In order to understand what I mean by "loose" and why I might've done
> so, for what it's worth, there are some people who don't use the C
> definition for "null pointer" in non-pedantic discussion:
>
> Scott Noone (well known at OSR Online):
> http://social.msdn.microsoft.com/Forums/en-AU/windbg/thread/669a4137-f328-495f-a002-120fb6841542
>
> Chad Bramwell:
> http://www.altdevblogaday.com/2011/09/26/how-did-i-crash-in-that-func/

I disagree with your interpretation of both linked discussions. In both
cases, there was a null pointer dereference in C (or C++) code, which
resulted in an attempted dereference of a non-null but invalid pointer
(similar to 0x0000000C) in the generated machine code.

Machine code *implements* C semantics; it needn't precisely mirror them.

[...]

> Considering a Windows pointer with representation 0xC, I meant with
> regards to what Ben had asked about and what I had typed later in the
> same post: "The C Standard _plus_ the implementation say: It can be
> stored, read, passed, discarded, converted, compared, its size
> determined, etc. Pretty well anything that doesn't involve using the
> pointer for indirect access."

Sure, but the C standard doesn't permit any of those things. It's well
known that an implementation can define behavior that isn't defined by
the C standard. Such behavior is still "undefined behavior" as defined
by 3.4.3.

[...]

>> Can you provide a small self-contained program that's relevant to
>> the point you're making?
>>
>> If you're saying that there's some construct whose behavior you
>> say is implementation-defined, and I say is undefined, then please
>> point out the construct in question in your program.
>
> int main(void) {
> void * vp1 = (void *) 0xC;
> void * vp2 = vp1;
> return 0;
> }
>
> This is implementation-defined, not undefined. If the implementation
> does not define the result of the cast to be a trap representation, or
> better yet, defines that pointers are implemented as unsigned integers
> with all value bits, then there is no undefined behaviour. This is the
> case for Windows NT, as far as I'm aware.

No, it's undefined. It may additionally be defined by the
implementation, but it's not implementation-defined.

The phrase "implementation-defined behavior", as defined by the C
standard, refers *only* to behavior that is explicitly referred to
by the standard as "implementation-defined". It doesn't just mean
"behavior that is defined by the implementation".

In your program, the behavior of the initialization:

void * vp2 = vp1;

is not defined by the C standard. For example, it could cause the
program to crash in a conforming implementation. It is therefore,
by definition, undefined behavior. If an implementation chooses
to define its behavior, that doesn't change any of the above --
and the implementation is not obligated to document its choice.

> If someone wants to suggest that 'vp1' and 'vp2' have trap
> representations, I'd like to know why. If we can agree that their
> representations can provide hints to a debugger, then I'll be glad to
> read that.

They are pointer values such that accessing them has undefined
behavior. They are neither null pointers, nor pointers to any
object, nor pointers just past the end of any object. It's not 100%
clear to me, from the standard's definition of "trap representation",
that they *must* be trap representations, but I believe that
they are.

Speculation: Perhaps what bothers you is the idea that "undefined
behavior" and "trap representation" imply "This is evil, don't
touch!!!". They don't. It's perfectly valid for an implementation
to define the behavior of something that has "undefined behavior" in
the standard, or to use a C trap representation for its own purposes.

Certainly their representation can provide hints to a debugger; we've
seen that demonstrated. That doesn't cause them not to be trap
representations.

Keith Thompson

unread,

Jan 22, 2013, 5:31:31 PM1/22/13

to

Keith Thompson <ks...@mib.org> writes:

> Shao Miller <sha0....@gmail.com> writes:
[...]
>> Considering a Windows pointer with representation 0xC, I meant with
>> regards to what Ben had asked about and what I had typed later in the
>> same post: "The C Standard _plus_ the implementation say: It can be
>> stored, read, passed, discarded, converted, compared, its size
>> determined, etc. Pretty well anything that doesn't involve using the
>> pointer for indirect access."
>
> Sure, but the C standard doesn't permit any of those things. It's well
> known that an implementation can define behavior that isn't defined by
> the C standard. Such behavior is still "undefined behavior" as defined
> by 3.4.3.
[...]

Sorry, that was imprecise. What I should have said is that the C
standard doesn't define the behavior of any of those things.

Geoff

unread,

Jan 22, 2013, 8:51:04 PM1/22/13

to

On Tue, 22 Jan 2013 07:43:51 -0800, Keith Thompson <ks...@mib.org> wrote:

>What you've seen, as I recall, is the Microsoft debugger, on an
>empt to dereference (on the machine code level) a pointer with
>the representation 0x0000000C, printing an error message that
>includes the identifier "NULL_CLASS_PTR_DEREFERENCE". That is not
>the debugger calling 0x0000000C a null pointer. It's the debugger
>inferring, from the attempt to dereference 0x0000000C, that there
>was an attempt to dereference a C null pointer (0x00000000). Or am
>I missing something?

You're not missing anything. That is precisely what is happening.
The actual message is something like: "Unhandled exception at 0x01091000 in
trap.exe: 0xC0000005: Access violation writing location 0x0000000c."

Error 0xC0000005 is STATUS_ACCESS_VIOLATION, the memory could not be written or
read. The process attempted to access memory outside its memory pool.

Shao Miller

unread,

Jan 22, 2013, 9:32:44 PM1/22/13

to

Assuming you really meant "access to ptr" and not "access to *ptr", my
answer is: No, then no.

The first access is during the 'memcpy'. The implementation defines the
representations of both 'x' and 'ptr'. If they do not have the same
size, there could be an out-of-bounds condition. If they do have the
same size, then this access is fine and we move on.

The second access is during the lvalue conversion that determines the
operand of the unary '*' operator. This second access has
implementation-defined behaviour, since the implementation defines the
mapping from object representations to values. Anything not defined (by
omission) or explicitly defined to be a trap representation would be a
trap representation. In that case, the lvalue conversion would yield
undefined behaviour. Otherwise, the value is a valid value and we get
past that lvalue conversion.

After that point, we can fret about the '*' operator. Whether or not
there's an object at address 0xC is not a consideration until this
point, so my understanding goes.

What I would say is an extremely relevant piece of Standard has
accidentally been snipped:

" 6.2.6.1 General

1 The representations of all types are unspecified except as stated
in this subclause.

2 Except for bit-fields, objects are composed of contiguous
sequences of one or more bytes, the number, order, and encoding of which
are either explicitly specified or implementation-defined."

Until the indirection in the last line, your example above should have
no different expectation for undefined/defined behaviour when compared to:

#include <string.h>
#include <stdio.h>

int main(void) {
float f;
unsigned int x = 0x4048F5C3;
memcpy(&f, &x, sizeof x);
printf("%f\n", f);
return 0;
}

If you'd say that this is undefined behaviour rather than
implementation-defined behaviour, please do explain why.

> A trap representation is a representation such that accessing an object
> holding it has undefined behavior. Undefined behavior is behavior that
> is not defined by the C standard, regardless of whether the
> implementation chooses to define it.
>

Did you just say that implementation-defined behaviour is a subset of
undefined behaviour? I don't think you did, given that you discuss
"implementation-defined," down below.

I think you think that I'm arguing that there is undefined behaviour and
that it so happens to be defined by Microsoft, so I'm claiming it to be
"well-defined". I'm not! I'm saying that there are parts where the
Standard calls something "implementation-defined" instead of
"undefined", and this is one such instance.

I tried to explain this before with the three paragraphs in response to
Ben about "predictability." "Well-defined" to me means that either the
Standard defines it directly, or defines some of it and defines that an
implementation defines the rest of it. Undefined behaviour doesn't
match either of those, even if the implementation provides definitions.
Was that not clear?

> [...]
>
>> I don't think there needs to be an inference that were was a dereference
>> to a null pointer, but other than that, I think you're right. I really
>> wish I'd said "NULL_CLASS_PTR_DEREFERENCE" in the beginning, so it
>> wouldn't have been an issue of discussion. The trap values that Geoff
>> had discussed were not C trap representations, so I was likewise being
>> loose with "null pointers." I am filled with regret about that. As
>> we've already discussed, such a pointer value does not compare equal to
>> a C null pointer, so the debugger could hardly claim that it's a C null
>> pointer.
>
> And in fact it doesn't claim either that 0x000000C a C null pointer, or
> that it's any kind of null pointer.
>

I don't think it's intentional, but I'm going to share a result of this:
I've been contacted by another regular who has the impression that we've
been arguing for a long time about whether or not 0xC is a null pointer.

while (1) {
You("It's not a null pointer.");
Me("Well I didn't mean a C null pointer. Sorry.");
}

What do you mean by "any kind of null pointer"? After this discussion,
I didn't think it was remotely possible to believe that anything other
than the C definition could be spoken of. Heh.

There are at least 4 IDs that WinDbg uses for an invalid memory access:

1. BAD_PTR_DEREFERENCE
2. NULL_CLASS_PTR_DEREFERENCE
3. NULL_DEREFERENCE
4. STRING_DEREFERENCE

An exception can be analyzed (with 'analyze -v') and a "default bucket
ID" will be chosen by WinDbg. I seldom debug programs where it chooses
#1 or #4. I usually see it choose #2 or #3. A garbage pointer usually
just yields a default bucket ID "progtype_FAULT" (such as "DRIVER_FAULT"
or "APPLICATION_FAULT"), with no mention of "NULL" anywhere.

>> In order to understand what I mean by "loose" and why I might've done
>> so, for what it's worth, there are some people who don't use the C
>> definition for "null pointer" in non-pedantic discussion:
>>
>> Scott Noone (well known at OSR Online):
>> http://social.msdn.microsoft.com/Forums/en-AU/windbg/thread/669a4137-f328-495f-a002-120fb6841542
>>
>> Chad Bramwell:
>> http://www.altdevblogaday.com/2011/09/26/how-did-i-crash-in-that-func/
>
> I disagree with your interpretation of both linked discussions. In both
> cases, there was a null pointer dereference in C (or C++) code, which
> resulted in an attempted dereference of a non-null but invalid pointer
> (similar to 0x0000000C) in the generated machine code.
>
> Machine code *implements* C semantics; it needn't precisely mirror them.
>

I don't think you understood my interpretation.

Scott: "This is a NULL pointer dereference in NTFS."

Chad: "Once again we are dereferencing a NULL pointer and once again our
program is crashing..."

Chad: "...the real problem was that your pointer was NULL?"

Chad: "So there ya go. A whole lot of null dereferences..."

In none of the cases was 'NULL' dereferenced. In none of the cases was
"null pointer" typed. That is, people don't always say precisely what
they mean. I'll try to do better in the future.

> [...]
>
>> Considering a Windows pointer with representation 0xC, I meant with
>> regards to what Ben had asked about and what I had typed later in the
>> same post: "The C Standard _plus_ the implementation say: It can be
>> stored, read, passed, discarded, converted, compared, its size
>> determined, etc. Pretty well anything that doesn't involve using the
>> pointer for indirect access."
>
> Sure, but the C standard doesn't permit any of those things. It's well
> known that an implementation can define behavior that isn't defined by
> the C standard. Such behavior is still "undefined behavior" as defined
> by 3.4.3.
>

Discussed above. I'm claiming it's defined by the Standard to be
implementation-defined behaviour, not that it's
undefined-behaviour-relative-to-the-Standard and which happens to be
defined elsewhere. I could be wrong, as always, but I've yet to
understand why that might be.

> [...]
>
>>> Can you provide a small self-contained program that's relevant to
>>> the point you're making?
>>>
>>> If you're saying that there's some construct whose behavior you
>>> say is implementation-defined, and I say is undefined, then please
>>> point out the construct in question in your program.
>>
>> int main(void) {
>> void * vp1 = (void *) 0xC;
>> void * vp2 = vp1;
>> return 0;
>> }
>>
>> This is implementation-defined, not undefined. If the implementation
>> does not define the result of the cast to be a trap representation, or
>> better yet, defines that pointers are implemented as unsigned integers
>> with all value bits, then there is no undefined behaviour. This is the
>> case for Windows NT, as far as I'm aware.
>
> No, it's undefined. It may additionally be defined by the
> implementation, but it's not implementation-defined.
>

I don't understand this perspective, given 6.3.2.3p5:

"An integer may be converted to any pointer type. Except as
previously specified, the result is implementation-defined, might not be
correctly aligned, might not point to an entity of the referenced type,
and might be a trap representation.67)"

> The phrase "implementation-defined behavior", as defined by the C
> standard, refers *only* to behavior that is explicitly referred to
> by the standard as "implementation-defined". It doesn't just mean
> "behavior that is defined by the implementation".
>

Yes, and that's the stuff I'm talking about. (See above.)

> In your program, the behavior of the initialization:
>
> void * vp2 = vp1;
>
> is not defined by the C standard. For example, it could cause the
> program to crash in a conforming implementation. It is therefore,
> by definition, undefined behavior. If an implementation chooses
> to define its behavior, that doesn't change any of the above --
> and the implementation is not obligated to document its choice.
>

You're talking about undefined behaviour. I'm talking about
implementation-defined behaviour. This line cannot stand on its own...

We have to ask, "Was the lvalue conversion of 'vp1' defined?"

The answer is, "Yes, because it has a valid value."

Then we ask, "How do we know that?"

The answer is, "Because it was initialized with the valid value that was
the result of the cast."

Then we ask, "How do we know the result of the cast was a valid value?"

The answer is, "Because the result of the cast conversion is
implementation-defined, and Microsoft's implementation defined it to be
a valid value."

If we'd encountered any lack of definitions along the way, there'd be
undefined behaviour. Or, more naturally, we could work from the
beginning, too. :)

>> If someone wants to suggest that 'vp1' and 'vp2' have trap
>> representations, I'd like to know why. If we can agree that their
>> representations can provide hints to a debugger, then I'll be glad to
>> read that.
>
> They are pointer values such that accessing them has undefined
> behavior. They are neither null pointers, nor pointers to any
> object, nor pointers just past the end of any object. It's not 100%
> clear to me, from the standard's definition of "trap representation",
> that they *must* be trap representations, but I believe that
> they are.
>

Well I suppose it's hard to prove either way, given that it's C90. I
just figured that it is up to the implementation to define which
representations map to which of {value, non-value}. I'm about C99%
sure, so learning otherwise would be a valuable learning experience.

> Speculation: Perhaps what bothers you is the idea that "undefined
> behavior" and "trap representation" imply "This is evil, don't
> touch!!!". They don't. It's perfectly valid for an implementation
> to define the behavior of something that has "undefined behavior" in
> the standard, or to use a C trap representation for its own purposes.
>

Good guess, but nope. :)

> Certainly their representation can provide hints to a debugger; we've
> seen that demonstrated. That doesn't cause them not to be trap
> representations.
>

Agreed.

Tim Rentsch

unread,

Jan 23, 2013, 2:32:48 AM1/23/13

to

Let me see if I can help untangle the description you're giving.

There are two different spaces of interest: values, and object
representations. What is stored in an object is an object
representation, not a value. Sometimes we talk about the "value
stored in an object", but that's just a shorthand for the value
represented by the object representation that the object holds.

Similarly, the result of evaluating an expression is a value, not
an object representation. It might be useful to think of mapping
between the two as a kind of "conversion" -- reading an object
converts an object representation to a value, and storing into an
object converts a value into an object representation.

In value space, the above four classes completely partition the
set of values for pointer types. A kind of value along the lines
you describe -- we might call it "semi-defined" -- would be a
subset of the Type 1 class, not a separate class.

In object representation space, we can consider the set of object
representations whose values will behave exactly in all the ways
the Standard requires (for the value type used to access the
object). These values are what might be called "well behaved",
and actually this set of values is what the Standard normally
means by 'value' (eg, in the phrase 'value of the object type').
For pointer types, these values correspond to types 2-4 above.

Any other object representation is one which when read (ie, by an
lvalue conversion) yields a "value" that might not behave in ways
the Standard requires. Any such object representation meets the
defining condition for trap representations: it need not represent
a (well-behaved) value of the object type. This set of object
representations is exactly the set of trap representations.

Of course, it should go without saying that the above assumes there
has been no previous undefined behavior. Whenever there has been
any previous undefined behavior, any assertion about how subsequent
actions will proceed might not hold up. There isn't any point in
talking about what might be true under such circumstances, because
anything at all _might_ be true, depending on how an implementation
chooses to exercise the unlimited license granted by the Standard
upon encountering undefined behavior.

Tim Rentsch

unread,

Jan 23, 2013, 2:41:38 AM1/23/13

to

Keith Thompson <ks...@mib.org> writes:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
> [...]
>> Actually there are four kinds of pointer values (with the
>> understanding that "value" here includes some that cannot
>> be used definedly):
>>
>> 1. unusable (any use is undefined behavior)
>> 2. null pointers (can be compared for equality/inequality)
>> 3. equality, pointer arithmetic, relational (eg <) comparison
>> 4. like 3 but also can be dereferenced
>>
>> Type 3 values are, eg, pointers one past the end of an array, or
>> non-null values returned from doing a malloc(0). Type 4 values
>> are regular pointers to objects.
> [...]
>
> And I believe that type 1 is exactly the set of trap representations,
> but I haven't been able to prove it.

My response to Philip Lantz gives a line of reasoning related
to this.

> The definition of "trap representation" in 3.19.4 is a bit vague:
>
> trap representation
>
> an object representation that need not represent a value of the
> object type
>
> I wonder why it doesn't say "does not" rather than "need not".

Because implementations may define (either explicitly or
implicitly) what happens for such representations so that they
do represent well-behaved values _in some cases_. Saying that
a TR need not represent a value of the object type leaves open
the possibility that under some circumstances it might be
okay. Basically, it lets the implementation eat its cake and
have it too.

Shao Miller

unread,

Jan 23, 2013, 5:17:16 PM1/23/13

to

On 1/22/2013 21:32, Shao Miller wrote:
> On 1/22/2013 17:15, Keith Thompson wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>> I don't think there needs to be an inference that were was a dereference
>>> to a null pointer, but other than that, I think you're right. I really
>>> wish I'd said "NULL_CLASS_PTR_DEREFERENCE" in the beginning, so it
>>> wouldn't have been an issue of discussion. The trap values that Geoff
>>> had discussed were not C trap representations, so I was likewise being
>>> loose with "null pointers." I am filled with regret about that. As
>>> we've already discussed, such a pointer value does not compare equal to
>>> a C null pointer, so the debugger could hardly claim that it's a C null
>>> pointer.
>>

Scott Noone suggested that I won't find any documentation for
"NULL_CLASS_PTR_DEREFERENCE". D'oh well. :S