Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Bounds Checking as Undefined Behaviour?

41 views
Skip to first unread message

Shao Miller

unread,
Jul 28, 2010, 10:30:26 PM7/28/10
to
Does the following code imply undefined behaviour?

int main(void) {
union {
char bar[1];
char baz[2];
} foo;

foo.baz[0] = 'z';
foo.baz[1] = 'z';
foo.bar[0] = 'r';
return foo.bar[1]; /* Undefined behaviour for O.O.B. */
}

I don't believe it does, since I believe that 'foo.bar' "decays" to a
'char *' whose bounds are checked, if at all, against the substrate of
'foo' versus a 'char[1]' within the object "space" of 'foo'.

I am not at all interested in arguing or debating about it. When in
doubt, treat as undefined, perhaps.

I'd simply be interested in what others have to say about it.

Ben Bacarisse brought up something [I believe is] related in another
thread and Richard Heathfield and Peter "Seebs" Seebach both
additionally shared interesting thoughts on the matter.

So what are your thoughts, if you please?

A good brain-teaser, Ben!

Ben Bacarisse

unread,
Jul 28, 2010, 11:08:00 PM7/28/10
to
Shao Miller <sha0....@gmail.com> writes:
<snip>

> Ben Bacarisse brought up something [I believe is] related

No.

<snip>
> A good brain-teaser, Ben!

Not mine. Please leave me out of it.

--
Ben.

Shao Miller

unread,
Jul 28, 2010, 11:18:18 PM7/28/10
to
On Jul 28, 11:08 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:

> Shao Miller <sha0.mil...@gmail.com> writes:
>
> <snip>
>
> > Ben Bacarisse brought up something [I believe is] related
>
> No.
>
> <snip>
>
> > A good brain-teaser, Ben!
>
> Not mine.  Please leave me out of it.
Hey, I misunderstood. Sorry, Ben! I'm still interested in reading
anyone's thoughts about the matter.

Eric Sosman

unread,
Jul 28, 2010, 11:43:35 PM7/28/10
to
On 7/28/2010 10:30 PM, Shao Miller wrote:
> Does the following code imply undefined behaviour?
>
> int main(void) {
> union {
> char bar[1];
> char baz[2];
> } foo;
>
> foo.baz[0] = 'z';
> foo.baz[1] = 'z';
> foo.bar[0] = 'r';
> return foo.bar[1]; /* Undefined behaviour for O.O.B. */
> }

U.B. for O.O.B., as you say.

> I don't believe it does, since I believe that 'foo.bar' "decays" to a
> 'char *' whose bounds are checked, if at all, against the substrate of
> 'foo' versus a 'char[1]' within the object "space" of 'foo'.

No: An array has its own bounds, not imputed bounds from some
larger entity of which it might be part. Consider:

struct { char c[1]; double d; )
*p = malloc(1000000); // assume success
p->c[999999] = 'x';

> I am not at all interested in arguing or debating about it. When in
> doubt, treat as undefined, perhaps.
>
> I'd simply be interested in what others have to say about it.

Okay: That's what I say. No debate, no argument, just opinion.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Shao Miller

unread,
Jul 28, 2010, 11:51:22 PM7/28/10
to
Thanks, Eric! Just to clarify: Was the intent of your example to
demonstrate a case where a bounds-checking implementation would
diagnose out-of-bounds or not? I think that's what you said above,
but would like to be sure.

Richard Heathfield

unread,
Jul 28, 2010, 11:53:56 PM7/28/10
to
Shao Miller wrote:
> Does the following code imply undefined behaviour?
>
> int main(void) {
> union {
> char bar[1];
> char baz[2];
> } foo;
>
> foo.baz[0] = 'z';
> foo.baz[1] = 'z';
> foo.bar[0] = 'r';
> return foo.bar[1]; /* Undefined behaviour for O.O.B. */
> }
>
> I don't believe it does, since I believe that 'foo.bar' "decays" to a
> 'char *' whose bounds are checked, if at all, against the substrate of
> 'foo' versus a 'char[1]' within the object "space" of 'foo'.

It is not clear to me whether the behaviour is undefined, but I have two
observations to make, one wrt the Standard, and one that is more general.

Firstly:
6.2.6.1(7): "When a value is stored in a member of an object of union
type, the bytes of the object representation that do not correspond to
that member but do correspond to other members take unspecified values."

When you store a value, then, in foo.bar[0], the value of foo.baz[1]
becomes unspecified. Therefore, *at best*, the value (if such there be)
of foo.bar[1] is unspecified, so the implementation is free to make it
anything that can be represented in the type, and does not have to
document its behaviour. (This is not quite the same thing as undefined
behaviour, since the implementation is /not/ free to, say, repeal the
law of gravity as it can in principle do for ++i+i++.)

The second observation is this: the code makes no sense as written. No
matter what it is supposed to achieve, there is certainly a better way
of achieving it. Although clc loves to tilt at windmills, we do tend to
prefer windmills that are actually capable of grinding some genuine
flour. If you can come up with a reason why someone might want to write
the above code, people might take more interest in your question.

<snip>


--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Richard Heathfield

unread,
Jul 28, 2010, 11:56:55 PM7/28/10
to
Eric Sosman wrote:

<snip>

> No: An array has its own bounds, not imputed bounds from some
> larger entity of which it might be part.

Another five seconds of thinking about it before posting, and I'd have
reached the same conclusion. Ah well!

Shao Miller

unread,
Jul 29, 2010, 12:01:37 AM7/29/10
to
On Jul 28, 11:53 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> It is not clear to me whether the behaviour is undefined, but I have two
> observations to make, one wrt the Standard, and one that is more general.
>
> Firstly:
> 6.2.6.1(7): "When a value is stored in a member of an object of union
> type, the bytes of the object representation that do not correspond to
> that member but do correspond to other members take unspecified values."
>
> When you store a value, then, in foo.bar[0], the value of foo.baz[1]
> becomes unspecified. Therefore, *at best*, the value (if such there be)
> of foo.bar[1] is unspecified, so the implementation is free to make it
> anything that can be represented in the type, and does not have to
> document its behaviour. (This is not quite the same thing as undefined
> behaviour, since the implementation is /not/ free to, say, repeal the
> law of gravity as it can in principle do for ++i+i++.)
>
Thanks, Richard!

> The second observation is this: the code makes no sense as written. No
> matter what it is supposed to achieve, there is certainly a better way
> of achieving it. Although clc loves to tilt at windmills, we do tend to
> prefer windmills that are actually capable of grinding some genuine
> flour. If you can come up with a reason why someone might want to write
> the above code, people might take more interest in your question.
>

Point taken. This question of UB really had no purpose other than a
simple test case, so is not very interesting at all. :)

Shao Miller

unread,
Jul 29, 2010, 12:05:23 AM7/29/10
to
On Jul 28, 11:56 pm, Richard Heathfield <r...@see.sig.invalid> wrote:
> Eric Sosman wrote:
>
> <snip>
>
> >     No: An array has its own bounds, not imputed bounds from some
> > larger entity of which it might be part.
>
> Another five seconds of thinking about it before posting, and I'd have
> reached the same conclusion. Ah well!
Forgiveness is yours. :) The question was in regards to out-of-
bounds, not unspecified values. Every response has its value (one
hopes).

Ian Collins

unread,
Jul 29, 2010, 12:21:51 AM7/29/10
to
On 07/29/10 03:56 PM, Richard Heathfield wrote:
> Eric Sosman wrote:
>
> <snip>
>
>> No: An array has its own bounds, not imputed bounds from some
>> larger entity of which it might be part.
>
> Another five seconds of thinking about it before posting, and I'd have
> reached the same conclusion. Ah well!

Unless the array encompasses all available memory, it is always part of
some larger entity!

--
Ian Collins

Eric Sosman

unread,
Jul 29, 2010, 12:28:16 AM7/29/10
to

My thesis is less stringent: All I'm saying is that the behavior
is undefined, regardless of whether there's explicit bounds checking.
For example, the compiler is entitled to assume that p->c and p->d do
not overlap, so that storing to an element of p->c does not invalidate
a p->d that may already have been cached in a register. No bounds
checking in sight, yet the outcome of the program is in doubt anyhow.

--
Eric Sosman
eso...@ieee-dot-org.invalid

Eric Sosman

unread,
Jul 29, 2010, 12:41:27 AM7/29/10
to

Yes (quite likely) from the machine's point of view, but not from
C's standpoint. Separate objects are -- well, separate, not embedded
in some kind of luminiferous aether. That's why it's U.B. to subtract
pointers to disparate objects: There's no "unifying substrate" in which
the address calculation can be defined. In principle, each object can
inhabit its very own private address space, incommensurate with the
spaces that contain other objects. (Granted, implementations usually
take shortcuts.)

--
Eric Sosman
eso...@ieee-dot-org.invalid

Shao Miller

unread,
Jul 29, 2010, 1:00:55 AM7/29/10
to
On Jul 29, 12:28 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
>      My thesis is less stringent: All I'm saying is that the behavior
> is undefined, regardless of whether there's explicit bounds checking.
> For example, the compiler is entitled to assume that p->c and p->d do
> not overlap, so that storing to an element of p->c does not invalidate
> a p->d that may already have been cached in a register.  No bounds
> checking in sight, yet the outcome of the program is in doubt anyhow.
Aha! So for the same reason, we cannot define the output to be a
return code of '3' below:

#include <stddef.h>

int main(void) {
struct {
char c[1];
int v;
} s1, s2;
size_t i;

s1.v = 3;
s2.v = 5;
for (i = 0; i < sizeof s1; i++) {
s2.c[i] = s1.c[i];
}
return s2.v;
}

Because 's2.v' might be cached. Is that right?

Shao Miller

unread,
Jul 29, 2010, 1:11:23 AM7/29/10
to
On Jul 29, 1:00 am, Shao Miller <sha0.mil...@gmail.com> wrote:
> On Jul 29, 12:28 am, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:>      My thesis is less stringent: All I'm saying is that the behavior
> > is undefined, regardless of whether there's explicit bounds checking.
> > For example, the compiler is entitled to assume that p->c and p->d do
> > not overlap, so that storing to an element of p->c does not invalidate
> > a p->d that may already have been cached in a register.  No bounds
> > checking in sight, yet the outcome of the program is in doubt anyhow.
>
> Aha!  So for the same reason, we cannot define the output to be a
> return code of '3' below:
>
... ... ...

>
> Because 's2.v' might be cached.  Is that right?

Or rather, this code, I should have used:

#include <stddef.h>

int main(void) {
struct {
char c[1];
int v;
} s1, s2;

int x;
size_t i;

s1.v = 3;
s2.v = 5;

x = s2.v;

Richard Heathfield

unread,
Jul 29, 2010, 1:13:45 AM7/29/10
to

It's certainly true that s2.v might not be 3. Since the last write to s2
was through s2.c[], the value of s2.v is unspecified, for the same
reason as before (6.2.6.1(7)). "Unspecified" does not necessarily mean
the same as "3".

Shao Miller

unread,
Jul 29, 2010, 1:20:02 AM7/29/10
to
On Jul 29, 1:13 am, Richard Heathfield <r...@see.sig.invalid> wrote:

> Shao Miller wrote:
> > Aha!  So for the same reason, we cannot define the output to be a
> > return code of '3' below:
>
> It's certainly true that s2.v might not be 3. Since the last write to s2
> was through s2.c[], the value of s2.v is unspecified, for the same
> reason as before (6.2.6.1(7)). "Unspecified" does not necessarily mean
> the same as "3".
Aha! I sensed the same reason here as your mention of using a
differently named member for setting values, but didn't wish to draw a
conclusion too quickly. Thanks for confirming! Is that why
'memcpy()' is used for copying, so that the implementation has a
chance to invalidate its caches? Or might we still have a cached
's2.v' after copying over '&s2' with 'memcpy()'?

Nick Keighley

unread,
Jul 29, 2010, 2:48:02 AM7/29/10
to
On 29 July, 03:30, Shao Miller <sha0.mil...@gmail.com> wrote:

> I am not at all interested in arguing or debating about it.

go away then

Tim Rentsch

unread,
Jul 29, 2010, 2:50:03 AM7/29/10
to
Richard Heathfield <r...@see.sig.invalid> writes:

Of course what you meant was that reading s1.c[1] is already
undefined behavior, so the store into s2.c[1] may not even
occur (but will also be undefined behavior if it does).

Also, 6.2.6.1p7 is about unions, and the example code uses only
structs.

Richard Heathfield

unread,
Jul 29, 2010, 3:10:20 AM7/29/10
to

Actually, on reading the code again I don't know what I meant. I think I
was still thinking about unions. Or perhaps I wasn't thinking at all.
The latter seems more likely.

>
> Also, 6.2.6.1p7 is about unions, and the example code uses only
> structs.

Please stop trying to confuse me with the facts.

Kenny McCormack

unread,
Jul 29, 2010, 4:23:27 AM7/29/10
to
In article <703e2e96-c286-4165...@j8g2000yqd.googlegroups.com>,

CLC at its finest!

--
> No, I haven't, that's why I'm asking questions. If you won't help me,
> why don't you just go find your lost manhood elsewhere.

CLC in a nutshell.

Shao Miller

unread,
Jul 29, 2010, 6:22:29 PM7/29/10
to
On Jul 29, 2:50 am, Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
> Of course what you meant was that reading s1.c[1] is already
> undefined behavior, so the store into s2.c[1] may not even
> occur (but will also be undefined behavior if it does).
>
> ... ... ...
Thanks for that, Tim.

int main(void) {
/* Pretty simple structure */
struct {
char c[1];
int i;
int j;
} s1, s2;
/* Copy iterator and dummy */
int cpi;

/* Assign some members */
s1.i = 5;
/* s1.j not assigned */
s2.i = 0;
/* s2.j not assigned */

/* Attempt to copy s1 to s2. s2.i might be cached. #1 */
for (cpi = s2.i; cpi < sizeof s1; cpi++)
/* Why might reading s1.c[cpi] be undefined? */
s2.c[cpi] = s1.c[cpi];
/* Attempt to fetch s2.i again. Could it be cached from #1? */
cpi = s2.i;

/* Is memcpy() well-defined for copying s1 to s2? #2 */
memcpy(&s2, &s1, sizeof s1);
/* Could s2.i still be cached from #1? */
cpi = s2.i;

/* Attempt to copy s1 to s2 using approach #3 */
for (cpi = 0; cpi < sizeof s1; cpi++)
/* Use a char * to access each */
((char *)&s2)[cpi] = ((char *)&s1)[cpi];
/* Could s2.i still be cached from #1? */
cpi = s2.i;

/* Attempt to copy s1 to s2 using approach #4 */
for (cpi = 0; cpi < sizeof s1; cpi++)
/* Use casts of the c member, which is at the start */
((char *)s2.c)[cpi] = ((char *)s1.c)[cpi];
/* Could s2.i still be cached from #1? */
cpi = s2.i;

return cpi;
}

Shao Miller

unread,
Jul 29, 2010, 7:14:06 PM7/29/10
to
Please forgive the fact that the code mixes the 'size_t' result from
'sizeof' with 'int'. That's not really the meat and potatoes of the
inline code-questions. :)

Shao Miller

unread,
Jul 30, 2010, 5:04:02 AM7/30/10
to
On Jul 29, 4:23 am, gaze...@shell.xmission.com (Kenny McCormack)
wrote:
> In article <703e2e96-c286-4165-a66b-73e7bac0b...@j8g2000yqd.googlegroups.com>,

> Nick Keighley  <nick_keighley_nos...@hotmail.com> wrote:
>
> >On 29 July, 03:30, Shao Miller <sha0.mil...@gmail.com> wrote:
>
> >> I am not at all interested in arguing or debating about it.
>
> >go away then
>
> CLC at its finest!
>
Heheh, isn't that something.

I'm not interested in argument or debating it, because I agree with
Richard W. M. Jones and Paul H. J. Kelly in their their paper[1]
that[2]:

"Casts can properly be used to change the type of the object to which
a pointer refers, but cannot be used to turn a pointer to one object
into a pointer to another. A corollary is that bounds checking is not
type checking: it does not prevent storage from being declared with
one data structure and used with another."

That is, by my understanding, that one must consider the entirety of
the object as the bounds; not some bounds based on type. A contiguous
region of storage valid for storing values.

If you treat all of memory (as per Ian Collins' quip) as a largest
object, then the bounds are the bounds of memory.

If you consider Eric Sosman's observation regarding "disparate
objects," we can imagine an object space with holes between objects.
These holes are rather like the unspecified padding in a 'struct'
object. So you imagine the bounds of objects in all of memory, and
you can recursively treat the bounds of sub-objects within a 'struct'
object similarly.

An array of an array of an array etc. of an object type has no holes.
Thus for the 2D array example originally from another thread
regarding:

int two_d_array[10][10];
two_d_array[0][20] = 5;

When 'two_d_array[0]' "decays" into an 'int *', some perceived bound
of '10' for the latter dimension implies a treatment of bounds based
on type, not based on the object.

This is why I offered the original post. The 'union' object likewise
has no holes between its start and the "sub-object" designated by
'foo.bar[1]'. Same thing as an array to me, so no undefined
behaviour, in my opinion.

With a 'struct' object, we might have holes (padding) between member
"sub-objects." It would appear that 'memcpy()' and otherwise treating
a 'struct' object as an array of 'char' allows for even such holes to
be perhaps temporarily "walked across." Quite fortunately, we are
protected by the restriction of character type, along with never
attempting to use such a hole as anything other than an "unspecified
value." Not undefined behaviour, but an unspecified value.

A 'struct' object should not span across memory segments or other
possible trap boundaries, so walking the holes in such an object is
also a little different than walking all of memory as an array of
'char', so the recursive treatment suggested above is not 100% the
same... It's safer on a 'struct' object.

Argument or debate are obviously valuable processes for understanding
a subject matter, but not my interest in this case.

[1] Jones, R. W. M. and Kelly, P. H. J. "Backwards-compatible bounds
checking for arrays and pointers in C programs". in ``Third
International Workshop on Automated Debugging'', M. Kamkar and D.
Byers, eds (Linkoping University Electronic Press), 1997. See also:
PostScript[3] and online proceedings[4]. Jones and Kelly provided
object bounds-checking for GCC in 1995.
[2] See [1], pp. 2, section 2, paragraph 3.
[3] http://www.doc.ic.ac.uk/~phjk/Publications/BoundsCheckingForC.ps.gz
[4] http://www.ep.liu.se/ea/cis/1997/009/

Shao Miller

unread,
Aug 3, 2010, 5:13:58 AM8/3/10
to
The undefined behaviour for the multi-dimensional array business is so
interesting. Which of the "Like this?" lines (if any) would allow for
the "Ok?" line to be well-behaved?

#include <stddef.h>
#include <stdlib.h>

#define UB_ALLOWED 1

int main(void) {
typedef int arrten[10];
typedef arrten arrarr[10];
typedef int rawints[sizeof (arrarr)];
void *vp;
size_t s = sizeof (arrarr);
arrarr *foo;
int *ip;

vp = malloc(s);
if (!vp) return 1;

foo = vp;
ip = vp;

(*foo)[1][0] = 2; /* Ok */
ip[10] = 3; /* Ok */

if ((*foo)[0] != ip) return 2;

ip = (*foo)[0]; /* Ok. Make dirty */
if ((*foo)[0] != ip) return 3;

#if UB_ALLOWED
ip[10] = 5; /* UB */
(*foo)[0][10] = 7; /* UB */
#endif

/* How do we wash it off? */
ip = vp; /* Like this? */
ip = (void *)vp; /* Like this? */
ip = (void *)foo; /* Like this? */
ip = *(rawints *)vp; /* Like this? */

ip[10] = 11; /* Ok? */

free(vp);
return 0;
}

Please and thank-you. :)

Ben Bacarisse

unread,
Aug 3, 2010, 7:37:49 AM8/3/10
to
Shao Miller <sha0....@gmail.com> writes:

> The undefined behaviour for the multi-dimensional array business is so
> interesting. Which of the "Like this?" lines (if any) would allow for
> the "Ok?" line to be well-behaved?

All of them.

> #include <stddef.h>
> #include <stdlib.h>
>
> #define UB_ALLOWED 1
>
> int main(void) {
> typedef int arrten[10];
> typedef arrten arrarr[10];
> typedef int rawints[sizeof (arrarr)];

You probably meant sizeof(arrarr)/sizeof(int) here, though your
expression is perfectly safe.

> void *vp;
> size_t s = sizeof (arrarr);
> arrarr *foo;
> int *ip;
>
> vp = malloc(s);
> if (!vp) return 1;
>
> foo = vp;
> ip = vp;
>
> (*foo)[1][0] = 2; /* Ok */
> ip[10] = 3; /* Ok */
>
> if ((*foo)[0] != ip) return 2;
>
> ip = (*foo)[0]; /* Ok. Make dirty */
> if ((*foo)[0] != ip) return 3;
>
> #if UB_ALLOWED
> ip[10] = 5; /* UB */
> (*foo)[0][10] = 7; /* UB */
> #endif
>
> /* How do we wash it off? */
> ip = vp; /* Like this? */
> ip = (void *)vp; /* Like this? */

This cast is defined to have no effect at all.

> ip = (void *)foo; /* Like this? */

foo was converted from vp so converting it back to a void * is defined
to yield the original pointer again. The point being that only the
first and last of these are really different. The middle two are, by
definition, the same as the first.

> ip = *(rawints *)vp; /* Like this? */
>
> ip[10] = 11; /* Ok? */
>
> free(vp);
> return 0;
> }

--
Ben.

Shao Miller

unread,
Aug 3, 2010, 9:00:56 AM8/3/10
to
On Aug 3, 7:37 am, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:

> Shao Miller <sha0.mil...@gmail.com> writes:
> > The undefined behaviour for the multi-dimensional array business is so
> > interesting.  Which of the "Like this?" lines (if any) would allow for
> > the "Ok?" line to be well-behaved?
>
> All of them.
>
> > #include <stddef.h>
> > #include <stdlib.h>
>
> > #define UB_ALLOWED 1
>
> > int main(void) {
> >   typedef int arrten[10];
> >   typedef arrten arrarr[10];
> >   typedef int rawints[sizeof (arrarr)];
>
> You probably meant sizeof(arrarr)/sizeof(int) here, though your
> expression is perfectly safe.
>
Absolutely! Just before falling asleep, I realized I'd forgotten
this. Thanks for catching and pointing it out. :)

>
> >   void *vp;
> >   size_t s = sizeof (arrarr);
> >   arrarr *foo;
> >   int *ip;
>
> >   vp = malloc(s);
> >   if (!vp) return 1;
>
> >   foo = vp;
> >   ip = vp;
>
> >   (*foo)[1][0] = 2;     /* Ok */
> >   ip[10] = 3;           /* Ok */
>
> >   if ((*foo)[0] != ip) return 2;
>
> >   ip = (*foo)[0];       /* Ok. Make dirty */
> >   if ((*foo)[0] != ip) return 3;
>
> > #if UB_ALLOWED
> >   ip[10] = 5;           /* UB */
> >   (*foo)[0][10] = 7;    /* UB */
> > #endif
>
> >   /* How do we wash it off? */
> >   ip = vp;              /* Like this? */
> >   ip = (void *)vp;      /* Like this? */
>
> This cast is defined to have no effect at all.
>

Ok. That's what I thought. :)

> >   ip = (void *)foo;     /* Like this? */
>
> foo was converted from vp so converting it back to a void * is defined
> to yield the original pointer again.  The point being that only the
> first and last of these are really different.  The middle two are, by
> definition, the same as the first.
>

Aha.

> >   ip = *(rawints *)vp;  /* Like this? */
>
> >   ip[10] = 11;          /* Ok? */
>
> >   free(vp);
> >   return 0;
> > }

Thanks, Ben!

Shao Miller

unread,
Aug 4, 2010, 4:29:16 PM8/4/10
to
Well I'd really appreciate if anyone could report C implementations
which the following program requests them to report.

This program attempts to demonstrate that the definition of pointer
arithmetic makes the informative note in C99's Annex J tough to
rationalize.

Under J.2:

"An array subscript is out of range, even if an object is apparently
accessible with the
given subscript (as in the lvalue expression a[1][7] given the
declaration int
a[4][5]) (6.5.6).

Here is the program. Thanks!:

/**
* bounds.c
*
* Check if a C implementation might encode bounds
* information into its pointer representation.
*
* (C) Shao Miller, 2010. All rights reserved.
* Permission is granted to:
* - Copy the source code
* - Compile the source code into an executable program
* - Execute the resulting program
*
* Please report any interesting cases! Thank you! :)
*/

#include <stddef.h>
#include <stdlib.h>
#include <stdio.h>

static void please_report(void) {
printf("PLEASE REPORT this C implementation's name and version to:
\n"
"Usenet: comp.lang.c: \"Bounds Checking as Undefined Behaviour?\"\n"
"Or:\n"
"http://groups.google.com/group/comp.lang.c/browse_thread/thread/
c4c847820e1f25f1\n\n");
return;
}

unsigned char *claim1(void) {
/* Claim #1: Initialized to 3 at program startup */
static unsigned char c = 3;
return &c;
}

static unsigned char claim2(unsigned char param) {
unsigned char *cp;
static int fill_set = 0;

cp = claim1();
if (fill_set)
goto claim;
check:
if (param > 2) {
--param;
--*cp;
goto check;
}
--*cp;
--*cp;
fill_set = 1;
claim:
/* Claim #2: If param is 3, then by claim #1, we return 0 */
return *cp;
}

static void claim3(unsigned char *area, size_t count) {
unsigned char fill;
fill = claim2(3);
while (count)
area[--count] = fill;
/* Claim #3: By claim #2, area will be filled with 0 */
return;
}

struct ptr_wrapper {
/* No padding before the first member */
int *ip;
/* Possible unspecified padding */
};

int main(void) {
struct ptr_wrapper s1, s2, final1, final2;
unsigned char s1_copy[sizeof (struct ptr_wrapper)];
unsigned char s2_copy[sizeof (struct ptr_wrapper)];
unsigned char mixer1[sizeof s1_copy];
unsigned char mixer2[sizeof s2_copy];
void *vp;
int (*inta)[10];
size_t sz = sizeof *inta;
unsigned char *copier;
int encoded_bounds;
int *oob_tester;

vp = malloc(sz);
if (!vp) {
printf("Out of memory. Sorry.\n");
return 1;
}
/**
* The allocated space might be an object.
* It might be an array object.
* What are its bounds at this moment?
* What is the type for the object(s)?
*/

s1.ip = vp;
/**
* Is pointer arithmetic with s1.ip well-defined? How many
elements?
* Claim #4: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

/* Copy s1 */
sz = sizeof s1;
copier = (unsigned char *)&s1;
while (sz) {
--sz;
s1_copy[sz] = copier[sz];
}

inta = vp;
s2.ip = *inta;
/**
* Is pointer arithmetic with s1.ip well-defined? How many
elements?
* Claim #5: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

/* Copy s2 */
sz = sizeof s2;
copier = (unsigned char *)&s2;
while (sz) {
--sz;
s2_copy[sz] = copier[sz];
}

/* Fill mixer1 */
claim3(mixer1, sizeof mixer1);
/* Claim #6: By claim #3, mixer1 is filled with 0 */

/* Fill mixer2 */
claim3(mixer2, sizeof mixer2);
/* Claim #7: By claim #3, mixer2 is filled with 0 */

/* Mix s1_copy into mixer1 */
sz = sizeof s1_copy;
copier = (unsigned char *)s1_copy;
while (sz) {
--sz;
mixer1[sz] |= copier[sz];
}
/* Claim #8: By claim #6 and the ORing above, mixer1 is a copy of s1
*/

/* Mix s2_copy into mixer2 */
sz = sizeof s2_copy;
copier = (unsigned char *)s2_copy;
while (sz) {
--sz;
mixer2[sz] |= copier[sz];
}
/* Claim #9: By claim #7 and the ORing above, mixer2 is a copy of s2
*/

/**
* Compare the pointer representations copied from s1 and s2.
* Since there is no padding before the pointer member, these are
* the first bytes in the sequence.
*/
sz = sizeof s1.ip;
encoded_bounds = 0;
while (sz) {
--sz;
if (mixer1[sz] != mixer2[sz])
encoded_bounds = 1;
}
if (encoded_bounds) {
printf("This C implementation may encode bounds in its pointer"
" representation!\n");
please_report();
/* Check for comparison equality */
if (s1.ip == s2.ip) {
printf("Furthermore, this C implementation compares\n"
"pointers with different bounds as being equal!\n");
please_report();
}
}

/* Copy mixer1 into final1 */
sz = sizeof mixer1;
copier = (unsigned char *)&final1;
while (sz) {
--sz;
copier[sz] = mixer1[sz];
}

/* Copy mixer2 into final2 */
sz = sizeof mixer2;
copier = (unsigned char *)&final2;
while (sz) {
--sz;
copier[sz] = mixer2[sz];
}

/**
* Have any bounds implied by the original pointers carried across
* into final1 and final2? How can a C implementation have passed
* them along?
*/

printf("Run-time test for this C implementation's bounds-checking.
\n"
"If you do not see a message indicating success down below, then
\n");
please_report();

/* Do we establish the int[10] effective type here? */
final2.ip[9] = 5;

/* Is the pointer arithmetic implied below well-defined? */
final2.ip[10] = 5;
/* Is the pointer arithmetic implied below well-defined? */
final1.ip[10] = 5;
/* Perhaps somehow, the implementation will have noted OOB? */

/* How about this reasoning? */
oob_tester = final2.ip;
oob_tester++;
/* If the first element is an int[1], now we point one-past. Do it
again. */
oob_tester++;
/* But, perhaps there was an int[1] there, too. We point one-past
it now. */
oob_tester++;
/* And so on */
oob_tester++;
oob_tester++;
oob_tester++;
oob_tester++;
oob_tester++;
oob_tester++; /* We are pointing at a ninth int element */
oob_tester++; /* We are pointing one-past a ninth int element */
*oob_tester = 5; /* Out-of-bounds? */
/* Put differently, */
oob_tester = final2.ip;
*((((((((((oob_tester + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) + 1) +
1) = 5;
/* Versus */
*(oob_tester + 10) = 5;

printf("Bounds-checking test succeeded.\n\n");

return 0;
}

Shao Miller

unread,
Aug 4, 2010, 4:46:01 PM8/4/10
to
On Aug 4, 4:29 pm, Shao Miller <sha0.mil...@gmail.com> wrote:
> Well I'd really appreciate if anyone could report C implementations
> which the following program requests them to report.
> ... ... ...
Two corrections.

Under 'main':

size_t sz = sizeof *inta;

should have been:

size_t sz = sizeof *inta + sizeof (int);

And here:

/**
* Is pointer arithmetic with s1.ip well-defined? How many elements?
* Claim #5: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

should have been:

/**
* Is pointer arithmetic with s2.ip well-defined? How many elements?


* Claim #5: We have not modified any values in the allocated space,
* so we haven't established an effective type for it yet.
*/

Sorry about that.

Shao Miller

unread,
Aug 6, 2010, 5:03:48 PM8/6/10
to
No hits yet. Oh well... :(

If you'd care to, please join in and imagine some theoretical C
implementation which _purposefully_ diagnoses undefined behaviour at
translation-time and dring execution at every chance it gets, and
exposes any assumptions you might have. Furthermore, imagine that it
performs the strictest of bounds-checking.

Let us refine the definition of "object" (3.14,p1) as having implicit
type 'char[N]', where 'N' is the size of the object, in 'char's.

Thus in:

int i;

'i' is an identifier and an lvalue designating some object whose size
is 'sizeof (int)'. The object thus has an implicit type of
'char[sizeof (int)]'. The bounds for the object are absolute.

And in:

void *vp;
vp = malloc(82);

If 'malloc' returns a valid pointer, it points to an object whose size
is 82 'chars'. The object thus has an implicit type of 'char[82]'.
These bounds are absolute.

Now let us _establish_ the definition of "array object" (as used by
6.5.6,p8, for example) as any object whose size and alignment meet the
requirements of an array type with a known number of elements, known
either at translation-time or during execution.

Please let us consider a "fat pointer" whose representation might be
something like:

struct fat_ptr {
ptrdiff_t position;
_Byte_addr first;
_Byte_addr last;
size_t element_sz;
};

Let's pretend that '_Byte_addr' is some implementation-specific
address representation. Let's then pretend that all pointers use this
"fat" representation. So:

int *ip = &i;

should fill 'ip' with 'position' 0, the 'first' byte address, the
'last' byte address (which is 'sizeof i - 1' away), and 'sizeof i' as
'element_sz'.

But now suppose we cast:

ip = *((int(*)[5])ip);

This is perfectly well-defined. We might expect it to yield a
different "fat" pointer, where 'position' is (reset) to 0, 'first'
remains the same, 'element_sz' remains the same, but 'last' is changed
at our insistence that there are 5 'int's. :(

If we do:

ip += 2;

Our instincts might suggest undefined behaviour due to overflow, but
how so? If the bounds are encoded in the 'fat_ptr' alone, then it is
insufficient for our imaginary implementation to tell us about it.

Well we could take a "bounds-reduction" approach and suggest that a
cast checks that it only _reduces_ bounds or leaves them equal. But
this is not defined by C99. Also, C99 states (6.3.2.3,p7) that the
pointer can be converted back again and compare as equal to the
original. If comparison does not compare the 'last' member, that
would be fine.

One could suppose that the cast given above actually implies some
bounds which are determinable at translation-time, but that would mean
that 'memcpy'ing a pointer (or equivalent, as the last post's 'mixer'
logic entails) could easily discard bounds because the bounds are tied
to the original pointer, as far as the translation can determine.

One could add a couple more members to the "fat" pointer structure:

struct fat_ptr {
ptrdiff_t position;
_Byte_addr first;
_Byte_addr last;
size_t element_sz;
_Byte_addr absolute_first;
_Byte_addr absolute_last;
};

Here we track the bounds of the _substrate_ "object" as well as the
particular sub-object we are pointing into. We have two means for
pointing out-of-bounds, but casts would be well-estabished as being
verifiable at even run-time that their bounds end at the narrowest
region of 'first' and 'absolute_first' with 'last' and
'absolute_last'.

Thanks for reading. :)

Shao Miller

unread,
Aug 14, 2010, 4:24:32 PM8/14/10
to
Shao Miller wrote:
> No hits yet. Oh well... :(
> ... ... ...
> Thanks for reading. :)

Please suppose I have:

static int arr[10];
/* 'arr' below is not the operand to 'sizeof' or '&' */
int *ip1 = arr;

Suppose in the second statement that 'arr' turns into an 'int *' when
evaluated[1]. Suppose the value includes bounds info. Then if we have:

int *ip2 = ip1;

There's no implicit nor explicit conversion[2], there's no change of
value[3], and the bounds info could persist, right? Then if we have:

char *cp = (char *)ip1;

What bounds, if any, could persist into the pointer value assigned to
'cp'? Could the bounds be 'sizeof *ip' elements[4] or could they be
'sizeof arr' elements[4] or are they undefined or unspecified, or
well-defined? Then if we have:

ip2 = (int *)cp;

'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
such as '<', '==', etc. or does it mean via 'memcmp', for example[5]?
The conversion[4] details "the result", but neither of "the value" or
"the object representation". Is there a difference?

As a separate example:

union {
/* 0.1.2.3.4.5 */
/* X.X.X|X.X.X */
int foo[2][3];
/* X.X|X.X|X.X */
int bar[3][2];
} baz;
/* Ok?[4] What bounds might 'cp' be subject to? */
char *cp = (char *)&baz + 2 * sizeof (int);
/* Ok?[4] What bounds might 'ip' be subject to? */
int *ip = (int *)cp;

References from the "C99" C Standard draft with filename 'n1256.pdf':
[1] 6.3.2.1p3
[2] 6.3p1
[3] 6.5.4p4
[4] 6.3.2.3p7
[5] 6.2.6.1p8

Ben Bacarisse

unread,
Aug 14, 2010, 6:40:57 PM8/14/10
to
Shao Miller <sha0....@gmail.com> writes:

> Shao Miller wrote:
<snip>


> Please suppose I have:
>
> static int arr[10];
> /* 'arr' below is not the operand to 'sizeof' or '&' */
> int *ip1 = arr;
>
> Suppose in the second statement that 'arr' turns into an 'int *' when
> evaluated[1]. Suppose the value includes bounds info. Then if we
> have:
>
> int *ip2 = ip1;
>
> There's no implicit nor explicit conversion[2], there's no change of
> value[3], and the bounds info could persist, right? Then if we have:
>
> char *cp = (char *)ip1;
>
> What bounds, if any, could persist into the pointer value assigned to
> cp'?

The most reasonable would be from cp to cp + sizeof arr inclusive. *
can be applied to all but the upper bound.

> Could the bounds be 'sizeof *ip' elements[4]

If there were this small and they were enforced, then the implementation
could not be conforming.

> or could they be
> sizeof arr' elements[4] or are they undefined or unspecified, or
> well-defined?

Since such bounds are outside of any standard, so you get to say what is
defined and undefined. If you go on to say what the effect of violating
a bound is, you might get something that interferes with the C standard.
One way to avoid that is to have no bounds at all. Presumably you aim
to have the tightest possible bounds such that, say, a trap on stepping
outside of them does not contravene the C standard.

> Then if we have:
>
> ip2 = (int *)cp;
>
> 'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
> such as '<', '==', etc.

It means == and only ==. The other operators like < and > happen to
work (in that they will return 0) but only because ip2 == ip1. Had one
or other been moved to point to some other array, then ip2 < ip2 would
not be defined.

> or does it mean via 'memcmp', for example[5]?

Pointers can have junk in the representation. Implementations where
ip1 == ip2 does not imply that memcmp(&ip1, &ip2, sizeof ip1) == 0 are
not uncommon.

> The conversion[4] details "the result", but neither of "the value" or
> "the object representation". Is there a difference?

Yes. See above.

> As a separate example:
>
> union {
> /* 0.1.2.3.4.5 */
> /* X.X.X|X.X.X */
> int foo[2][3];
> /* X.X|X.X|X.X */
> int bar[3][2];
> } baz;
> /* Ok?[4] What bounds might 'cp' be subject to? */
> char *cp = (char *)&baz + 2 * sizeof (int);

cp must be permitted to range over the whole of the object baz.
I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
permitted and cp + 4 * sizeof(int) can be constructed but not
dereferenced.

> /* Ok?[4] What bounds might 'ip' be subject to? */
> int *ip = (int *)cp;

The most logical would be, again, the whole of the baz object.
I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
not dereferenced.

I've probably made some error in the actual bounds, but I hope the ideas
are clear enough.

These questions seem peculiar. Surely the bounds one might construct
for, say, (char *)&baz.foo[1][0] are more interesting?

<snip references>
--
Ben.

Shao Miller

unread,
Aug 15, 2010, 2:55:03 PM8/15/10
to
Ben Bacarisse wrote:
> Shao Miller <sha0....@gmail.com> writes:
>
>> Shao Miller wrote:
> <snip>
>> Please suppose I have:
>>
>> static int arr[10];
>> /* 'arr' below is not the operand to 'sizeof' or '&' */
>> int *ip1 = arr;
>>
>> Suppose in the second statement that 'arr' turns into an 'int *' when
>> evaluated[1]. Suppose the value includes bounds info. Then if we
>> have:
>>
>> int *ip2 = ip1;
>>
>> There's no implicit nor explicit conversion[2], there's no change of
>> value[3], and the bounds info could persist, right? Then if we have:
>>
>> char *cp = (char *)ip1;
>>
>> What bounds, if any, could persist into the pointer value assigned to
>> cp'?
>
> The most reasonable would be from cp to cp + sizeof arr inclusive. *
> can be applied to all but the upper bound.
Ok. That seems sensible. The compiler knows from the declaration how
many contiguous bytes there are.

>
>> Could the bounds be 'sizeof *ip' elements[4]
>
> If there were this small and they were enforced, then the implementation
> could not be conforming.

Well, that's one of the questions. Would it be non-conforming because
there would be no way to copy the object representation of the whole of
'arr'?

>
>> or could they be
>> sizeof arr' elements[4] or are they undefined or unspecified, or
>> well-defined?
>
> Since such bounds are outside of any standard, so you get to say what is
> defined and undefined. If you go on to say what the effect of violating
> a bound is, you might get something that interferes with the C standard.
> One way to avoid that is to have no bounds at all. Presumably you aim
> to have the tightest possible bounds such that, say, a trap on stepping
> outside of them does not contravene the C standard.

Ok.

int tdarr[10][10];
/* 'tdarr[0]' is not operand to 'sizeof' or '&'
int *ip = tdarr[0];
/* Bounds for 'cp' might be those of 'int[10]'? */
char *cp = (char *)ip;
/* Undefined behaviour? */
cp += 11 * sizeof (int);

>
>> Then if we have:
>>
>> ip2 = (int *)cp;
>>
>> 'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
>> such as '<', '==', etc.
>
> It means == and only ==. The other operators like < and > happen to
> work (in that they will return 0) but only because ip2 == ip1. Had one
> or other been moved to point to some other array, then ip2 < ip2 would
> not be defined.

Ah yes. As in:

int tdarr[10][10];
int *ip1 = tdarr[0];
int *ip2 = tdarr[2];
/* Undefined behaviour? */
(void)(ip1 == ip2);

>
>> or does it mean via 'memcmp', for example[5]?
>
> Pointers can have junk in the representation. Implementations where
> ip1 == ip2 does not imply that memcmp(&ip1, &ip2, sizeof ip1) == 0 are
> not uncommon.

Ok. Since pointers are so opaque, I suppose they might even have random
bits, as long as it wouldn't effect conformance. "Result" must mean
"value" in this instance, then.

>
>> The conversion[4] details "the result", but neither of "the value" or
>> "the object representation". Is there a difference?
>
> Yes. See above.
>
>> As a separate example:
>>
>> union {
>> /* 0.1.2.3.4.5 */
>> /* X.X.X|X.X.X */
>> int foo[2][3];
>> /* X.X|X.X|X.X */
>> int bar[3][2];
>> } baz;
>> /* Ok?[4] What bounds might 'cp' be subject to? */
>> char *cp = (char *)&baz + 2 * sizeof (int);
>
> cp must be permitted to range over the whole of the object baz.
> I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
> permitted and cp + 4 * sizeof(int) can be constructed but not
> dereferenced.

Ok, sure.

>
>> /* Ok?[4] What bounds might 'ip' be subject to? */
>> int *ip = (int *)cp;
>
> The most logical would be, again, the whole of the baz object.
> I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
> not dereferenced.

So 'ip' then is, for bounds similarity's sake, pointing into an
'int[6]'. We effectively have a one-dimensional array occupying the
same storage as the 'union'... Interesting.

>
> I've probably made some error in the actual bounds, but I hope the ideas
> are clear enough.
>
> These questions seem peculiar. Surely the bounds one might construct
> for, say, (char *)&baz.foo[1][0] are more interesting?

(char *)&baz.foo[1][0]
(char *)&(*((baz.foo) + (1)))[0]
(char *)&(*(((*((baz.foo) + (1)))) + (0)))
(char *)(((*((baz.foo) + (1)))) + (0))
(char *)(baz.foo[1] + 0)

Hmm... Yes, I see your point. The bounds for the resulting pointer
there are similar to the combined 'tdarr' and 'cp' sample above.

Thanks, Mr. B. Bacarisse.

Ben Bacarisse

unread,
Aug 15, 2010, 6:53:35 PM8/15/10
to
Shao Miller <sha0....@gmail.com> writes:

> Ben Bacarisse wrote:
>> Shao Miller <sha0....@gmail.com> writes:
>>
>>> Shao Miller wrote:
>> <snip>
>>> Please suppose I have:
>>>
>>> static int arr[10];
>>> /* 'arr' below is not the operand to 'sizeof' or '&' */
>>> int *ip1 = arr;

<snip>
>>> int *ip2 = ip1;
<snip>


>>> char *cp = (char *)ip1;
>>>

<snip>


>>> Could the bounds be 'sizeof *ip' elements[4]
>>
>> If there were this small and they were enforced, then the implementation
>> could not be conforming.
> Well, that's one of the questions. Would it be non-conforming because
> there would be no way to copy the object representation of the whole
> of 'arr'?

You can have absolutely any bounds you like. You get to define what
bounds are and what they mean. You haven't so no one else can comment.
If you mean hard-checked bounds that, say, stop the program when they
are broken then, having the bounds you suggest would make your system
non-conforming. I.e. it would not really be C anymore.

<snip>


> int tdarr[10][10];
> /* 'tdarr[0]' is not operand to 'sizeof' or '&'
> int *ip = tdarr[0];
> /* Bounds for 'cp' might be those of 'int[10]'? */
> char *cp = (char *)ip;
> /* Undefined behaviour? */
> cp += 11 * sizeof (int);

Assuming you mean hard-checked bounds, I think a case can be made for
either 10 ints or 100 ints. I'd plump for 10 of them, but I won't argue
the point. Basically because I don't care. Sorry, but that's just how
it is.

>>> Then if we have:
>>>
>>> ip2 = (int *)cp;
>>>
>>> 'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
>>> such as '<', '==', etc.
>>
>> It means == and only ==. The other operators like < and > happen to
>> work (in that they will return 0) but only because ip2 == ip1. Had one
>> or other been moved to point to some other array, then ip2 < ip2 would
>> not be defined.
> Ah yes. As in:
>
> int tdarr[10][10];
> int *ip1 = tdarr[0];
> int *ip2 = tdarr[2];
> /* Undefined behaviour? */
> (void)(ip1 == ip2);

No. I don't see how what I said can lead to that conclusion. Both ip1
< ip2 and ip1 > ip2 are undefined, but ip1 == ip2 (and !=, of course) is
fine.

<snip>


>>> As a separate example:
>>>
>>> union {
>>> /* 0.1.2.3.4.5 */
>>> /* X.X.X|X.X.X */
>>> int foo[2][3];
>>> /* X.X|X.X|X.X */
>>> int bar[3][2];
>>> } baz;
>>> /* Ok?[4] What bounds might 'cp' be subject to? */
>>> char *cp = (char *)&baz + 2 * sizeof (int);
>>
>> cp must be permitted to range over the whole of the object baz.
>> I.e. cp[-2 * (int)sizeof(int)] and cp[4 * sizeof(int) - 1] must be
>> permitted and cp + 4 * sizeof(int) can be constructed but not
>> dereferenced.
> Ok, sure.
>
>>
>>> /* Ok?[4] What bounds might 'ip' be subject to? */
>>> int *ip = (int *)cp;
>>
>> The most logical would be, again, the whole of the baz object.
>> I.e. ip[-2] and ip[3] are permitted while ip + 4 can be constructed but
>> not dereferenced.
> So 'ip' then is, for bounds similarity's sake, pointing into an
> int[6]'. We effectively have a one-dimensional array occupying the
> same storage as the 'union'... Interesting.

As I keep saying, you get to choose. I think you are conflating your
proposed bounds (presumably you are designing a bound-checked C
implementation) with what the standard might say. The standard does say
much about pointers converted like this (other than they must convert
back to the original). char pointers have a special dispensation, but
I think your new 'ip' could be defined to be useless if you were of a
mind to define it as such (again, other than converting back).

There is the practical question of what bounds you can reasonably
maintain. There may well be times when the most reasonable bounds you
can pull from piece of code suggest that accesses might be permitted
that the standard does not define. That's just a limit on the bounds
checking design -- it does not mean that the standard should permit such
accesses.

<snip>
> Thanks, Mr. B. Bacarisse.

That joke (it if is a joke) is wearing thin, now. If it is not a joke,
then you have needlessly extrapolated one person's reply to encompass
all other posters here.

Do you have anyone around you trust to understand the tone of what you
and others write? If so, ask them to look are few posts and to tell you
what they think. See of they can explain why people sometimes take your
posts in a way that seems to surprise you.

--
Ben.

Shao Miller

unread,
Aug 15, 2010, 9:11:18 PM8/15/10
to
Yes I was wondering what a conforming implementation is within its
rights to determine about hard-checked bounds whose overflow is caught
and reported. Could an implementation argue that it is conforming by
treating the particular sub-object as an 'int[1]' which constrains what
'cp' can point to? As in, what rights regarding "array object" is an
implementation free to decide about?

>
> <snip>
>> int tdarr[10][10];
>> /* 'tdarr[0]' is not operand to 'sizeof' or '&'
>> int *ip = tdarr[0];
>> /* Bounds for 'cp' might be those of 'int[10]'? */
>> char *cp = (char *)ip;
>> /* Undefined behaviour? */
>> cp += 11 * sizeof (int);
>
> Assuming you mean hard-checked bounds, I think a case can be made for
> either 10 ints or 100 ints. I'd plump for 10 of them, but I won't argue
> the point. Basically because I don't care. Sorry, but that's just how
> it is.

Your preference is certainly consistent with one of the items of the
informative section J.2. Thus in:

int tdarr[10][1];


int *ip = tdarr[0];

char *cp = (char *)ip;

perhaps we could similarly reason that the declaration could be
interpreted just cause for an implementation to constrain 'cp' to
pointing to the bytes within a single 'int'. Very well! No apologies
needed; your feedback helps!

>
>>>> Then if we have:
>>>>
>>>> ip2 = (int *)cp;
>>>>
>>>> 'ip2' should compare equal to 'ip1'[4]. Does that mean via operators
>>>> such as '<', '==', etc.
>>> It means == and only ==. The other operators like < and > happen to
>>> work (in that they will return 0) but only because ip2 == ip1. Had one
>>> or other been moved to point to some other array, then ip2 < ip2 would
>>> not be defined.
>> Ah yes. As in:
>>
>> int tdarr[10][10];
>> int *ip1 = tdarr[0];
>> int *ip2 = tdarr[2];
>> /* Undefined behaviour? */
>> (void)(ip1 == ip2);
>
> No. I don't see how what I said can lead to that conclusion. Both ip1
> < ip2 and ip1 > ip2 are undefined, but ip1 == ip2 (and !=, of course) is
> fine.
>

My mistake! I typed the wrong operator. I should have typed:

(void)(ip1 < ip2);

which you have already just stated would indeed be undefined. Thanks.

Aha. Agreed for the right to define as useless. Agreed for the
requirement to convert back to the value of the original. Great.

>
> There is the practical question of what bounds you can reasonably
> maintain. There may well be times when the most reasonable bounds you
> can pull from piece of code suggest that accesses might be permitted
> that the standard does not define. That's just a limit on the bounds
> checking design -- it does not mean that the standard should permit such
> accesses.

Yeahbut I'm trying to understand what access the Standard _does_ define.
What bounds it mandates versus what the implementation gets to choose
as extension. As in, "how do we determine the number of elements in an
array object for purposes of pointer arithmetic"?

>
> <snip>
>> Thanks, Mr. B. Bacarisse.
>
> That joke (it if is a joke) is wearing thin, now. If it is not a joke,
> then you have needlessly extrapolated one person's reply to encompass
> all other posters here.

It's not a joke. If you do not require its use, that's easily accepted
and consider it done. The idea was: Better polite than potentially
offensive.

>
> Do you have anyone around you trust to understand the tone of what you
> and others write? If so, ask them to look are few posts and to tell you
> what they think. See of they can explain why people sometimes take your
> posts in a way that seems to surprise you.
>

Again, one cannot please all of the people all of the time. One can
only try and succeed or try and fail. People often find and get what
they want ("Rorschach inkblot tests"). Though this personal
communications feedback is appreciated, there's no general solution, so
I'd rather keep discussion to the C. Your advice is certainly a good
check; the results best as private.

Shao Miller

unread,
Jun 8, 2011, 10:52:41 AM6/8/11
to
Good day, folks!

With thanks to Wojtek Lerch for pointing out DR #206, and to Clive Feather for reporting his concerns:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm

I am wondering about whether or not the behaviour is defined for the noted line (or in general) in the code below:

/* Shao Miller, 2011 */
void * ptr_equ(const void * p, const void * q) {
typedef unsigned char * bp;
void * result;
bp pi, qi, ri, re;
pi = (bp)&p; qi = (bp)&q; ri = (bp)&result;
re = ri + sizeof result;
while (ri < re)
/* Defect Report #260: "Provenance" */
*ri++ = *pi++ & *qi++;
pi = (bp)&p; qi = (bp)&q; ri = (bp)&result;
while (ri < re)
if (*ri != *pi++ || *ri++ != *qi++)
return (void *)0;
return result;
}

#include <stdio.h>
#include <stdlib.h>

int main(void) {
int iaa[2][2] = {{1,2},{3,4}};
int * ip;

printf(
"iaa: {{%d,%d},{%d,%d}}\n",
iaa[0][0],
iaa[0][1],
iaa[1][0],
iaa[1][1]
);
/* &iaa[0][2] == &iaa[1][0] ? */
ip = ptr_equ(iaa[0] + 2, iaa[1]);
if (!ip) {
puts("Pointers have different object representation!");
return EXIT_FAILURE;
}
puts("Pointers have the same object representation");
/* Does the next line have undefined behaviour? */
*ip = 5;
printf(
"iaa: {{%d,%d},{%d,%d}}\n",
iaa[0][0],
iaa[0][1],
iaa[1][0],
iaa[1][1]
);
return EXIT_SUCCESS;
}

Tim Rentsch

unread,
Jun 8, 2011, 3:37:01 PM6/8/11
to
Shao Miller <sha0....@gmail.com> writes:

> With thanks to Wojtek Lerch for pointing out DR #206, and to Clive Feather for reporting his concerns:
>
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
>
> I am wondering about whether or not the behaviour is defined

> [..snip..]

comp.std.c is a better venue for this question.

io_x

unread,
Jun 9, 2011, 2:48:44 AM6/9/11
to

"Shao Miller" <sha0....@gmail.com> ha scritto nel messaggio
news:27fa82a4-d89c-4b75...@glegroupsg2000goo.googlegroups.com...
it is unreadable, for i think doing something it seems easy

Shao Miller

unread,
Jun 9, 2011, 9:42:09 AM6/9/11
to
On 6/9/2011 02:48, io_x wrote:
> "Shao Miller"<sha0....@gmail.com> ha scritto nel messaggio
> news:27fa82a4-d89c-4b75...@glegroupsg2000goo.googlegroups.com...
>> Good day, folks!
>>
>> With thanks to Wojtek Lerch for pointing out DR #206, and to Clive Feather for
>> reporting his concerns:
>>
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
>>
>> I am wondering about whether or not the behaviour is defined for the noted
>> line (or in general) in the code below:
>>
>> [...code...]

>>
> it is unreadable, for i think doing something it seems easy
>

Very well. I've included a version below which is hopefully more
readable for you. I would expect that for many implementations, the
entire 'obj_ptr_equ' function could be optimized to:

(!memcmp(&ptr_one, &ptr_two, sizeof ptr_one) ? ptr_one : 0)

and for some other implementations, it could be optimized to:

((ptr_one == ptr_two) ? ptr_one : 0)

But anyway, here's the code:

/* Shao Miller, 2011 */

void * obj_ptr_equ(const void * ptr_one, const void * ptr_two) {
typedef unsigned char * byte_ptr_t;
void * result;
/* Points into the object representation of 'ptr_one' */
byte_ptr_t ptr_into_ptr_one;
/* Points into the object representation of 'ptr_two' */
byte_ptr_t ptr_into_ptr_two;
/* Points into the object representation of 'result' */
byte_ptr_t ptr_into_result;
/* Points one past the last byte of 'result' */
byte_ptr_t ptr_to_one_past_result;

/*
* Point to the first byte of:
* - 'ptr_one'
* - 'ptr_two'
* - 'result'
*/
ptr_into_ptr_one = (byte_ptr_t)&ptr_one;
ptr_into_ptr_two = (byte_ptr_t)&ptr_two;
ptr_into_result = (byte_ptr_t)&result;

/* Note the address of one past the last byte of 'result' */
ptr_to_one_past_result = ptr_into_result + sizeof result;

/*
* For each byte of 'result', make it the bitwise AND
* of the corresponding bytes of 'ptr_one' and 'ptr_two'
*/
while (ptr_into_result < ptr_to_one_past_result) {


/* Defect Report #260: "Provenance" */

*ptr_into_result = *ptr_into_ptr_one & *ptr_into_ptr_two;
ptr_into_result++;
ptr_into_ptr_one++;
ptr_into_ptr_two++;
}

/*
* Point to the first byte of:
* - 'ptr_one'
* - 'ptr_two'
* - 'result'
*/
ptr_into_ptr_one = (byte_ptr_t)&ptr_one;
ptr_into_ptr_two = (byte_ptr_t)&ptr_two;
ptr_into_result = (byte_ptr_t)&result;

/*
* For each byte of 'result', compare it with the bytes
* of 'ptr_one' and 'ptr_two'. If we find a mis-match,
* return a null pointer value to the caller
*/
while (ptr_into_result < ptr_to_one_past_result) {
if (*ptr_into_result != *ptr_into_ptr_one)
return (void *)0;
if (*ptr_into_result != *ptr_into_ptr_two)
return (void *)0;
ptr_into_result++;
ptr_into_ptr_one++;
ptr_into_ptr_two++;
}

/*
* If we get here, the object representations for:
* - 'ptr_one'
* - 'ptr_two'
* - 'result'
* are the same. Return the bitwise AND result
*/
return result;
}

#include <stdio.h>
#include <stdlib.h>

int main(void) {
int array_of_array_of_int[2][2] = {{1,2},{3,4}};
int * ip;

printf(
"array_of_array_of_int: {{%d,%d},{%d,%d}}\n",
array_of_array_of_int[0][0],
array_of_array_of_int[0][1],
array_of_array_of_int[1][0],
array_of_array_of_int[1][1]
);
/* &array_of_array_of_int[0][2] == &array_of_array_of_int[1][0] ? */
ip = obj_ptr_equ(
&array_of_array_of_int[0][2],
&array_of_array_of_int[1][0]


);
if (!ip) {
puts("Pointers have different object representation!");
return EXIT_FAILURE;
}
puts("Pointers have the same object representation");
/* Does the next line have undefined behaviour? */
*ip = 5;
printf(

"array_of_array_of_int: {{%d,%d},{%d,%d}}\n",
array_of_array_of_int[0][0],
array_of_array_of_int[0][1],
array_of_array_of_int[1][0],
array_of_array_of_int[1][1]
);
return EXIT_SUCCESS;
}

io_x

unread,
Jun 9, 2011, 12:35:14 PM6/9/11
to

"Shao Miller" <sha0....@gmail.com> ha scritto nel messaggio
news:27fa82a4-d89c-4b75...@glegroupsg2000goo.googlegroups.com...

i don't know if yuour code has some UB; i think the
below has some less UBs of your

the problem could be sizeof(void*)!=sizeof(int*)
because result is void* but ip is int*
but i'm not much sure of that

-------------------------------
#define u8 unsigned char
#define u32 unsigned

/* Shao Miller, 2011 */

u32* ptr_equ(u32* p, u32* q)
{u32 *result;
u8 *pi, *qi, *ri, *re;

pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
re=ri + sizeof result;
while(ri<re)


*ri++ = *pi++ & *qi++;

pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
/* if( result != q ) return 0 */
while(ri<re)
if(*ri!=*pi++ || *ri++!=*qi++)
return 0;
return result;
}

#include <stdio.h>
#include <stdlib.h>

int main(void)
{u32 iaa[2][2] = {{5,6},{7,8}};
u32 *ip;

printf("iaa: {{%u,%u},{%u,%u}}\n",
iaa[0][0],iaa[0][1],iaa[1][0],iaa[1][1]);


/* &iaa[0][2] == &iaa[1][0] ?
iaa[0][0], iaa[0][1], iaa[1][0], iaa[1][1]

io> iaa[0]+2 == iaa[1] i see it in the debugger
*/

ip=ptr_equ(iaa[0] + 2, iaa[1]);
if(!ip){puts("Pointers have different object representation!");


return EXIT_FAILURE;}
puts("Pointers have the same object representation");

/* Does the next line have undefined behaviour? */

/* io> here not */
*ip = 5;
printf("iaa: {{%u,%u},{%u,%u}}\n",
iaa[0][0],iaa[0][1],iaa[1][0],iaa[1][1]);
return EXIT_SUCCESS;
}

Shao Miller

unread,
Jun 9, 2011, 1:55:15 PM6/9/11
to
On 6/9/2011 12:35, io_x wrote:
> "Shao Miller"<sha0....@gmail.com> ha scritto nel messaggio
> news:27fa82a4-d89c-4b75...@glegroupsg2000goo.googlegroups.com...
>> Good day, folks!
>>
>> With thanks to Wojtek Lerch for pointing out DR #206, and to Clive Feather for
>> reporting his concerns:
>>
>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
>>
>> I am wondering about whether or not the behaviour is defined for the noted
>> line (or in general) in the code below:
>>
>> [...code...]

>
> i don't know if yuour code has some UB; i think the
> below has some less UBs of your
>

As in 0 points of undefined behaviour? As far as I know, there was only
one potential point for undefined behaviour: On the line marked "Does
the next line have undefined behaviour?" If your code below has "less,"
then I think that would mean zero points of undefined behaviour.

> the problem could be sizeof(void*)!=sizeof(int*)

In whose code? Yours (below) or mine? I don't see you using 'void *'
below. If you mean mine, 'ptr_equ' (renamed more accurately to
'obj_ptr_equ' in another post) takes two 'void *'s. It tests their
object representations, not the object representations of the original
'int *'-typed arguments. Size differences between the pointer types
shouldn't make any difference.

> because result is void* but ip is int*

If you mean that the result of 'ptr_equ' is assigned to 'ip' in my code,
yes, that conversion is defined. (As far as I know.) 'ip' can have a
completely different object representation.

> but i'm not much sure of that
>
> -------------------------------
> #define u8 unsigned char
> #define u32 unsigned
>

Do you use these macros because the numbers '8' and '32' and meaningful
for the implementations you frequently use? Or do you use '8' because
'CHAR_BIT' must be '8' at a minimum? Or some other reason?

> /* Shao Miller, 2011 */

No need to include the line above, since it's your code. ;)

> u32* ptr_equ(u32* p, u32* q)

This function takes and returns 'unsigned int *', instead of 'void *'.
If I want to use the function for 'struct foo *', that means I have to use:

foop=(struct foo *)ptr_equ((u32*)foo_ptr_one, (u32*)foo_ptr_two);

But the cast '(u32*)foo_ptr_one' is unsafe if the first member of
'struct foo' is not a 'u32'; the alignment requirements can be
different, for one example.

> {u32 *result;
> u8 *pi, *qi, *ri, *re;
>
> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
> re=ri + sizeof result;
> while(ri<re)

> *ri++ = *pi++& *qi++;


> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
> /* if( result != q ) return 0 */

I know you've commented it out, but wish to point out that the code:

if( result != q) return 0;

is not safe. The object representations of two 'u32*' pointers have
been "merged" together with bitwise AND. But if they were unequal, the
resulting object representation might not be valid when interpreted as a
'u32*' pointer. 'if ( result' interprets it as a 'u32*' pointer value,
which might not be a valid value. Fortunately, you have it commented
out. :)

> while(ri<re)
> if(*ri!=*pi++ || *ri++!=*qi++)
> return 0;
> return result;
> }
>
> #include<stdio.h>
> #include<stdlib.h>
>
> int main(void)
> {u32 iaa[2][2] = {{5,6},{7,8}};
> u32 *ip;
>

Are they playing catch with the pointer star? (Just kidding. Hee hee hee.)

> printf("iaa: {{%u,%u},{%u,%u}}\n",
> iaa[0][0],iaa[0][1],iaa[1][0],iaa[1][1]);

> /*&iaa[0][2] ==&iaa[1][0] ?


> iaa[0][0], iaa[0][1], iaa[1][0], iaa[1][1]
> io> iaa[0]+2 == iaa[1] i see it in the debugger
> */
>
> ip=ptr_equ(iaa[0] + 2, iaa[1]);
> if(!ip){puts("Pointers have different object representation!");
> return EXIT_FAILURE;}
> puts("Pointers have the same object representation");
>
> /* Does the next line have undefined behaviour? */
> /* io> here not */
> *ip = 5;
> printf("iaa: {{%u,%u},{%u,%u}}\n",
> iaa[0][0],iaa[0][1],iaa[1][0],iaa[1][1]);
> return EXIT_SUCCESS;
> }

Your code explicitly examines the object representations of 'u32*'-typed
pointers, rather than of converted-to-'void *' pointers. Ok.

But here's a question for you: Does 'ip' point to one past the 'iaa[0]'
array, or does it point to the first element of the 'iaa[1]' array? If
the former, '*ip = 5;' is undefined behaviour (by the Standard). If the
latter, everything's fine. If both, then is the behaviour defined or
undefined? If neither, then what does it point to?

Thanks for the feed-back!

io_x

unread,
Jun 9, 2011, 3:00:39 PM6/9/11
to

"Shao Miller" <sha0....@gmail.com> ha scritto nel messaggio
news:isr18q$efg$1...@dont-email.me...

> On 6/9/2011 12:35, io_x wrote:
>> "Shao Miller"<sha0....@gmail.com> ha scritto nel messaggio
>> news:27fa82a4-d89c-4b75...@glegroupsg2000goo.googlegroups.com...
>>> Good day, folks!
>>>
>>> With thanks to Wojtek Lerch for pointing out DR #206, and to Clive Feather
>>> for
>>> reporting his concerns:
>>>
>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm
>>>
>>> I am wondering about whether or not the behaviour is defined for the noted
>>> line (or in general) in the code below:
>>>
>>> [...code...]
>>
>> i don't know if yuour code has some UB; i think the
>> below has some less UBs of your
>>
>
> As in 0 points of undefined behaviour? As far as I know, there was only one
> potential point for undefined behaviour: On the line marked "Does the next
> line have undefined behaviour?" If your code below has "less," then I think
> that would mean zero points of undefined behaviour.
>
>> the problem could be sizeof(void*)!=sizeof(int*)
>
> In whose code? Yours (below) or mine?

your code, has


re = ri + sizeof result;
while (ri < re)
/* Defect Report #260: "Provenance" */
*ri++ = *pi++ & *qi++;

where 'ri' point to a void*, 'pi' and 'qi' point to
one int* pointer
and the loop, loop for sizeof(result)==sizeof(void*)
times

if sizeof(void*)>sizeof(int*) that above 'pi', 'qi' read something out the
a int* pointer

> I don't see you using 'void *' below. If you mean mine, 'ptr_equ' (renamed
> more accurately to 'obj_ptr_equ' in another post) takes two 'void *'s. It
> tests their object representations, not the object representations of the
> original 'int *'-typed arguments. Size differences between the pointer types
> shouldn't make any difference.

yes

>> because result is void* but ip is int*
>
> If you mean that the result of 'ptr_equ' is assigned to 'ip' in my code, yes,
> that conversion is defined. (As far as I know.) 'ip' can have a completely
> different object representation.
>
>> but i'm not much sure of that
>>
>> -------------------------------
>> #define u8 unsigned char
>> #define u32 unsigned
>>
>
> Do you use these macros because the numbers '8' and '32' and meaningful for
> the implementations you frequently use? Or do you use '8' because 'CHAR_BIT'
> must be '8' at a minimum? Or some other reason?

i whould rewrite as
#include <stdint.h>

#define u32 uint32_t
#define u8 uint8_t


>> /* Shao Miller, 2011 */
>
> No need to include the line above, since it's your code. ;)
>
>> u32* ptr_equ(u32* p, u32* q)
>
> This function takes and returns 'unsigned int *', instead of 'void *'. If I
> want to use the function for 'struct foo *', that means I have to use:
>
> foop=(struct foo *)ptr_equ((u32*)foo_ptr_one, (u32*)foo_ptr_two);

ok i try


/* Shao Miller, 2011 */

u8* ptr_equ(u8* p, u8* q, u32 v)
{u8 *result;


u8 *pi, *qi, *ri, *re;

/* where v is sizeof(type) and -1 means error */
if(v!=sizeof(u8*)) return -1;

pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
re=ri + sizeof result;
while(ri<re)

*ri++ = *pi++ & *qi++;
pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;

while(ri<re)
if(*ri!=*pi++ || *ri++!=*qi++)
return 0;
return result;
}

> But the cast '(u32*)foo_ptr_one' is unsafe if the first member of 'struct foo'
> is not a 'u32'; the alignment requirements can be different, for one example.

ok

>> {u32 *result;
>> u8 *pi, *qi, *ri, *re;
>>
>> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
>> re=ri + sizeof result;
>> while(ri<re)
>> *ri++ = *pi++& *qi++;
>> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
>> /* if( result != q ) return 0 */
>
> I know you've commented it out, but wish to point out that the code:
>
> if( result != q) return 0;
>
> is not safe. The object representations of two 'u32*' pointers have been
> "merged" together with bitwise AND. But if they were unequal, the resulting
> object representation might not be valid when interpreted as a 'u32*' pointer.
> 'if ( result' interprets it as a 'u32*' pointer value, which might not be a
> valid value. Fortunately, you have it commented out. :)

i not understand it well

ip here point to &(iaa[1][0]) aligned to u32 (the print say so)

> array, or does it point to the first element of the 'iaa[1]' array?

yes point to iaa[1] (debugger say so)

------------
iaa: {{5,6},{7,8}}


Pointers have the same object representation

iaa: {{5,6},{5,8}}
------------
so the element of *ip here is iaa[1][0]

Shao Miller

unread,
Jun 10, 2011, 12:32:32 AM6/10/11
to
> *ri++ = *pi++& *qi++;

> where 'ri' point to a void*, 'pi' and 'qi' point to
> one int* pointer

'pi' does not point to an 'int *' and 'qi' does not point to an 'int *'.
'pi' points to a 'void *' and 'qi' points to a 'void *'.

When the function is called:

ip = obj_ptr_equ(&iaa[0][2], &iaa[1][0]);

'&iaa[0][2]' is an 'int *'-typed value like '1 + 1' is an 'int'-typed
value. But because the function is declared to have 'void *'
parameters, the 'int *'-typed arguments are converted to 'void *'-typed
values, then those 'void *'-typed values become the values of the 'void
*'-typed parameters, not the original 'int *' values.

('&iaa[0][2]' used for brevity, though the original code had 'iaa[0] + 2'.)

> and the loop, loop for sizeof(result)==sizeof(void*)
> times
>

Right. 'p', 'q', 'result' are all 'void *'-typed objects with the same
size.

> if sizeof(void*)>sizeof(int*) that above 'pi', 'qi' read something out the
> a int* pointer
>

See above.

>> I don't see you using 'void *' below. If you mean mine, 'ptr_equ' (renamed
>> more accurately to 'obj_ptr_equ' in another post) takes two 'void *'s. It
>> tests their object representations, not the object representations of the
>> original 'int *'-typed arguments. Size differences between the pointer types
>> shouldn't make any difference.
>
> yes
>

Ok.

>>> because result is void* but ip is int*
>>
>> If you mean that the result of 'ptr_equ' is assigned to 'ip' in my code, yes,
>> that conversion is defined. (As far as I know.) 'ip' can have a completely
>> different object representation.
>>
>>> but i'm not much sure of that
>>>
>>> -------------------------------
>>> #define u8 unsigned char
>>> #define u32 unsigned
>>>
>>
>> Do you use these macros because the numbers '8' and '32' and meaningful for
>> the implementations you frequently use? Or do you use '8' because 'CHAR_BIT'
>> must be '8' at a minimum? Or some other reason?
>
> i whould rewrite as
> #include<stdint.h>
>
> #define u32 uint32_t
> #define u8 uint8_t
>
>

Ok, but to my knowledge, 'uint32_t' needn't be available for some C89
implementations. I was trying to understand why you were using specific
numbers like '8' and '32', but I guess it doesn't really matter. :)

>>> /* Shao Miller, 2011 */
>>
>> No need to include the line above, since it's your code. ;)
>>
>>> u32* ptr_equ(u32* p, u32* q)
>>
>> This function takes and returns 'unsigned int *', instead of 'void *'. If I
>> want to use the function for 'struct foo *', that means I have to use:
>>
>> foop=(struct foo *)ptr_equ((u32*)foo_ptr_one, (u32*)foo_ptr_two);
>
> ok i try
> /* Shao Miller, 2011 */

/* io_x, 2011 */

> u8* ptr_equ(u8* p, u8* q, u32 v)
> {u8 *result;
> u8 *pi, *qi, *ri, *re;
>
> /* where v is sizeof(type) and -1 means error */
> if(v!=sizeof(u8*)) return -1;
>
> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
> re=ri + sizeof result;
> while(ri<re)

> *ri++ = *pi++& *qi++;
> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;

> while(ri<re)
> if(*ri!=*pi++ || *ri++!=*qi++)
> return 0;
> return result;
> }
>

Hmmm... 6.8.6.4p3 includes the sentence 'If the expression has a type
different from the return type of the function in which it appears, the
value is converted as if by assignment to an object having the return
type of the function." So 'return -1' would be like assigning '-1' to a
'u8*' object. Unfortunately, I believe that's a constraint violation of
6.5.16.1p1.

If 'u8' is 'uint8_t' and if 'uint8_t' is exactly the type 'unsigned
char' (it might not be), the representation should be the same as for
'void *' because 6.2.5p27 includes "A pointer to void shall have the
same representation and alignment requirements as a pointer to a
character type."

>
>> But the cast '(u32*)foo_ptr_one' is unsafe if the first member of 'struct foo'
>> is not a 'u32'; the alignment requirements can be different, for one example.
>
> ok
>
>>> {u32 *result;
>>> u8 *pi, *qi, *ri, *re;
>>>
>>> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
>>> re=ri + sizeof result;
>>> while(ri<re)
>>> *ri++ = *pi++& *qi++;
>>> pi=(u8*)&p; qi=(u8*)&q; ri=(u8*)&result;
>>> /* if( result != q ) return 0 */
>>
>> I know you've commented it out, but wish to point out that the code:
>>
>> if( result != q) return 0;
>>
>> is not safe. The object representations of two 'u32*' pointers have been
>> "merged" together with bitwise AND. But if they were unequal, the resulting
>> object representation might not be valid when interpreted as a 'u32*' pointer.
>> 'if ( result' interprets it as a 'u32*' pointer value, which might not be a
>> valid value. Fortunately, you have it commented out. :)
>
> i not understand it well
>

Let's pretend we see the object representations in memory and that they
look like:

p : DD CC BB AA : 10101010101110111100110011011101
q : EE CC BB AA : 10101010101110111100110011101110
result : CC CC BB AA : 10101010101110111100110011001100

If you use:

if( result != q)

'result' is an lvalue which "is converted to the value stored in the
designated object"[6.3.2.1p2]. "CC CC BB AA" could be a trap
representation[6.2.6.1p5] and leads to undefined behaviour. So it's a
good idea that you had it commented out. :)

> ip here point to&(iaa[1][0]) aligned to u32 (the print say so)


>
>> array, or does it point to the first element of the 'iaa[1]' array?
>
> yes point to iaa[1] (debugger say so)
>
> ------------
> iaa: {{5,6},{7,8}}
> Pointers have the same object representation
> iaa: {{5,6},{5,8}}
> ------------
> so the element of *ip here is iaa[1][0]
>

Ok. Thanks!

io_x

unread,
Jun 13, 2011, 1:27:02 AM6/13/11
to
"Shao Miller" <sha0....@gmail.com> ha scritto nel messaggio
news:iss38j$mii$1...@dont-email.me...

> On 6/9/2011 2:00 PM, io_x wrote:
so for doing one operation for pointier a, b as a==b
you would write 14 lines with multiple instructions in them;
1) it should be easy program one machine
with one model in mind than a set of machines;
2) the result code of programming one machine can be
many times more complex than the one of set of machines
and be readable


0 new messages