On Tue, 19 Mar 2002, Svante wrote:
> "Pai-Yi HSIAO" <hs...@ccr.jussieu.fr> wrote in message
> > On Tue, 19 Mar 2002, Svante wrote:
> > > double array[10][10];
> > > double *pd1 = &array[5][9], *pd2 = &array[6][0];
> > > pd1 = pd1 + 1;
> > > if (pd1 == pd2) printf("Conforming implementation.\n");
> > > *pd2 = 0.0; /* Strictly conforming */
> > > *pd1 = 1.0; /* Undefined behavior... */
> > >
> > > Basically then, we have two pointers, of the same type that compare
> > > equal. One can be dereferenced with defined behavior, the other cannot.
> >
> > In my opinion, "*pd1 = 1.0;" is defined
>
> Before pd1 = pd1 + 1, pd1 is a pointer to an element of an array object,
> namely the 10'th element of an array of 10 doubles. After the addition
> of 1, pd1 points one past the last element of that array, and may thus
> not be dereferenced, at least the way I read the standard.
In my opinion, 'pd1' is a pointer to double.
It's also a pointer to an element of an array BUT it contains no
information about the array boundary after assignment.
(Since array type variable is not a modifiable lvalue, the information
can not be propagated by assignment.)
> > but "*(&array[5][9]+1)=1.0;" is undefined.
> Sure it is today, but how can that be reasonably reconciled
> with:
>
> if (pd1 == (&array[5][9] + 1)) printf("Conforming.\n");
'pd1' has type double*.
'&array[5][9]+1' SHOULD have type "pointer to the element of 10 x
10 double array" and hence has type pointer to double (plus some
information about array boundaries).
So we can let them compare equal.
> And that's close to the core of my argument. How can it be, given
> your opinion that '*pd1 = 1.0;' above is defined?
"Dereferencing" (there is no such word in the standard) a array type
and dereferencing a ponter may use different machinism.
> indistinguishable pointers of the same type being de-referenced.
They are *not* exactly the same type while dereferencing
in a strict point of view.
> Yes, in the specific case, the implementation does know the exact
> type of the array, i.e. double[10][10], and of course can see the
> out of bounds condition. Granted. But in the general case...
This is some of the disadvantages to let array type working as pointer.
Array and pointer are different.
> Once the address of an array element is assigned a pointer of
> appropriate type, all information about the source array is lost
> in the type of the pointer.
Yes.
> The standard talks about 'pointer operand'
> and 'result' 'points to elements of ... array object'. Typically,
> P in: "double *P = &array[5][9];" would be such a pointer operand
> in "P = P + 1;". According to the standard *(&array[5][9] + 1) is
> definitely undefined behavior, and as far as I can see, *(P + 1)
> must also be so, with the current wording.
I don't think so. :-)
>
> In the example above at "pd1 = pd1 + 1;", pd1 must reasonably be
> a pointer operand in this sense, and thus produce undefined behavior
> if dereferenced. The alternative is even more horrifying, in the
> previous paragraph in the standard, it says that if the pointer is
> to an object not part of an array, then it is to be treated as
> a pointer to an array with one element. If this interpretation
> is to be applied strictly to all pointers not directly derived from an
> array without intermediate storage in a T* object, then all
> normal pointer arithmetic falls apart.
Well, this is really a problem if the standard contains such words.
paiyi
> My argument is, that an implementation, cannot really reasonably
> do anything else in the general case than to allow the dereferencing
> in the expected way, unless heavy run-time overhead with extra type-
> info being carried by all pointers, is added.
>
> Undefined behavior once again, poses no requirement on the implementation
> to behave strangely. It just poses no requirements for it to call
> itself conforming.
>
> As the memory structure of arrays of arrays is clear, and the rules
> of pointer arithmetic too, including the equality above, I can only
> see advantages for the standard to not only legitimize, but actually
> require, the only sensible and reasonable way to implement it.
>
> C is tradionally not a language burdened with heavy run-time checks,
> and their associated penalties. This is the way of C. You want it
> some other way - go brew some Java ;-)
>
> Seriously, another argument is of course that, well by assigning pd1
> you have effectively made a cast - but no such cast is implied in the
> standard. There is no 'pointer promotion/demotion' or anything. It's just
> that we seem to have more than one flavor of double * (in this example).
>
> One flavor is dereferencable, and the other is not. Either we spend time
> and code in the implementation to tell them apart or we do not. It seems
> beneficial to allow the implementation to not differentiate when the
> pointers in question are part of a larger array object.
>
> (snip)
> > "*(&array[5][9]+1)" may be implemted as
> >
> > *( *(array+5) + 9 +1).
> >
> > The implementation clearly knows an arithmetics on a 10 by 10 array is
> > performed. It can do boundary checking while dereferencing it at each
> > dimension level.
>
> Certainly it can. The problem is that the standard speaks of a pointer
> to the last element of the array. It does not state how that pointer
> is derived. It may be derived as above, or it may be a pointer that
> has advanced through the larger array of greater rank containg the
> array in question.
>
> My argument is: It would cost nothing to make it defined with a slight
> rewording. Bounds-checking would still be possible in an implementation
> that wants this for array types - when the bounds are the complete
> array object! This would be more reasonable.
>
> <note-to-self>
> I wonder if anyone is going to read this? I think I will need to work
> up a short, and more succint, statement as a real proposal for a
> standard change. Unless someone else has already done this? Pointers
> appreciated ;-)
> </note-to-self>
>
> (snip)
> --
> /Svante
>
> http://axcrypt.sourceforge.net
> Free AES Point'n'Click File Encryption for Windows 9x/ME/2K/XP
>
>
>
>
>
The expression "*pd1 = 1.0" above results in undefined behavior.
Similar problems raised many times since C90 was published. I
agree that it's hard to find a real implementation that issue a
diagnostic and stop compilation or run of a pragram containing
the expression, but it's undefined behavior by the Standard.
You can find similar questions ("object questions") in Google
archive and old DRs.
Thanks.
--
Jun Woong (myco...@hanmail.net)
Dept. of Physics, Univ. of Seoul
Hsiao wrote :
> > In my opinion, 'pd1' is a pointer to double.
> > It's also a pointer to an element of an array BUT it contains no
> > information about the array boundary after assignment.
> > (Since array type variable is not a modifiable lvalue, the information
> > can not be propagated by assignment.)
> >
On 19 Mar 2002, Jun Woong wrote:
> The expression "*pd1 = 1.0" above results in undefined behavior.
> Similar problems raised many times since C90 was published.
Why?
Array type object can not be a midifiable lvalue.
'pd1' is *ONLY* a pointer to double object.
'&array[5][9]' in addition to be a pointer to double contains boundary
information.
While assigning &array[5][9] to pd1, the boundary information is lost.
Isn't it?
If C allows to declare some pointer type lvalue like "pointer to an
element of 10x10 double array", I agree when assigning &array[5][9] to
such lvalue, the boundary information of an array can be propagated.
paiyi
I agree that it appears to result in undefined behavior, i.e. the
standard has no requirements on that behavior. It doesn't require
unpredictable behavior either. It's perfectly ok for an implemetation
do document the behavior in this situation, but _it's not required_.
My point is, that if the standard _did_ require the behavior that
a pointer will be valid within the bounds of the largest containing
array, then first of all a lot of formally non-portable programs would
become portable, no existing code would break, and only that implementation
which carries along extended type and bounds information at run-time
would break. I know of no such implementations for serious use, but I'm
sure there is at least one C interpreter out there which might break.
That price would seem to be slight in comparison with the gains in
defined portability, and the increased consistency.
To reiterate: There is no way an implementation today can behave otherwise
than proposed, except in a few cases where the out-of-bounds condition
is detected at compile time. Unless of course, said implementation carries
extended type info at run-time.
>To reiterate: There is no way an implementation today can behave otherwise
>than proposed, except in a few cases where the out-of-bounds condition
>is detected at compile time. Unless of course, said implementation carries
>extended type info at run-time.
This is not true. An implementation may generate code which assumes that
the out-of-bounds condition does not occur, without carrying any extended
type info at run-time, and without knowing at compile time whether or
not the out-of-bounds condition actually occurs.
For example, the pointer may be represented as the address of the
original array plus an offset, and the offset may be stored in a
register that doesn't have enough bits to hold the out-of-bounds index.
If at runtime the offset is too large for the register, it may just
wrap around.
But whether this actually occurs in current implementations is another
question...
--
Fergus Henderson <f...@cs.mu.oz.au> | "I have always known that the pursuit
The University of Melbourne | of excellence is a lethal habit"
WWW: <http://www.cs.mu.oz.au/~fjh> | -- the last words of T. S. Garp.
Huh..... You can not yet convince me.
According to the standard,
'pd1' *doesn't* inherit the bounds condition of an array type
because 'pd1' is merely a pointer type object.
A reasonable requirement to make the bounds condition of an array type
to be inherited is to ask array type object a modifiable lvalue.
The expression "*(&array[5][9]+1)=1.0" is undefined because inside it
'array' has type 'double[10][10]'; at this moment N869 6.5.6 [#8] should
be applied.
> My point is, that if the standard _did_ require the behavior that
> a pointer will be valid within the bounds of the largest containing
> array, then first of all a lot of formally non-portable programs would
> become portable, no existing code would break, and only that implementation
> which carries along extended type and bounds information at run-time
> would break. I know of no such implementations for serious use, but I'm
> sure there is at least one C interpreter out there which might break.
This is a compromise problem how strict we should define an array object.
> That price would seem to be slight in comparison with the gains in
> defined portability, and the increased consistency.
> To reiterate: There is no way an implementation today can behave otherwise
> than proposed, except in a few cases where the out-of-bounds condition
> is detected at compile time. Unless of course, said implementation carries
> extended type info at run-time.
You can not prohibit an abstract implementation does such bounds checking
during the run-time.
paiyi
The C standard does not require that the boundary information be lost.
And, since the operation has undefined behavior, an implementation is
allowed to store that information in the pointer, and to use that
information to make the operation fail.
Specifically, a pointer could be equivalent to a structure containing a
machine address, plus upper and lower limits on how far that address may
be shifted. The & operator could initialize those limits based upon the
type of the object to which it is applied. For arrays, those limits
would correspond to the first element in the array, and one past the end
of the array; non-arrays would be treated as arrays with length 1.
Conversion of that pointer to a different type would not change those
limits, nor would pointer arithmetic. Dereferencing a pointer to an
aggregate type, and taking the address of a member of that aggregate,
would produce a pointer with new, more restrictive limits corresponding
to the type and location of that member. Pointer arithmetic that takes
the pointer outside its limits would be undefined behavior per 6.5.6p8,
and storing those limits in order to use them to make such arithmetic
fail is therefore legal. Similarly, dereferencing a pointer which has
been moved by pointer arithmetic to its upper limit would be undefined
behavior, per the last sentence of 6.5.6p8.
The key point is that &array would have upper and lower limits based
upon the size and location of 'array', as would &array[n], but
&array[n][m] would have limits based upon the size an location of
array[n].
You're right, it doesn't inherit them. I simply possesses those
restrictions without needing to inherit them. The standard simply says
"If the pointer operand points to an element of an array object,", it
doesn't say anything about whether or not the type of the pointer
encodes the length of that array object. Those restrictions are a
property of the pointer value, not of the pointer type, and can
therefore be stored as part of the pointer value. They survive
conversion to different types. The limit is with reference to any array
containing the the objects pointed at; since it applies to all such
arrays, the one that matters is the most deeply nested array that
contains objects of the pointed-at type.
> > To reiterate: There is no way an implementation today can behave otherwise
> > than proposed, except in a few cases where the out-of-bounds condition
> > is detected at compile time. Unless of course, said implementation carries
> > extended type info at run-time.
>
> You can not prohibit an abstract implementation does such bounds checking
> during the run-time.
Why not? What would prevent the standard from saying that such an
implementation is not a conforming one? Of course, prohibiting such
implementations wouldn't make it impossible to create them, but that's
no different from any other requirement in the standard. The most the
standard can do is determine which implementations deserve the label
"conforming"; but it can use any internally consistent criteria that the
committee wants it to use for assigning that label.
Good point.
Certainly we can conceive of implementations where such a situation
occurs. But such an implementation will then have trouble handling
the pointer equality requirement, that two pointers pointing to the
same object, even if derived differently, shall compare equal. Not
impossible, but troublesome. This hypothetical implementation (or
does it exist?) will likely have trouble with lots of existing
code too.
Generating code utlizing such limited offset pointers as objects
seems to be truly non-trivial too. For indexed access yes, and I
really do advocate to keep the bounds 'checking' when accessing the
original array object. What I want, is when we derive a T* from
an array, I want that pointer to be freely movable in the entire
largest enclosing array.
Thus, what I would like to see, for a scalar type T:
T *p1, array[10][10];
for (p1 = &array[0][0]; p1 < &array[10][10]; p1++) {
*p1 = (T)0; /* well-defined */
}
array[5][10] = (T)1; /* undefined behavior */
p1 = &array[5][10]; /* here p1 is just a T* - no inherent bounds left */
*p1 = (T)2; /* well-defined */
The above is not my view on how things stand today, please note!
Although - I'm also willing to bet it actually works in all existing
conforming compiling implementations! If there is an implementation
where the underlying architecture causes problems with this I would
be most interested to know. If no such implementation exists, and there
is no compelling reasons to believe there is such a need, the expense
of standardizing the common, or even universal, behavior seems to me
reasonable and beneficial.
This preserves the lovely difference between arrays and pointers, and
breaks no existing code I'm willing to bet. Although it might make
an interpretative implementation non-compliant due to 'excessive' bounds-
checking.
Once again - remove the pointer equality requirement and the array
contiguousness requirement (when allocating arrays of arrays) or
allow access within the largest containing array object.
All of the above seems to be correct, and in concert with how I read
the standard. And that's the problem.
We can conceive of any number of esoteric and interesting implementations
such as the one above. But first of all - this is an example of what I
refer to as carrying extra information at run-time, which will have an
expense even if supported by the hardware architecture.
Second - If no such implementation exists, and the trends do not show such
things on the horizon, why should the standard prohibit effeciencies in
coding today, with reference to hypothetical implementations?
The standard should codify, clarify and unify existing and likely future
practices to enhance portability while retaining as much optimizeability
and security as possible. It does in most cases. Much thought has been
put into not breaking legacy code while improving the language, even at
the expense of some minor uglynesses (i.e. _Bool for example ;-).
Changing the behavior of pointer accesses to encompass the largest
containing array will, I believe, make lots of legacy code more portable,
break no existing code, and likely not require recoding of any existing
compiling implementations. It will also make it possible to write more
efficient portable matrix and multi-dimensional array code.
/Svante
Exactly. To behave differently than 'expected', an implemenation needs
to encode more information in the pointer object. I think such an
implementation, should it come into existance, would not be popular
due to the expense of using that extra information. Possibly as a
teaching aid - but it would then restrict usage of a perfectly safe
optimization - to sometimes refer to arrays of arrays as a single
array of rank 1. That it is safe already, is due to the contiguousness
requirement and the pointer equality requirement.
>
> > > To reiterate: There is no way an implementation today can behave otherwise
> > > than proposed, except in a few cases where the out-of-bounds condition
> > > is detected at compile time. Unless of course, said implementation carries
> > > extended type info at run-time.
> >
> > You can not prohibit an abstract implementation does such bounds checking
> > during the run-time.
>
> Why not? What would prevent the standard from saying that such an
> implementation is not a conforming one? Of course, prohibiting such
> implementations wouldn't make it impossible to create them, but that's
> no different from any other requirement in the standard. The most the
> standard can do is determine which implementations deserve the label
> "conforming"; but it can use any internally consistent criteria that the
> committee wants it to use for assigning that label.
Indeed the standard can prohibit an implementation (abstract or concrete)
to do such bounds checking. That's what I want! By leaving it undefined
behavior, we're making lot's of safe but efficient pointer manipulation
in arrays of rank > 1 undefined behavior. Unnecessarily so, in my mind,
as all the other parts are alread in the standard.
No, a conforming implementation can propagate the information.
> If C allows to declare some pointer type lvalue like "pointer to an
> element of 10x10 double array", I agree when assigning &array[5][9] to
> such lvalue, the boundary information of an array can be propagated.
>
Of course, C doesn't allow to declare such a pointer type lvalue. But
it can't make all conforming implementation remove the boundary
information.
The Committee's (official) interpretation is as follows (IIRC):
The possible pointer arithmetic for multi-dim arrays depends on
the TYPE and VALUE of the pointer. The fact that the subobjects reside
continuously in the address space does not guarantee that pointer
arithmetic makes a pointer point to a subobject beyond its type and
value.
To illustrate this, consider an example.
int a[3][4];
int *pi = &a[1][3];
In this case the type of pi is "pointer to int" and its value is to
point at a[1][3]. Even if a[2][0] follows a[1][3] in the address space
there is four int objects (a[1][0], a[1][1], a[1][2] and a[1][3]) on
which pi (pointer to int) can stride and inspect values, not the
entire array object (a).
As long as this is the official interpretation and there is no
practical reason to allow that pi can stride the entire object, a
conforming implementation can do anything when, say, pi is used as
pi[1] in the above example, so it's allowed to check the boundary
within which a pointer can move and be dereferenced by any means,
e.g., some information hidden in pointer values or run-time boundary
checking. [Note that it's allowed for unsigned char * to inspect the
values of the entire obejct exceptionally.]
Of course, I've never seen an practical implementation which do it and
I'm pretty sure that I'll never see in the future.
Yes. This is what and where I misunderstood.
The clause referes to *all* the pointer regardless of its type.
The use of 'pd1' in this case is hence undefined according to it.
I stand to correct my previous opinion.
I think this clause is badly written.
It seems to want to give a strict definition for an array bounds and
at the same time make it applying for pointers.
[...]
> The most the standard can do is determine which implementations deserve
> the label "conforming"; but it can use any internally consistent
> criteria that the committee wants it to use for assigning that label.
:-)
paiyi
No, but the pointer comparison requirement guarantees that a pointer
obtained via pointer arithmetic into an array subobject that "happens" (sic)
to reside next to the original subobject compare equal. So we have
the case of two pointers required to compare equal, but only one
can be dereferenced without invoking undefined behavior. [I know that
"compare equal" does not necessarily imply "bitwise equality", but
few implementations, if any, will even have the possibility to produce
two pointer objects that compare equal without bitwise equality too.]
>
> To illustrate this, consider an example.
>
> int a[3][4];
> int *pi = &a[1][3];
>
> In this case the type of pi is "pointer to int" and its value is to
> point at a[1][3]. Even if a[2][0] follows a[1][3] in the address space
> there is four int objects (a[1][0], a[1][1], a[1][2] and a[1][3]) on
> which pi (pointer to int) can stride and inspect values, not the
> entire array object (a).
>
> As long as this is the official interpretation and there is no
> practical reason to allow that pi can stride the entire object, a
> conforming implementation can do anything when, say, pi is used as
> pi[1] in the above example, so it's allowed to check the boundary
> within which a pointer can move and be dereferenced by any means,
> e.g., some information hidden in pointer values or run-time boundary
> checking. [Note that it's allowed for unsigned char * to inspect the
> values of the entire obejct exceptionally.]
>
> Of course, I've never seen an practical implementation which do it and
> I'm pretty sure that I'll never see in the future.
I fully agree with your interpretation of the standard.
I believe the committee's decision is wrong, and would like to gather
some evidence before possibly submitting a proposal for change in a
future revised edition.
Due to the contiguousness requirement for arrays, and the pointer
comparison identity requirement for pointers advancing beyond it's
own array into another one, there is currenly really no reason not
to change 'undefined behavior' into required behavior == to what all
compiled implementations in current architectures are forced to do
unless they do carry extra information forward at run-time.
Ya, it is true....
All the pointer type on this machine are implemented similar to 1-dim
*modifiable* array type.
> The & operator could initialize those limits based upon the
> type of the object to which it is applied. For arrays, those limits
> would correspond to the first element in the array, and one past the end
> of the array; non-arrays would be treated as arrays with length 1.
ya.
> Conversion of that pointer to a different type would not change those
> limits, nor would pointer arithmetic. Dereferencing a pointer to an
> aggregate type, and taking the address of a member of that aggregate,
> would produce a pointer with new, more restrictive limits corresponding
> to the type and location of that member.
Yes.
> Pointer arithmetic that takes the pointer outside its limits would be
> undefined behavior per 6.5.6p8, and storing those limits in order to use
> them to make such arithmetic fail is therefore legal.
Agree.
> Similarly, dereferencing a pointer which has been moved by pointer
> arithmetic to its upper limit would be undefined
> behavior, per the last sentence of 6.5.6p8.
Good.
> The key point is that &array would have upper and lower limits based
> upon the size and location of 'array', as would &array[n], but
> &array[n][m] would have limits based upon the size an location of
> array[n].
:-)
Thank you.
paiyi
Such an implementation would be a great improvement. Most of
Microsofts packages would routinely crash instead of propagating
virii. The old adage about getting it correct before worrying
about efficiency applies with a vengeance. Unfortunately to make
it reasonably efficient today requires something like enforcing
descriptors for every access, much like the old Burroughs machines
did at the hardware (actually microcode) level.
--
Chuck F (cbfal...@yahoo.com) (cbfal...@XXXXworldnet.att.net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)
"CBFalconer" <cbfal...@yahoo.com> wrote in message news:3C98CB83...@yahoo.com...
Not really that bad - I don't advocate unlimited pointer accesses -
only those within the bounds of the largest containing array object.
In the case of 'buffer overruns' which normally is arrays of rank 1,
nothing would change in my proposal.
It is only the ability to portably address the contiguous array memmbers
of arrays with rank > 1 as an array of lower rank that I'm after.
I have a beginner's question;
May a two dimension array 'A' be implemented as malloc()ed one?
int ** A;
int *tmp, i;
tmp = malloc( sizeof(**A) * row * col );
A = malloc( sizeof(*A ) * row );
for (i=0; i<row; i++){
A[i] = tmp + i*col;
}
The requirement (N869 6.5.9 [#6]) for equal comparing of two pointers:
one pointer to an object and the other to its subobject at beginning,
seems to be able to achieve by forcing "&A==(int*)&A[0]" giving the
answer "true".
Thank you.
paiyi
"Pai-Yi HSIAO" <hs...@ccr.jussieu.fr> wrote in message
news:Pine.A41.4.10.1020320...@moka.ccr.jussieu.fr...
> On Wed, 20 Mar 2002, Svante wrote:
> > Changing the behavior of pointer accesses to encompass the largest
> > containing array will, I believe, make lots of legacy code more portable,
> > break no existing code, and likely not require recoding of any existing
> > compiling implementations. It will also make it possible to write more
> > efficient portable matrix and multi-dimensional array code.
>
> I have a beginner's question;
>
> May a two dimension array 'A' be implemented as malloc()ed one?
You can simulate one, as you are doing below.
>
> int ** A;
> int *tmp, i;
>
> tmp = malloc( sizeof(**A) * row * col );
> A = malloc( sizeof(*A ) * row );
> for (i=0; i<row; i++){
> A[i] = tmp + i*col;
> }
>
> The requirement (N869 6.5.9 [#6]) for equal comparing of two pointers:
> one pointer to an object and the other to its subobject at beginning,
> seems to be able to achieve by forcing "&A==(int*)&A[0]" giving the
> answer "true".
Here I'm not sure I follow. "&A == (int *)&A[0]" does not compile. Nor
should they ever compare equal. They point to different objects, and
different areas in memory.
&A is an int ***, which points to a pointer pointing to an allocated
sequence of int *'s.
&A[0] is an int **, which points to an allocated sequence of int *'s.
For your amusement, consider:
#include <stdio.h>
#define ROWS 10
#define COLS 10
int main(void) {
int ** A;
int *tmp, i;
int (*pA)[ROWS][COLS];
tmp = (int*)pA = malloc(sizeof **A * ROWS * COLS );
A = malloc(sizeof *A * ROWS );
for (i=0; i < ROWS; i++){
A[i] = tmp + i*COLS;
}
if (&A[5][10] == &A[6][0]) {
printf("Conforming.\n");
}
if (&(*pA)[5][10] == &A[6][0]) {
printf("Conforming.\n");
}
if (&(*pA)[5][11] == &A[6][1]) {
printf("Broken implementation?\n");
}
}
The last if is rather interesting. An excerpt from 6.5.9 #6: "Two pointers
compare equal if and only if both are null pointers, both are pointers to
the same object (including a pointer to an object and a subobject at its
beginning) or function, both are pointers to one past the last element of
the same array object, or one is a pointer to one past the end of one array
object and the other is a pointer to the start of a different array object
that happens to immediately follow the first array object in the address
space."
*pA is definitely an array object. (*pA)[5] is one too. &(*pA)[5][11] is
thus a pointer to an element two past the end of an array object. &A[6][1]
is strictly speaking not even a pointer to an array object, as it is just
an address in a chunk of malloc()'d memory. I'm not sure how to interpret
6.5.9 #6 in such a manner that these two pointers fall into any of the
"if and only if" categories.
The possible escape hatch is "both are pointers to the same object", but
then the rest of the section is completely superfluous. In my mind, they
are not really pointers to the same 'object', although they do point to
the same area of memory.
But I challenge you to find an implementation that does not produce
"Broken implementation?"...
I thought we fixed that in C99. pd1 points to a double, and so long
as arithmetic on it does not stray beyond the bounds of an actual
object array on which it is based (except for one past the last),
all should be well. Alignment is guaranteed (any padding is entirely
contained within each array element). The actual prohibition is on
something like:
double array[10][10];
double *pd = &array[6][12]; // not s.c.
But
double *pd = &array[6][6];
pd += 6;
is allowed.
If that's the case, all is indeed well, but it's hard to infer that
from 6.5.6 #8 (Additive operators), 6.5.9 #6 (Equality operators) and
6.5.2.1 Array subscripting.
The text consistently speaks of 'element of array object'. The semantics
of array subscripting clearly makes the case that an multidimensional
array object consists of less-dimensional array objects etc, which leads
to your first case above.
The second case, would seem to be non-conforming, as *pd "points to an
element of an array consisting of 10 doubles". Unless of course, in
the text in 6.5.6 #8 one should interpret the wording "array object"
as the largest containing array or somesuch.
I really want the behavior you describe above, but I can't read that from
the standard. The wording is rather hard to interpret it would seem then.
Most other posters on the issue in this thread seem to interpret the
second case as undefined behavior, so it certainly seems to be easy
to mis-interpret it, if the interpretation you propose is the offical and
correct one intended by the committee (of which I think you are/were a
member).
My point is that your proposed interpretation is the only reasonable one
really, but I just can't read it from the standard! That's not what it
says, as far as I can see. Even if it is what it is intended to mean...
[snip]
> if (&A[5][10] == &A[6][0]) {
> printf("Conforming.\n");
> }
> if (&(*pA)[5][10] == &A[6][0]) {
> printf("Conforming.\n");
> }
> if (&(*pA)[5][11] == &A[6][1]) {
> printf("Broken implementation?\n");
> }
> }
Allow me to simplify this quite a bit:
#include <stdio.h>
int main(void) {
double arr[10][10];
printf("&arr[5][9]=%p, &arr[5][10] = %p, &arr[6][0]=%p\n",
&arr[5][9], &arr[5][10], &arr[6][0]);
return 0;
}
On GCC:
@ C:\cprog>gcc -ansi -pedantic -Wall -W -O2 svante.c
@ C:\cprog>a
@ &arr[5][9]=0x22fd5c, &arr[5][10] = 0x22fd64, &arr[6][0]=0x22fd64
These results are as expected, [5][10] is equal to [6][0].
But compare to the behaviour of Ch, which is an interpreter which checks
array bounds at run time:
@ C:/cprog> svante.c
@ WARNING: subscript value 10 greater than upper limit 9
@ at line 6 in file 'svante.c'
@ &arr[5][9]=00565CA8, &arr[5][10] = 00565CA8, &arr[6][0]=00565CB0
Note (1) the run-time warning generated, and (2) that [5][9] == [5][10],
neither of which equal [6][0]!
Is Ch's behaviour non-conforming?
--
Simon.
I'm not aware of a relevant wording change. 6.5.6p8 requires the two
pointers to point to elements of the same array object. If they point to
elements of that array, then the array has to be an array of the type of
elements that the pointers point at. There's at most one such array for
any given pointer.
> something like:
> double array[10][10];
> double *pd = &array[6][12]; // not s.c.
> But
> double *pd = &array[6][6];
> pd += 6;
> is allowed.
In this case, pd points at an element of array[6], but not at an element
of 'array' itself, since it points at the wrong type to point at an
element of 'array'.
Personally, I hope I'm wrong. I'm not in favor of the rule I've been
describing. However, it seems as clear from the current wording as it
ever was (which wasn't very clear).
Yes. This is important. The only fact that two pointers compare equal
doesn't guarantee that they can be used to access the value of the
same object.
...
>
> >
> > To illustrate this, consider an example.
> >
> > int a[3][4];
> > int *pi = &a[1][3];
> >
> > In this case the type of pi is "pointer to int" and its value is to
> > point at a[1][3]. Even if a[2][0] follows a[1][3] in the address space
> > there is four int objects (a[1][0], a[1][1], a[1][2] and a[1][3]) on
> > which pi (pointer to int) can stride and inspect values, not the
> > entire array object (a).
> >
> > As long as this is the official interpretation and there is no
> > practical reason to allow that pi can stride the entire object, a
> > conforming implementation can do anything when, say, pi is used as
> > pi[1] in the above example, so it's allowed to check the boundary
> > within which a pointer can move and be dereferenced by any means,
> > e.g., some information hidden in pointer values or run-time boundary
> > checking. [Note that it's allowed for unsigned char * to inspect the
> > values of the entire obejct exceptionally.]
> >
> > Of course, I've never seen an practical implementation which do it and
> > I'm pretty sure that I'll never see in the future.
>
> I fully agree with your interpretation of the standard.
>
> I believe the committee's decision is wrong, and would like to gather
> some evidence before possibly submitting a proposal for change in a
> future revised edition.
>
But the Committee's interpretation does not disagree the current
wording of the Standard as James Kuyper said, and I also think that
the members of the Committee interpret it very correctly. But
I can also recognize that there is no practical and efficient
implementation that uses the too-restricting interpretation. I can
say that a code which, say, treat a two-dim array as a one-dim array
via pointer arithmetic would work well in most implementation, but
can't say that so it's strictly conforming.
> Due to the contiguousness requirement for arrays, and the pointer
> comparison identity requirement for pointers advancing beyond it's
> own array into another one, there is currenly really no reason not
> to change 'undefined behavior' into required behavior
From observation of the Committee's decision on possible defect
reports, I learned that the Standard is not changed with only one
reason that there is no good reason to not change the Standard.
At present there seems to be no unavoidable reason to regard it as
a defect.
I think it *can* be conforming if this implementation conforms
to the Standard in all other places; for example, &arr[5][10]
must be compared greater than &arr[5][9]. The Standard doesn't
require &a[5][10] to compare equal or have the same address as
&a[6][0] when "printf"ed. The description
] ... one is a pointer to one past the end of one array object
] and the other is a pointer to the start of a different array
] object that happens to immediately follow the first array
] object in the address space.
is added because the description containing it is "iif" rule.
Hmm. Just what are you saying, it's not fully clear to me.. ;-(. Do
you agree with Douglas that pd += 6 is conforming, or not? It sounds
as if not, but...
Obviously. That's part of the problem.
Could you clarify the last statement please? Too many negations there
for me ;-)
Are you saying that as it can't be regarded as a definitive defect,
i.e. non-implementable or just plain wrong, the Committee will not
consider it? I.e. the Committe only considers true errors, not
any kind of improvement for any reason?
This result, even if indicative, is irrelevant. The visual representation
of the pointer is implementation-defined. But I'm sure that if you
do a comparison with ==, you will observe the same identity.
>
>
> But compare to the behaviour of Ch, which is an interpreter which checks
> array bounds at run time:
> @ C:/cprog> svante.c
> @ WARNING: subscript value 10 greater than upper limit 9
> @ at line 6 in file 'svante.c'
> @ &arr[5][9]=00565CA8, &arr[5][10] = 00565CA8, &arr[6][0]=00565CB0
>
> Note (1) the run-time warning generated, and (2) that [5][9] == [5][10],
> neither of which equal [6][0]!
>
> Is Ch's behaviour non-conforming?
Yes, it would seem so. (With the above caveat about %p really being
irrelevant). &arr[5][10] is required to compare greater, and it is
fully defined to refer to that address, as long as you don't apply
unary '*' to it. Also &arr[5][10] is required to compare equal to
&arr[6][0].
What I'm thinking is that if an implementation can add some extra
padding at the end of an array type object.
For example, may it be allowed to allocate 16 bytes for an array
object "char A[10]"?
May it be allowed to allocate 16*16 bytes for the use of "char B[15][10]"?
In this case, the array respects the "contiguous" requirement BUT
we can not be sure &B[5][10] points to the same place of &B[6][0].
sizeof() applying to such type doesn't give the "real" space which the
array occupies.
paiyi
Could you show us relevant revision of wording in C99?
If there is no actual revision of wording in C99 or no official
of the Committee, then this still results in undefined behavior.
No. The Standard doesn't require that &arr[5][10] must compare equal
with &arr[6][0]. The Standard just says that if the result of the
equalitiy operation whose operands are valid pointers is true, at
least one of enumerated cases in that subclause must holds. Even if
&arr[5][10] must compare grater than &arr[5][9], it can't be a
criterion to decide whether an implementation is conforming that
&arr[5][10] and &arr[6][0] compare equal.
I think the standard does require &arr[5][10] == &arr[6][0].
ISO/IEC 9899:1999
6.5.9 #6: "Two pointers compare equal if and only if [...] or one is
a pointer to one past the end of one array object and the other is a
pointer to the start of a different array object that happens to
immediately follow the first array object in the address space."
Together with the rules about contiguous allocation of array space,
I think it conclusively does require the above equality.
I would like to know for what reason "one past the end of one array
object" appears so many times in the standard?
If the standard wants to restrict the usage of a pointer object in a
rigorous way, why not restricts it up to "the end of on array object"?
(Here, all non array object is refered as an array object of length 1.)
paiyi
What I meant is, even if it's a true error and many people agree with
that position, the Committee would not decide to change the Standard
unless it disagree with the original intention of the Committee,
it's worth to be thought as significant improvement or it invalidates
existing significant and correct programs.
But in this case, I don't think it's a true error of the Standard,
even if it makes many programs using that trick "more" conforming.
This case (one is a pointer to ... happens to ...) is added because of
"if and only if".
> Together with the rules about contiguous allocation of array space,
> I think it conclusively does require the above equality.
If allocation of array space is contiguous and other requirements of
the Standard (e.g., &arr[5][10] > &arr[5][9]) hold, but &arr[5][10]
compares unequal with &arr[6][0], what makes this impelemetation
not conforming? The Standard doesn't reuqire that if Q is a pointer
to the last element of a sub-array of an array, then Q+1 must point
to the beginning element of the next sub-array.
Of course the examle implementation that I said above is very
unreasonable and hard to find, but still conforms to the Standard.
Such a restriction would make the following impossible:
int a[10], *ap;
for(ap = a, ap <= a + 9, ap++) *ap = 0;
because in the last iteration of the loop ap increments out of the array.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/
[please forget my previous posting]
Hmm... This problem is connected with a question, should we treat
two continuous subobject of an object as related? If the answer to
this question is yes, &arr[5][10] and &arr[6][0] must compare equal.
But they are unrealted objects, a conforming implementation can do
this:
Assume Q is a pointer to the last element of an array.
Q + 1 does not make Q have the next address; the implementation
writes some information that indicates the pointer points at past
the last element somewhere.
The information makes Q+1 compare greater than Q.
If R is a pointer to the first element of the next sub-array, Q and
R will compare unequal.
I think two sub-objects are treated as unrelated because the
Standard currently defines, say, the result of &arr[5][9] <
&arr[6][0] as undefined AFAICK; I've never thought about comparison
for order between two elements of different sub-arrays of the same
array, but I think I'm right considering what we are discussing.
If the result of &arr[5][9] < &arr[6][0] is well-defined, my
position is incorrect, and as you said, &arr[5][10] and &arr[6][0]
should compare equal.
&arr[5][9] < (or other relative operators) &arr[6][1] defined?
If it's really the reason why the standard extends the usage of a pointer
to "one past the last element of an array object", why not includes
the usage to "one before the first element of an array object" at the same
time?
In such situation, the symmetry one :
for(ap = &a[9], ap >= a, ap--) *ap = 0;
is defined.
paiyi
One reason might be to allow these idioms on strings
while(*q++) do_something_with(*q);
or
while(*p++ = *q++);
where the pointers end up one element beyond the null terminator.
--
Simon.
Improvements can be made in the next revision of the standard. In between
revisions, they're just supposed to be fixing bugs.
--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
The purpose of that clause was to avoid breaking thousands of existing
programs. Counting up through an array is common, counting down isn't.
That might make memory allocation expensive in a number of cases.
Consider a processor where each array gets its own segment with
an access violation if pointer arithmetic takes a pointer out of
a segment. To allow a pointer to be one past the last element,
the system needs to allocate only one byte more to get valid
pointer arithmetic. To allow a pointer to be one before the first
element the system must allocate an additional complete element,
which can be huge.
"Jun Woong" <myco...@hanmail.net> wrote in message news:94f0654c.02032...@posting.google.com...
> "Svante" <spl...@nospam.com> wrote in message news:<nUem8.33624$l93.6...@newsb.telia.net>...
> > "Jun Woong" <myco...@hanmail.net> wrote in message news:94f0654c.02032...@posting.google.com...
> [...]
> > >
> > > From observation of the Committee's decision on possible defect
> > > reports, I learned that the Standard is not changed with only one
> > > reason that there is no good reason to not change the Standard.
> > > At present there seems to be no unavoidable reason to regard it as
> > > a defect.
> >
> > Could you clarify the last statement please? Too many negations there
> > for me ;-)
> >
> > Are you saying that as it can't be regarded as a definitive defect,
> > i.e. non-implementable or just plain wrong, the Committee will not
> > consider it? I.e. the Committe only considers true errors, not
> > any kind of improvement for any reason?
>
> What I meant is, even if it's a true error and many people agree with
> that position, the Committee would not decide to change the Standard
> unless it disagree with the original intention of the Committee,
> it's worth to be thought as significant improvement or it invalidates
> existing significant and correct programs.
>
> But in this case, I don't think it's a true error of the Standard,
> even if it makes many programs using that trick "more" conforming.
>
Ok. My point is that I think the Standards intention is to allow
for a pointer to roam freely in the largest containing array. But
that's not really what it says. So presumably there would be an
opening then.
Due to the contiguousness requirement I do think that Q+1 must point
to the beginning element of the next sub-array.
>
> Of course the examle implementation that I said above is very
> unreasonable and hard to find, but still conforms to the Standard.
Don't think so as I read it.
Because "one past the end" requires only a single addressable byte past
the end of the array. "One before the beginning" would require as many
addressable bytes before the beginning of the array as the size of an
element, which could be quite a lot.
-Larry Jones
The living dead don't NEED to solve word problems. -- Calvin
Does it violate the standard if an implementation puts some extra bytes
at the end of an array object?
In this case, array of array object has also contiguousness.
(Just think it as an array of a structure wrapping an array.)
Hasn't it?
paiyi
This is perfectly possible. But there is still requirement related
with the result of sizeof() - see below.
> May it be allowed to allocate 16*16 bytes for the use of "char B[15][10]"?
>
> In this case, the array respects the "contiguous" requirement BUT
> we can not be sure &B[5][10] points to the same place of &B[6][0].
It depends on where the padding bytes reside. Note that the
Standard require that
sizeof(array) == number of elements * sizeof(element type).
If the padding bytes reside between each sub-arrays and *visible*,
the above requirement can't hold, so it's not allowed. But when
they reside at the end of the entire array object and the sizeof
operator gives the correct result hiding the padding bytes, the
implementation can be conforming. (Note that the implementation
must do other things corresponding to this policy, e.g., dynamic
memory allocation).
A conforming implementation can place the padding bytes anywhere
it wants provided that no strictly conforming program can find
them so they are *not visible*. Thus in this case the directly
related question is, as I wrote in other posting, whether sub-
arrays within an array are related objects, I think.
>
> sizeof() applying to such type doesn't give the "real" space which the
> array occupies.
Then, the implementation can't be conforming.
For one thing because, on existing implementations, the
consequences of actually dereferencing that pointer are much more
serious _more often_ than the consequences of dereferecing the
higher pointer. I would guess by about 2:1 in most cases. I
leave you to consider the (system dependant) why.
--
Chuck F (cbfal...@yahoo.com) (cbfal...@XXXXworldnet.att.net)
Available for consulting/temporary embedded and systems.
(Remove "XXXX" from reply address. yahoo works unmodified)
mailto:u...@ftc.gov (for spambots to harvest)
Yes it does. It breaks sizeof array / sizeof array[0]. Consider what
happens in multidimensional array.
> In this case, array of array object has also contiguousness.
> (Just think it as an array of a structure wrapping an array.)
> Hasn't it?
The standard differentiates between contiguous storage, and
sequential storage. Sequential is in increasing memory but with
possible padding. Contiguous is without padding.
>No. The Standard doesn't require that &arr[5][10] must compare equal
>with &arr[6][0]. The Standard just says that if the result of the
>equalitiy operation whose operands are valid pointers is true, at
>least one of enumerated cases in that subclause must holds. Even if
>&arr[5][10] must compare grater than &arr[5][9], it can't be a
>criterion to decide whether an implementation is conforming that
>&arr[5][10] and &arr[6][0] compare equal.
The original question was not that &arr[5]10] and &arr[6][0] be
equal, but whether he could use functions like memcpy() or fwrite()
on such an array. Independent of the question about bounds checking,
I believe that this is true. I also believe, though I am willing
to hear other views, that once cast to (void*) any bounds checking
must apply to the whole array and not a subarray.
So, if I have a
double arr[10][10];
can I fwrite((void*)arr,sizeof double,10*10,out);
even on an implementation that would, through bounds checking,
disallow a reference to array[5][10]?
-- glen
There are other consequences.
Currently, a compiler can assume that a[i][j] and a[i+1][k] are not the
same object. (To be more precise, it is possible that the pointers have
the same value if j has the maximum legal value and k is 0, but if you
access a[i][j] and a[i+1][k] then they are different objects or you have
invoked undefined behavior). The fact that they are different objects
allows optimisations that wouldn't be legal otherwise.
Another situation:
int f (int n)
{
char a[2][4];
int i;
a [1][0] = 2;
for (i = 0; i < n; ++i) a [0][i] = 0;
/* More code ... */
}
Many processors (for example PowerPC and Pentium) can execute the for
loop with one or two instructions. (n must be <= 4, otherwise undefined
behavior. Therefore storing four bytes of zeroes with a single
instruction is correct if the processor allows unaligned stores. )
I don't really have a problem with limited bounds assumptions/checking
for accessing of array objects using an array object. But everything
in the standard points seems to make it very hard to interpret the
following in any other way than the intended, and expected:
void f(int n) {
char a[2][4], *p, (*ap)[2][4];
for (p = &a[0][0]; n; n--) {
*p++ = 0; /* should be defined for 0 <= n < 8 */
}
for (ap = &a; n; n--) {
(*ap)[0][n-1] = 0; /* should be defined for 0 <= n < 4 */
}
}
Not very useful, admittedly, but I wanted to mirror your example
below. The point is I would like it clearly stated that a plain
_pointer_ into an array object can traverse the area defined by
the largest enclosing array object. The standard seems to state
that 'p' points to the array object a[0], and is bound by it's
limits.
The other parts of the standard seems to state that all of the
subarrays of, and it's elements must reside in contiguous storage
without padding, and there is thus no reason to limit defined
behavior of 'p' to the subarray it happens to originate from.
Maybe I've missed something in the standard?
>
> Another situation:
>
> int f (int n)
> {
> char a[2][4];
> int i;
>
> a [1][0] = 2;
> for (i = 0; i < n; ++i) a [0][i] = 0;
> /* More code ... */
> }
>
> Many processors (for example PowerPC and Pentium) can execute the for
> loop with one or two instructions. (n must be <= 4, otherwise undefined
> behavior. Therefore storing four bytes of zeroes with a single
> instruction is correct if the processor allows unaligned stores. )
Right - see above.
For what it's worth, I share that opinion. That is, if you
visualize multi-dimensional arrays being mapped row-major order
onto a linear range of memory space, which is normally indeed
the case.
Others paint a different picture, where each row is in its
own segment (Dik) or cells are otherwise row-wise constrained.
In that case you would have to normalize indices to wrap on
subarray's boundaries in such a way that in double p[10][10];
p[x][y] == p[x + y/10] [y%10]
The standard however does not require to keep array indices
normalized that way.
willem
> "Jun Woong" <myco...@hanmail.net> wrote in message
> news:94f0654c.02032...@posting.google.com...> "Svante"
> <spl...@nospam.com> wrote in message
> news:<5Zem8.33625$l93.6...@newsb.telia.net>...> [...]
> > > >
> > > > Is Ch's behaviour non-conforming?
> > >
> > > Yes, it would seem so. (With the above caveat about %p really
> > > being irrelevant). &arr[5][10] is required to compare greater, and
> > > it is fully defined to refer to that address, as long as you don't
> > > apply unary '*' to it. Also &arr[5][10] is required to compare
> > > equal to&arr[6][0].
> >
> > No. The Standard doesn't require that &arr[5][10] must compare equal
> > with &arr[6][0]. The Standard just says that if the result of the
> > equalitiy operation whose operands are valid pointers is true, at
> > least one of enumerated cases in that subclause must holds. Even if
> > &arr[5][10] must compare grater than &arr[5][9], it can't be a
> > criterion to decide whether an implementation is conforming that
> > &arr[5][10] and &arr[6][0] compare equal.
>
> I think the standard does require &arr[5][10] == &arr[6][0].
>
> ISO/IEC 9899:1999
> 6.5.9 #6: "Two pointers compare equal if and only if [...] or one is
> a pointer to one past the end of one array object and the other is a
> pointer to the start of a different array object that happens to
> immediately follow the first array object in the address space."
>
> Together with the rules about contiguous allocation of array space,
> I think it conclusively does require the above equality.
What if array types of a large sized element go downwards in memory, and
array types of a small sized element go upwards. (Ie, arr[1] is at a
lower machine address than arr[0], but arr[0][1] is at a higher machine
address than arr[0][0]. This way when using pointers to the elements of
arr[0], you increment to one past the end of the array arr[0] to get a
pointer that is beyond the memory used by the array. The array is still
in a contiguous peice of memory isn't it? Is a conforming implementation
allowed to do that?
--
Niall: A woman who is yellow bounces brilliantly then bursts with henry
the fowler.
> The standard differentiates between contiguous storage, and
> sequential storage. Sequential is in increasing memory but with
> possible padding. Contiguous is without padding.
Is that machine addresses must increase, or that the pointers within an
object found via that object (rather than coincidental) must compare in
that manner?
As a consequence of this, in the function
void f (int* p)
{
p [1] = 0;
/* More code... */
}
an optimising compiler can assume that the assignment to p [1] cannot
modify any simple variable of type int, for example "static int i; ".
There is good reason for undefined behaviour because it makes range
checking possible, it makes a huge range of optimisations possible, and
the only code that would change from undefined behaviour to defined
behaviour is obfuscated in the first place and should be changed anyway.
I would find a change in the C Standard that only supports dubious
programming practices unacceptable.
As an example where this obfuscation will hurt you: Try to port code
from C to a different language (for example C++ with overloaded
operator[], or Java, or Fortran), and it breaks. In C++, anything could
happen. In Java, you will get a range check exception. In Fortran,
behaviour is just as undefined as in C, but because the order of array
elements is different, behavior will be different from C.
All in all I believe that the change in the C Standard that you suggest
would not benefit anyone.
If the Standard required the suggested shenanigans to work, it
would outlaw some forms of bounds checking. Considering the plague
of buffer overrun "exploits," it seems to me enhanced bounds checking
ought to be encouraged rather than banned. The state of the art
(what I've seen of it, anyhow) makes effective bounds checking too
expensive for widespread use, so the price of outlawing it is indeed
"slight:" it's like forbidding the purchase of eight-kilogram diamonds.
But I live in hope that the state of the art will advance, that
someday we will all live in El Dorado where the streets are paved
with gold and diamonds are as dust -- and if that glorious day ever
arrives, it will be a shame if our bygone penury prevents us from
enjoying it.
> To reiterate: There is no way an implementation today can behave otherwise
> than proposed, except in a few cases where the out-of-bounds condition
> is detected at compile time. Unless of course, said implementation carries
> extended type info at run-time.
Doesn't this paragraph merely say "X || !X"? If so, I'm
forced to agree ...
It is much more dangerous. Take this code fragment as an example:
#include <stdio.h>
int main (void)
{
#define ARRAY_SIZE 100000000
typedef struct { int data [ARRAY_SIZE]; } huge_struct;
huge_struct my_ struct;
huge_struct* p = &my_struct - 1;
if (p == NULL) printf ("p is NULL; this is unexpected.\n");
if (p > &my_struct) printf ("p is greater than &my_struct; this is unexpected\n");
return 0;
}
On many implementations one if the printf statements will be executed if
you replace ARRAY_SIZE with the right constant. There is much less
danger with &my_struct + 1, because that will point to just the next
byte after the last byte that belongs to my_struct.
I can't say for sure if it violates the standard, although I think it
does. But it will break lots of existing code that assumes that the size
of an array with n elements is n times the size of an element.
The standard doesn't care about machine addresses. Contiguity is
determined by pointer arithmetic, not addresses.
pd points at a double. pd+6 would also point at a double, if it were
legal. There is only one array of doubles that contains *pd; that is
array[6], and it only has 6 elements. 'array' itself is NOT an array of
doubles; its an array of arrays. *pd is not an element of 'array', it's
only an element of array[6]. Since there is no double that pd+6 could
point at that would be an element of the same array as *pd, 6.5.6p8
leaves the behavior of such code undefined.
> ... Do
> you agree with Douglas that pd += 6 is conforming, or not? It sounds
> as if not, but...
I am disagreeing with him. He said that there's been a change in the
text that makes this legal. I'm unaware of any such change. That doesn't
mean that I'm saying he's wrong; he's got better sources of information
than I do, being on the committee. For instance, I've never had the time
to give the Technical Corrigendum as close a review as I'd like, while
he actually had to vote on the thing. However, being on the committee
doesn't mean that he's necessarily right, either.
Incidentally, what Doug said was "is allowed", not "is conforming". The
adjective "conforming", when applied to an implementation, says
something very useful about that that implementation. However, when
applied to programs, the term "conforming" describes a uselessly broad
category, while the term "strictly conforming" describes an (almost)
uselessly narrow category.
I believe that "pd += 6" has undefined behavior. That would means that
it's not strictly conforming. However, it is conforming code. Conforming
code is any code that can be accepted by a conforming compiler, and
there are very few conforming compilers out there that won't accept that
code. Keep in mind that the Ten Commandments are also conforming C code,
though the number of conforming compilers that would accept them is much
smaller.
Do you have a basis for that opinion? The relevant evidence would be
testimony from the committee members about what they intended when they
wrote it (note: the standard itself has no intentions).
It's plausible to me that this might have been deliberate, to support
runtime bounds checking of arrays. Such a feature would certainly
improve the reliability of code that had been debugged using it.
However, it's also plausible to me that this was an unintended feature
of the standard.
If no strictly conforming program can find the padding bytes, what
makes the implementation non-conforming? You are mapping the abstract
machine described by the Standard to the real one, which is a mistake.
>
> > In this case, array of array object has also contiguousness.
> > (Just think it as an array of a structure wrapping an array.)
> > Hasn't it?
>
> The standard differentiates between contiguous storage, and
> sequential storage. Sequential is in increasing memory but with
> possible padding. Contiguous is without padding.
Of course yes. But if the padding bytes is not visible by any strictly
conforming program and the implementation follows the requirements
of the Standard for pointer arithmetic, sizeof, etc., a conforming
implementation put the padding bytes where it wants.
Note that the contiguity is defined with pointer arithmetic defined by
the Standard as James Kuyper said.
The essential problem that I think is, if the result of comparison
for order between two elements of different sub-arrays of the same
array is defined. In our original example, &arr[5][9] < &arr[6][0]
defined?
If it's defined, it means that the two elements are related objects,
which requires &arr[5][10] == &arr[6][0]. If it's undefined, they
are unrelated, which means that there is no need to guarantee that
&arr[5][10] == &arr[6][0]. And my position is the later. But I've
never thought about that comparison, so I want to know whether the
expression is defined or not.
Thanks.
As I said in other posting, I think it depends on whether they are
related objects or unrelated.
> I also believe, though I am willing
> to hear other views, that once cast to (void*) any bounds checking
> must apply to the whole array and not a subarray.
This view is officialy rejected by the Committee. The official
interpretation of the Standard doesn't guarantee that behavior
at least in C90; I'm waiting for Douglas's answer to my question
against his posting that says that behavior seems to be allowed
now in C99.
>
> So, if I have a
>
> double arr[10][10];
>
> can I fwrite((void*)arr,sizeof double,10*10,out);
>
> even on an implementation that would, through bounds checking,
> disallow a reference to array[5][10]?
The standard library functions need not be written in
strictly conforming C. And by definition, unsigned char
pointer doesn't be affected by that bounds checking,
which makes many standard library functions easy to
be implemented in strictly conforming C.
6.2.5
"- An array type describes a contiguously allocated nonempty set of objects
with a particular member object type, called the element type(36). Array
types are characterized by their element type and by the number of elements
in the array. ..."
Seems that if you define an array as "a contiguously allocated nonempty set
of objects" that having padding on the end of it would not match this
definition, as the padding would not be inclusive in the set.
A non-normative example in 6.5.3.4 is 6., Example 1:
"Another use of the sizeof operator is to compute the number of elements in
an array:
sizeof array / sizeof array[0]"
Seems that this would support that the intent of the standard is that an
array be composed only of a contigiously allocated set of objects, without
allowance for padding on the end.
-Daniel
True. The word increasing above is only in the sense that they addresses
are exposed through pointer arithmetic and comparison.
"Tristan Wibberley" <mcai...@stud.umist.ac.uk> wrote in message
news:20020321221728.6...@stud.umist.ac.uk...
I'm not sure if it's possible to implement malloc() in this kind of
implementation. Interesting - is this pure academia, or is there
some architecture out there where this is relevant? I'm not trying
to reduce the weight of your argument, just curious.
Of course. I think I have not made myself clear enough.
>
> There is good reason for undefined behaviour because it makes range
> checking possible, it makes a huge range of optimisations possible, and
> the only code that would change from undefined behaviour to defined
> behaviour is obfuscated in the first place and should be changed anyway.
> I would find a change in the C Standard that only supports dubious
> programming practices unacceptable.
The above is not a dubious practice, it's common and portable
practice. What I want to be defined, is:
int ia[10][10];
f(&ia[0][0]);
Then, in f, I want all accesses to array elements of the entire array
ia[10][10] to be well defined. I can't see that it is today, only
accesses to ia[0][0] .. ia[0][9] (in f).
>
> As an example where this obfuscation will hurt you: Try to port code
> from C to a different language (for example C++ with overloaded
> operator[], or Java, or Fortran), and it breaks. In C++, anything could
> happen. In Java, you will get a range check exception. In Fortran,
> behaviour is just as undefined as in C, but because the order of array
> elements is different, behavior will be different from C.
This I do not understand.
>
> All in all I believe that the change in the C Standard that you suggest
> would not benefit anyone.
I'm not sure a change is required - it might just be a clarification
issue. It depends on what it actually says today, which it has been
shown is hard to determine.
"Eric Sosman" <Eric....@sun.com> wrote in message news:3C9A6356...@sun.com...
This is a difficult point to communicate... I don't want to all outlaw
bounds checking. I do want to be able to access an entire array of
arrays as a flat array. The typical buffer overrun is in a one-dimensional
array. No change there.
>
> > To reiterate: There is no way an implementation today can behave otherwise
> > than proposed, except in a few cases where the out-of-bounds condition
> > is detected at compile time. Unless of course, said implementation carries
> > extended type info at run-time.
>
> Doesn't this paragraph merely say "X || !X"? If so, I'm
> forced to agree ...
Probably, and that's the point. It seems that the standard makes
behavior undefined, that execept for the case above, cannot be
anything but defined...
No testimony. Only circumstantial evidence from malloc(), array
contiquousness and some vagueness in the pointer arithmetic section.
>
> It's plausible to me that this might have been deliberate, to support
> runtime bounds checking of arrays. Such a feature would certainly
> improve the reliability of code that had been debugged using it.
> However, it's also plausible to me that this was an unintended feature
> of the standard.
Certainly both cases are plausible. I was just speculation (hoping?) that
the Committee had an intention, communicated through the standard. Obviously
the standard itself is not of many opinions... ;-)
It was not my intention to make that mistake, although I might have
been guilty in the sense that I tried to envision a real machine
with padding that would map onto the abstract machine and work
properly and failed. Maybe it's possible, but I just can't see how.
>
> >
> > > In this case, array of array object has also contiguousness.
> > > (Just think it as an array of a structure wrapping an array.)
> > > Hasn't it?
> >
> > The standard differentiates between contiguous storage, and
> > sequential storage. Sequential is in increasing memory but with
> > possible padding. Contiguous is without padding.
>
> Of course yes. But if the padding bytes is not visible by any strictly
> conforming program and the implementation follows the requirements
> of the Standard for pointer arithmetic, sizeof, etc., a conforming
> implementation put the padding bytes where it wants.
>
> Note that the contiguity is defined with pointer arithmetic defined by
> the Standard as James Kuyper said.
Absolutely - I just can't see how an implementation can have padding
in arrays and still conform.
>
> The essential problem that I think is, if the result of comparison
> for order between two elements of different sub-arrays of the same
> array is defined. In our original example, &arr[5][9] < &arr[6][0]
> defined?
Now we're going in a circle I think... I believe it is, since
&arr[5] < &arr[6] by definition. It would seem to follow that
&arr[5][9] < &arr[6][0] then, and from that (assuming to extra
padding between subarrays ;-) &array[5][10] == &array[6][0], but
creative arguments have been raised elsewhere...
>
> If it's defined, it means that the two elements are related objects,
> which requires &arr[5][10] == &arr[6][0]. If it's undefined, they
> are unrelated, which means that there is no need to guarantee that
> &arr[5][10] == &arr[6][0]. And my position is the later. But I've
> never thought about that comparison, so I want to know whether the
> expression is defined or not.
I think it is. See above.
I agree with &arr[5] < &arr[6], but ...
> It would seem to follow that
> &arr[5][9] < &arr[6][0] then, and from that (assuming to extra
> padding between subarrays ;-)
I don't think so. The Standard doesn't give a useful meaning to the
behavior which treats two-dim arrays as one-dim arrays, if the
Standard give a meaning comparison two elements of different sub-
arrays, it's inconsistent.
> &array[5][10] == &array[6][0], but
> creative arguments have been raised elsewhere...
I'll deal with this problem as a separate thread.
The circumstantial evidence agrees with the position of the Committee,
not your opinion, I think. The interpretation of the Standard by the
Committee is enough reasonable.
If the Standard endorses this behavior, the bounds checking can't
stop compilation of the program doing it only with the reason. (Cluase
4) Of course diagnostic messages are free to be issued.
> The typical buffer overrun is in a one-dimensional
> array.
The buffer overrun case is dealt as undefined in the Standard. It's
not directly connected with this multi-dim array problem.
Don't think it need be, but it's certainly not worded for easy
comprehension...
>
> > &array[5][10] == &array[6][0], but
> > creative arguments have been raised elsewhere...
>
> I'll deal with this problem as a separate thread.
Good idea - it doesn't have very much to do with "Re: alignment of memory
in static arrays" does it? <Grin>
Where can I find these notes on the interpretation of the Standard by the
Committee for this case? It would be helpful... If that is clearly expressed,
my opinion certainly carries less weight :-(.
You can find many similar questions in WG14 official website:
http://anubis.dkuug.dk/JTC1/SC22/WG14
You need to find relevant items of C90 DRs; C90 DRs have numbers
smaller than 200, i.e., DR001-DR1xx.
The thing is, that if there is no way for a strictly conforming program
to find the bytes, then there's no practical significance to their
presence. As far as the C standard is concerned, those bytes aren't
there, if you can't reach them by (char*)&object+offset, for any value
of offset between 0 and sizeof(object)-1. This is necessary, because it
would be illegal for them to be accessible by such an expression. Keep
in mind that this applies to the array as a whole, as well as for each
of its elements.
And you would thereby prohibit any array bounds checking which doesn't
allow that.
You are right. So the padding bytes problem is not concerned with the
real problem, whether &arr[5][10] == &arr[6][0] or not. I believe the
answer to the original problem can be given by whether they (the two
elements) are related or unrelated objects, so I posted this question
as a separate thread.
Obviously, that's what it means to allow such accesses I guess. I can see
no conflict though, i.e. it should be doable just the way it in fact is
in I think all present compiling compliant implementations. I.e. - it is
my thought that to allow would only be to defined what is already in
all practical terms universal practice.
The one-past-the-end pointer value arises naturally in
situations like
for (p = array; p < array + ELEMENT_COUNT; ++p)
Without the Standard's guarantee that it is legal to compute
such a value (though not, of course, to dereference it), this
kind of loop would be difficult to write. (If you know an
easy way to write this loop without using a one-past-the-end
value, by all means share it.)
> If the standard wants to restrict the usage of a pointer object in a
> rigorous way, why not restricts it up to "the end of on array object"?
> (Here, all non array object is refered as an array object of length 1.)
Other terminology could have been chosen, but might have
been ambiguous. For example, it's not obvious to me that "up
to the end of an array object" permits the formation of a pointer
value that compares `>' to all pointers to actual array elements,
which is what's desired for "natural" looping constructs.
Here you go:
for (i = 0; i < ELEMENT_COUNT; i++)
/* use array[i] instead of p in loop body */
-- Niklas Matthies
--
If all you have is a hammer, everything looks like a nail.
Unfortunately, C programmers are not generally willing to give up the
ability to step pointers instead of using array indexing. Never mind that
Fortran implementors figured out 40 years ago how to optimize it into the
same code.
And once the idiom became entrenched, it was politically necessary for the
C standard to support it. Never mind that it's probably the single feature
that results in most of the complexity and confusion in the standard for an
otherwise simple language.
--
Barry Margolin, bar...@genuity.net
Genuity, Woburn, MA
*** DON'T SEND TECHNICAL QUESTIONS DIRECTLY TO ME, post them to newsgroups.
Please DON'T copy followups to me -- I'll assume it wasn't posted to the group.
>Pai-Yi HSIAO <hs...@ccr.jussieu.fr> wrote in message news:<Pine.A41.4.10.1020320...@moka.ccr.jussieu.fr>...
>[...]
>>
>> On 19 Mar 2002, Jun Woong wrote:
>> > The expression "*pd1 = 1.0" above results in undefined behavior.
>> > Similar problems raised many times since C90 was published.
>>
>> Why?
>> Array type object can not be a midifiable lvalue.
>> 'pd1' is *ONLY* a pointer to double object.
>> '&array[5][9]' in addition to be a pointer to double contains boundary
>> information.
>>
>> While assigning &array[5][9] to pd1, the boundary information is lost.
>> Isn't it?
>>
>No, a conforming implementation can propagate the information.
How about if they are cast to (void*) do they still keep boundary
information?
If so, for the whole array or only the subarray?
-- glen
Yes, it's possible. See below.
>
> If so, for the whole array or only the subarray?
>
The important thing here is not whether a pointer of a certain type
is able to keep some boundary information within its encoding, or
whether the Standard requires a pointer converted to a certain type
to remove the possible boundary information.
The range in which a pointer moves with defined meaning is determined
by the value and the type of the pointer. The Standard never mentions
keeping or removing boundary condition. The range can be deduced from
the interpretation of the semantic for pointer arithmetic, and it is
ascertained by the Committee that it's the intended interpretation of
the Standard through DR questions. And I agree that the Committee's
interpretation is enough reasonable and correct. The interpreatation
is explained in detail by James Kuyper in his posting of this thread.
In fact range checking is not allowed if it prevents a s.c. program
from working.
> The range can be deduced from
> the interpretation of the semantic for pointer arithmetic,
Only under some circumstances.
> and it is ascertained by the Committee that it's the intended
> interpretation of the Standard through DR questions.
C90 DRs on this matter do not apply to C99.
In preparing C99 we debated such issues and there was consensus
for not supporting such extraneous out-of-band information through
pointer conversions, but I'd have to study the final wording again
to determine to what extent it reflects our intent.
We are takling about what a s.c.program can do, which determines
whether the range checking is allowed or not. You reversed the
order of the problems.
> > The range can be deduced from
> > the interpretation of the semantic for pointer arithmetic,
>
> Only under some circumstances.
Yes, so my very old question entitled "object questions" has
no answers until now. But from C90 DRs I believe that the
Committee can answer all of them and it must do.
>
> > and it is ascertained by the Committee that it's the intended
> > interpretation of the Standard through DR questions.
>
> C90 DRs on this matter do not apply to C99.
Provided that you can explicitly show how C99 is changed in
order to allow the pointer arithmetic which was not allowed
in C90.
>
> In preparing C99 we debated such issues and there was consensus
> for not supporting such extraneous out-of-band information through
> pointer conversions,
The quiet consensus of the Committee members does not affect
the fact that it's not allowed by C90 and thus by C99, unless
the intention of the consensus is not visible to the public
via the text of the Standard.
> but I'd have to study the final wording again
> to determine to what extent it reflects our intent.
I hope you show us the changes to reflect the consensus.
No, we were talking about range checking. As you already noted,
that is a concept outside the scope of the standard. Thus, bringing
it into the discussion loses focus, which was my point.
In C, there is a notion of an object, and there is also a notion of
type. Some expressions designate objects, and to identify what
object is designated, the type of the expression is consulted. The
set of rules concerning pointer construction are intended to ensure
that reasonable operations on objects can be performed in a s.c.
manner, without guaranteeing properties that are not necessary
for that purpose. The layout rules ensure that in a 5x5 array the
"inner" elements are contiguous across the end of rows; all padding
is contained entirely in each individual "inner" element. The whole
declared array object may be associated with an address space
(think segment) distinct from other unrelated objects, but within a
declared object (or a malloc()ed chunk) there is a uniform address
space. Therefore, pointers to unsigned char or "inner" elements
can be used to "walk through" the entire object (or chunk) so long
as the arithmetic rules are obeyed. On the other hand, "array[2][7]"
is an error because array[2] does not have 7 elements. &array[2][5]
would be allowed (in C99, not C90) by the one-past-the-end rule.
I would argue that we even intended for (int(*)[10])array+7 to be
allowed. I know that is not how we responded to a C90 DR (as I
recall I provided the words for the response), but at that time we
felt constrained to interpret the wording of the standard strictly as
written even in cases where it clearly didn't reflect what we intended
to say, so long as we thought we could live with what it clearly said.
Several such cases were revisited for C99 and are supposed to be
reflected in the current standard, or if not then we are more inclined
now (for DR purposes) to interpret the standard according to our
agreed intent and eventually fix wording infelicities via Technical
Corrigenda. If there is a *practical* problem involving this issue,
I hope a DR will be filed so we can deal with it officially. If on the
other hand this is just an academic quibble, we don't need a DR.
It was I, I believe, who brought up the issue. It is a practical problem
in the sense that no-one seems to be able to positively and uncontested
identify the intent or meaning of the standard, conceivably leading to
avoidance of good, solid strictly conforming coding practices for the
wrong reasons. In at least one case, one interpretative implementation
also seems to be n.c. due to "excessive" bounds checking.
There were voices raised wanting to disallow any access to the underlying
array elements without being constrained by the original array type,
with the motivation that otherwise buffer overflows would be more likely
to happen. I do not agree with this...
It was my meaning that the contiguousness requirement for arrays and
thus arrays of arrays, along with the equality comparison requirement,
and the pointer arithmetic rules together with malloc() makes it
basically impossible for an implementation to not invoke the expected
behavior in a "walk through" of an entire array. This should be s.c.,
and produce no output:
#include <stdio.h>
int main(void) {
int arr[10][10], (*pa1)[10], *pa2;
int i = 0;
pa1 = arr;
for (pa2 = *pa1; pa2 < &arr[10][0]; pa2++) {
*pa2 = i++;
}
for (i = 0; i < 100; i++) {
if (arr[i/10][i%10] != i) {
printf("Non-conforming implementation.\n");
}
}
return 0;
}
There has been some controversy if the above is s.c. or not. I would
hope that it is... But it's hard to unambiguously read it from the
standard it seems. I have elsethread suggested some alternative wording
to the paragraph on pointer arithmetic to clarify it along these lines.
For what reason does the standard make such one-past-the-end rule?
As well as this rule is introduced, the result of any pointer arithmetic
is required to obey the boundary condition. Especially, the fact that any
pointer to a nonarray object is regarded as a pointer to the first element
of an array of length one reveals a strong intention that the Committee
tries to make pointer arithmetic rigorous.
If the range checking concept is outside the scope of the original
consideration, I can hardly believe the Committee ignored this obvious
side effects while settling down the clauses.
If it is meant the propagation of the boundary condition from an
array to a pointer was not originally considered, a bigger conflict will
be induced due to the lost of the boundary condition for the pointer.
In such situation, the object pointed by the pointer is regarded as
the first element of an array of length 1. All the arithmetic increasing
more than 1 on this pointer invokes undefined behavior.
[...]
> If there is a *practical* problem involving this issue,
> I hope a DR will be filed so we can deal with it officially. If on the
> other hand this is just an academic quibble, we don't need a DR.
Academic as well as practical are both good reasons to deal with this
problem. Unambiguously writing make everything clear. Discussions makes
world progress.
paiyi
What I'm interested in is not boundary checking, whether the
OP started this discussion due to boundary checking. This is
a questions related to allowed pointer arithmetic and object.
> As you already noted,
> that is a concept outside the scope of the standard. Thus, bringing
> it into the discussion loses focus, which was my point.
Yes, boundary checking is outside the scope of the Standard, but
the same doesn't go for "allowed pointer arithmetic" question.
I am talking about it.
>
> In C, there is a notion of an object, and there is also a notion of
> type. Some expressions designate objects, and to identify what
> object is designated, the type of the expression is consulted. The
> set of rules concerning pointer construction are intended to ensure
> that reasonable operations on objects can be performed in a s.c.
> manner, without guaranteeing properties that are not necessary
> for that purpose. The layout rules ensure that in a 5x5 array the
> "inner" elements are contiguous across the end of rows; all padding
> is contained entirely in each individual "inner" element. The whole
> declared array object may be associated with an address space
> (think segment) distinct from other unrelated objects, but within a
> declared object (or a malloc()ed chunk) there is a uniform address
> space. Therefore, pointers to unsigned char or "inner" elements
> can be used to "walk through" the entire object (or chunk) so long
> as the arithmetic rules are obeyed. On the other hand, "array[2][7]"
> is an error because array[2] does not have 7 elements. &array[2][5]
> would be allowed (in C99, not C90) by the one-past-the-end rule.
> I would argue that we even intended for (int(*)[10])array+7 to be
> allowed.
I would never insist your interpretation is incorrect, but the
Committee's intention must be visible to the public with proper
wording of the Standard. If there is no change of the text of
the revised Standard, I believe that the C90's intention must
be able to apply provided that there is no "official"
interpreration provided in other forms like a response to a DR.
People can't read the intended or changed interpretation of the
Standard only in the Committee members' mind.
> I know that is not how we responded to a C90 DR (as I
> recall I provided the words for the response), but at that time we
> felt constrained to interpret the wording of the standard strictly as
> written even in cases where it clearly didn't reflect what we intended
> to say, so long as we thought we could live with what it clearly said.
> Several such cases were revisited for C99 and are supposed to be
> reflected in the current standard, or if not then we are more inclined
> now (for DR purposes) to interpret the standard according to our
> agreed intent and eventually fix wording infelicities via Technical
> Corrigenda. If there is a *practical* problem involving this issue,
> I hope a DR will be filed so we can deal with it officially. If on the
> other hand this is just an academic quibble, we don't need a DR.
Even if there is no "practical" problem, if the intention of
the Committee doesn't agree with the real wording of the
Standard, then it need a DR. And AFAICK one of the purposes
of the DR system is to provide the authoritative
interpretation of the Standard (I remember that there were
many such questions in C90 DRs).
Is it so hard to share the real intention of the Committee
members with people who want and need to know it? The
Standard exists not only for the Committee members.
I hope, in this case, you submit a DR to deny the (too
strict) response to the C90's DR partly that was provided
by you and to show the real intention "officially".
To support idioms such as
for ( p = array; p < &array[N]; ++p )
[In a strict reading of C89 that would have to be array+N.]
In order to do so it is only necessary for *one* byte past the end
of the array to have a valid address; note that this is *not* true
for down-counting:
for ( p = &array[N]; --p >= array; )
which would require as many bytes as the size of an array element
before the start of the array to have valid addresses. The latter
case is not only a problem for segmented architectures, but also
for almost any system where addresses have relatively small numeric
values, since subtracting too much can "wrap around" to a high
address and then the comparison will malfunction.
In other words, it is a requirement induced by practical
considerations; we wanted to support at least one convenient way
to program such loops that would be portable to all reasonable
target platforms. It had nothing to do with "language purity".
> If the range checking concept is outside the scope of the original
> consideration, I can hardly believe the Committee ignored this obvious
> side effects while settling down the clauses.
I didn't understand your argument, including what you meant by
"obvious side effects". Treating every object of a particular
*type* as being in effect an array of that type of length 1 for
purposes of pointer arithmetic involving that type merely
unifies certain issues of pointer arithmetic; in particular it
doesn't conflict with the kind of "walking through" operations I
described previously.
Of course there are *numerous* wording changes between the two
versions of the standard. *None* of the C90-based DRs should be
taken as applying to C99.
I meant parts in question only.
> *None* of the C90-based DRs should be
> taken as applying to C99.
If the entire Committee agrees with your position, then we
must repeat most of the questions and discussions raised
in C90 DRs, many newsgroups, .... even if there is no
real change between C90 and C99 wording; of course, DRs
added in TCs are exceptions. I think it's very reasonable
and valuable to take official responses to C90 DRs as
applying to C99, and IIRC C99 revision guidelines also
mention this. Don't you agree?
Thanks.
--
Jun Woong (myco...@hammail.net)
Well, let's talk about portability instead of the "language purity".
Why not disregard the "one-past-the-end rule" and leave the range
of pointer arithmetic implementation-specific?
(Here I mean give no *official* boundary restriction to a pointer
to an element of an array. We only needs a restriction while one wants
to "dereference" it.)
In this situation, "one-past-the-end", "ten-before-the-beginning" even
"thousands-past-the-end" are still defined if the pointer arithmetic
doesn't overflow. What we care about is if the pointer pointing to a legal
object before "dereferencing" it.
> > If the range checking concept is outside the scope of the original
> > consideration, I can hardly believe the Committee ignored this obvious
> > side effects while settling down the clauses.
>
> I didn't understand your argument, including what you meant by
> "obvious side effects".
Here I mean while the Committee makes the "one-past-the-end" rule,
the range checking concept (here I called "side effect of the
clauses") is automatically posed.
> Treating every object of a particular *type* as being in effect an array
> of that type of length 1 for purposes of pointer arithmetic involving
> that type merely unifies certain issues of pointer arithmetic;
> in particular it doesn't conflict with the kind of "walking through"
> operations I described previously.
I don't think so.
int A[10][10], *p;
After having assigned p the value &A[3][3],
if the boundary condition of array A can not be transfered to p,
the implementer has no information about where p points to.
(although the programmer knows it points to an element of array A.)
At this moment, what the implementer can do is to see the object pointed
by p as a nonarray (i.e. an array of length 1) for the safty reason.
In this respect, any arithmetic on p (including "walking through"
operations) is almost undefied by the standard.
paiyi
I would like to say the standard doesn't require a pointer pointing to
the lowest address byte of an object except this pointer is a pointer to
character type.
It means an implementation has the freedom to make "p= &array[N]" points
to the *highest* byte of the last element of array 'arrary'.
In this condition, the expression '++p' has more risk to overflow
than '--p' does while the size of the array element is very large.
paiyi
: I would like to say the standard doesn't require a pointer pointing to
: the lowest address byte of an object except this pointer is a pointer to
: character type.
Perhaps I need to join RH's reading comprehension class, but I don't
understand that sentence.
: It means an implementation has the freedom to make "p= &array[N]" points
: to the *highest* byte of the last element of array 'arrary'.
What if it's an array of bytes?
It would also require the addition of checking code on every pointer
arithmetic operation, e.g. for an increment
if (__last_element_ptr(p))
p = (__typeof(p))((unsigned char *)p + (sizeof *p - 1));
else
++p;
A rather unlikely implementation I think.
: In this condition, the expression '++p' has more risk to overflow
: than '--p' does while the size of the array element is very large.
If p points one-past-the-end of an array, as above, then ++p invokes
undefined behaviour, and --p points p at the last element - no chance of
overflow at all. I'm not sure what you're getting at.
-- Mat.
As I said, we wanted to guarantee that the up-counting pointer loop
would work. Without the rule, implementations could jam objects
against the high end of mapped segments, causing invalid-address
faults or wrap-around in such loops.
> Here I mean while the Committee makes the "one-past-the-end" rule,
> the range checking concept (here I called "side effect of the
> clauses") is automatically posed.
I don't think so.
> ... After having assigned p the value &A[3][3],
> if the boundary condition of array A can not be transfered to p,
> the implementer has no information about where p points to.
The "boundary condition" is your own invention. It's not inherent
in the language nor in actual hardware. And in fact every C
implementation I have access to allows "walking through" operations
such as I described; there is no natural reason why they wouldn't.
I was talking about machine addresses, not C pointers. In fact the
implementation needs to support only one extra byte of valid addressing
past the end of an object. On a word-oriented architecture there will
usually be additional padding within the addressed word, but at least
the required extra address is only that of the very *next* word, no
matter how large the array element. This would not be true for the
down-counting loop; if the elements are large objects (e.g. structures)
then potentially very many additional words below the actual array would
have to have valid, safe addresses.