In C/C++ it is possible to form the address of one past the last
element of the array.
I can see that this is required by a common idiom to copy `\0'
terminated arrays of chars
while (*d++ = *s++);
What are the historical roots of this decision to allow the forming of
this off-by-one element?
--
Best Regards,
Mika
----------------------------------------------------------
mika.m...@shire.ntc.nokia.com phone: +358-0-511-23587
Nokia Networks, PO box 320, FIN-00045 NOKIA GROUP, Finland
----------------------------------------------------------
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
[ about comp.lang.c++.moderated. First time posters: do this! ]
Yes. It has always been valid in C and C++ to form (but not dereference)
the address of one past the last element of an array.
> I can see that this is required by a common idiom to copy `\0'
> terminated arrays of chars
>
> while (*d++ = *s++);
>
> What are the historical roots of this decision to allow the forming of
> this off-by-one element?
To make it more convenient to write a loop to scan an entire array.
--
Steve Clamage, stephen...@sun.com
You do not form an off by one element, what both C and C++ require is
that the address (pointer value) for one past the end should be 'valid'
in the sense that using it for comparisons etc. should not cause a
program failure. There are many cases where iterating through an array
is most easily done by checking against the address of a hypothetical
element one past the end of the actual ones. This has nothing to do
with the use of '\0' as a string terminator. '\0' is a real element of
such an array but is given a special significance by functions managing
C-style strings.
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
> I can see that this is required by a common idiom to copy `\0'
> terminated arrays of chars
>
> while (*d++ = *s++);
>
> What are the historical roots of this decision to allow the forming of
> this off-by-one element?
>
Otherwise, to enforce not to allow that off-by-any-number-of element
address would require the compiler to keep track of the sizes of all
arrays.
void take_address(char *ptr)
{
char *p = &ptr[13];
// but only use p when you're sure it's ok
}
int main()
{
char *p1 = new char[5];
char *p2 = new char[256];
take_address(p1);
take_address(p2);
}
--
Martin Fabian http://www.s2.chalmers.se/~fabian/
--
"Cheer up. It may never happen" (Edina Monsoon)
/* Remove NOSPAM from reply-to address to mail me */
Mika Moilanen wrote:
>
> Dear C++ers,
>
> In C/C++ it is possible to form the address of one past the last
> element of the array.
>
When using C++ standard algos it's used more often in standard
containers
with the ::begin() ... ::end() concept where end() points to one element
after the last. The advantage is that end() represents a better
abstraction for iterators than a null pointer would and it also allows
you
to scan arrays which are only part of a larger array ( ::end() need not
be
an invalid element).
Art
/*
Rabatin Investment Technology Ltd
http://www.rabatin.com
But to do so for anything other than one past the end is to expose your
code to undefined behaviour. An excellent example of undefined
behaviour exactly because the compiler cannot, in general, diagnose your
mistake. What both C and C++ require is that one past the end shall be
a pointer that does not, in itself, cause undefined behaviour (though
dereferencing it will)
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
Francis> In article <61on1qd...@sirion.ntc.nokia.com>, Mika Moilanen
Francis> <mika.m...@stybba.ntc.nokia.com> writes
>> What are the historical roots of this decision to allow the forming of
>> this off-by-one element?
Francis> This has nothing to do
Francis> with the use of '\0' as a string terminator. '\0' is a real element of
Francis> such an array but is given a special significance by functions managing
Francis> C-style strings.
A typo, should have been "What are the historical roots of this
decision to allow the forming of the address this off-by-one element?"
^^^^^^^^^^^
Copying of C-style strings was just an example.
--
Best Regards,
Mika
----------------------------------------------------------
mika.m...@shire.ntc.nokia.com phone: +358-0-511-23587
Nokia Networks, PO box 320, FIN-00045 NOKIA GROUP, Finland
----------------------------------------------------------
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
No it's not. This code has undefined behavior:
char a[10];
a+15;
If the result does not point to elements of the array or one past the last
element the behavior is undefined.
>> What are the historical roots of this decision to allow the forming of
>> this off-by-one element?
Maybe to allow things like:
char a[size];
for(char* b = a ; b != a + size ; b++) {
...
}
-- Dag Henriksson
> > Dear C++ers,
> > In C/C++ it is possible to form the address of one past the last
> > element of the array.
> It is possible to form the address of _any_number_of_elements_ past the
> last element of an array. C/C++ has no knowledge of the size of your
> array.
While the latter is (partially) true, the former is not. At the moment
C/C++ creates the array, it knows the size. Therefore, it can ensure
that there is an address higher then the address of the last element.
I.e. in a 16-bit address space without overflow it can ensure no array
ends at 0xFFFF. But there is no requirement in either standard that
demands it can't end at 0xFFFE, 0xFFFD, etc.
> > I can see that this is required by a common idiom to copy `\0'
> > terminated arrays of chars
> > while (*d++ = *s++);
> > What are the historical roots of this decision to allow the forming of
> > this off-by-one element?
> Otherwise, to enforce not to allow that off-by-any-number-of element
> address would require the compiler to keep track of the sizes of all
> arrays.
You miss the point - the compiler doesn't have to do anything to
produce undefined behavior, and besides, allowing indexing two past
the end requires just as much knowledge of the array size as allowing
indexing one, or even zero elements behind the end.
> void take_address(char *ptr)
> {
> char *p = &ptr[13];
> // but only use p when you're sure it's ok
> }
Undefined behavior, if ptr points to an array with less than 12
elements (or a single char). Even if not used, calculating p
might produce an overflow with undefined behavior.
> int main()
> {
> char *p1 = new char[5];
> char *p2 = new char[256];
> take_address(p1);
> take_address(p2);
> }
The first call produces undefined behavior, after which
nothing can be said about the behavior of the program.
If that line is left out, the program is correct (
shows only defined behavior).
Michiel Salters
Not strictly true. To cover "normal hardware" first...
IsItReally(char* p)
{
char* q = p + INT_MAX;
assert(p<q);
}
On many platforms the assert will fail. Had I used UINT_MAX, then
it would fail on many (other) platforms. In the wacky world of Intel
16-bit programming, this was something that you really had to
worry about, since 64K is not an outrageous size for an array.
Moving on the "odd hardware", I am led to believe that some CPUs
have separate register types for addresses and data, and if you
compute a virtual address that isn't mapped to a physical page
then the program faults. The "one past the end" rule allows us to
write "while (++q<p+n)" without risking this fault.
> void take_address(char *ptr)
> {
> char *p = &ptr[13];
> // but only use p when you're sure it's ok
> }
> int main()
> {
> char *p1 = new char[5];
> char *p2 = new char[256];
> take_address(p1);
> take_address(p2);
> }
I believe the standard allows this program to crash, although (to
quote Mr Fabian's sig line) "Cheer up. It may never happen".
A reasonable logical and good point. Indeed, I've always though that
it is only guaranteed to form the address of just the one-past the
last element.
--
Best Regards,
Mika
----------------------------------------------------------
mika.m...@shire.ntc.nokia.com phone: +358-0-511-23587
Nokia Networks, PO box 320, FIN-00045 NOKIA GROUP, Finland
----------------------------------------------------------
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
Mika is completely correct. It is always permissible to form an address one
past the last element of an array. However, I believe Martin has ventured
into "format your hard drive" territory as there is no guarantee as to what
forming an address more than one past the last element in an array will do.
[...]
--
Barry L. Wallis, Senior Systems Engineer
Science Applications International Corporation
4161 Campus Point Court
San Diego, CA 92121
_____
"In all labor there is profit, but mere talk leads to poverty."
- King Solomon
Mika Moilanen wrote:
>
> What are the historical roots of this decision to allow the forming of
> this off-by-one element?
>
Isn't the off-by-one a hark back to C days where for an array of (say)
100 elements the "for" loop to visit each in sequence would be:
for(i=0;i<100;++i)
instead of:
for(i=0;i<=99;++i)
hence the notion that one off the end was acceptable and this idiom
persisted since then?
Though the above example happily used indexes within the bounds of
the array, re-write it as pointers and you get to compute the address
of one of the end of the array.
IIRC Andrew Koening mentioned in a previous post that some architectures
would do some sort of segment check if you loaded an address into an
address register, even if you didn't actually visit that address.
Hence the need to ensure that one-off-the-end was a viable address.
Mungo
--
Mungo Henning - it's a daft name but it goes with the face...
mungoh@itacs.strath.ac.uk.http://www.itacs.strath.ac.uk/
I speak for me, not my employer.
More to the point, it allows you to deal with empty (sub)-arrays.
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
I imagine there was originally no restriction - certainly there isn't in
BCPL because that language doesn't have a special pointer type. Probably
early B and C were the same.
Several people have said why having no restriction causes problems on
certain hardware. I want to mention garbage collection, too. Consider:
char *p = new char[100];
p += 200;
collect_garbage();
p -= 200;
use( p );
Code like the above can cause problems because the array is apparently
unreachable when collect_garbage() is called. The "one past the end"
restriction means that garbage collectors don't have to worry about cases
like this. They have an easy test for array reachability. (Admittedly few
C++ implementations have garbage collection, but they are available.)
So we must have some restriction. The obvious rule of only allowing
pointers into the array and not one past the end, causes obvious problems
in practice. The most interesting case is the degenerate one - how do you
represent an empty range? Consider:
char *p = new char[0];
One-past-the-end is the only part of the array that 'p' can point to.
Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
bran...@cix.co.uk | And close your eyes with holy dread,
| For he on honey dew hath fed
http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."
Mungo Henning wrote:
>
> Mika Moilanen wrote:
>
> >
> > What are the historical roots of this decision to allow the forming of
> > this off-by-one element?
> >
>
> Isn't the off-by-one a hark back to C days where for an array of (say)
> 100 elements the "for" loop to visit each in sequence would be:
> for(i=0;i<100;++i)
> instead of:
> for(i=0;i<=99;++i)
Actually those statements are completly equivalent!
[snip]
Heinz
The phrase "odd hardware" makes it sound like 1-s complement
arithmetic or 36-bit integers -- wierd things from the past that
you will never see. I'd therefore like to offer two thoughts.
Firstly, the Intel x86 is an example of such "odd hardware".
16-bit software for OS/2, Win3 and indeed Win95/98 must all
take a little care over "__huge" pointers.
Secondly, having dedicated "address registers" like this allows
the CPU to preload things like page tables and protection bits
before the memory access is made. Since this speeds up the CPU,
it is likely that at least some future architectures will do likewise.
But unsupported by the defintion of C or C++.
> Indeed, I've always though that
> it is only guaranteed to form the address of just the one-past the
> last element.
What you always thought is in fact what the C and C++ standards say.
--
Steve Clamage, stephen...@sun.com
>What are the historical roots of this decision to allow the forming of
>this off-by-one element?
Is the reason not a technical one?
Arrays are stored as a list in memory, so the elements of the array are
stored one after another. Pointer arithmetic is possible in C/C++. So
instead of a[3] you could use *(a+3). But then you could also add ANY
(integer) number, and the compiler can and should not stop you (or else p
arithmetic would only be possible for arrays).
And what about the one-before-the-start-element, &a[-1]?
--
Martin Fabian http://www.s2.chalmers.se/~fabian/
--
"Cheer up. It may never happen" (Edina Monsoon)
/* Remove NOSPAM from reply-to address to mail me */
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
Yes it is, but not the one you gave. The most important value of the
one-past-the-end (off-by-one is the name of a bug where you process one
too many elements) is that it allows the handling of empty (sub-arrays).
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
Yes it is, I'm just interested in the background of these decisions
which were once made.
Peter> then you could also add ANY
Peter> (integer) number, and the compiler can and should not stop you (or else p
Peter> arithmetic would only be possible for arrays).
As said earlier in this thread, the compiler isn't responsible for
tracking of the addresses particular pointer holds - the behaviour in
the case of forming other addresses than valid objects and one past
the last element of the array is simply undefined.
--
Best Regards,
Mika
----------------------------------------------------------
mika.m...@shire.ntc.nokia.com phone: +358-0-511-23587
Nokia Networks, PO box 320, FIN-00045 NOKIA GROUP, Finland
----------------------------------------------------------
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
Peter Balazs wrote:
>
> Hi!
>
> >What are the historical roots of this decision to allow the forming of
> >this off-by-one element?
>
> Is the reason not a technical one?
>
> Arrays are stored as a list in memory, so the elements of the array are
> stored one after another. Pointer arithmetic is possible in C/C++. So
> instead of a[3] you could use *(a+3). But then you could also add ANY
> (integer) number, and the compiler can and should not stop you (or else p
> arithmetic would only be possible for arrays).
The compiler doesn't stop you, but pointer arithmatic IS only defined
within arrays. There is *NO* assumption in C or C++ that the data
memory is one big linear space (and all of these requirements stem
from making well behaved implemenations on machine where the memory
is segmented). There is no guarantee that in this example:
int x, y, z;
int* xp = &x;
xp[1] yields anything deterministic.
In both cases, overflow or underflow in the address register.
--
Stan Brown, Oak Road Systems, Cleveland, Ohio, USA
http://www.mindspring.com/~brahms/
C++ FAQ Lite: http://www.cerfnet.com/~mpcline/c++-faq-lite/
the C++ standard: http://webstore.ansi.org/
more FAQs: http://www.mindspring.com/~brahms/faqget.htm
> > >What are the historical roots of this decision to allow the forming of
> > >this off-by-one element?
> > Is the reason not a technical one?
> > Arrays are stored as a list in memory, so the elements of the array are
> > stored one after another. Pointer arithmetic is possible in C/C++. So
> > instead of a[3] you could use *(a+3). But then you could also add ANY
> > (integer) number, and the compiler can and should not stop you (or else p
> > arithmetic would only be possible for arrays).
Pointer arithmentic _IS_ only possible for arrays!
> Exactly my point, but people keep telling me (us) that simply _forming_
> the address of anything past the one-off-element invokes undefined
> behavior. Still, I don't see how it could, it's all just arithmetics,
> add an offset to a pointer. As long as it's not used, what's the
> problem?
> And what about the one-before-the-start-element, &a[-1]?
> Martin Fabian http://www.s2.chalmers.se/~fabian/
What if the CPU you compile to has special hardware protection for memory?
I.e. When creating a pointer in an address register, the CPU does an
implicit check to see if the memory at that address is mapped. If not,
the program terminates.
A C/C++ compiler would translate a malloc call for N bytes into an OS
call for N+1 bytes, so &a to &a+N are all valid pointers, and can all
be loaded into the address register. But the moment a pointer of &a+N
is incremented, the program terminates (ungracefully, no detructors
called, etc.). The same result is obtained when &a[-1] is calculated.
All this doesn't prevent anybody from writing a fully compliant C++
compiler for this particular CPU, but non-standard code will fail.
You have it backwards. Any attempt to form an address that does not
point to an object (or into an object) results in undefined behaviour
(for good reason, for example on memory protected systems your program
is may core-dump). However both C and C++ require that an exception to
the general rule be made in the case of one-past-the-end.
Seriously there are many systems in which just loading an address that
does not belong to your program causes your program to fail.
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
The problem is that a pointer may be stored in a special kind of hardware
register with special support for indirection and bounds-checking. And the
bounds-checking policy may be to check when new values are assigned rather
than when they are indirected-through.
You can see this may be faster for pointers which are assigned-to once and
then indirected through many times. The check only happens once. The
assignment of one checked pointer to another may not need to repeat the
check.
Whatever. It sounds like a plausible CPU design. Whether it truly works
out better is for the hardware guys to decide. The C++ committee did not
want to preempt that expertise.
Dave Harris, Nottingham, UK | "Weave a circle round him thrice,
bran...@cix.co.uk | And close your eyes with holy dread,
| For he on honey dew hath fed
http://www.bhresearch.co.uk/ | And drunk the milk of Paradise."
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
>The compiler doesn't stop you, but pointer arithmatic IS only defined
>within arrays. There is *NO* assumption in C or C++ that the data
>memory is one big linear space (and all of these requirements stem
>from making well behaved implemenations on machine where the memory
>is segmented). There is no guarantee that in this example:
> int x, y, z;
> int* xp = &x;
>xp[1] yields anything deterministic.
You are quite right. However, there *is* a guarantee that xp+1
yields something deterministic, because a scalar is treated as
a one-element array, and you are permitted to form the address
of one past the end of an array.
There is no guarantee that either xp+2 or xp[2] yields anything
deterministic.
--
Andrew Koenig, a...@research.att.com, http://www.research.att.com/info/ark
I am not sure about this. Can someone give a quote from the standard
which states that a pointer can be treated as a one-element array?
Also, consider this case:
int* xp = 0;
Is xp + 1 guaranteed to be deterministic? I don't think so. There can be
systems where one past the null pointer is not a valid address and the
processor may trap on loading such pointers into address registers.
--
Biju Thomas
Biju Thomas wrote:
>
> Andrew Koenig wrote:
> >
> > In article <38848468...@sensor.com>, Ron Natalie <r...@sensor.com> wrote:
> >
> > > There is no guarantee that in this example:
> > > int x, y, z;
> > > int* xp = &x;
> > >xp[1] yields anything deterministic.
> >
> > You are quite right. However, there *is* a guarantee that xp+1
> > yields something deterministic, because a scalar is treated as
> > a one-element array, and you are permitted to form the address
> > of one past the end of an array.
> >
>
> I am not sure about this. Can someone give a quote from the standard
> which states that a pointer can be treated as a one-element array?
>
> Also, consider this case:
>
> int* xp = 0;
>
> Is xp + 1 guaranteed to be deterministic? I don't think so. There can be
> systems where one past the null pointer is not a valid address and the
> processor may trap on loading such pointers into address registers.
If you treat the pointer as a one-element array, it would be "&xp + 1".
"x + 1" builds the address one past the address pointed to!
In the example above, the poster assigns a valid address to xp before he
increases it.
Heinz
By definition, when xp is a pointer, xp[n] is exactly equivalent to
*(xp+n) (section 5.2.1).
>
> Also, consider this case:
>
> int* xp = 0;
>
> Is xp + 1 guaranteed to be deterministic? I don't think so.
Of course not. xp does not point to any object, so the result of xp+1
is undefined.
In the example above, xp points to an object, so xp+1 has a value
than can be compared to xp.
--
Steve Clamage, stephen...@sun.com
>From the 2nd Ed. of the C Programming Language, section A7.7:
"The provision for pointers just beyond the end of an array is new. It
legitimizes a common idiom for looping over the elements of an array."
Sent via Deja.com http://www.deja.com/
Before you buy.
Well what is one past the end of a non-existent object? There is a
sense in which a null pointer is one past the end already, indeed much
of its behaviour is similar to a one past the end pointer.
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
Francis Glassborow wrote:
>
> Well what is one past the end of a non-existent object? There is a
> sense in which a null pointer is one past the end already, indeed much
> of its behaviour is similar to a one past the end pointer.
If you really want something you can go one past the end of nothing,
you can allocate a zero element array (only dynamically):
int* p = new int[0];
p+1
would then be valid.
I have never considered this case but I am far from certain you are
correct. All that is required is that new will return a unique address,
I am unaware of any requirement that you will get a valid address by
incrementing the result. Can anyone quote chapter and verse from the
Standard to support Ron's view?
Francis Glassborow Journal Editor, Association of C & C++ Users
64 Southfield Rd
Oxford OX4 1PA +44(0)1865 246490
All opinions are mine and do not represent those of any organisation
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
My reasoning:
int* p = new int[x];
the last element is addressed by
p + (x - 1);
One past the last is
p + x;
Two past the end (invoking undefined behavior) is
p + x + 1;
Substituting 0 for x, one past the end is p + 0, two past the end
is p + 1.
Mike Ferrel
>> You are quite right. However, there *is* a guarantee that xp+1
>> yields something deterministic, because a scalar is treated as
>> a one-element array, and you are permitted to form the address
>> of one past the end of an array.
>I am not sure about this. Can someone give a quote from the standard
>which states that a pointer can be treated as a one-element array?
Subclause 5.7, paragraph 4:
For the purposes of these operators, a pointer to a nonarray
object behaves the same as a pointer to the first element of
an array of length one with the type of the object as its
element type.
>Also, consider this case:
> int* xp = 0;
>Is xp + 1 guaranteed to be deterministic? I don't think so.
No it isn't, because xp doesn't point to an object.
--
Andrew Koenig, a...@research.att.com, http://www.research.att.com/info/ark
[ Send an empty e-mail to c++-...@netlab.cs.rpi.edu for info ]
A good view is to see pointers and iterators not pointing to
objects, but to the borders between the objects. That is,
the array/container and its pointers/iterators look like
+---+---+---+---+ +---+---+---+---+ - +
| a | b | c | d | instead of | a | b | c | d | |
+---+---+---+---+ +---+---+---+---+ - +
^ ^ ^ ^
| | | |
begin end begin one-past-end
Now dereferencing a pointer/iterator means accessing the element
on the right. It's immediatly obvious that there's no element on
the right of the end pointer/iterator.
The range of elements described by the begin/end pair is given
by the elements in between. Again, it's obvious that if both are
the same, there's nothing in between, and therefore you get an
empty range in that case.
Finally, this viewpoint even makes the "displacement by one" of
the reverse_iterator obvious (i.e. the reverse iterator of an
iterator refers to the element directly _before_ the element
the original iterator refers to): reversing the range means
turning it around (think of a slide you look at from the other
side). By doing so, the element on the right side now is the
one which was originally on the left side.
Biju Thomas wrote:
> Andrew Koenig wrote:
> >
> > In article <38848468...@sensor.com>, Ron Natalie <r...@sensor.com>
wrote:
> >
> > > There is no guarantee that in this example:
> > > int x, y, z;
> > > int* xp = &x;
> > >xp[1] yields anything deterministic.
> >
> > You are quite right. However, there *is* a guarantee that xp+1
> > yields something deterministic, because a scalar is treated as
> > a one-element array, and you are permitted to form the address
> > of one past the end of an array.
> >
>
> I am not sure about this. Can someone give a quote from the standard
> which states that a pointer can be treated as a one-element array?
5.7p4
"For the purposes of these operators, a pointer to a nonarray object behaves
the same
as a pointer to the first element of an array of length one with the type of
the
object as its element type."
>
>
> Also, consider this case:
>
> int* xp = 0;
>
> Is xp + 1 guaranteed to be deterministic? I don't think so.
No. xp is not a pointer to a nonarray object. It's a null pointer.
-- Dag Henriksson