Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Pointer to &array[-1] - legal?

61 views
Skip to first unread message

KK6GM

unread,
Dec 5, 2011, 10:15:07 AM12/5/11
to
I've run into some code that uses this idiom for looping through an
array

some_type *ptr = &some_type_array[0] - 1;

while (*(++ptr) != some_val)
...

What I'm wondering is whether forming the pointer (&some_type_array[0]
- 1) is legal. I realize that this array is never used to access
memory. BTW, in the subject line I used a -1 index for brevity. I'm
assuming both forms are either legal or illegal, and it is not the
case that one is legal and the other is not.
--
comp.lang.c.moderated - moderation address: cl...@plethora.net -- you must
have an appropriate newsgroups line in your header for your mail to be seen,
or the newsgroup name in square brackets in the subject line. Sorry.

Hans-Bernhard Bröker

unread,
Dec 10, 2011, 5:06:33 AM12/10/11
to
On 05.12.2011 16:15, KK6GM wrote:
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr =&some_type_array[0] - 1;
>
> while (*(++ptr) != some_val)

That's similar to an old foe: the code that "Numerical Recipes in C"
was (still is?) full of. Those guys were so hell-bent on sticking with
their FORTRAN style 1-based indexing even in C code that wild horses
couldn't pull them away.

> What I'm wondering is whether forming the pointer (&some_type_array[0]
> - 1) is legal.

The the extent anything can be illegal in C, this is: it causes
undefined behaviour by triggering this, from C99 6.5.6p8:

> If both the pointer
> operand and the result point to elements of the same array object, or one past the last
> element of the array object, the evaluation shall not produce an overflow; otherwise, the
> behavior is undefined.

Barry Schwarz

unread,
Dec 10, 2011, 5:07:33 AM12/10/11
to
On Mon, 5 Dec 2011 09:15:07 -0600 (CST), KK6GM
<mjs...@scriptoriumdesigns.com> wrote:

>I've run into some code that uses this idiom for looping through an
>array
>
>some_type *ptr = &some_type_array[0] - 1;
>
>while (*(++ptr) != some_val)
> ...
>
>What I'm wondering is whether forming the pointer (&some_type_array[0]
>- 1) is legal. I realize that this array is never used to access
>memory. BTW, in the subject line I used a -1 index for brevity. I'm
>assuming both forms are either legal or illegal, and it is not the
>case that one is legal and the other is not.

It is not legal in either case. The relevant statement from the
standard is in 6.5.6-8: "If both the pointer operand and the result
point to elements of the same array object, or one past the last
element of the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined."

--
Remove del for email

Francis Glassborow

unread,
Dec 10, 2011, 5:08:03 AM12/10/11
to
On 06/12/2011 01:44, tenpointwo wrote:
> On Mon, 05 Dec 2011 09:15:07 -0600, KK6GM wrote:
>
>> I've run into some code that uses this idiom for looping through an
>> array
>>
>> some_type *ptr =&some_type_array[0] - 1;
>>
>> while (*(++ptr) != some_val)
>> ...
>>
>> What I'm wondering is whether forming the pointer (&some_type_array[0] -
>> 1) is legal. I realize that this array is never used to access memory.
>> BTW, in the subject line I used a -1 index for brevity. I'm assuming
>> both forms are either legal or illegal, and it is not the case that one
>> is legal and the other is not.
>
> Someone correct me if I'm wrong, I'm pretty new to C.
>
> First off, in your first line I think it could be
> some_type *ptr = some_type_array-1;
> since some_type_array is equivalent to&some_type_array[0], it's really
> up to you though.
>
> Secondly, I think it is a legal expression because&some_type_array[0] is
> just a pointer to the first element in the array, which is just an
> address.&some_type_array[i]-1 for some i would just give you the hex
> address of&some_type_array[i] - 1*sizeof(some_type).
> For instance if some_type=int, and&int_array[i]=0x00400564. Then
> &int_array[i-1]=0x00400560.
>
> So, to conclude, you can evaluate that expression, but you're not
> guaranteed what it would be since the program will just return the hex
> address 4 bytes before some_type_array[0] and treat it as a pointer to
> some_type.
>
> Have you tried compiling it?

A compiler will probably compile the code but that misses the point.
Compilers will frequently generate code that will exhibit undefined
behaviour. If a compiler rejects code you have an error but if a
compiler accepts code all you know is that the code is syntactically
correct but not whether it is semantically so (i.e. will do what you want)

Now just because you can compute an address does not make it a valid
address. It does not guarantee that the address is even within your
program's data area.

Taking or computing the address of an element before the start of an
array results in undefined behaviour. Even should it happen to work on
the hardware/OS you are using it is still something that you should
never do because one day it will stop working, perhaps catastrophically.


>
> --.

Jasen Betts

unread,
Dec 10, 2011, 5:08:33 AM12/10/11
to
On 2011-12-05, KK6GM <mjs...@scriptoriumdesigns.com> wrote:
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr = &some_type_array[0] - 1;

I'm fairly sure that's not legal. AIUI undefined behaviour.

eg: if the array elements are large and it has a low address array[-1]
could overflow past NULL (assuming NULL has the same representations
as 0). integer overflow is udefined bahaviour


--
⚂⚃ 100% natural

--- Posted via news://freenews.netfront.net/ - Complaints to ne...@netfront.net ---

CodeTrooper

unread,
Dec 10, 2011, 5:08:48 AM12/10/11
to
On Dec 5, 8:15 pm, KK6GM <mjsi...@scriptoriumdesigns.com> wrote:
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr = &some_type_array[0] - 1;
>
> while (*(++ptr) != some_val)
>   ...
>
> What I'm wondering is whether forming the pointer (&some_type_array[0]
> - 1) is legal.  I realize that this array is never used to access
> memory.  BTW, in the subject line I used a -1 index for brevity.  I'm
> assuming both forms are either legal or illegal, and it is not the
> case that one is legal and the other is not.
> --
> comp.lang.c.moderated - moderation address: c...@plethora.net -- you must
> have an appropriate newsgroups line in your header for your mail to be seen,
> or the newsgroup name in square brackets in the subject line.  Sorry.

As much as I understand, both (&some_type_array[0] - 1 as well as
some_type_array[-1]) are syntactically legal. However, while using
negative indices, one should be sure that elements are present
backwards to avoid unnecessary run-time issues; especially when using
a syntax like some_type_array[-1].

The reasoning behind these being legal is that myArray[i] is
considered to be a syntactic sugar for *(myArray + i) where i is not
bound to be a positive number always.

Kenneth Brody

unread,
Dec 10, 2011, 5:09:18 AM12/10/11
to
On 12/5/2011 8:44 PM, tenpointwo wrote:
> On Mon, 05 Dec 2011 09:15:07 -0600, KK6GM wrote:
>
>> I've run into some code that uses this idiom for looping through an
>> array
>>
>> some_type *ptr =&some_type_array[0] - 1;
>>
>> while (*(++ptr) != some_val)
>> ...
>>
>> What I'm wondering is whether forming the pointer (&some_type_array[0] -
>> 1) is legal. I realize that this array is never used to access memory.
>> BTW, in the subject line I used a -1 index for brevity. I'm assuming
>> both forms are either legal or illegal, and it is not the case that one
>> is legal and the other is not.

Calculating the address outside of the array (except for the address
immediately following the array) invokes undefined behavior.

> Someone correct me if I'm wrong, I'm pretty new to C.
[...]
> Secondly, I think it is a legal expression because&some_type_array[0] is
> just a pointer to the first element in the array, which is just an
[...]

It's not valid. I have used systems where simply calculating &something[-1]
can cause the program to crash at runtime. (Specifically, an address
register would underflow, causing a hardware fault.)

> Have you tried compiling it?

Note that "I got the results I expected to get" is a perfectly valid
consequence of UB. However, there is no guarantee that a different system,
different compiler on the same system, or even the next release of the same
compiler (or the same release, but with different compile-time options) will
give the same result.

--
Kenneth Brody

Gordon Burditt

unread,
Dec 10, 2011, 5:07:03 AM12/10/11
to
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr = &some_type_array[0] - 1;
>
> while (*(++ptr) != some_val)
> ...
>
> What I'm wondering is whether forming the pointer (&some_type_array[0]
> - 1) is legal.

No. Chances are you'll get away with it on most machines, though.
Given an array declared as sometype array[MAX]; , you may form
&array[0] thru &array[MAX] inclusive. Referencing memory at
&array[MAX] is not allowed. Even forming &array[-1] or &array[MAX+1]
is not allowed.

Under some circumstances, "overflow" on address calculations may
cause trouble (traps). So may address calculations on segmented
machines where "carries" or "borrows" into the segment register are
done inconsistently, because the compiler has no reason to believe
it can't get away with it.

I did run into a situation, though, where a loop like this:

struct whatever foo[MAX];
struct whatever *ptr;
....
for (ptr = &foo[MAX-1]; ptr >= &foo[0]; ptr--)
{
... use ptr->various fields ...;
}

This is (unexpectedly) an infinite loop on the machine I was using
it on. Pointers are compared as unsigned, and as it happened,
sizeof(struct whatever) was greater than the address of foo, because
there were few other global or static variables.

Therefore, after running the loop with ptr == &foo[0], ptr gets
"decremented" to a large postive number, so the loop keeps going.
Then when the body of the loop tries to access the array out of
range, it segfaults. Worse, adding code for debugging this added
more static variables before foo, increasing the address of foo,
causing the code to work as expected.

James Kuyper

unread,
Dec 10, 2011, 5:04:36 AM12/10/11
to
On 12/05/2011 10:15 AM, KK6GM wrote:
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr = &some_type_array[0] - 1;
>
> while (*(++ptr) != some_val)
> ...
>
> What I'm wondering is whether forming the pointer (&some_type_array[0]
> - 1) is legal. I realize that this array is never used to access
> memory. BTW, in the subject line I used a -1 index for brevity. I'm
> assuming both forms are either legal or illegal, and it is not the
> case that one is legal and the other is not.

The two forms are equivalent, as is some_type_array-1, which is even
simpler. Using any one of the three renders the behavior of that program
undefined. The intended behavior seems to be roughly equivalent to:

for(some_type *ptr = some_type_array; *ptr != some_val; ptr++)
{

}

except for the more restricted scope of 'ptr'. That can trivially be
changed if the code needs to compile with C90, or if the value of ptr is
needed outside of the loop.

Keith Thompson

unread,
Dec 10, 2011, 5:06:03 AM12/10/11
to
KK6GM <mjs...@scriptoriumdesigns.com> writes:
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr = &some_type_array[0] - 1;
>
> while (*(++ptr) != some_val)
> ...
>
> What I'm wondering is whether forming the pointer (&some_type_array[0]
> - 1) is legal. I realize that this array is never used to access
> memory. BTW, in the subject line I used a -1 index for brevity. I'm
> assuming both forms are either legal or illegal, and it is not the
> case that one is legal and the other is not.

The C standard doesn't use the word "legal", which introduces a bit of
subtlety.

The behavior of the expression (i.e., of computing the pointer value,
even without attempting to dereference it) is undefined. C99 6.5.6p8:

When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the
pointer operand.
[...]
If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.

On the other hand, it's not "illegal" in the sense that a compiler must
reject it. It's perfectly legitimate for a compiler to accept the code
without complaint, and for it to do exactly what you expect it to do.
On the other hand, optimizing compilers may *assume* that your program's
behavior is defined, and perform transformations based on that
assumption. For example:

int arr[10];
int *p = arr; /* points to arr[0] */
int *bad = p - 1; /* UB */
if (bad < arr) {
/* something */
}
else {
/* something else */
}

A naive compiler will probably generate code that performs the
comparison and executes the "something" branch, but an optimizing
compiler might recognize that the value of "bad" is derived from
arr, and that it cannot possibly be less than the base address of
the array (in the absence of undefined behavior), resulting in code
that executes the "something else" branch.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Francis Glassborow

unread,
Dec 10, 2011, 5:05:48 AM12/10/11
to
On 05/12/2011 15:15, KK6GM wrote:
> I've run into some code that uses this idiom for looping through an
> array
>
> some_type *ptr =&some_type_array[0] - 1;
>
> while (*(++ptr) != some_val)
> ...
>
> What I'm wondering is whether forming the pointer (&some_type_array[0]
> - 1) is legal. I realize that this array is never used to access
> memory. BTW, in the subject line I used a -1 index for brevity. I'm
> assuming both forms are either legal or illegal, and it is not the
> case that one is legal and the other is not.

The code attempts to access an address one before the start of an array.
Such an attempt results in undefined behaviour. The address one beyond
the end of an array is easy to support, all it needs, at worst, is for
the implementation to assign one padding byte after the end of the
array. Trying to support one before the beginning would require padding
of the size of an entire object. That is not reasonable and not supported.

Francis

Keith Thompson

unread,
Dec 11, 2011, 12:59:52 AM12/11/11
to
Jasen Betts <ja...@xnet.co.nz> writes:
> On 2011-12-05, KK6GM <mjs...@scriptoriumdesigns.com> wrote:
>> I've run into some code that uses this idiom for looping through an
>> array
>>
>> some_type *ptr = &some_type_array[0] - 1;
>
> I'm fairly sure that's not legal. AIUI undefined behaviour.
>
> eg: if the array elements are large and it has a low address array[-1]
> could overflow past NULL (assuming NULL has the same representations
> as 0). integer overflow is udefined bahaviour

Yes, the behavior of integer overflow is undefined (for signed
types), but that's not relevant here. It's a *pointer* that
potentially overflows (or underflows), and it could misbehave
whether it "overflows past NULL" or not.

Pointers are not numbers, and C does not require a monolithic
address space. Even if the address space is monolithic, it
could be a patchwork of regions that are read-write, read-only,
write-only, execute-only, or nonexistent from the point of view
of the currently running program. Computing an address before the
beginning of an object can fail because it crosses a boundary between
two such regions, or because the compiler inserts explicit checks,
or because the hardware peforms implicit checks, or for any other
reason. The standard refrains from defining the behavior so that
implementations are not constrained to a particular memory model.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Keith Thompson

unread,
Dec 11, 2011, 1:00:07 AM12/11/11
to
Francis Glassborow <francis.g...@btinternet.com> writes:
> On 06/12/2011 01:44, tenpointwo wrote:
[...]
>> Secondly, I think it is a legal expression because&some_type_array[0] is
>> just a pointer to the first element in the array, which is just an
>> address.&some_type_array[i]-1 for some i would just give you the hex
>> address of&some_type_array[i] - 1*sizeof(some_type).
>> For instance if some_type=int, and&int_array[i]=0x00400564. Then
>> &int_array[i-1]=0x00400560.
[...]
> Now just because you can compute an address does not make it a valid
> address. It does not guarantee that the address is even within your
> program's data area.
>
> Taking or computing the address of an element before the start of an
> array results in undefined behaviour. Even should it happen to work on
> the hardware/OS you are using it is still something that you should
> never do because one day it will stop working, perhaps catastrophically.

I can't see tenpointwo's article for some reason, so I'm piggypacking
onto Francis's followup.

The use of the phrase "hex address" suggests a misconception about
what addresses are in C. Addresses (pointer values) are not stored
in hexadecimal; they're stored as addresses, which may or may not
have some straightforward mapping to (either signed or unsigned)
integers. If you use printf("%p", (void*)ptr) to print a pointer
value, you'll get a hexadecimal representation on many (but not all)
systems, but that's just a convenience for human readability.

All stored values in C are composed of bits, and hexadecimal is a
convenient way of presenting those bits in human-readable form.

Addresses are not numbers. They're represented the same way as
integers on a lot of systems, but the C standard doesn't require
or imply that. On some systems, an address is a composite entity
consisting of, for example, a part that specifies a memory segment
and another part that indicates an offset within that segment.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Will write code for food.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
0 new messages