Making Fatal Hidden Assumptions

CBFalconer

unread,

Mar 6, 2006, 6:05:49 PM3/6/06

to

We often find hidden, and totally unnecessary, assumptions being
made in code. The following leans heavily on one particular
example, which happens to be in C. However similar things can (and
do) occur in any language.

These assumptions are generally made because of familiarity with
the language. As a non-code example, consider the idea that the
faulty code is written by blackguards bent on foulling the
language. The term blackguards is not in favor these days, and for
good reason. However, the older you are, the more likely you are
to have used it since childhood, and to use it again, barring
specific thought on the subject. The same type of thing applies to
writing code.

I hope, with this little monograph, to encourage people to examine
some hidden assumptions they are making in their code. As ever, in
dealing with C, the reference standard is the ISO C standard.
Versions can be found in text and pdf format, by searching for N869
and N1124. [1] The latter does not have a text version, but is
more up-to-date.

We will always have innocent appearing code with these kinds of
assumptions built-in. However it would be wise to annotate such
code to make the assumptions explicit, which can avoid a great deal
of agony when the code is reused under other systems.

In the following example, the code is as downloaded from the
referenced URL, and the comments are entirely mine, including the
'every 5' linenumber references.

/* Making fatal hidden assumptions */
/* Paul Hsiehs version of strlen.
http://www.azillionmonkeys.com/qed/asmexample.html

Some sneaky hidden assumptions here:
1. p = s - 1 is valid. Not guaranteed. Careless coding.
2. cast (int) p is meaningful. Not guaranteed.
3. Use of 2's complement arithmetic.
4. ints have no trap representations or hidden bits.
5. 4 == sizeof(int) && 8 == CHAR_BIT.
6. size_t is actually int.
7. sizeof(int) is a power of 2.
8. int alignment depends on a zeroed bit field.

Since strlen is normally supplied by the system, the system
designer can guarantee all but item 1. Otherwise this is
not portable. Item 1 can probably be beaten by suitable
code reorganization to avoid the initial p = s - 1. This
is a serious bug which, for example, can cause segfaults
on many systems. It is most likely to foul when (int)s
has the value 0, and is meaningful.

He fails to make the valid assumption: 1 == sizeof(char).
*/

#define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
#define SW (sizeof (int) / sizeof (char))

int xstrlen (const char *s) {
const char *p; /* 5 */
int d;

p = s - 1;
do {
p++; /* 10 */
if ((((int) p) & (SW - 1)) == 0) {
do {
d = *((int *) p);
p += SW;
} while (!hasNulByte (d)); /* 15 */
p -= SW;
}
} while (*p != 0);
return p - s;
} /* 20 */

Let us start with line 1! The constants appear to require that
sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
really looked too closely, and it is possible that the ~x term
allows for larger sizeof(int), but nothing allows for larger
CHAR_BIT. A further hidden assumption is that there are no trap
values in the representation of an int. Its functioning is
doubtful when sizeof(int) is less that 4. At the least it will
force promotion to long, which will seriously affect the speed.

This is an ingenious and speedy way of detecting a zero byte within
an int, provided the preconditions are met. There is nothing wrong
with it, PROVIDED we know when it is valid.

In line 2 we have the confusing use of sizeof(char), which is 1 by
definition. This just serves to obscure the fact that SW is
actually sizeof(int) later. No hidden assumptions have been made
here, but the usage helps to conceal later assumptions.

Line 4. Since this is intended to replace the systems strlen()
function, it would seem advantageous to use the appropriate
signature for the function. In particular strlen returns a size_t,
not an int. size_t is always unsigned.

In line 8 we come to a biggie. The standard specifically does not
guarantee the action of a pointer below an object. The only real
purpose of this statement is to compensate for the initial
increment in line 10. This can be avoided by rearrangement of the
code, which will then let the routine function where the
assumptions are valid. This is the only real error in the code
that I see.

In line 11 we have several hidden assumptions. The first is that
the cast of a pointer to an int is valid. This is never
guaranteed. A pointer can be much larger than an int, and may have
all sorts of non-integer like information embedded, such as segment
id. If sizeof(int) is less than 4 the validity of this is even
less likely.

Then we come to the purpose of the statement, which is to discover
if the pointer is suitably aligned for an int. It does this by
bit-anding with SW-1, which is the concealed sizeof(int)-1. This
won't be very useful if sizeof(int) is, say, 3 or any other
non-poweroftwo. In addition, it assumes that an aligned pointer
will have those bits zero. While this last is very likely in
todays systems, it is still an assumption. The system designer is
entitled to assume this, but user code is not.

Line 13 again uses the unwarranted cast of a pointer to an int.
This enables the use of the already suspicious macro hasNulByte in
line 15.

If all these assumptions are correct, line 19 finally calculates a
pointer difference (which is valid, and of type size_t or ssize_t,
but will always fit into a size_t). It then does a concealed cast
of this into an int, which could cause undefined or implementation
defined behaviour if the value exceeds what will fit into an int.
This one is also unnecessary, since it is trivial to define the
return type as size_t and guarantee success.

I haven't even mentioned the assumption of 2's complement
arithmetic, which I believe to be embedded in the hasNulByte
macro. I haven't bothered to think this out.

Would you believe that so many hidden assumptions can be embedded
in such innocent looking code? The sneaky thing is that the code
appears trivially correct at first glance. This is the stuff that
Heisenbugs are made of. Yet use of such code is fairly safe if we
are aware of those hidden assumptions.

I have cross-posted this without setting follow-ups, because I
believe that discussion will be valid in all the newsgroups posted.

[1] The draft C standards can be found at:
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/>

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

ena8...@yahoo.com

unread,

Mar 7, 2006, 3:10:31 AM3/7/06

to

CBFalconer wrote (in part):

> In the following example, the code is as downloaded from the
> referenced URL, and the comments are entirely mine, including the
> 'every 5' linenumber references.
>
> /* Making fatal hidden assumptions */
> /* Paul Hsiehs version of strlen.
> http://www.azillionmonkeys.com/qed/asmexample.html
>
> Some sneaky hidden assumptions here:
> 1. p = s - 1 is valid. Not guaranteed. Careless coding.
> 2. cast (int) p is meaningful. Not guaranteed.
> 3. Use of 2's complement arithmetic.
> 4. ints have no trap representations or hidden bits.
> 5. 4 == sizeof(int) && 8 == CHAR_BIT.
> 6. size_t is actually int.
> 7. sizeof(int) is a power of 2.
> 8. int alignment depends on a zeroed bit field.

> ...

None of these objections is warranted in the original context,
where the code is given as transliteration of some x86 assembly
language. In effect, the author is offering a function that
might be used as part of a C implementation, where none of the
usual portability considerations need apply. Your commentary
really should have included some statement along those lines.

James Dow Allen

unread,

Mar 7, 2006, 3:34:00 AM3/7/06

to

CBFalconer wrote:
> /* Making fatal hidden assumptions */
> /* Paul Hsiehs version of strlen.

I haven't read your entire treatise, and I surely wouldn't want to
defend
Mr. Hsieh (whose code, he's led us to understand, was written by his
dog)
but I do have some comments.

> #define SW (sizeof (int) / sizeof (char))

The fact that sizeof(char)==1 doesn't make this line bad *if* the
programmer feels that viewing SW as chars per int is important.
I don't defend this particular silliness, but using an expression for
a simple constant is often good if it makes the constant
self-documenting.

> Some sneaky hidden assumptions here:
> 1. p = s - 1 is valid. Not guaranteed. Careless coding.

Mr. Hsieh immediately does p++ and his code will be correct if then
p == s. I don't question Chuck's argument, or whether the C standard
allows the C compiler to trash the hard disk when it sees p=s-1,
but I'm sincerely curious whether anyone knows of an *actual*
environment
where p == s will ever be false after (p = s-1; p++).

Many of us fell in love with C because, in practice, it is so much
simpler
and more deterministic than many languages. Many of the discussions in
comp.lang.c seem like they'd be better in a new newsgroup:
comp.lang.i'd_rather_be_a_lawyer

:-) :-)

James Dow Allen

David Brown

unread,

Mar 7, 2006, 3:49:35 AM3/7/06

to

CBFalconer wrote:
> We often find hidden, and totally unnecessary, assumptions being
> made in code. The following leans heavily on one particular
> example, which happens to be in C. However similar things can (and
> do) occur in any language.
>
> These assumptions are generally made because of familiarity with
> the language. As a non-code example, consider the idea that the
> faulty code is written by blackguards bent on foulling the
> language. The term blackguards is not in favor these days, and for
> good reason. However, the older you are, the more likely you are
> to have used it since childhood, and to use it again, barring
> specific thought on the subject. The same type of thing applies to
> writing code.
>
> I hope, with this little monograph, to encourage people to examine
> some hidden assumptions they are making in their code. As ever, in
> dealing with C, the reference standard is the ISO C standard.
> Versions can be found in text and pdf format, by searching for N869
> and N1124. [1] The latter does not have a text version, but is
> more up-to-date.
>

Getting people to think about their code is no bad idea. I've added a
couple of comments below...

> We will always have innocent appearing code with these kinds of
> assumptions built-in. However it would be wise to annotate such
> code to make the assumptions explicit, which can avoid a great deal
> of agony when the code is reused under other systems.
>
> In the following example, the code is as downloaded from the
> referenced URL, and the comments are entirely mine, including the
> 'every 5' linenumber references.
>
> /* Making fatal hidden assumptions */
> /* Paul Hsiehs version of strlen.
> http://www.azillionmonkeys.com/qed/asmexample.html
>
> Some sneaky hidden assumptions here:
> 1. p = s - 1 is valid. Not guaranteed. Careless coding.

Not guaranteed in what way? You are not guaranteed that p will be a
valid pointer, but you don't require it to be a valid pointer - all that
is required is that "p = s - 1" followed by "p++" leaves p equal to s.
I'm not good enough at the laws of C to tell you if this is valid, but
the laws of real life implementation say that it *is* valid on standard
2's complement cpus, *assuming* you do not have any sort of saturated
arithmetic.

In other words, the assumption is valid on most systems, but may be
risky on some DSPs or on dinosaurs.

> 2. cast (int) p is meaningful. Not guaranteed.

In what way could it not be meaningful here? In particular, the code is
correct even in a 32-bit int, 64-bit pointer environment.

> 3. Use of 2's complement arithmetic.

Again, this is valid on everything bar a few dinosaurs.

> 4. ints have no trap representations or hidden bits.

Ditto.

> 5. 4 == sizeof(int) && 8 == CHAR_BIT.

This is clearly an assumption that is not valid in general (although
perfectly acceptable in the context of this webpage, which assumes an
x86 in 32-bit mode).

> 6. size_t is actually int.

No, the assumption is that the ratio of two size_t items is compatible
with int. This may or may not be valid according to the laws of C.

> 7. sizeof(int) is a power of 2.

This is implied by (5), so it's not a new assumption.

> 8. int alignment depends on a zeroed bit field.
>

I'm not quite sure what you mean here, but I think this is implied by
(3) and (4).

> Since strlen is normally supplied by the system, the system
> designer can guarantee all but item 1. Otherwise this is
> not portable. Item 1 can probably be beaten by suitable
> code reorganization to avoid the initial p = s - 1. This
> is a serious bug which, for example, can cause segfaults
> on many systems. It is most likely to foul when (int)s
> has the value 0, and is meaningful.

That's incorrect. No machine (that I can imagine) would segfault on p =
s - 1. It might segfault if p is then dereferenced, but it never is
(until it has been incremented again).

>
> He fails to make the valid assumption: 1 == sizeof(char).
> */
>

Since strlen is supplied by the system, it is reasonable to make
assumptions about the system when writing it. It is *impossible* to
write good C code for a low-level routine like this without making
assumptions. The nearest you can get to a portable solution is a
horrible mess involving pre-processor directives to generate different
code depending on the target architecture. *All* the assumptions made
here are valid in that context - the code is good.

As a general point, however, it is important to comment such routines
with their assumptions, and possibly use pre-processor directives to
give an error if the assumptions are not valid. In particular, this
code assumes a 32-bit int, 8-bit char model, and an otherwise standard CPU.

>
> Would you believe that so many hidden assumptions can be embedded
> in such innocent looking code? The sneaky thing is that the code
> appears trivially correct at first glance. This is the stuff that
> Heisenbugs are made of. Yet use of such code is fairly safe if we
> are aware of those hidden assumptions.
>
> I have cross-posted this without setting follow-ups, because I
> believe that discussion will be valid in all the newsgroups posted.
>
> [1] The draft C standards can be found at:
> <http://www.open-std.org/jtc1/sc22/wg14/www/docs/>
>

I'm following this in comp.arch.embedded, where we see a lot of
different target architectures, and where efficient code is perhaps more
important than in the world of "big" computers (at least, a higher
proportion of "big" computer programmers seem to forget about
efficiency...). There are two things about such target-specific
assumptions, in embedded programming - you should make such assumptions,
for the sake of good code, and you must know what they are, for the sake
of correct and re-usable code.

Flash Gordon

unread,

Mar 7, 2006, 4:18:25 AM3/7/06

to

ena8...@yahoo.com wrote:
> CBFalconer wrote (in part):
>> In the following example, the code is as downloaded from the
>> referenced URL, and the comments are entirely mine, including the
>> 'every 5' linenumber references.
>>
>> /* Making fatal hidden assumptions */
>> /* Paul Hsiehs version of strlen.
>> http://www.azillionmonkeys.com/qed/asmexample.html
>>
>> Some sneaky hidden assumptions here:
>> 1. p = s - 1 is valid. Not guaranteed. Careless coding.
>> 2. cast (int) p is meaningful. Not guaranteed.

Mentioned by Paul.

>> 3. Use of 2's complement arithmetic.
>> 4. ints have no trap representations or hidden bits.
>> 5. 4 == sizeof(int) && 8 == CHAR_BIT.

Paul says it wouldmentionedtaken care of.

>> 6. size_t is actually int.
>> 7. sizeof(int) is a power of 2.
>> 8. int alignment depends on a zeroed bit field.
>> ...
>
> None of these objections is warranted in the original context,
> where the code is given as transliteration of some x86 assembly
> language. In effect, the author is offering a function that
> might be used as part of a C implementation, where none of the
> usual portability considerations need apply. Your commentary
> really should have included some statement along those lines.

Much as I dislike some of what Paul posts, on that site Paul explicitly
states in the text that the code is non-portable and mentions as least a
couple of issues.

Personally, I would not expect a function documented as being
non-portable and optimised for a specific processor to specify every
single assumption.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

pete

unread,

Mar 7, 2006, 5:31:09 AM3/7/06

to

David Brown wrote:
>
> CBFalconer wrote:

> > http://www.azillionmonkeys.com/qed/asmexample.html
> >
> > Some sneaky hidden assumptions here:
> > 1. p = s - 1 is valid. Not guaranteed. Careless coding.
>
> Not guaranteed in what way? You are not guaranteed that p will be a
> valid pointer, but you don't require it to be a valid pointer
> - all that
> is required is that "p = s - 1" followed by "p++" leaves p equal to s.
> I'm not good enough at the laws of C to tell you if this is valid,

Merely subtracting 1 from s, renders the entire code undefined.
You're "off the map" as far as the laws of C are concerned.
On comp.lang.c,
we're mostly interested in what the laws of C *do* say
is guaranteed to work.

--
pete

pete

unread,

Mar 7, 2006, 5:36:14 AM3/7/06

to

James Dow Allen wrote:

>
> CBFalconer wrote:

> > Some sneaky hidden assumptions here:
> > 1. p = s - 1 is valid. Not guaranteed. Careless coding.
>
> Mr. Hsieh immediately does p++ and his code will be correct if then
> p == s. I don't question Chuck's argument, or whether the C standard
> allows the C compiler to trash the hard disk when it sees p=s-1,
> but I'm sincerely curious whether anyone knows of an *actual*
> environment
> where p == s will ever be false after (p = s-1; p++).
>
> Many of us fell in love with C because, in practice, it is so much
> simpler
> and more deterministic than many languages.
> Many of the discussions in
> comp.lang.c seem like they'd be better in a new newsgroup:
> comp.lang.i'd_rather_be_a_lawyer

If you don't want to know
whether or not (s - 1) causes undefined behavior,
then don't let it bother you.
You can write code any way you want.
But, if you want to discuss whether or not
(s - 1) causes undefined behavior, it does,
and this is the place to find out about it.

--
pete

Richard Heathfield

unread,

Mar 7, 2006, 6:11:10 AM3/7/06

to

James Dow Allen said:

>
> CBFalconer wrote:
>
>> #define SW (sizeof (int) / sizeof (char))
>
> The fact that sizeof(char)==1 doesn't make this line bad *if* the
> programmer feels that viewing SW as chars per int is important.
> I don't defend this particular silliness, but using an expression for
> a simple constant is often good if it makes the constant
> self-documenting.

Specifically, there is a good argument for char *p = malloc(n * sizeof *p).

>
>> Some sneaky hidden assumptions here:
>> 1. p = s - 1 is valid. Not guaranteed. Careless coding.
>
> Mr. Hsieh immediately does p++ and his code will be correct if then
> p == s.

Not so, since forming an invalid pointer invokes undefined behaviour.

> I don't question Chuck's argument, or whether the C standard
> allows the C compiler to trash the hard disk when it sees p=s-1,

It does.

> but I'm sincerely curious whether anyone knows of an *actual*
> environment
> where p == s will ever be false after (p = s-1; p++).

Since it won't compile, the question is academic. But assuming you meant to
have a test in there somewhere, my personal answer would be: no, I don't
know of such an environment off the top of my head, but there are some
weird environments out there (the comp.lang.c FAQ lists a few), and it
would certainly not surprise me to learn of such an environment. In any
case, such an environment could become mainstream next week, or next year,
or next decade. Carefully-written code that observes the rules will
continue to work in such environments.

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Gerry Quinn

unread,

Mar 7, 2006, 7:26:28 AM3/7/06

to

In article <440CC04D...@yahoo.com>, cbfal...@yahoo.com says...

> These assumptions are generally made because of familiarity with
> the language. As a non-code example, consider the idea that the
> faulty code is written by blackguards bent on foulling the
> language. The term blackguards is not in favor these days, and for
> good reason.

About as good a reason as the term niggardly, as far as I can tell.
Perhaps the words are appropriate in a post relating to fatal
assumptions.

- Gerry Quinn

Ben Bacarisse

unread,

Mar 7, 2006, 10:38:18 AM3/7/06

to

It seems to me ironic that, in a discussion about hidden assumptions, the
truth of this remark requires a hidden assumption about how the function
is called. Unless I am missing something big, p = s - 1 is fine unless s
points to the first element of an array (or worse)[1]. One can imagine
situations such as "string pool" implementations where all strings are
part of the same array where p = s - 1 is well defined. (You'd have to
take care with the first string: char pool[BIG_NUMBER]; char *pool_start =
pool + 1;).

[1] Since this is a language law discussion, we need to take the
definition from the standard where T *s is deemed to point to an array of
size one if it points to an object of type T. By "worse" I mean that s
does not point into (or just past) an array at all.

--
Ben.

Paul Burke

unread,

Mar 7, 2006, 11:45:06 AM3/7/06

to

Ben Bacarisse wrote:

> p = s - 1 is fine unless s
> points to the first element of an array (or worse)[1].

My simple mind must be missing something big here. If for pointer p,
(p-1) is deprecated because it's not guaranteed that it points to
anything sensible, why is p++ OK? There's no boundary checking in C
(unless you put it in).

Quote from p98 of K&R 2nd (ANSI) edition: "If pa points to a particular
element of an array, then by definition pa+1 points to the next element,
pa+i points to i elements after pa, and pa-i points to i elements before."

Paul Burke

Arthur J. O'Dwyer

unread,

Mar 7, 2006, 12:28:37 PM3/7/06

to

On Tue, 7 Mar 2006, Paul Burke wrote:
> Ben Bacarisse wrote:
>>
>> p = s - 1 is fine unless s
>> points to the first element of an array (or worse)[1].
>
> My simple mind must be missing something big here. If for pointer p, (p-1)
> is deprecated because it's not guaranteed that it points to anything
> sensible, why is p++ OK? There's no boundary checking in C (unless you
> put it in).

(BTW, "deprecated," in the context of programming-language standards,
means something that once was okay but is not recommended anymore. (p-1)
isn't like that.) Read on for your answer.

> Quote from p98 of K&R 2nd (ANSI) edition: "If pa points to a particular
> element of an array, then by definition pa+1 points to the next element,
> pa+i points to i elements after pa, and pa-i points to i elements before."

K&R answers your question. If pa points to some element of an array,
then pa-1 points to the /previous element/. But what's the "previous
element" relative to the first element in the array? It doesn't exist.
So we have undefined behavior.
The expression pa+1 is similar, but with one special case. If pa points
to the last element in the array, you might expect that pa+1 would be
undefined; but actually the C standard specifically allows you to evaluate
pa+1 in that case. Dereferencing that pointer, or incrementing it /again/,
however, invoke undefined behavior.

Basically: A C pointer must always point to something. "The
negative-oneth element of array a" is not "something."

int a[10];
int *pa = a+3;
pa-3; /* fine; points to a[0] */
pa+6; /* fine; points to a[9] */
pa+7; /* fine; points "after" a[9] */
pa+8; /* undefined behavior */
pa-4; /* undefined behavior */

HTH,
-Arthur

Ben Pfaff

unread,

Mar 7, 2006, 12:21:21 PM3/7/06

to

Paul Burke <pa...@scazon.com> writes:

> My simple mind must be missing something big here. If for pointer p,
> (p-1) is deprecated because it's not guaranteed that it points to
> anything sensible, why is p++ OK? There's no boundary checking in C
> (unless you put it in).

You're missing what the standard says about it:

8 When an expression that has integer type is added to or
subtracted from a pointer, the result has the type of the
pointer operand. If the pointer operand points to an element
of an array object, and the array is large enough, the
result points to an element offset from the original element
such that the difference of the subscripts of the resulting
and original array elements equals the integer expression.
In other words, if the expression P points to the i-th
element of an array object, the expressions (P)+N
(equivalently, N+(P)) and (P)-N (where N has the value n)
point to, respectively, the i+n-th and i-n-th elements of
the array object, provided they exist. Moreover, if the
expression P points to the last element of an array object,
the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the
last element of an array object, the expression (Q)-1 points
to the last element of the array object. If both the pointer
operand and the result point to elements of the same array
object, or one past the last element of the array object,
the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the
last element of the array object, it shall not be used as
the operand of a unary * operator that is evaluated.

(Wow, that's a mouthful.)
--
"To get the best out of this book, I strongly recommend that you read it."
--Richard Heathfield

Eric Sosman

unread,

Mar 7, 2006, 12:29:11 PM3/7/06

to

Paul Burke wrote On 03/07/06 11:45,:

There's a special dispensation that allows you to
calculate a pointer value that points to the imaginary
element "one position past the end" of an array. (You
cannot, of course, actually try to access the imaginary
element -- but you can form the pointer value, compare
it to other pointer values, and so on.)

There is no such special dispensation for an element
"one position before the beginning."

The Rationale says the Committee considered defining
the effects at both ends (bilateral dispensations?), but
rejected it for efficiency reasons. Consider an array of
large elements -- structs of 32KB size, say. A system
that actually performed hardware checking of pointer values
could accommodate the one-past-the-end rule by allocating
just one extra byte after the end of the array, a byte that
the special pointer value could point at without setting off
the hardware's alarms. But one-before-the-beginning would
require an extra 32KB, just to hold data that could never
be used ...

--
Eric....@sun.com

Ben Bacarisse

unread,

Mar 7, 2006, 12:40:11 PM3/7/06

to

On Tue, 07 Mar 2006 16:45:06 +0000, Paul Burke wrote:

> Ben Bacarisse wrote:
>
>> p = s - 1 is fine unless s
>> points to the first element of an array (or worse)[1].
>
> My simple mind must be missing something big here. If for pointer p, (p-1)
> is deprecated because it's not guaranteed that it points to anything
> sensible, why is p++ OK? There's no boundary checking in C (unless you put
> it in).

As your (snipped) quote from K&R illustrates, pointer arithmetic is
defined only "within" arrays. "Within", includes a pointer that points
just after the last element. This is very handy, since pointer
expressions like end - start are a common idiom. When a pointer, p,
points to the first element of an array, p - 1 is not defined (indeed it
may not be representable given a devious enough, but conforming,
implementation). p - 1 is not deprecated at all (so far as I know). It
is either perfectly valid or entirely undefined. p + 1 is well-defined if
p points to any array element (including the last). p + 2 is not defined
if p points to the last element of an array.

For the purpose of arithmetic, pointers to single data objects are treated
like pointers to arrays of size 1. So we have:

int x, a[2];

int *p = &x + 1; /* defined */
int *q = &x - 1; /* undefined */
int *r = &x + 2; /* undefined */

int *s = a; /* defined */
int *t = a - 1; /* undefined */
int *u = a + 1; /* defined */
int *v = a + 2; /* defined */
int *w = a + 3; /* undefined */

I hope I have not missed what you were talking about.

--
Ben.

Peter Harrison

unread,

Mar 7, 2006, 1:11:24 PM3/7/06

to

"Ben Pfaff" <b...@cs.stanford.edu> wrote in message
news:87acc2j...@benpfaff.org...

> Paul Burke <pa...@scazon.com> writes:
>
>> My simple mind must be missing something big here. If for pointer p,
>> (p-1) is deprecated because it's not guaranteed that it points to
>> anything sensible, why is p++ OK? There's no boundary checking in C
>> (unless you put it in).
>

It seems I too have a simple mind. I read the recent replies to this and
found myself not sure I am better off.

This is what I think I understand:

int x;
int *p;
int *q;

p = &x; /* is OK */
p = &x + 1; /* is OK even though we have no idea what p points to */
p = &x + 6; /* is undefined - does this mean that p may not be the */
/* address six locations beyond x? */
/* or just that we don't know what is there? */
p = &x - 1; / as previous */

But the poster was comparing with p++, so...

p = &x; /* so far so good */
p++; /* still ok (?) but we dont know what is there */
p++; /* is this now undefined? */

I guess _my_ question is - in this context does 'undefined' mean just that
we cannot say anything about what the pointer points to or that we cannot
say anything about the value of the pointer. So for example:

p = &x;
q = &x;
p = p+8;
q = q+8;

should p and q have the same value or is that undefined.

Pete Harrison

Walter Roberson

unread,

Mar 7, 2006, 1:26:52 PM3/7/06

to

In article <Pine.LNX.4.60-041....@unix43.andrew.cmu.edu>,

Arthur J. O'Dwyer <ajon...@andrew.cmu.edu> wrote:
> K&R answers your question. If pa points to some element of an array,
>then pa-1 points to the /previous element/. But what's the "previous
>element" relative to the first element in the array? It doesn't exist.
>So we have undefined behavior.
> The expression pa+1 is similar, but with one special case. If pa points
>to the last element in the array, you might expect that pa+1 would be
>undefined; but actually the C standard specifically allows you to evaluate
>pa+1 in that case. Dereferencing that pointer, or incrementing it /again/,
>however, invoke undefined behavior.

Right.

These limitations may perhaps be most easily be understood with
reference to the VAX "descriptor" model of pointers, in which a pointer
did not refer -directly- to the target memory, but instead refered to
a a -description- of that memory, including sizes and basic types
and current offset. In that scheme, with pa a pointer, pa-1 involves
internal work to produce the descriptor with the appropriate sizes and
offsets -- and at the time the relative offset was calculated, it would
be compared to the known bounds, and an exception could occur *then*
[when the pointer was built] rather than at the time the pointer was used.

Other circumstances where it could make a difference include cases in
which pa points to a block of memory at the very beginning of a memory
segment (on a segmented architecture). The calculation of the value of
pa-1 could then proceed in several ways:

- by exception (because the system notes the attempt to point before
the segment beginning)

- by wrapping the relative segment offset field to its maximum value
(which might or might not trigger odd behaviours)

- by wrapping the relative segment offset field to its maximum value -and-
decrementing the field that holds the address register number that
holds the base virtual memory (this effectively pointing into
a completely -different- block of memory, which might or might not
trigger odd behaviours)

- by noticing that the segment descriptor is not suitable for the
pointer and building a new segment descriptor that covers the extended
range (which might use up scarce segment descriptors unnecessarily)

- by exception (because the system notes that the new memory
address is not one that the user has access rights to)

There are undoubtedly other situations, but the point remains that
creating a pointer to "before" an object is not certain to work,
even if that pointer is never dereferenced.
--
Okay, buzzwords only. Two syllables, tops. -- Laurie Anderson

Robin Haigh

unread,

Mar 7, 2006, 1:56:34 PM3/7/06

to

"Peter Harrison" <pe...@cannock.ac.uk> wrote in message
news:dukicm$qhl$1...@south.jnrs.ja.net...

Basically you're supposed to assume that the arithmetic can't be done if the
result (or an intermediate value) goes out of range. As if it were a
trapped overflow. The behaviour is undefined as soon as you evaluate the
r.h.s, so you don't even know that the assignment to p or q actually takes
place.

--
RSH

Keith Thompson

unread,

Mar 7, 2006, 2:34:02 PM3/7/06

to

"Peter Harrison" <pe...@cannock.ac.uk> writes:
[...]

> This is what I think I understand:
>
> int x;
> int *p;
> int *q;
>
> p = &x; /* is OK */
> p = &x + 1; /* is OK even though we have no idea what p points to */
> p = &x + 6; /* is undefined - does this mean that p may not be the */
> /* address six locations beyond x? */
> /* or just that we don't know what is there? */

[...]

> p = &x - 1; / as previous */
>
> But the poster was comparing with p++, so...
>
> p = &x; /* so far so good */
> p++; /* still ok (?) but we dont know what is there */
> p++; /* is this now undefined? */
>
> I guess _my_ question is - in this context does 'undefined' mean just that
> we cannot say anything about what the pointer points to or that we cannot
> say anything about the value of the pointer. So for example:
>
> p = &x;
> q = &x;
> p = p+8;
> q = q+8;
>
> should p and q have the same value or is that undefined.

"Undefined" means far more than that. It's not an undefined *value*,
it's undefined *behavior*, which the standard defines as:

behavior, upon use of a nonportable or erroneous program construct
or of erroneous data, for which this International Standard
imposes no requirements

NOTE Possible undefined behavior ranges from ignoring the
situation completely with unpredictable results, to behaving
during translation or program execution in a documented manner
characteristic of the environment (with or without the issuance of
a diagnostic message), to terminating a translation or execution
(with the issuance of a diagnostic message).

Once undefined behavior has occurred, it's meaningless to talk about
the value of p. If your program or your computer crashes, p doesn't
have a value; if demons start flying out of your nose (in accordance
with the standard joke around here), you'll have more important things
to worry about.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
We must do something. This is something. Therefore, we must do this.

Andrew Reilly

unread,

Mar 7, 2006, 3:57:39 PM3/7/06

to

It's precicely this sort of tomfoolery on the part of the C standards
committee that has brought the language into such ill-repute in recent
years. It's practically unworkable now, compared to how it was in (say)
the immediately post-ANSI-fication years.

The code in question could trivially have p replaced by (p+p_index)
everywhere, where p_index is an int, and all of the arithmetic currently
effected on p is instead effected on p_index. I.e., if p was set to s,
and p_index set to -1, and p_index++ appeared as the first element inside
the outer do loop.

So having the standard make the semantic equivalent "undefined" only
serves to make the standard itself ever more pointless.

Bah, humbug. Think I'll go back to assembly language, where pointers do
what you tell them to, and don't complain about it to their lawyers.

--
Andrew

Andrew Reilly

unread,

Mar 7, 2006, 4:05:47 PM3/7/06

to

On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
> K&R answers your question. If pa points to some element of an array,
> then pa-1 points to the /previous element/. But what's the "previous
> element" relative to the first element in the array? It doesn't exist. So
> we have undefined behavior.

Only because the standard says so. Didn't have to be that way. There are
plenty of logically correct algorithms that could exist that involve
pointers that point somewhere outside of a[0..N]. As long as there's no
de-referencing, no harm, no foul. (Consider the simple case of iterating
through an array at a non-unit stride, using the normal p < s + N
termination condition. The loop finishes with p > s + N and the standard
says "pow, you're dead", when the semantically identical code written with
integer indexes has no legal problems.

> The expression pa+1 is similar, but with one special case. If pa
> points
> to the last element in the array, you might expect that pa+1 would be
> undefined; but actually the C standard specifically allows you to
> evaluate pa+1 in that case. Dereferencing that pointer, or incrementing
> it /again/, however, invoke undefined behavior.
>
> Basically: A C pointer must always point to something. "The
> negative-oneth element of array a" is not "something."

Only because the standard says so. The standard is stupid, in that
respect.

--
Andrew

Andrew Reilly

unread,

Mar 7, 2006, 4:14:37 PM3/7/06

to

On Tue, 07 Mar 2006 13:29:11 -0500, Eric Sosman wrote:
> The Rationale says the Committee considered defining
> the effects at both ends (bilateral dispensations?), but rejected it for
> efficiency reasons. Consider an array of large elements -- structs of
> 32KB size, say. A system that actually performed hardware checking of
> pointer values could accommodate the one-past-the-end rule by allocating
> just one extra byte after the end of the array, a byte that the special
> pointer value could point at without setting off the hardware's alarms.
> But one-before-the-beginning would require an extra 32KB, just to hold
> data that could never be used ...

So the standards body broke decades of practice and perfectly safe and
reasonable code to support a *hypothetical* implementation that was so
stupid that it checked pointer values, rather than pointer *use*?
Amazing.

--
Andrew

Ben Pfaff

unread,

Mar 7, 2006, 4:10:38 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:

> It's precicely this sort of tomfoolery on the part of the C standards
> committee that has brought the language into such ill-repute in recent
> years. It's practically unworkable now, compared to how it was in (say)
> the immediately post-ANSI-fication years.

The semantics you're complaining (one-past-the-end) about didn't
change in important ways from C89 to C99. I don't know why you'd
think the C99 semantics are unreasonable if you didn't think the
C89 semantics were unreasonable.
--
int main(void){char p[]="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz.\
\n",*q="kl BIcNBFr.NKEzjwCIxNJC";int i=sizeof p/2;char *strchr();int putchar(\
);while(*q){i+=strchr(p,*q++)-p;if(i>=(int)sizeof p)i-=sizeof p-1;putchar(p[i]\
);}return 0;}

Keith Thompson

unread,

Mar 7, 2006, 4:26:57 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
>> K&R answers your question. If pa points to some element of an array,
>> then pa-1 points to the /previous element/. But what's the "previous
>> element" relative to the first element in the array? It doesn't exist. So
>> we have undefined behavior.
>
> Only because the standard says so. Didn't have to be that way. There are
> plenty of logically correct algorithms that could exist that involve
> pointers that point somewhere outside of a[0..N]. As long as there's no
> de-referencing, no harm, no foul. (Consider the simple case of iterating
> through an array at a non-unit stride, using the normal p < s + N
> termination condition. The loop finishes with p > s + N and the standard
> says "pow, you're dead", when the semantically identical code written with
> integer indexes has no legal problems.

The standard is specifically designed to allow for architectures where
constructing an invalid pointer value can cause a trap even if the
pointer is not dereferenced.

For example, given:

int n;
int *ptr1 = &n; /* ptr1 points to n */
int *ptr2 = ptr1 - 1 /* ptr2 points *before* n in memory; this
invokes undefined behavior. */

Suppose some CPU uses special instructions and registers for pointer
values. Suppose arr happens to be allocated at the very beginning of
a memory segment. Just constructing the value ptr1-1 could cause a
trap of some sort. Or it could quietly yield a value such that
ptr2+1 != ptr1.

By saying that this is undefined behavior, the C standard isn't
forbidding you to do it; it's just refusing to tell you how it
behaves. If you're using an implementation that guarantees that this
will work the way you want it to, and if you're not concerned about
portability to other implementations, there's nothing stopping you
from doing it.

On the other hand, if the standard had defined the behavior of this
construct, it would have required all implementations to support it.
On a system that does strong checking of all pointer values, a C
compiler might have to generate inefficient code to meet the
standard's requirements.

Ben Pfaff

unread,

Mar 7, 2006, 4:37:27 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:

> So the standards body broke decades of practice and perfectly safe and
> reasonable code to support a *hypothetical* implementation that was so
> stupid that it checked pointer values, rather than pointer *use*?

You can still write your code to make whatever assumptions you
like. You just can't assume that it will work portably. If, for
example, you are writing code for a particular embedded
architecture with a given compiler, then it may be reasonable to
make assumptions beyond those granted by the standard.

In other words, the standard provides minimum guarantees. Your
implementation may provide stronger ones.
--
"It would be a much better example of undefined behavior
if the behavior were undefined."
--Michael Rubenstein

Andrew Reilly

unread,

Mar 7, 2006, 4:41:32 PM3/7/06

to

On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:

> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
>> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
>>> K&R answers your question. If pa points to some element of an array,
>>> then pa-1 points to the /previous element/. But what's the "previous
>>> element" relative to the first element in the array? It doesn't exist.
>>> So we have undefined behavior.
>>
>> Only because the standard says so. Didn't have to be that way. There
>> are plenty of logically correct algorithms that could exist that involve
>> pointers that point somewhere outside of a[0..N]. As long as there's no
>> de-referencing, no harm, no foul. (Consider the simple case of
>> iterating through an array at a non-unit stride, using the normal p < s
>> + N termination condition. The loop finishes with p > s + N and the
>> standard says "pow, you're dead", when the semantically identical code
>> written with integer indexes has no legal problems.
>
> The standard is specifically designed to allow for architectures where
> constructing an invalid pointer value can cause a trap even if the pointer
> is not dereferenced.

And are there any? Any in common use? Any where the equivalent (well
defined) pointer+offset code would be slower?

I'll need that list to know which suppliers to avoid...

> For example, given:
>
> int n;
> int *ptr1 = &n; /* ptr1 points to n */ int *ptr2 = ptr1 - 1 /*
> ptr2 points *before* n in memory; this
> invokes undefined behavior. */
>
> Suppose some CPU uses special instructions and registers for pointer
> values. Suppose arr happens to be allocated at the very beginning of a
> memory segment. Just constructing the value ptr1-1 could cause a trap
> of some sort. Or it could quietly yield a value such that ptr2+1 !=
> ptr1.

Suppose the computer uses tribits.

Standards are meant to codify common practice. If you want a language
that only has object references and array indices, there are plenty of
those to chose from.

> By saying that this is undefined behavior, the C standard isn't
> forbidding you to do it; it's just refusing to tell you how it behaves.

And that helps who?

> If you're using an implementation that guarantees that this will work
> the way you want it to, and if you're not concerned about portability to
> other implementations, there's nothing stopping you from doing it.

Which implementations?

> On the other hand, if the standard had defined the behavior of this
> construct, it would have required all implementations to support it. On
> a system that does strong checking of all pointer values, a C compiler
> might have to generate inefficient code to meet the standard's
> requirements.

That would be a *good* thing. Checking any earlier than at reference time
breaks what it is about C that makes it C.

OK, I've made enough of a fool of myself already. I'll go and have that
second cup of coffee for the morning, before I start going on about having
the standard support non-2's complement integers, or machines that have no
arithmetic right shifts...

--
Andrew

Jordan Abel

unread,

Mar 7, 2006, 4:46:14 PM3/7/06

to

On 2006-03-07, Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
> OK, I've made enough of a fool of myself already. I'll go and have that
> second cup of coffee for the morning, before I start going on about having
> the standard support non-2's complement integers, or machines that have no
> arithmetic right shifts...

Arithmetic right shift isn't particularly useful on some machines that
do. notably twos-complement ones.

Al Balmer

unread,

Mar 7, 2006, 4:52:59 PM3/7/06

to

On Wed, 08 Mar 2006 07:57:39 +1100, Andrew Reilly
<andrew-...@areilly.bpc-users.org> wrote:

>It's precicely this sort of tomfoolery on the part of the C standards
>committee that has brought the language into such ill-repute in recent
>years. It's practically unworkable now, compared to how it was in (say)
>the immediately post-ANSI-fication years.

Well, shucks, I manage to make it work pretty well most every day.
Does that mean I'm in ill-repute too?

--
Al Balmer
Sun City, AZ

Al Balmer

unread,

Mar 7, 2006, 4:59:37 PM3/7/06

to

Decades of use? This isn't a new rule.

An implementation might choose, for valid reasons, to prefetch the
data that pointer is pointing to. If it's in a segment not allocated
...

Ben Pfaff

unread,

Mar 7, 2006, 4:54:30 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:

> On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:
>
>> The standard is specifically designed to allow for architectures where
>> constructing an invalid pointer value can cause a trap even if the pointer
>> is not dereferenced.
>
> And are there any? Any in common use?

x86 in non-flat protected mode would be one example. Attempting
to load an invalid value into a segment register causes a fault.
--
"Given that computing power increases exponentially with time,
algorithms with exponential or better O-notations
are actually linear with a large constant."
--Mike Lee

Andrew Reilly

unread,

Mar 7, 2006, 5:31:05 PM3/7/06

to

On Tue, 07 Mar 2006 14:54:30 -0800, Ben Pfaff wrote:

> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
>
>> On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:
>>
>>> The standard is specifically designed to allow for architectures where
>>> constructing an invalid pointer value can cause a trap even if the
>>> pointer is not dereferenced.
>>
>> And are there any? Any in common use?
>
> x86 in non-flat protected mode would be one example. Attempting to load
> an invalid value into a segment register causes a fault.

I wasn't aware that address arithmetic generally operated on the segment
register in that environment, rather on the "pointer" register used within
the segment. I haven't coded in that environment myself, so I have no
direct experience to call on. My understanding was that the architecture
was intrinsically a segment+offset mechanism, so having the compiler
produce the obvious code in the offset value (i.e., -1) would not incur
the performance penalty that has been mentioned. (Indeed, it's loading
the segment register that causes the performance penalty, I believe.)

--
Andrew

Andrew Reilly

unread,

Mar 7, 2006, 5:39:33 PM3/7/06

to

On Tue, 07 Mar 2006 15:59:37 -0700, Al Balmer wrote:
> An implementation might choose, for valid reasons, to prefetch the data
> that pointer is pointing to. If it's in a segment not allocated ...

Hypothetical hardware that traps on *speculative* loads isn't broken by
design? I'd love to see the initialization sequences, or the task
switching code that has to make sure that all pointer values are valid
before they're loaded. No, scratch that. I've got better things to do.

Cheers,

--
Andrew

Keith Thompson

unread,

Mar 7, 2006, 5:48:49 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
> On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:
>> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
>>> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
>>>> K&R answers your question. If pa points to some element of an array,
>>>> then pa-1 points to the /previous element/. But what's the "previous
>>>> element" relative to the first element in the array? It doesn't exist.
>>>> So we have undefined behavior.
>>>
>>> Only because the standard says so. Didn't have to be that way. There
>>> are plenty of logically correct algorithms that could exist that involve
>>> pointers that point somewhere outside of a[0..N]. As long as there's no
>>> de-referencing, no harm, no foul. (Consider the simple case of
>>> iterating through an array at a non-unit stride, using the normal p < s
>>> + N termination condition. The loop finishes with p > s + N and the
>>> standard says "pow, you're dead", when the semantically identical code
>>> written with integer indexes has no legal problems.
>>
>> The standard is specifically designed to allow for architectures where
>> constructing an invalid pointer value can cause a trap even if the pointer
>> is not dereferenced.
>
> And are there any? Any in common use? Any where the equivalent (well
> defined) pointer+offset code would be slower?

I really don't know, but the idea of allowing errors to be caught as
early as possible seems like a good one.

[...]

> Suppose the computer uses tribits.

Do you mean trinary digits rather than binary digits? The C standard
requires binary representation for integers.

> Standards are meant to codify common practice. If you want a language
> that only has object references and array indices, there are plenty of
> those to chose from.
>
>> By saying that this is undefined behavior, the C standard isn't
>> forbidding you to do it; it's just refusing to tell you how it behaves.
>
> And that helps who?

It (potentially) helps implementers to generate the most efficient
possible code, and it helps programmers to know what's actually
guaranteed to work across all possible platforms with conforming C
implementations.

[...]

> OK, I've made enough of a fool of myself already. I'll go and have that
> second cup of coffee for the morning, before I start going on about having
> the standard support non-2's complement integers, or machines that have no
> arithmetic right shifts...

C99 allows signed integers to be represented in 2's-complement,
1's-complement, or signed-magnitude (I think I mispunctuated at least
one of those).

C has been implemented on machines that don't support floating-point,
or even multiplication and division, in hardware. The compiler just
has to do whatever is necessary to meet the standard's requirements.

Ben Pfaff

unread,

Mar 7, 2006, 5:41:56 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:

> On Tue, 07 Mar 2006 14:54:30 -0800, Ben Pfaff wrote:
>
>> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
>>
>>> On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:
>>>
>>>> The standard is specifically designed to allow for architectures where
>>>> constructing an invalid pointer value can cause a trap even if the
>>>> pointer is not dereferenced.
>>>
>>> And are there any? Any in common use?
>>
>> x86 in non-flat protected mode would be one example. Attempting to load
>> an invalid value into a segment register causes a fault.
>
> I wasn't aware that address arithmetic generally operated on the segment
> register in that environment, rather on the "pointer" register used within

> the segment. [...]

Address arithmetic might not, but the standard doesn't disallow
it. Other uses of invalid pointers, e.g. comparing a pointer
into a freed memory block against some other pointer, seem more
likely to do so.
--
"Give me a couple of years and a large research grant,
and I'll give you a receipt." --Richard Heathfield

Keith Thompson

unread,

Mar 7, 2006, 5:57:38 PM3/7/06

to

Consider the alternative.

#define LEN 100
#define INC 5000
int arr[LEN];
int *ptr = arr;
/* A */
ptr += 2*INC;
ptr -= INC;
ptr -= INC;
/* B */
ptr -= INC;
ptr -= INC;
ptr += 2*INC;
/* C */

What you're suggesting, I think, is that ptr==arr should be true at
points A, B, and C, for any(?) values of LEN and INC. It happens to
work out that way sometimes (at least in the test program I just
tried), but I can easily imagine a system where guaranteeing this
would place an undue burden on the compiler.

Al Balmer

unread,

Mar 7, 2006, 6:16:10 PM3/7/06

to

It doesn't have to make sure. It's free to segfault. You write funny
code, you pay the penalty (or your customers do.) Modern hardware does
a lot of speculation. It can preload or even precompute both branches
of a conditional, for example.

Eric Sosman

unread,

Mar 7, 2006, 6:22:57 PM3/7/06

to

Andrew Reilly wrote On 03/07/06 16:41,:

> On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:
>>
>>The standard is specifically designed to allow for architectures where
>>constructing an invalid pointer value can cause a trap even if the pointer
>>is not dereferenced.
>
> And are there any? Any in common use? Any where the equivalent (well
> defined) pointer+offset code would be slower?

I've been told that IBM AS/400 bollixes the bogus
arithmetic, at least under some circumstances. A friend
told of fixing code that did something like

if (buffer_pos + amount_needed > buffer_limit) {
... enlarge the buffer ...
}
memcpy (buffer_pos, from_somewhere, amount_needed);
buffer_pos += amount_needed;

This looks innocuous to devotees of flat address spaces
(Flat-Earthers?), but it didn't work on AS/400. If the
sum `buffer_pos + amount_needed' went past the end of the
buffer, the result was some kind of NaP ("not a pointer")
and the comparison didn't kick in. Result: the code never
discovered that the buffer needed enlarging, and merrily
tried to memcpy() into a too-small area ...

I have no personal experience of the AS/400, and I may
have misremembered some of what my friend related. Would
anybody with AS/400 knowledge care to comment?

--
Eric....@sun.com

Randy Howard

unread,

Mar 7, 2006, 6:27:25 PM3/7/06

to

Andrew Reilly wrote
(in article
<pan.2006.03.07....@areilly.bpc-users.org>):

This is a lot of whining about a specific problem that can
easily be remedied just by changing the loop construction. The
whole debate is pretty pointless in that context, unless you
have some religious reason to insist upon the method in the
original.

--
Randy Howard (2reply remove FOOBAR)
"The power of accurate observation is called cynicism by those
who have not got it." - George Bernard Shaw

Robin Haigh

unread,

Mar 7, 2006, 6:56:03 PM3/7/06

to

"Andrew Reilly" <andrew-...@areilly.bpc-users.org> wrote in message
news:pan.2006.03.07....@areilly.bpc-users.org...

It's not always equivalent. The trouble starts with

char a[8];
char *p;

for ( p = a+1 ; p < a+8 ; p += 2 ) {}

intending that the loop terminates on p == a+9 (since it skips a+8). But
how do we know that a+9 > a+8 ? If the array is right at the top of some
kind of segment, the arithmetic might have wrapped round.

To support predictable pointer comparisons out of range, the compiler would
have to allocate space with a safe buffer zone. Complications are setting
in.

Ints have the nice property that 0 is in the middle and we know how much
headroom we've got either side. So it's easy for the compiler to make the
int version work (leaving it to the programmer to take responsibility for
avoiding overflow, which is no big deal).

Pointers don't have that property. The compiler can't take sole
responsibility for avoiding overflow irrespective of what the programmer
does. If the programmer wants to go out of range and is at the same time
responsible for avoiding overflow, then he has to start worrying about
whereabouts his object is and what headroom he's got.

>
> Bah, humbug. Think I'll go back to assembly language, where pointers do
> what you tell them to, and don't complain about it to their lawyers.

Can't see how assembly programmers avoid the same kind of issue. I can see
how they could ignore it. The code will work most of the time.

--
RSH

Andrew Reilly

unread,

Mar 7, 2006, 8:08:59 PM3/7/06

to

On Tue, 07 Mar 2006 23:57:38 +0000, Keith Thompson wrote:
> Consider the alternative.
>
> #define LEN 100
> #define INC 5000
> int arr[LEN];
> int *ptr = arr;
> /* A */
> ptr += 2*INC;
> ptr -= INC;
> ptr -= INC;
> /* B */
> ptr -= INC;
> ptr -= INC;
> ptr += 2*INC;
> /* C */
>
> What you're suggesting, I think, is that ptr==arr should be true at points
> A, B, and C, for any(?) values of LEN and INC. It happens to work out
> that way sometimes (at least in the test program I just tried), but I can
> easily imagine a system where guaranteeing this would place an undue
> burden on the compiler.

That *is* what I'm suggesting. In fact, I'm suggesting that p += a; p -=
a; should leave p as it was originally for any int a and pointer p. To my
mind, and I've been using C for more than 20 years, that is the very
essence of the nature of C. It's what makes pointer-as-cursor algorithms
make sense. Throw it away, and you might as well restrict yourself to
coding p[a], and then you've got fortran, pascal or Java.

Just because hardware can be imagined (or even built) that doesn't match
the conventional processor model that C most naturally fits *shouldn't* be
an argument to dilute or mess around with the C spec. Just use a
different language on those processors, or put up with some inefficiency
or compiler switches. Pascal has always been a pretty nice fit for many
hardware-pointer-checked machines. Such hardware isn't even a good
argument in this case though, since the obvious implementation will
involve base+offset compound pointers anyway, and mucking around with the
offset (as an integer) should neither trap nor cause a performance issue.

I've coded for years on Motorola 56000-series DSPs, and they don't look
anything like the conventional processor that C knows about: you've got
two separate data memory spaces and a third for program memory, pointers
aren't integers, words are 24-bits long and that's the smallest
addressable unit, and so on. Never the less,
there have been at least two C compilers for the thing, and they've both
produced *awful* code, and that's OK: they were never used for
performance-critical code. That was always done in assembler. There are
lots of processors (particularly DSPs) that are worse. I know of one that
doesn't have pointers as such at all. That's OK too. There isn't a C
compiler for that.

C is useful, though, and there's a lot of code written in it, so it's no
surprise that most of the more recent DSP designs actually do fit nicely
into the C conventional machine model. And (p + n) - n works in the
obvious fashion for those, too.

Cheers,

--
Andrew

Dik T. Winter

unread,

Mar 7, 2006, 8:01:09 PM3/7/06

to

In article <440d491f$1...@news.wineasy.se> David Brown <da...@westcontrol.removethisbit.com> writes:
> CBFalconer wrote:
...
> > Some sneaky hidden assumptions here:
> > 1. p = s - 1 is valid. Not guaranteed. Careless coding.
>
> Not guaranteed in what way? You are not guaranteed that p will be a
> valid pointer, but you don't require it to be a valid pointer - all that
> is required is that "p = s - 1" followed by "p++" leaves p equal to s.

But the standard allows "p = s - 1" to trap when an invalid pointer is
generated. And this can indeed be the case on segmented architectures.
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Andrew Reilly

unread,

Mar 7, 2006, 8:18:37 PM3/7/06

to

I haven't used an AS/400 myself, either, but this is almost certainly the
sort of perfectly reasonable code that the standard has arranged to be
undefined, precicely so that it can be said that there's a C compiler for
that system.

Given the hardware behaviour, it would have been vastly preferable for the
compiler to handle pointers as base+offset pairs, so that the specialness
of the hardware pointers didn't interfere with the logic of the program.

Since most coding for AS/400s were (is still?) done in COBOL and PL/1,
both of which are perfectly suited to the hardware's two-dimensional
memory, any performance degredation would hardly have been noticed. (And
since AS/400s are actually Power processors with a JIT over the top now,
there would likely not be a performance problem from doing it "right"
anyway.) But no, your friend had to go and modify good code, and risk
introducing bugs in the process.

--
Andrew

Rod Pemberton

unread,

Mar 7, 2006, 8:18:28 PM3/7/06

to

"Gerry Quinn" <ger...@DELETETHISindigo.ie> wrote in message
news:MPG.1e778c4f6...@news1.eircom.net...
> In article <440CC04D...@yahoo.com>, cbfal...@yahoo.com says...
>
> > These assumptions are generally made because of familiarity with
> > the language. As a non-code example, consider the idea that the
> > faulty code is written by blackguards bent on foulling the
> > language. The term blackguards is not in favor these days, and for
> > good reason.
>
> About as good a reason as the term niggardly, as far as I can tell.
> Perhaps the words are appropriate in a post relating to fatal
> assumptions.

I didn't know what he meant either. Not being racist (at least I hope not),
I went GIYF'ing. I think it might be a reference to some Dungeons and
Dragons persona or something. Unfortunately, he'd need to clarify...

Rod Pemberton

Dik T. Winter

unread,

Mar 7, 2006, 8:15:27 PM3/7/06

to

In article <pan.2006.03.07....@areilly.bpc-users.org> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
...

> OK, I've made enough of a fool of myself already. I'll go and have that
> second cup of coffee for the morning, before I start going on about having
> the standard support non-2's complement integers, or machines that have no
> arithmetic right shifts...

In the time of the first standard, the Cray-1 was still quite important,
and it has no arithmetic right shift. When K&R designed C there was a
large number of machines that did not use 2's complement integers.

Dik T. Winter

unread,

Mar 7, 2006, 8:10:34 PM3/7/06

to

In article <pan.2006.03.07....@areilly.bpc-users.org> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
...

> It's precicely this sort of tomfoolery on the part of the C standards
> committee that has brought the language into such ill-repute in recent
> years. It's practically unworkable now, compared to how it was in (say)
> the immediately post-ANSI-fication years.

I do not understand this. The very same restriction on pointer arithmetic
was already in the very first ANSi C standard.

Keith Thompson

unread,

Mar 7, 2006, 8:36:59 PM3/7/06

to

[snip]

There's (at least) one more property I forgot to mention. Given:

#define LEN 100
#define INC 5000 /* modify both of these as you like */
int arr[LEN];
int *ptr1 = arr;
int *ptr2 = ptr1 + INC;
/* D */

would you also require that, at point D, ptr2 > ptr1? (If pointer
arithmetic wraps around, this might not be the case even if adding and
subtracting as above always gets you back to the original address.)

And you think that having the standard guarantee this behavior is
worth the cost of making it much more difficult to implement C on
systems where the underlying machine addresses don't meet this
property, yes?

If so, that's a consistent point of view, but I disagree with it.

I'll also mention that none of this stuff has changed significantly
between C90 (the 1990 ISO C standard, equivalent to the original ANSI
standard of 1989) and C99 (the 1990 ISO standard).

In fact, I just checked my copy of K&R1 (published in 1978). I can't
copy-and-paste from dead trees, so there may be some typos in the
following. This is from Appendix A, the C Reference Manual, section
7.4, Additive operators:

A pointer to an object in an array and a value of any integral
type may be added. [...] The result is a pointer of the same
type as the original pointer, and which points to another object
in the same array, appropriately offset from the orginal object.

[...]

[... likewise for subtracting an integer from a pointer ...]

If two pointers to objects of the same type are subtracted, the
result is converted [...] to an int representing the number of
objects separating the pointed-to objects. This conversion will
in general give unexpected results unless the pointers point to
objects in the same array, since pointers, even to objects of the
same type, do not necessarily differ by a multiple of the
object-length.

The last quoted paragraph isn't quite as strong as what the current
standard says, since it bases the undefinedness of pointer subtraction
beyond the bounds of an object on alignment, but it covers the same
idea.

The C Reference Manual from May 1975,
<http://cm.bell-labs.com/cm/cs/who/dmr/cman.pdf>, has the same wording
about pointer subtraction, but not about pointer+integer addition.

So if you think that the requirements you advocate are "the very
essence of the nature of C", I'm afraid you're at least 28 years too
late to do anything about it.

David Holland

unread,

Mar 7, 2006, 8:35:09 PM3/7/06

to

On 2006-03-07, James Dow Allen <jdall...@yahoo.com> wrote:
> [...] but I'm sincerely curious whether anyone knows of an *actual*
> environment where p == s will ever be false after (p = s-1; p++).

The problem is that evaluating s-1 might cause an underflow and a
trap, and then you won't even reach the comparison. You don't
necessarily have to dereference an invalid pointer to get a trap.

You might hit this behavior on any segmented architecture (e.g.,
80286, or 80386+ with segments on) and you are virtually guaranteed to
hit it on any architecture with fine-grained segmentation. comp.std.c
periodically reminisces about the old Burroughs architecture, and
it's always possible something like it might come back sometime.

You will also see this behavior in any worthwhile bounds-checking
implementation.

> Many of the discussions in comp.lang.c seem like they'd be better
> in a new newsgroup:
> comp.lang.i'd_rather_be_a_lawyer
>
> :-) :-)

Yes, well, that's what comp.lang.c is about...

--
- David A. Holland
(the above address works if unscrambled but isn't checked often)

Andrew Reilly

unread,

Mar 7, 2006, 9:10:42 PM3/7/06

to

On Wed, 08 Mar 2006 00:56:03 +0000, Robin Haigh wrote:
> It's not always equivalent. The trouble starts with
>
> char a[8];
> char *p;
>
> for ( p = a+1 ; p < a+8 ; p += 2 ) {}
>
> intending that the loop terminates on p == a+9 (since it skips a+8). But
> how do we know that a+9 > a+8 ? If the array is right at the top of some
> kind of segment, the arithmetic might have wrapped round.

a+9 > a+8 because a + 9 - (a + 8) == 1, which is > 0. Doesn't matter if
the signed or unsigned pointer value wrapped around in an intermediate
term. On many machines that's how the comparison is done anyway. You're
suggesting that having the compiler ensure that a+8 doesn't wrap around
wrt a is OK, but a+9 is too hard. I don't buy it.

> To support predictable pointer comparisons out of range, the compiler
> would have to allocate space with a safe buffer zone. Complications are
> setting in.

Only if you put them there. (The real problem is objects larger than half
the address space, where a valid pointer difference computation produces a
ptrdiff value that is out of range for a signed integer.)

> Ints have the nice property that 0 is in the middle and we know how much
> headroom we've got either side. So it's easy for the compiler to make
> the int version work (leaving it to the programmer to take
> responsibility for avoiding overflow, which is no big deal).

Unsigned ints have the nice property that (a + 1) - 1 == a for all a, even
if a + 1 == 0. Overflow is generally no big deal in any case. (Other
than the object larger than half the address space issue.)

> Pointers don't have that property. The compiler can't take sole
> responsibility for avoiding overflow irrespective of what the programmer
> does. If the programmer wants to go out of range and is at the same
> time responsible for avoiding overflow, then he has to start worrying
> about whereabouts his object is and what headroom he's got.

The compiler can't necessarily avoid overflow, but it *can* arrange for
pointer comparisons to work properly.

>> Bah, humbug. Think I'll go back to assembly language, where pointers
>> do what you tell them to, and don't complain about it to their lawyers.
>
> Can't see how assembly programmers avoid the same kind of issue. I can
> see how they could ignore it. The code will work most of the time.

Seems like it will work at least as well as the usual unit-stride
algorithm and idiom.

--
Andrew

Robin Haigh

unread,

Mar 7, 2006, 9:20:45 PM3/7/06

to

"Andrew Reilly" <andrew-...@areilly.bpc-users.org> wrote in message
news:pan.2006.03.07....@areilly.bpc-users.org...

> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
> > K&R answers your question. If pa points to some element of an array,
> > then pa-1 points to the /previous element/. But what's the "previous
> > element" relative to the first element in the array? It doesn't exist.
So
> > we have undefined behavior.
>
> Only because the standard says so. Didn't have to be that way. There are
> plenty of logically correct algorithms that could exist that involve
> pointers that point somewhere outside of a[0..N]. As long as there's no
> de-referencing, no harm, no foul. (Consider the simple case of iterating
> through an array at a non-unit stride, using the normal p < s + N
> termination condition. The loop finishes with p > s + N and the standard
> says "pow, you're dead"

and if the arithmetic happens to wrap round after s + N, you really are dead
too.

It doesn't have to be about weird architectures and traps. No
implementation can provide an unlimited range for pointer arithmetic without
some kind of overflow behaviour, such as a wrap round. Granted a wrap-round
needn't affect addition and subtraction, but it will affect comparisons.

Every allocated object comes with a limited range for pointer comparisons to
satisfy p-1<p<p+1. Not because the standard says so, but because the
implementation can't avoid it.

The kind of code you're talking about tends to make careless assumptions
about the valid range with no justification at all. Just because we've all
been doing it for years (I don't mind pleading guilty) doesn't make it
right. Such code is broken, and it doesn't need a standard to say so.

Fortunately some people have learnt a bit since the good old days, and they
work to higher standards now.

--
RSH

John Temples

unread,

Mar 7, 2006, 9:32:53 PM3/7/06

to

On 2006-03-08, Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:
>> if (buffer_pos + amount_needed > buffer_limit) {
>> ... enlarge the buffer ...
>> }

> I haven't used an AS/400 myself, either, but this is almost certainly the

> sort of perfectly reasonable code that the standard has arranged to be
> undefined, precicely so that it can be said that there's a C compiler for
> that system.

There are lots of embedded systems with 8- and 16-bit pointers. With
the right value of buffer_pos, it wouldn't take a very large value of
amount_needed for that addition to wrap and given you an incorrect
comparison.

--
John W. Temples, III

Keith Thompson

unread,

Mar 7, 2006, 9:33:10 PM3/7/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
> On Wed, 08 Mar 2006 00:56:03 +0000, Robin Haigh wrote:
>> It's not always equivalent. The trouble starts with
>>
>> char a[8];
>> char *p;
>>
>> for ( p = a+1 ; p < a+8 ; p += 2 ) {}
>>
>> intending that the loop terminates on p == a+9 (since it skips a+8). But
>> how do we know that a+9 > a+8 ? If the array is right at the top of some
>> kind of segment, the arithmetic might have wrapped round.
>
> a+9 > a+8 because a + 9 - (a + 8) == 1, which is > 0. Doesn't matter if
> the signed or unsigned pointer value wrapped around in an intermediate
> term. On many machines that's how the comparison is done anyway. You're
> suggesting that having the compiler ensure that a+8 doesn't wrap around
> wrt a is OK, but a+9 is too hard. I don't buy it.

How would you guarantee that a+(i+1) > a+i for all arbitrary values of
i? It's easy enough to do this when the addition doesn't go beyond
the end of the array (plus the case where it points just past the end
of the array), but if you want to support arbitrary arithmetic beyond
the array bounds, it's going to take some extra work, all for the sake
of guaranteeing properties that have *never* been guaranteed by the C
language. (Don't confuse what happens to have always worked for you
with what's actually guaranteed by the language itself.)

[...]

>> Ints have the nice property that 0 is in the middle and we know how much
>> headroom we've got either side. So it's easy for the compiler to make
>> the int version work (leaving it to the programmer to take
>> responsibility for avoiding overflow, which is no big deal).
>
> Unsigned ints have the nice property that (a + 1) - 1 == a for all a, even
> if a + 1 == 0. Overflow is generally no big deal in any case. (Other
> than the object larger than half the address space issue.)

But unsigned ints *don't* have the property that a + 1 > a for all a.

CBFalconer

unread,

Mar 7, 2006, 7:10:10 PM3/7/06

to

Ben Pfaff wrote:
> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
>

>> So the standards body broke decades of practice and perfectly safe
>> and reasonable code to support a *hypothetical* implementation
>> that was so stupid that it checked pointer values, rather than
>> pointer *use*?
>

> You can still write your code to make whatever assumptions you
> like. You just can't assume that it will work portably. If,
> for example, you are writing code for a particular embedded
> architecture with a given compiler, then it may be reasonable
> to make assumptions beyond those granted by the standard.

Which was the point of my little article in the first place. If,
for one reason or another, you need to make assumptions, document
what they are. To do this you first need to be able to recognize
those assumptions. You may easily find you could avoid making the
assumption in the first place without penalty, as in this
particular "p = p - 1;" case.

Why is everybody concentrating on the one real error in the example
code, and not on the hidden assumptions. Errors are correctible,
assumptions need to be recognized.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

msg

unread,

Mar 8, 2006, 1:14:36 AM3/8/06

to

> OK, I've made enough of a fool of myself already. I'll go and have that
> second cup of coffee for the morning, before I start going on about having
> the standard support non-2's complement integers, or machines that have no
> arithmetic right shifts...

I get queasy reading the rants against 1's complement architectures; I
wish Seymour Cray were still around to address this.

Michael Grigoni
Cybertheque Museum

Michael Mair

unread,

Mar 8, 2006, 1:20:32 AM3/8/06

to

Andrew Reilly schrieb:

> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
>
>> K&R answers your question. If pa points to some element of an array,
>>then pa-1 points to the /previous element/. But what's the "previous
>>element" relative to the first element in the array? It doesn't exist. So
>>we have undefined behavior.
>
> Only because the standard says so. Didn't have to be that way. There are
> plenty of logically correct algorithms that could exist that involve
> pointers that point somewhere outside of a[0..N]. As long as there's no
> de-referencing, no harm, no foul. (Consider the simple case of iterating
> through an array at a non-unit stride, using the normal p < s + N
> termination condition. The loop finishes with p > s + N and the standard

> says "pow, you're dead", when the semantically identical code written with
> integer indexes has no legal problems.

I have encountered situations where
free(p);
....
if (p == q)
leads to the platform's equivalent of the much beloved
"segmentation fault". Your theory means that this should
have worked. Assigning NULL or a valid address to p after
freeing avoids the error.

<OT>Incidentally, in gnu.gcc.help there is a discussion about
much the same situation in C++ where someone gets in trouble
for delete a; .... if (a == b) ...
Happens only for multiple inheritance and only for gcc.
Thread starts at <Eg0Pf.110735$sa3.34921@pd7tw1no>
</OT>

[snip!]

Cheers
Michael
--
E-Mail: Mine is an /at/ gmx /dot/ de address.

msg

unread,

Mar 8, 2006, 1:25:39 AM3/8/06

to

>
> Since most coding for AS/400s were (is still?) done in COBOL and PL/1

Most coding was in flavors of RPG.

> since AS/400s are actually Power processors with a JIT over the top now

There is a large installed base of CISC (non Power-PC) AS-400s; they use
a flat memory model reminiscent of the Multics design.

Michael Grigoni
Cybertheque Museum

Andrew Reilly

unread,

Mar 8, 2006, 1:30:01 AM3/8/06

to

On Wed, 08 Mar 2006 03:33:10 +0000, Keith Thompson wrote:
> But unsigned ints *don't* have the property that a + 1 > a for all a.

My last comment on the thread, hopefully:

No, they don't, but when you're doing operations on pointer derivations
that are all in some sense "within the same object", even if hanging
outside it, (i.e., by dint of being created by adding integers to a single
initial pointer), then the loop termination condition is, in a very real
sense, a ptrdif_t, and *should* be computed that way. The difference can
be both positive and negative.

The unsigned comparison a + n > a fails for some values of a, but the
ptrdiff_t (signed) comparison a + n - a > 0 is indeed true for all a and
n > 0, so that's what should be used. And it *is* what is used on most
processors that do comparison by subtraction (even if that's wrapped in a
non-destructive cmp).

I actually agree completely with the piece of K&R that you posted a few
posts ago, where it was pointed out that pointer arithmetic only makes
sense for pointers within the same object (array). Since C doesn't tell
you that the pointer that your function has been given isn't somewhere in
the middle of a real array, my aesthetic sense is that conceptually,
arrays (as pointers, within functions) extend infinitely (or at least to
the range of int) in *both* directions, as far as pointer arithmetic
within a function is concerned. Actually accessing values outside of the
bounds of the real array that has been allocated somewhere obviously
contravenes the "same object" doctrine, and it's up to the logic of the
caller and the callee to avoid that.

Now it has been amply explained that my conception of how pointer
arithmetic ought to work is not the way the standard describes, even
though it *is* the way I have experienced it in all of the C
implementations that it has obviously been my good fortune to encounter.
I consider that to be a pity, and obviously some of my code wouldn't
survive a translation to a Boroughs or AS/400 machine (or perhaps even to
some operating environments on 80286es). Oh, well. I can live with that.
It's not really the sort of code that I'd expect to find there, and I
don't expect to encounter such constraints in the future, but I *will* be
more careful, and will keep my eyes more open.

Thanks to all,

--
Andrew

CBFalconer

unread,

Mar 8, 2006, 1:03:52 AM3/8/06

to

Randy Howard wrote:
> Andrew Reilly wrote

>> On Tue, 07 Mar 2006 15:59:37 -0700, Al Balmer wrote:
>
>>> An implementation might choose, for valid reasons, to prefetch
>>> the data that pointer is pointing to. If it's in a segment not
>>> allocated ...
>>
>> Hypothetical hardware that traps on *speculative* loads isn't
>> broken by design? I'd love to see the initialization sequences,
>> or the task switching code that has to make sure that all pointer
>> values are valid before they're loaded. No, scratch that. I've
>> got better things to do.
>
> This is a lot of whining about a specific problem that can
> easily be remedied just by changing the loop construction. The
> whole debate is pretty pointless in that context, unless you
> have some religious reason to insist upon the method in the
> original.

Er, didn't I point that fix out in the original article? That was
the only error in the original sample code, all other problems can
be tied to assumptions, which may be valid on any given piece of
machinery. The point is to avoid making such assumptions, which
requires recognizing their existence in the first place.

Ben Bacarisse

unread,

Mar 8, 2006, 2:23:15 AM3/8/06

to

On Tue, 07 Mar 2006 19:10:10 -0500, CBFalconer wrote:

> Why is everybody concentrating on the one real error in the example code,
> and not on the hidden assumptions. Errors are correctible, assumptions
> need to be recognized.

Well I, for one, commented on the hidden assumption that must be made for
what you call "the one real error" to actually be an error -- but it was
not recognised! ;-)

[At the top of your original post did not, in fact, claim this was an
error but you call it a "real error" later on.]

I feel that your points would have been better made using other examples.
The context of the code made me read the C as little more than pseudo-code
with the added advantage that a C compiler might, with a following wind,
produce something like the assembler version (which in turn has its own
assumptions but you were not talking about that).

I found Eric Sosman's "if (buffer + space_required > buffer_end) ..."
example more convincing, because I have seen that in programs that are
intended to be portable -- I am pretty sure I have written such things
myself in my younger days. Have you other more general examples of
dangerous assumptions that can sneak into code? A list of the "top 10
things you might be assuming" would be very interesting.

--
Ben.

Randy Howard

unread,

Mar 8, 2006, 2:25:36 AM3/8/06

to

CBFalconer wrote
(in article <440E73C8...@yahoo.com>):

> Randy Howard wrote:
>> This is a lot of whining about a specific problem that can
>> easily be remedied just by changing the loop construction. The
>> whole debate is pretty pointless in that context, unless you
>> have some religious reason to insist upon the method in the
>> original.
>
> Er, didn't I point that fix out in the original article?

Yes, which is precisely why I'm surprised at the ensuing debate
over the original version, as my comments should reflect.

Randy Howard

unread,

Mar 8, 2006, 2:28:30 AM3/8/06

to

Rod Pemberton wrote
(in article <dulbd8$alql$1...@news3.infoave.net>):

Oh come on. Doesn't anyone own a dictionary anymore, or have a
vocabulary which isn't found solely on digg, slashdot or MTV?

blackguard:
A thoroughly unprincipled person; a scoundrel.
A foul-mouthed person.

Does everything have to become a racism experiment?

Jordan Abel

unread,

Mar 8, 2006, 3:08:48 AM3/8/06

to

On 2006-03-08, Randy Howard <randy...@FOOverizonBAR.net> wrote:
> blackguard:
> A thoroughly unprincipled person; a scoundrel.
> A foul-mouthed person.

Sure, that's what it _means_, but...

> Does everything have to become a racism experiment?

the question is one of etymology.

Keith Thompson

unread,

Mar 8, 2006, 3:17:39 AM3/8/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
> On Wed, 08 Mar 2006 03:33:10 +0000, Keith Thompson wrote:
>> But unsigned ints *don't* have the property that a + 1 > a for all a.
>
> My last comment on the thread, hopefully:
>
> No, they don't, but when you're doing operations on pointer derivations
> that are all in some sense "within the same object", even if hanging
> outside it, (i.e., by dint of being created by adding integers to a single
> initial pointer), then the loop termination condition is, in a very real
> sense, a ptrdif_t, and *should* be computed that way. The difference can
> be both positive and negative.

Um, I always thought that "within" and "outside" were two different
things.

Paul Burke

unread,

Mar 8, 2006, 3:22:18 AM3/8/06

to

Arthur J. O'Dwyer wrote:

> If pa points to some element of an array,
> then pa-1 points to the /previous element/. But what's the "previous
> element" relative to the first element in the array? It doesn't exist.
> So we have undefined behavior.

> The expression pa+1 is similar, but with one special case. If pa
> points to the last element in the array, you might expect that pa+1
> would be
> undefined; but actually the C standard specifically allows you to
> evaluate pa+1 in that case. Dereferencing that pointer, or incrementing
> it /again/,
> however, invoke undefined behavior.

This is pure theology. the simple fact is that you can't GUARANTEE that
p++, or p--, or for that matter p itself, points to anything in
particular, unless you know something about p. And if you know about p,
you are OK. What's your problem?

Paul Burke

Andrew Reilly

unread,

Mar 8, 2006, 3:35:18 AM3/8/06

to

On Wed, 08 Mar 2006 09:17:39 +0000, Keith Thompson wrote:

> Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
>> On Wed, 08 Mar 2006 03:33:10 +0000, Keith Thompson wrote:
>>> But unsigned ints *don't* have the property that a + 1 > a for all a.
>>
>> My last comment on the thread, hopefully:
>>
>> No, they don't, but when you're doing operations on pointer derivations
>> that are all in some sense "within the same object", even if hanging
>> outside it, (i.e., by dint of being created by adding integers to a single
>> initial pointer), then the loop termination condition is, in a very real
>> sense, a ptrdif_t, and *should* be computed that way. The difference can
>> be both positive and negative.
>
> Um, I always thought that "within" and "outside" were two different
> things.

Surely the camel's nose is already through the gate, on that one, with the
explicit allowance of "one element after"? How does that fit with all of
the conniptions expressed here about things that fall over dead if a
pointer even looks at an address that isn't part of the object? One out,
all out.

--
Andrew

Richard Bos

unread,

Mar 8, 2006, 4:16:20 AM3/8/06

to

Ben Bacarisse <ben.u...@bsb.me.uk> wrote:

> On Tue, 07 Mar 2006 10:31:09 +0000, pete wrote:
>
> > David Brown wrote:
> >>
> >> CBFalconer wrote:
> >
> >> > http://www.azillionmonkeys.com/qed/asmexample.html

> >> >
> >> > Some sneaky hidden assumptions here:
> >> > 1. p = s - 1 is valid. Not guaranteed. Careless coding.
> >>
> >> Not guaranteed in what way? You are not guaranteed that p will be a
> >> valid pointer, but you don't require it to be a valid pointer - all that
> >> is required is that "p = s - 1" followed by "p++" leaves p equal to s.

> >> I'm not good enough at the laws of C to tell you if this is valid,
> >
> > Merely subtracting 1 from s, renders the entire code undefined. You're
> > "off the map" as far as the laws of C are concerned. On comp.lang.c,
> > we're mostly interested in what the laws of C *do* say is guaranteed to
> > work.
>
> It seems to me ironic that, in a discussion about hidden assumptions, the
> truth of this remark requires a hidden assumption about how the function
> is called. Unless I am missing something big, p = s - 1 is fine unless s
> points to the first element of an array (or worse)[1].

It's an implementation of strlen(). One must expect it to be called with
any pointer to a valid string - and those are usually pointers to the
first byte of a memory block.

Richard

Flash Gordon

unread,

Mar 8, 2006, 5:09:58 AM3/8/06

to

As previously stated, that only requires using one extra byte or, in the
worst case of a HW word pointer, one extra word.
--
Flash Gordon, living in interesting times.
Web site - http://home.flash-gordon.me.uk/
comp.lang.c posting guidelines and intro:
http://clc-wiki.net/wiki/Intro_to_clc

Gerry Quinn

unread,

Mar 8, 2006, 5:34:22 AM3/8/06

to

In article <0001HW.C033E3BE...@news.verizon.net>,
randy...@FOOverizonBAR.net says...

> Rod Pemberton wrote
> (in article <dulbd8$alql$1...@news3.infoave.net>):
> > "Gerry Quinn" <ger...@DELETETHISindigo.ie> wrote in message
> > news:MPG.1e778c4f6...@news1.eircom.net...
> >> In article <440CC04D...@yahoo.com>, cbfal...@yahoo.com says...
> >>
> >>> These assumptions are generally made because of familiarity with
> >>> the language. As a non-code example, consider the idea that the
> >>> faulty code is written by blackguards bent on foulling the
> >>> language. The term blackguards is not in favor these days, and for
> >>> good reason.
> >>
> >> About as good a reason as the term niggardly, as far as I can tell.
> >> Perhaps the words are appropriate in a post relating to fatal
> >> assumptions.
> >
> > I didn't know what he meant either. Not being racist (at least I hope not),
> > I went GIYF'ing. I think it might be a reference to some Dungeons and
> > Dragons persona or something. Unfortunately, he'd need to clarify...
>
> Oh come on. Doesn't anyone own a dictionary anymore, or have a
> vocabulary which isn't found solely on digg, slashdot or MTV?
>
> blackguard:
> A thoroughly unprincipled person; a scoundrel.
> A foul-mouthed person.

Yes - and what 'good reason' is there for not using the term?

> Does everything have to become a racism experiment?

That was my point - the expression like many has no clear etymology,
but there doesn't seem to have been any racial connection. Even if
there had been, I'm not sure this is a strong reason for not using it
(there's got to be a statute of limitations somewhere), but at least it
would be some sort of rationale.

Of course there are those who object to every figure in which the
adjective 'black' has negative connotations.

- Gerry Quinn

Keith Thompson

unread,

Mar 8, 2006, 5:46:42 AM3/8/06

to

Are you quite sure that you know what the word "theology" means?

What Arthur wrote above is entirely correct. (Remember that undefined
behavior includes the possibility, but not the guarantee, of the code
doing exactly what you expect it to do, whatever that might be.)

What's your problem?

Richard Heathfield

unread,

Mar 8, 2006, 7:39:18 AM3/8/06

to

Jordan Abel said:

OE blaec, and OFr garter (the latter from from OHGer warten; OE weardian)

--
Richard Heathfield
"Usenet is a strange place" - dmr 29/7/1999
http://www.cpax.org.uk
email: rjh at above domain (but drop the www, obviously)

Richard Heathfield

unread,

Mar 8, 2006, 7:40:25 AM3/8/06

to

Keith Thompson said:

> Um, I always thought that "within" and "outside" were two different
> things.

Ask Jack to lend you his bottle. You'll soon change your mind.

Rod Pemberton

unread,

Mar 8, 2006, 7:41:23 AM3/8/06

to

"Gerry Quinn" <ger...@DELETETHISindigo.ie> wrote in message

news:MPG.1e78c38d...@news1.eircom.net...

> Of course there are those who object to every figure in which the
> adjective 'black' has negative connotations.

True. Mostly black people. This is a _true_ story. A number of years ago,
I was hired by a company which had a large number of black employees. I was
in the smallest minority as a white person. On the second day, I realized
there were no pens or pencils in the desk. Each floor in the building had
it's own "supply manager." So, I walked over the supply manager, a black
female, and asked for a blue pen. She politely said she didn't have a blue
pen. So, I asked for a black pen. To which she stood up and yelled at the
top of her lungs: "WHY DO YOU WANT A FUCKING BLACK PEN? DON'T YOU WANT A
GODDAMN WHITE PEN?" Of course, I was in shock, and a bit stunned since it
seemed I that I had just been setup. The entire floor of predominantly
black people were standing up and staring at me and I didn't see them
laughing. So, I politely replied: "Yes, as long as you have some black
paper." She grunted, looked away, and angrily handed me a blue pen... Of
course, the word "black" was never used there for anything even if it was
explicitly black.

Rod Pemberton

Al Balmer

unread,

Mar 8, 2006, 11:42:18 AM3/8/06

to

?
What do you imagine the etymology to be?

--
Al Balmer
Sun City, AZ

Vladimir S. Oka

unread,

Mar 8, 2006, 11:52:13 AM3/8/06

to

FWIW, from <http://www.wordorigins.org/wordorb.htm>:

====
Blackguard

The exact etymology of this term for a villain is a bit uncertain. What
is known is that it is literally from black guard; it is English in
origin; and it dates to at least 1532.

The two earliest senses (it is impossible to tell which one came first)
are:

* the lowest servants in a household (often those in charge of the
scullery), or the servants and camp followers of an army.
* attendants or guards, either dressed in black, of low character,
or attending a criminal.

The OED2 doesn't dismiss the possibility that there may literally have
been a company of soldiers at Westminster called the Black Guard, but
no direct evidence of this exists.

The earliest known citation (1532) uses the term blake garde to refer
to torch bearers at a funeral. A 1535 cite refers to the Black Guard of
the King's kitchen, a scullery reference. The second sense of a guard
of attendants appears in 1563 in reference to a retinue of Dominican
friars--who would be in black robes.

The sense of the vagabond or criminal class doesn't appear until the
1680s. And the modern sense of a scoundrel dates to the 1730s.
====

Nothing racist there...

--
BR, Vladimir

Christopher Barber

unread,

Mar 8, 2006, 12:25:33 PM3/8/06

to cbfal...@maineline.net

CBFalconer wrote:
> We often find hidden, and totally unnecessary, assumptions being
> made in code. The following leans heavily on one particular
> example, which happens to be in C. However similar things can (and
> do) occur in any language.

>
> These assumptions are generally made because of familiarity with
> the language. As a non-code example, consider the idea that the
> faulty code is written by blackguards bent on foulling the
> language. The term blackguards is not in favor these days, and for

> good reason. However, the older you are, the more likely you are
> to have used it since childhood, and to use it again, barring
> specific thought on the subject. The same type of thing applies to
> writing code.
>
> I hope, with this little monograph, to encourage people to examine
> some hidden assumptions they are making in their code. As ever, in
> dealing with C, the reference standard is the ISO C standard.
> Versions can be found in text and pdf format, by searching for N869
> and N1124. [1] The latter does not have a text version, but is
> more up-to-date.
>
> We will always have innocent appearing code with these kinds of
> assumptions built-in. However it would be wise to annotate such
> code to make the assumptions explicit, which can avoid a great deal
> of agony when the code is reused under other systems.
>
> In the following example, the code is as downloaded from the
> referenced URL, and the comments are entirely mine, including the
> 'every 5' linenumber references.
>
> /* Making fatal hidden assumptions */
> /* Paul Hsiehs version of strlen.

> http://www.azillionmonkeys.com/qed/asmexample.html
>
> Some sneaky hidden assumptions here:
> 1. p = s - 1 is valid. Not guaranteed. Careless coding.

> 2. cast (int) p is meaningful. Not guaranteed.
> 3. Use of 2's complement arithmetic.
> 4. ints have no trap representations or hidden bits.
> 5. 4 == sizeof(int) && 8 == CHAR_BIT.
> 6. size_t is actually int.
> 7. sizeof(int) is a power of 2.
> 8. int alignment depends on a zeroed bit field.
>
> Since strlen is normally supplied by the system, the system
> designer can guarantee all but item 1. Otherwise this is
> not portable. Item 1 can probably be beaten by suitable
> code reorganization to avoid the initial p = s - 1. This
> is a serious bug which, for example, can cause segfaults
> on many systems. It is most likely to foul when (int)s
> has the value 0, and is meaningful.
>
> He fails to make the valid assumption: 1 == sizeof(char).
> */
>
> #define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)
> #define SW (sizeof (int) / sizeof (char))
>
> int xstrlen (const char *s) {
> const char *p; /* 5 */
> int d;
>
> p = s - 1;
> do {
> p++; /* 10 */
> if ((((int) p) & (SW - 1)) == 0) {
> do {
> d = *((int *) p);
> p += SW;
> } while (!hasNulByte (d)); /* 15 */
> p -= SW;
> }
> } while (*p != 0);
> return p - s;
> } /* 20 */
>
> Let us start with line 1! The constants appear to require that
> sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
> really looked too closely, and it is possible that the ~x term
> allows for larger sizeof(int), but nothing allows for larger
> CHAR_BIT. A further hidden assumption is that there are no trap
> values in the representation of an int. Its functioning is
> doubtful when sizeof(int) is less that 4. At the least it will
> force promotion to long, which will seriously affect the speed.
>
> This is an ingenious and speedy way of detecting a zero byte within
> an int, provided the preconditions are met. There is nothing wrong
> with it, PROVIDED we know when it is valid.
>
> In line 2 we have the confusing use of sizeof(char), which is 1 by
> definition. This just serves to obscure the fact that SW is
> actually sizeof(int) later. No hidden assumptions have been made
> here, but the usage helps to conceal later assumptions.
>
> Line 4. Since this is intended to replace the systems strlen()
> function, it would seem advantageous to use the appropriate
> signature for the function. In particular strlen returns a size_t,
> not an int. size_t is always unsigned.
>
> In line 8 we come to a biggie. The standard specifically does not
> guarantee the action of a pointer below an object. The only real
> purpose of this statement is to compensate for the initial
> increment in line 10. This can be avoided by rearrangement of the
> code, which will then let the routine function where the
> assumptions are valid. This is the only real error in the code
> that I see.
>
> In line 11 we have several hidden assumptions. The first is that
> the cast of a pointer to an int is valid. This is never
> guaranteed. A pointer can be much larger than an int, and may have
> all sorts of non-integer like information embedded, such as segment
> id. If sizeof(int) is less than 4 the validity of this is even
> less likely.
>
> Then we come to the purpose of the statement, which is to discover
> if the pointer is suitably aligned for an int. It does this by
> bit-anding with SW-1, which is the concealed sizeof(int)-1. This
> won't be very useful if sizeof(int) is, say, 3 or any other
> non-poweroftwo. In addition, it assumes that an aligned pointer
> will have those bits zero. While this last is very likely in
> todays systems, it is still an assumption. The system designer is
> entitled to assume this, but user code is not.
>
> Line 13 again uses the unwarranted cast of a pointer to an int.
> This enables the use of the already suspicious macro hasNulByte in
> line 15.
>
> If all these assumptions are correct, line 19 finally calculates a
> pointer difference (which is valid, and of type size_t or ssize_t,
> but will always fit into a size_t). It then does a concealed cast
> of this into an int, which could cause undefined or implementation
> defined behaviour if the value exceeds what will fit into an int.
> This one is also unnecessary, since it is trivial to define the
> return type as size_t and guarantee success.
>
> I haven't even mentioned the assumption of 2's complement
> arithmetic, which I believe to be embedded in the hasNulByte
> macro. I haven't bothered to think this out.
>
> Would you believe that so many hidden assumptions can be embedded
> in such innocent looking code? The sneaky thing is that the code
> appears trivially correct at first glance. This is the stuff that
> Heisenbugs are made of. Yet use of such code is fairly safe if we
> are aware of those hidden assumptions.

I guess I will have to keep all this in mind the next time I copy C
code off of a web page devoted to x86 assembly hacks and try to
get it to run on a machine with 24-bit ones-complement integers.

;-)

- C

Al Balmer

unread,

Mar 8, 2006, 12:40:34 PM3/8/06

to

On 8 Mar 2006 08:52:13 -0800, "Vladimir S. Oka"
<nov...@btopenworld.com> wrote:

>
>Al Balmer wrote:
>> On 8 Mar 2006 08:08:48 GMT, Jordan Abel <rand...@gmail.com> wrote:
>>
>> >On 2006-03-08, Randy Howard <randy...@FOOverizonBAR.net> wrote:
>> >> blackguard:
>> >> A thoroughly unprincipled person; a scoundrel.
>> >> A foul-mouthed person.
>> >
>> >Sure, that's what it _means_, but...
>> >
>> >> Does everything have to become a racism experiment?
>> >
>> >the question is one of etymology.
>>
>> ?
>> What do you imagine the etymology to be?
>
>FWIW, from <http://www.wordorigins.org/wordorb.htm>:

Actually, I wasn't asking that. I wondered what Jordan was imagining
it to be.

Vladimir S. Oka

unread,

Mar 8, 2006, 2:37:23 PM3/8/06

to

Al Balmer wrote:

> On 8 Mar 2006 08:52:13 -0800, "Vladimir S. Oka"
> <nov...@btopenworld.com> wrote:
>
>>
>>Al Balmer wrote:
>>> On 8 Mar 2006 08:08:48 GMT, Jordan Abel <rand...@gmail.com> wrote:
>>>
>>> >On 2006-03-08, Randy Howard <randy...@FOOverizonBAR.net> wrote:
>>> >> blackguard:
>>> >> A thoroughly unprincipled person; a scoundrel.
>>> >> A foul-mouthed person.
>>> >
>>> >Sure, that's what it _means_, but...
>>> >
>>> >> Does everything have to become a racism experiment?
>>> >
>>> >the question is one of etymology.
>>>
>>> ?
>>> What do you imagine the etymology to be?
>>
>>FWIW, from <http://www.wordorigins.org/wordorb.htm>:
>
> Actually, I wasn't asking that. I wondered what Jordan was imagining
> it to be.
>

Ah, sorry. I didn't read the lot carefully enough.

--
BR, Vladimir

There was a young lady named Mandel
Who caused quite a neighborhood scandal
By coming out bare
On the main village square
And frigging herself with a candle.

Andrey Tarasevich

unread,

Mar 8, 2006, 2:52:52 PM3/8/06

to

James Dow Allen wrote:
> Mr. Hsieh immediately does p++ and his code will be correct if then
> p == s. I don't question Chuck's argument, or whether the C standard
> allows the C compiler to trash the hard disk when it sees p=s-1,
> but I'm sincerely curious whether anyone knows of an *actual*
> environment
> where p == s will ever be false after (p = s-1; p++).
> ...

There are actual environments where 's - 1' alone is enough to cause a
crash. In fact, any non-flat memory model environment (i.e. environment
with 'segment:offset' pointers) would be a good candidate. The modern
x86 will normally crash, unless the implementation takes specific steps
to avoid it.

--
Best regards,
Andrey Tarasevich

Andrey Tarasevich

unread,

Mar 8, 2006, 2:58:51 PM3/8/06

to

CBFalconer wrote:
> ...

> int xstrlen (const char *s) {
> const char *p; /* 5 */
> int d;
>
> p = s - 1;
> do {
> p++; /* 10 */
> if ((((int) p) & (SW - 1)) == 0) {
> do {
> d = *((int *) p);
> p += SW;
> } while (!hasNulByte (d)); /* 15 */
> p -= SW;
> }
> } while (*p != 0);
> return p - s;
> } /* 20 */

> ...

> Line 13 again uses the unwarranted cast of a pointer to an int.
> This enables the use of the already suspicious macro hasNulByte in
> line 15.

> ...

This is not exactly correct. Line 13 uses a cast of a 'char*' pointer to
an 'int*' pointer, not to an 'int'. This is relatively OK, especially
compared to the "less predictable" pointer->int casts.

After that the char array memory pointed by the resultant 'int*' pointer
is reinterpreted as an 'int' object. The validity of this is covered by
the previous assumptions.

Andrey Tarasevich

unread,

Mar 8, 2006, 3:14:39 PM3/8/06

to

Andrew Reilly wrote:
> ...
> Bah, humbug. Think I'll go back to assembly language, where pointers do
> what you tell them to, and don't complain about it to their lawyers.
> ...

Incorrect. It is not about "lawyers", it is about actual _crashes_. The
reason why 's - 1' itself can (an will) crash on certain platforms is
the same as the one that will make it crash in exactly the same way in
"assembly language" on such platforms.

Trying to implement the same code in assembly language on such a
platform would specifically force you to work around the potential
crash, sacrificing efficiency for safety. In other words, you'd be
forced to use different techniques for doing 's - 1' in contexts where
it might underflow and in contexts where it definitely will not underflow.

C language, on the other hand, doesn't offer two different '-' operators
to for these two specific situations. Instead C language outlaws (in
essence) pointer underflows. This is a perfectly reasonable approach for
a higher level language.

Clark S. Cox III

unread,

Mar 8, 2006, 3:12:01 PM3/8/06

to

On 2006-03-07 13:11:24 -0500, "Peter Harrison" <pe...@cannock.ac.uk> said:

>
> "Ben Pfaff" <b...@cs.stanford.edu> wrote in message
> news:87acc2j...@benpfaff.org...
>> Paul Burke <pa...@scazon.com> writes:
>>
>>> My simple mind must be missing something big here. If for pointer p,
>>> (p-1) is deprecated because it's not guaranteed that it points to
>>> anything sensible, why is p++ OK? There's no boundary checking in C
>>> (unless you put it in).
>>
>
> It seems I too have a simple mind. I read the recent replies to this
> and found myself not sure I am better off.
>
> This is what I think I understand:
>
> int x;
> int *p;
> int *q;
>
> p = &x; /* is OK */

Correct, p now points to x

> p = &x + 1; /* is OK even though we have no idea what p points to */

Basically. p now points "one past the end" of x. You're allowed to
compare p to (&x, &x + 1 or NULL), as well as subtract one from p, but
you're not allowed to dereference p.

> p = &x + 6; /* is undefined - does this mean that p may not be the */
> /* address six locations beyond x? */
> /* or just that we don't know what is there? */

No, undefined means that the program could have just crashed here, and
never got to the point of assigning *anything* (indeterminate or not)
to p.

> p = &x - 1; / as previous */

After the (&x + 6), all bets are off.

> But the poster was comparing with p++, so...
>
> p = &x; /* so far so good */
> p++; /* still ok (?) but we dont know what is there */

Correct, p is now a "one past the end", just as with (p = &x + 1).

> p++; /* is this now undefined? */

Yes, it is undefined. The program may have just crashed at this point.

> I guess _my_ question is - in this context does 'undefined' mean just
> that we cannot say anything about what the pointer points to or that we
> cannot say anything about the value of the pointer.

No, 'undefined' means that the program could do anything at all.
Undefined means that there is no defined behavior whatsoever as far as
the standard is concerned.

> So for example:
>
> p = &x;
> q = &x;
> p = p+8;
> q = q+8;
>
> should p and q have the same value or is that undefined.

p and q may not even have values.

--
Clark S. Cox, III
clar...@gmail.com

Chris Torek

unread,

Mar 8, 2006, 3:35:24 PM3/8/06

to

>Keith Thompson said:
>> Um, I always thought that "within" and "outside" were two different
>> things.

In article <dumjbp$ppp$4...@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com>

Richard Heathfield <inv...@invalid.invalid> wrote:
>Ask Jack to lend you his bottle. You'll soon change your mind.

To clarify a bit ...

A mathematician named Klein
Thought the Moebius band was divine
Said he, "If you glue
The edges of two
You'll get a weird bottle like mine!"

:-)

(A Moebius band has only one side. It is a two-dimensional object
that exists only in a 3-dimensional [or higher] space. A Klein
bottle can only be made in a 4-dimensional [or higher] space, and
is a 3-D object with only one side. The concept can be carried on
indefinitely, but a Klein bottle is hard enough to contemplate
already.)
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.

Jordan Abel

unread,

Mar 8, 2006, 4:23:34 PM3/8/06

to

On 2006-03-08, Vladimir S. Oka <nov...@btopenworld.com> wrote:
> Al Balmer wrote:
>
>> On 8 Mar 2006 08:52:13 -0800, "Vladimir S. Oka"
>> <nov...@btopenworld.com> wrote:
>>
>>>
>>>Al Balmer wrote:
>>>> On 8 Mar 2006 08:08:48 GMT, Jordan Abel <rand...@gmail.com> wrote:
>>>>
>>>> >On 2006-03-08, Randy Howard <randy...@FOOverizonBAR.net> wrote:
>>>> >> blackguard:
>>>> >> A thoroughly unprincipled person; a scoundrel.
>>>> >> A foul-mouthed person.
>>>> >
>>>> >Sure, that's what it _means_, but...
>>>> >
>>>> >> Does everything have to become a racism experiment?
>>>> >
>>>> >the question is one of etymology.
>>>>
>>>> ?
>>>> What do you imagine the etymology to be?
>>>
>>>FWIW, from <http://www.wordorigins.org/wordorb.htm>:
>>
>> Actually, I wasn't asking that. I wondered what Jordan was imagining
>> it to be.
>>
>
> Ah, sorry. I didn't read the lot carefully enough.

I don't imagine it to be anything. I suspect others do, and that's why
there is a potential for accusations of racism

Paul Keinanen

unread,

Mar 8, 2006, 4:26:50 PM3/8/06

to

Exactly which x86 mode are you referring to ?

16 bit real mode, virtual86 mode or some 32 mode (which are after all
segmented modes with all segmented registers with the same value) ?

If s is stored in 16 bit mode in ES:DX with DX=0, then p=s-1 would
need to decrement ES by one and store 000F in DX. Why would reloading
ES cause any traps, since no actual memory reference is attempted ?
Doing p++ would most likely just increment DX by one to 0010, thus
ES:DX would point to s again, which is a legal address, but with a
different internal representation.

IIRC some 32 bit addressing mode would trap if one tried to load the
segment register, but again, how could the caller generate such
constructs as s = ES:0 at least from user mode. In practice s = ES:0
could only be set by a kernel mode routine calling a user mode
routine, so this is really an issue only with main() parameters.

Paul

CBFalconer

unread,

Mar 8, 2006, 4:06:16 PM3/8/06

to

This illustrates the fact that usenet threads are uncontrollable.
I wrote the original to draw attention to hidden assumptions, and
it has immediately degenerated into thrashing about the one real
error in the sample code. I could have corrected and eliminated
that error by a slight code rework, but then I would have modified
Mr Hsiehs code. There were at least seven further assumptions,
most of which were necessary for the purposes of the code, but
strictly limited its applicability.

My aim was to get people to recognize and document such hidden
assumptions, rather than leaving them lying there to create sneaky
bugs in apparently portable code.

--
"If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers." - Keith Thompson
More details at: <http://cfaj.freeshell.org/google/>
Also see <http://www.safalra.com/special/googlegroupsreply/>

Andrew Reilly

unread,

Mar 8, 2006, 5:13:24 PM3/8/06

to

On Wed, 08 Mar 2006 13:14:39 -0800, Andrey Tarasevich wrote:

> Andrew Reilly wrote:
>> ...
>> Bah, humbug. Think I'll go back to assembly language, where pointers do
>> what you tell them to, and don't complain about it to their lawyers.
>> ...
>
> Incorrect. It is not about "lawyers", it is about actual _crashes_. The
> reason why 's - 1' itself can (an will) crash on certain platforms is
> the same as the one that will make it crash in exactly the same way in
> "assembly language" on such platforms.

No, my point was that the language lawyers have taken a perfectly
appealing and generally applicable abstraction, and outlawed certain
obvious constructions on the flimsy grounds that it was easier to
pervert the abstraction than to support it on some uncommon (or indeed
hypothetical) hardware.

> Trying to implement the same code in assembly language on such a
> platform would specifically force you to work around the potential
> crash, sacrificing efficiency for safety. In other words, you'd be
> forced to use different techniques for doing 's - 1' in contexts where
> it might underflow and in contexts where it definitely will not
> underflow.

The assembly language version of the algorithm would *not* crash, because
the assembly language of the perverted platform on which that was a
possibility would require a construction (probably using an explicit
integer array index, rather than pointer manipulation) that would cause
exactly zero inefficiency or impairment of safety. (Because the index is
only *used* in-bounds.)

> C language, on the other hand, doesn't offer two different '-' operators
> to for these two specific situations. Instead C language outlaws (in
> essence) pointer underflows.

No it doesn't. The C language allows "inner" pointers to be passed to
functions, with no other way for the function to tell whether s - 1 is
legal or illegal in any particular call context. It is therefore clear
what the abstraction of pointer arithmetic implies. That some platforms
(may) have a problem with this is not the language's fault. It's just a
bit harder to support C on them. That's OK. There are plenty of other
languages that don't allow that construct at all (or even have pointers as
such), and they were clearly the targets in mind for the people who
developed such hardware. The standard authors erred. It should have been
incumbant on implementers on odd platforms to support the full power of
the language or not at all, rather than for all other (C-like) platforms
to carry the oddness around with them, in their code. However, it's clear
that's a very old mistake, and no-one's going to back away from it now.

> This is a perfectly reasonable approach for a higher level language.

C is not a higher-level language. It's a universal assembler. Pick
another one.

--
Andrew

Christian Bau

unread,

Mar 8, 2006, 5:53:23 PM3/8/06

to

In article <pan.2006.03.07....@areilly.bpc-users.org>,
Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:

> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
> > K&R answers your question. If pa points to some element of an array,

> > then pa-1 points to the /previous element/. But what's the "previous
> > element" relative to the first element in the array? It doesn't exist. So
> > we have undefined behavior.
>

> Only because the standard says so. Didn't have to be that way. There are
> plenty of logically correct algorithms that could exist that involve
> pointers that point somewhere outside of a[0..N]. As long as there's no
> de-referencing, no harm, no foul. (Consider the simple case of iterating
> through an array at a non-unit stride, using the normal p < s + N
> termination condition. The loop finishes with p > s + N and the standard
> says "pow, you're dead", when the semantically identical code written with
> integer indexes has no legal problems.

Consider a typical implementation with 32 bit pointers and objects that
can be close to 2 GByte in size.

typedef struct { char a [2000000000]; } giantobject;

giantobject anobject;

giantobject* p = &anobject;
giantobject* q = &anobject - 1;
giantobject* r = &anobject + 1;
giantobject* s = &anobject + 2;

It would be very hard to implement this in a way that both q and s would
be valid; for example, it would be very hard to achieve that q < p, p <
r and r < s are all true. If q and s cannot be both valid, and there
isn't much reason why one should be valid and the other shouldn't, then
neither can be used in a program with any useful guarantees by the
standard.

CBFalconer

unread,

Mar 8, 2006, 5:26:35 PM3/8/06

to

"Clark S. Cox III" wrote:

> "Peter Harrison" <pe...@cannock.ac.uk> said:
>>> Paul Burke <pa...@scazon.com> writes:
>>>
>>>> My simple mind must be missing something big here. If for
>>>> pointer p, (p-1) is deprecated because it's not guaranteed that
>>>> it points to anything sensible, why is p++ OK? There's no
>>>> boundary checking in C (unless you put it in).
>>>

It's not deprecated, it's illegal. Once you have involved UB all
bets are off. Without the p-1 the p++ statements are fine, as long
as they don't advance the pointer more than one past the end of the
object.

>>
>> It seems I too have a simple mind. I read the recent replies to this
>> and found myself not sure I am better off.
>>
>> This is what I think I understand:
>>
>> int x;
>> int *p;
>> int *q;
>>
>> p = &x; /* is OK */
>
> Correct, p now points to x

and a statement --p or p-- would be illegal. However p++ would be
legal. But *(++p) would be illegal, because it dereferences past
the confines of the object x.

Christian Bau

unread,

Mar 8, 2006, 6:00:34 PM3/8/06

to

In article <pan.2006.03.07....@areilly.bpc-users.org>,
Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:

> On Tue, 07 Mar 2006 22:26:57 +0000, Keith Thompson wrote:
>
> > Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:

> >> On Tue, 07 Mar 2006 13:28:37 -0500, Arthur J. O'Dwyer wrote:
> >>> K&R answers your question. If pa points to some element of an array,
> >>> then pa-1 points to the /previous element/. But what's the "previous
> >>> element" relative to the first element in the array? It doesn't exist.
> >>> So we have undefined behavior.
> >>
> >> Only because the standard says so. Didn't have to be that way. There
> >> are plenty of logically correct algorithms that could exist that involve
> >> pointers that point somewhere outside of a[0..N]. As long as there's no
> >> de-referencing, no harm, no foul. (Consider the simple case of
> >> iterating through an array at a non-unit stride, using the normal p < s
> >> + N termination condition. The loop finishes with p > s + N and the
> >> standard says "pow, you're dead", when the semantically identical code
> >> written with integer indexes has no legal problems.
> >

> > The standard is specifically designed to allow for architectures where
> > constructing an invalid pointer value can cause a trap even if the pointer
> > is not dereferenced.
>
> And are there any? Any in common use? Any where the equivalent (well
> defined) pointer+offset code would be slower?

Question: If the C Standard guarantees that for any array a, &a [-1]
should be valid, should it also guarantee that &a [-1] != NULL and that
&a [-1] < &a [0] and &a [-1] < &a [0]?

In that case, what happens when I create an array with a single element
that is an enormously large struct?

Christian Bau

unread,

Mar 8, 2006, 6:15:17 PM3/8/06

to

In article <pan.2006.03.08....@areilly.bpc-users.org>,
Andrew Reilly <andrew-...@areilly.bpc-users.org> wrote:

> On Wed, 08 Mar 2006 00:56:03 +0000, Robin Haigh wrote:
> > It's not always equivalent. The trouble starts with
> >
> > char a[8];
> > char *p;
> >
> > for ( p = a+1 ; p < a+8 ; p += 2 ) {}
> >
> > intending that the loop terminates on p == a+9 (since it skips a+8). But
> > how do we know that a+9 > a+8 ? If the array is right at the top of some
> > kind of segment, the arithmetic might have wrapped round.
>
> a+9 > a+8 because a + 9 - (a + 8) == 1, which is > 0. Doesn't matter if
> the signed or unsigned pointer value wrapped around in an intermediate
> term. On many machines that's how the comparison is done anyway. You're
> suggesting that having the compiler ensure that a+8 doesn't wrap around
> wrt a is OK, but a+9 is too hard. I don't buy it.

I just tried the following program (CodeWarrior 10 on MacOS X):

#include <stdio.h>

#define SIZE (50*1000000L)
typedef struct {
char a [SIZE];
} bigstruct;

static bigstruct bigarray [8];

int main(void)
{
printf("%lx\n", (unsigned long) &bigarray [0]);
printf("%lx\n", (unsigned long) &bigarray [9]);
printf("%lx\n", (unsigned long) &bigarray [-1]);

if (&bigarray [-1] < & bigarray [0])
printf ("Everything is fine\n");
else
printf ("The C Standard is right: &bigarray [-1] is broken\n");

return 0;
}

The output is:

2008ce0
1cd30160
ff059c60
The C Standard is right: &bigarray [-1] is broken

Al Balmer

unread,

Mar 8, 2006, 7:04:11 PM3/8/06

to

On 8 Mar 2006 20:35:24 GMT, Chris Torek <nos...@torek.net> wrote:

>>Keith Thompson said:
>>> Um, I always thought that "within" and "outside" were two different
>>> things.
>
>In article <dumjbp$ppp$4...@nwrdmz03.dmz.ncs.ea.ibs-infra.bt.com>
>Richard Heathfield <inv...@invalid.invalid> wrote:
>>Ask Jack to lend you his bottle. You'll soon change your mind.
>
>To clarify a bit ...
>
> A mathematician named Klein
> Thought the Moebius band was divine
> Said he, "If you glue
> The edges of two
> You'll get a weird bottle like mine!"
>
>:-)
>
>(A Moebius band has only one side. It is a two-dimensional object
>that exists only in a 3-dimensional [or higher] space. A Klein
>bottle can only be made in a 4-dimensional [or higher] space, and
>is a 3-D object with only one side. The concept can be carried on
>indefinitely, but a Klein bottle is hard enough to contemplate
>already.)

But that was Felix. Who's Jack?

Al Balmer

unread,

Mar 8, 2006, 7:07:45 PM3/8/06

to

On Thu, 09 Mar 2006 09:13:24 +1100, Andrew Reilly
<andrew-...@areilly.bpc-users.org> wrote:

>C is not a higher-level language. It's a universal assembler. Pick
>another one.

Nice parrot. I think the original author of that phrase meant it as a
joke.

I spent 25 years writing assembler. C is a higher-level language.

Keith Thompson

unread,

Mar 8, 2006, 7:58:30 PM3/8/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
[...]

> C is not a higher-level language.

It's higher-level than some, lower than others. I'd call it a
medium-level language.

> It's a universal assembler.

Not in any meaningful sense of the word "assembler".

Dik T. Winter

unread,

Mar 8, 2006, 8:21:57 PM3/8/06

to

In article <120udfp...@news.supernews.com> Andrey Tarasevich <andreyta...@hotmail.com> writes:
...
The first time I see this code, but:

> > const char *p; /* 5 */

...

> > if ((((int) p) & (SW - 1)) == 0) {

...
This will not result in the desired answer on the Cray 1.
On the Cray 1 a byte pointer has the word address (64 bit words)
in the lower 48 bits and a byte offset in the upper 16 bits.
So this code actually tests whether the *word* address is even.
And so the code will fail to give the correct answer in the
following case:
char f[] = "0123456789";
int i;
f[1] = 0;
i = strlen(f + 2);
when f starts at an even word address it will give the answer 1
instead of the correct 8.

> > d = *((int *) p);

Note that in here the byte-offset in the pointer is ignored, so
d points to the integer that contains the character array:
"0\000234567".

Again an hidden assumption I think. (It is exactly this hidden
assumption that made porting of a particular program extremely
difficult to the Cray 1. The assumption was that in a word
pointer the lowest bit was 0, and that bit was used for
administrative purposes.)
--
dik t. winter, cwi, kruislaan 413, 1098 sj amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Dik T. Winter

unread,

Mar 8, 2006, 8:28:28 PM3/8/06

to

In article <120stck...@corp.supernews.com> msg <msg@_cybertheque.org_> writes:
>
> > OK, I've made enough of a fool of myself already. I'll go and have that
> > second cup of coffee for the morning, before I start going on about having
> > the standard support non-2's complement integers, or machines that have no
> > arithmetic right shifts...
>
> I get queasy reading the rants against 1's complement architectures; I
> wish Seymour Cray were still around to address this.

There are quite a few niceties indeed. Negation of a number is really
simple, just a logical operation, and there are others. This means
simpler hardware for the basic operations on signed objects, except for
the carry. It was only when I encountered the PDP that I saw the first
2's complement machine.

On the other hand, when Seymour Cray started his own company, those
machines where 2's complement. And he shifted from 60 to 64 bit
words, but still retained octal notation (he did not like hexadecimal
at all).

Andrew Reilly

unread,

Mar 8, 2006, 8:39:11 PM3/8/06

to

On Wed, 08 Mar 2006 18:26:35 -0500, CBFalconer wrote:

> "Clark S. Cox III" wrote:
>> "Peter Harrison" <pe...@cannock.ac.uk> said:
>>>> Paul Burke <pa...@scazon.com> writes:
>>>>
>>>>> My simple mind must be missing something big here. If for
>>>>> pointer p, (p-1) is deprecated because it's not guaranteed that
>>>>> it points to anything sensible, why is p++ OK? There's no
>>>>> boundary checking in C (unless you put it in).
>>>>
>
> It's not deprecated, it's illegal. Once you have involved UB all
> bets are off. Without the p-1 the p++ statements are fine, as long
> as they don't advance the pointer more than one past the end of the
> object.

It's no more "illegal" than any of the other undefined behaviour that you
pointed out in that code snippet. There aren't different classes of
undefined behaviour, are there?

I reckon I'll just go with the undefined flow, in the interests of
efficient, clean code on the architectures that I target. I'll make sure
that I supply a document specifying how the compilers must behave for all
of the undefined behaviours that I'm relying on, OK? I have no interest
in trying to make my code work on architectures for which they don't hold.

Of course, that list will pretty much just describe the usual flat-memory,
2's compliment machine that is actually used in almost all circumstances
in the present day, anyway. Anyone using anything else already knows that
they're in a world of trouble and that all bets are off.

--
Andrew

Dik T. Winter

unread,

Mar 8, 2006, 8:44:02 PM3/8/06

to

Now responding to the basic article:

In article <440CC04D...@yahoo.com> cbfal...@maineline.net writes:
...

> #define hasNulByte(x) ((x - 0x01010101) & ~x & 0x80808080)

> Let us start with line 1! The constants appear to require that

> sizeof(int) be 4, and that CHAR_BIT be precisely 8. I haven't
> really looked too closely, and it is possible that the ~x term
> allows for larger sizeof(int), but nothing allows for larger
> CHAR_BIT.

It does not allow for larger sizeof(int) (as it does not allow for
other values of CHAR_BIT). When sizeof(int) > 4 it will only show
whether there is a zero byte in the low order four bytes. When
sizeof(int) < 4 it will give false positives. Both constants have
to be changed when sizeof(int) != 4. Moreover, it will not work on
1's complement or sign-magnitude machines. Using unsigned here is
most appropriate.

> if ((((int) p) & (SW - 1)) == 0) {

> Then we come to the purpose of the statement, which is to discover

> if the pointer is suitably aligned for an int. It does this by
> bit-anding with SW-1, which is the concealed sizeof(int)-1. This
> won't be very useful if sizeof(int) is, say, 3 or any other
> non-poweroftwo. In addition, it assumes that an aligned pointer
> will have those bits zero. While this last is very likely in
> todays systems, it is still an assumption. The system designer is
> entitled to assume this, but user code is not.

It is false on the Cray 1 and its derivatives. See another article
by me where I show that it may give wrong answers.

Andrew Reilly

unread,

Mar 8, 2006, 9:03:17 PM3/8/06

to

On Wed, 08 Mar 2006 23:53:23 +0000, Christian Bau wrote:
>> Only because the standard says so. Didn't have to be that way. There are
>> plenty of logically correct algorithms that could exist that involve
>> pointers that point somewhere outside of a[0..N]. As long as there's no
>> de-referencing, no harm, no foul. (Consider the simple case of iterating
>> through an array at a non-unit stride, using the normal p < s + N
>> termination condition. The loop finishes with p > s + N and the standard
>> says "pow, you're dead", when the semantically identical code written with
>> integer indexes has no legal problems.
>
> Consider a typical implementation with 32 bit pointers and objects that
> can be close to 2 GByte in size.

Yeah, my world-view doesn't allow individual objects to occupy half the
address space or more. I'm comfortable with that restriction, but I can
accept that there may be others that aren't. They're wrong, of course :-)

> typedef struct { char a [2000000000]; } giantobject;
>
> giantobject anobject;
>
> giantobject* p = &anobject;
> giantobject* q = &anobject - 1;
> giantobject* r = &anobject + 1;
> giantobject* s = &anobject + 2;
>
> It would be very hard to implement this in a way that both q and s would
> be valid; for example, it would be very hard to achieve that q < p, p <
> r and r < s are all true. If q and s cannot be both valid, and there
> isn't much reason why one should be valid and the other shouldn't, then
> neither can be used in a program with any useful guarantees by the
> standard.

Yes, very hard indeed. Partition your object or use a machine with bigger
addresses. Doesn't seem like a good enough reason to me to break a very
useful abstraction.

Posit: you've got N bits to play with, both for addresses and integers.
You need to be able to form a ptrdiff_t, which is a signed quantity, to
compute d = anobject.a[i] - anobject.a[j], for any indices i,j within the
range of the array. The range of signed quantities is just less than half
that of unsigned. That range must therefore define how large any
individual object can be. I.e., half of your address space. Neat, huh?

Yeah, yeah, for any complicated problem there's an answer that is simple,
neat and wrong.

--
Andrew

Keith Thompson

unread,

Mar 8, 2006, 9:15:37 PM3/8/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
> On Wed, 08 Mar 2006 18:26:35 -0500, CBFalconer wrote:
>> "Clark S. Cox III" wrote:
>>> "Peter Harrison" <pe...@cannock.ac.uk> said:
>>>>> Paul Burke <pa...@scazon.com> writes:
>>>>>
>>>>>> My simple mind must be missing something big here. If for
>>>>>> pointer p, (p-1) is deprecated because it's not guaranteed that
>>>>>> it points to anything sensible, why is p++ OK? There's no
>>>>>> boundary checking in C (unless you put it in).
>>
>> It's not deprecated, it's illegal. Once you have involved UB all
>> bets are off. Without the p-1 the p++ statements are fine, as long
>> as they don't advance the pointer more than one past the end of the
>> object.
>
> It's no more "illegal" than any of the other undefined behaviour that you
> pointed out in that code snippet. There aren't different classes of
> undefined behaviour, are there?

Right, "illegal" probably isn't the best word to describe undefined
behavior. An implementation is required to diagnose syntax errors and
constraint violations; it's specifically *not* required to diagnose
undefined behavior (though it's allowed to do so).

> I reckon I'll just go with the undefined flow, in the interests of
> efficient, clean code on the architectures that I target. I'll make sure
> that I supply a document specifying how the compilers must behave for all
> of the undefined behaviours that I'm relying on, OK? I have no interest
> in trying to make my code work on architectures for which they don't hold.

Ok, you can do that if you like. If you can manage to avoid undefined
behavior altogether, your code is likely to work on *any* system with
a conforming C implementation; if not, it may break when ported to
some exotic system.

For example, code that makes certain seemingly reasonable assumptions
about pointer representations will fail on Cray vector systems. I've
run into such code myself; the corrected code was actually simpler and
cleaner.

If you write code that depends on undefined behavior, *and* there's a
real advantage in doing so on some particular set of platforms, *and*
you don't mind that your code could fail on other platforms, then
that's a perfectly legitimate choice. (If you post such code here in
comp.lang.c, you can expect us to point out the undefined behavior;
some of us might possibly be overly enthusiastic in pointing it out.)

> Of course, that list will pretty much just describe the usual flat-memory,
> 2's compliment machine that is actually used in almost all circumstances
> in the present day, anyway. Anyone using anything else already knows that
> they're in a world of trouble and that all bets are off.

All bets don't *need* to be off if you're able to stick to what the C
standard actually guarantees.

Andrew Reilly

unread,

Mar 8, 2006, 9:21:36 PM3/8/06

to

On Thu, 09 Mar 2006 00:00:34 +0000, Christian Bau wrote:
> Question: If the C Standard guarantees that for any array a, &a [-1]
> should be valid, should it also guarantee that &a [-1] != NULL

Probably, since NULL has been given the guarantee that it's unique in some
sense. In an embedded environment, or assembly language, the construct
could of course produce NULL (for whatever value you pick for NULL), and
NULL would not be special. I don't know that insisting on the existence of
a unique and special NULL pointer value is one of the standard's crowning
achievements, either. It's convenient for lots of things, but it's just
not the way simple hardware works, particularly at the limits.

> and that
> &a [-1] < &a [0]

Sure, in the ptrdiff sense that I mentioned before.
I.e., (a - 1) - (a + 0) < 0 (indeed, identically -1)

> In that case, what happens when I create an array with a single element
> that is an enormously large struct?

Go nuts. If your address space is larger than your integer range, (as, is
the case for I32LP64 machines), your compiler might have to make sure that
it performs the difference calculation to sufficient precision.

I still feel comfortable about this failing to work for objects larger
than half the address space, or even for objects larger than the range of
an int. That's IMO, a much less uncomfortable restriction than the one
that the standard seems to have stipulated, which is that the simple and
obvious pointer arithmetic that you've used in your examples works in some
situations and doesn't work in others. (Remember: it's all good if those
array references are in a function that was itself passed (&foo[n], for
n>=1) as the argument.)

Cheers,

--
Andrew

Keith Thompson

unread,

Mar 8, 2006, 9:32:22 PM3/8/06

to

Andrew Reilly <andrew-...@areilly.bpc-users.org> writes:
> On Wed, 08 Mar 2006 23:53:23 +0000, Christian Bau wrote:

[...]

>> Consider a typical implementation with 32 bit pointers and objects that
>> can be close to 2 GByte in size.
>
> Yeah, my world-view doesn't allow individual objects to occupy half the
> address space or more. I'm comfortable with that restriction, but I can
> accept that there may be others that aren't. They're wrong, of course :-)

I can easily imagine a program that needs to manipulate a very large
data set (for a scientific simulation, perhaps). For a data set that
won't fit into memory all at once, loading as much of it as possible
can significantly improve performance.

>> typedef struct { char a [2000000000]; } giantobject;
>>
>> giantobject anobject;
>>
>> giantobject* p = &anobject;
>> giantobject* q = &anobject - 1;
>> giantobject* r = &anobject + 1;
>> giantobject* s = &anobject + 2;
>>
>> It would be very hard to implement this in a way that both q and s would
>> be valid; for example, it would be very hard to achieve that q < p, p <
>> r and r < s are all true. If q and s cannot be both valid, and there
>> isn't much reason why one should be valid and the other shouldn't, then
>> neither can be used in a program with any useful guarantees by the
>> standard.
>
> Yes, very hard indeed. Partition your object or use a machine with bigger
> addresses. Doesn't seem like a good enough reason to me to break a very
> useful abstraction.

Your "very useful abstraction" is not something that has *ever* been
guaranteed by any C standard or reference manual.

> Posit: you've got N bits to play with, both for addresses and integers.
> You need to be able to form a ptrdiff_t, which is a signed quantity, to
> compute d = anobject.a[i] - anobject.a[j], for any indices i,j within the
> range of the array. The range of signed quantities is just less than half
> that of unsigned. That range must therefore define how large any
> individual object can be. I.e., half of your address space. Neat, huh?

The standard explicitly allows for the possibility that pointer
subtraction within a single object might overflow (if so, it invokes
undefined behavior). Or, given that C99 requires 64-bit integer
types, making ptrdiff_t larger should avoid the problem for any
current systems (I don't expect to see full 64-bit address spaces for
a long time).

The standard is full of compromises. Not everyone likes all of them.