Decreasing order of address within main

karthikbalaguru

unread,

Sep 21, 2009, 1:27:47 PM9/21/09

to

Hi,
I was playing around with the address of operator and i
found some pattern in it.

#include<stdio.h>
int main(void)
{
int i=10,j=20;
int diff;
diff = &j - &i;
printf("address of diff - %u \naddress of j - %u\naddress of i - %u
\n",&diff,&j,&i);
printf("sizeof diff - %d \nsize of j - %d\nsize of i - %d \n",sizeof
(diff),sizeof(j),sizeof(i));
}

Output of First run
-----------------------------
address of diff - 1375424
address of j - 1375436
address of i - 1375448
sizeof diff - 4
size of j - 4
size of i - 4

Output of Second run
------------------------------
address of diff - 1637408
address of j - 1637420
address of i - 1637432
sizeof diff - 4
size of j - 4
size of i - 4

Were able to notice it ?
In both of the above outputs, the Address of 'diff' has the lowest
address to start with and next is 'j' and finally is the 'i'. The
difference between all these three variable address is 12.

Interesting to know the reason for the difference of 12 between
these variables .
Also, how does the variable 'diff' always gets the lowest address ?
Does it mean that the first element within a function will always
get the highest possible address during that particular time and
the subsequent variables will have their addresses less than that ?

Does the standard talk anything on these lines ?

Any ideas ?

Thx in advans,
Karthik Balaguru

Nobody

unread,

Sep 21, 2009, 2:02:33 PM9/21/09

to

On Mon, 21 Sep 2009 10:27:47 -0700, karthikbalaguru wrote:

> I was playing around with the address of operator and i
> found some pattern in it.
>
> #include<stdio.h>
> int main(void)
> {
> int i=10,j=20;
> int diff;
> diff = &j - &i;
> printf("address of diff - %u \naddress of j - %u\naddress of i - %u
> \n",&diff,&j,&i);
> printf("sizeof diff - %d \nsize of j - %d\nsize of i - %d \n",sizeof
> (diff),sizeof(j),sizeof(i));
> }

> In both of the above outputs, the Address of 'diff' has the lowest

> address to start with and next is 'j' and finally is the 'i'. The
> difference between all these three variable address is 12.
>
> Interesting to know the reason for the difference of 12 between
> these variables .

On my system, the difference is 4. I suspect that your compiler has
some form of buffer-overrun checking enabled, causing it to insert
a "canary" between each variable.

> Also, how does the variable 'diff' always gets the lowest address ?
> Does it mean that the first element within a function will always
> get the highest possible address during that particular time and
> the subsequent variables will have their addresses less than that ?
>
> Does the standard talk anything on these lines ?

The standard says absolutely nothing about how the compiler lays out the
stack, or even whether it uses a stack. All of this is an implementation
detail.

If you enable optimisation, the compiler will often store local variables
in registers, or even completely eliminate many local variables
(obviously, it can't do either of these if you take their address).

jameskuyper

unread,

Sep 21, 2009, 2:12:53 PM9/21/09

to

karthikbalaguru wrote:
> Hi,
> I was playing around with the address of operator and i
> found some pattern in it.
>
> #include<stdio.h>
> int main(void)
> {
> int i=10,j=20;
> int diff;
> diff = &j - &i;

"When two pointers are subtracted, both shall point to elements of the
same array object, or one past the last element of the array object;
the result is the difference of the subscripts of the two array
elements." (6.5.6p9)

For purposes of that section of code, 'i' can treated as a one-
dimensional array of int. If 'j' happens to allocated a position
immediately after the possition allocated for 'i', then a pointer to
'j' could also be a pointer one past the end of that array. However,
the standard guarantees nothing about where 'j' and 'i' are stored
relative to each other. Therefore, in general the expression you use
to initialize 'diff' may violate a "shall" which appears outside of a
"Constraints" section - the behavior is undefined.

> printf("address of diff - %u \naddress of j - %u\naddress of i - %u
> \n",&diff,&j,&i);
> printf("sizeof diff - %d \nsize of j - %d\nsize of i - %d \n",sizeof
> (diff),sizeof(j),sizeof(i));
> }

...

> Does the standard talk anything on these lines ?

No, not at all, except to say that the behavior of &i-&j may have
undefined behavior. You shouldn't write code that depends upon any of
the patterns you've noticed, unless it's acceptable for that code to
be highly non-portable.

Seebs

unread,

Sep 21, 2009, 2:30:39 PM9/21/09

to

On 2009-09-21, karthikbalaguru <karthikb...@gmail.com> wrote:
> I was playing around with the address of operator and i
> found some pattern in it.

Okay.

> #include<stdio.h>
> int main(void)
> {
> int i=10,j=20;
> int diff;
> diff = &j - &i;

This is undefined behavior. There is no guarantee that it will produce
meaningful results.

> Interesting to know the reason for the difference of 12 between
> these variables .

That happens to be where the compiler put them this time. Or maybe
it just wanted to print 12 -- there's no guarantee that there is a
meaningful result from subtracting a pointer into one object from a
pointer into another object.

> Also, how does the variable 'diff' always gets the lowest address ?

That happened to be what happened on this machine.

> Does it mean that the first element within a function will always
> get the highest possible address during that particular time and
> the subsequent variables will have their addresses less than that ?

No.

> Does the standard talk anything on these lines ?

No.

Except to say that it is undefined behavior to even try to answer the
question.

> Any ideas ?

From the standpoint of the C standard, i, j, and diff are three separate
objects which may or may not be in even the same *kind* of physical storage.
It is not necessarily the case that comparing or subtracting those
pointers would be reasonable; there could be systems on which this program
would dump core or crash when it tried to calculate "diff".

The short answer is: Except in extremely unusual circumstances, you can
never need to know, and if you think you know you're almost certainly wrong.
In the absence of your explicit address-taking, it's quite possible that
the variables wouldn't even *have* addresses in the resulting code -- they
might never make it to actual memory storage. There's no reason they should
have to.

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Keith Thompson

unread,

Sep 21, 2009, 2:35:21 PM9/21/09

to

karthikbalaguru <karthikb...@gmail.com> writes:
> I was playing around with the address of operator and i
> found some pattern in it.
>
> #include<stdio.h>
> int main(void)
> {
> int i=10,j=20;
> int diff;
> diff = &j - &i;
> printf("address of diff - %u \naddress of j - %u\naddress of i - %u
> \n",&diff,&j,&i);
> printf("sizeof diff - %d \nsize of j - %d\nsize of i - %d \n",sizeof
> (diff),sizeof(j),sizeof(i));
> }

[...]

Others have covered the core problems with your program; I'll handle
the tedious nitpicking. 8-)}

The result of subtracting one pointer value from another is of type
ptrdiff_t, not int. Since ptrdiff_t is an integer type, this:

int diff;
diff = &j - &i;

is valid (apart from the fact that the behavior of the subtraction
itself is undefined), but it would probably be better to do this:

ptrdiff_t diff;
diff = &j - &i; /* still UB, but no conversion is needed */

Your printf formats are all wrong. The correct format for printing an
address value (pointer value) is "%p", and it requires an argument of
type void*. (You can probably get away with char*, but it's easier to
use void* consistently.)

In your second printf, you're using "%d" to print values of type
size_t. This is likely to work in practice if int and size_t happen
to be the same size on your implementation, and if the values don't
exceed INT_MAX, but it's definitely not the best way to do it.

Here's a modified version of your program. I've also split the
printfs into multiple calls, one per output line. Using a single
format string for multiple output lines isn't incorrect, but very long
format strings can be split by Usenet software.

#include <stdio.h>
#include <stddef.h>
int main(void)
{
int i = 10;
int j = 20;
ptrdiff_t diff = &j - &i;

printf("address of diff - %p\n", (void*)&diff);
printf("address of j - %p\n", (void*)&j);
printf("address of i - %p\n", (void*)&i);

printf("sizeof diff - %lu\n", (unsigned long)sizeof diff);
printf("sizeof j - %lu\n", (unsigned long)sizeof j);
printf("sizeof i - %lu\n", (unsigned long)sizeof i);

return 0;
}

Note that the behavior of the subtraction is still undefined.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

karthikbalaguru

unread,

Sep 21, 2009, 3:30:23 PM9/21/09

to

On Sep 21, 11:12 pm, jameskuyper <jameskuy...@verizon.net> wrote:
> karthikbalaguru wrote:
> > Hi,
> > I was playing around with the address of operator and i
> > found some pattern in it.
>
> > #include<stdio.h>
> > int main(void)
> > {
> > int i=10,j=20;
> > int diff;
> > diff = &j - &i;
>
> "When two pointers are subtracted, both shall point to elements of the
> same array object, or one past the last element of the array object;
> the result is the difference of the subscripts of the two array
> elements." (6.5.6p9)

I checked with this with the below code.

check_1
---------
#include<stdio.h>
int main(void)
{
int arr[]={1,2,3,4,5};
int i,*iptr;
iptr=&arr[4]-4;
for(i=0;i<=4;i++)
{
printf("%d \n",*iptr);
iptr++;
}
}
output
------
1
2
3
4
5

check_2
--------
#include<stdio.h>
int main(void)
{
int arr[]={1,2,3,4,5};
int i,*iptr;
iptr=&arr[4]-&arr[0];
for(i=0;i<=4;i++)
{
printf("%d \n",*iptr);
iptr++;
}
}
output
-------
crash

From check_2, i think even if i try subtraction
between the addresses of the elements within the same array,
it is giving undefined behaviour.
But, from check_1, it confirms that subtraction of a constant
number from the address is defined . But not between two
addresses of the same array.

Correct me if the above is wrong.

>
> For purposes of that section of code, 'i' can treated as a one-
> dimensional array of int. If 'j' happens to allocated a position
> immediately after the possition allocated for 'i', then a pointer to
> 'j' could also be a pointer one past the end of that array. However,
> the standard guarantees nothing about where 'j' and 'i' are stored
> relative to each other. Therefore, in general the expression you use
> to initialize 'diff' may violate a "shall" which appears outside of a
> "Constraints" section - the behavior is undefined.
>
>
>
> > printf("address of diff - %u \naddress of j - %u\naddress of i - %u
> > \n",&diff,&j,&i);
> > printf("sizeof diff - %d \nsize of j - %d\nsize of i - %d \n",sizeof
> > (diff),sizeof(j),sizeof(i));
> > }
> ...
> > Does the standard talk anything on these lines ?
>
> No, not at all, except to say that the behavior of &i-&j may have
> undefined behavior. You shouldn't write code that depends upon any of
> the patterns you've noticed, unless it's acceptable for that code to
> be highly non-portable.

Thx in advans,
Karthik Balaguru

jacob navia

unread,

Sep 21, 2009, 3:48:42 PM9/21/09

to

karthikbalaguru a �crit :

This is very easy to explain. The compiler is assigning in decreasing
order 3 integers of size 4.

> Also, how does the variable 'diff' always gets the lowest address ?

Probably because the compiler assigns them one after the other in their
declaration order. It sees first "i", then "j", then "diff", so it
assigns an address to i, then to j, then to diff. In the lcc-win
compiler each local variable is assigned an address as soon as it
is seen. Other compilers are more sophisticated and try to put
variables that are disjoint in the same address, or try to
align them to fit machine requirements. How the specific address
is settled can be a complicated process.

> Does it mean that the first element within a function will always
> get the highest possible address during that particular time and
> the subsequent variables will have their addresses less than that ?
>

Maybe. This depends on the compiler and the compilation options.
For instance, if optimization is requested all 3 variables could end
in some register and not be assigned any address at all.

Or some of them would be in registers and others in memory, all
combinations are possible.

> Does the standard talk anything on these lines ?
>

Not really. Most compilers do something similar to what you see,
but some of them (gcc for instance) have much more complicated schemas
specially when they see a floating point variable etc.

Keith Thompson

unread,

Sep 21, 2009, 3:55:41 PM9/21/09

to

karthikbalaguru <karthikb...@gmail.com> writes:
[snip]

> check_2
> --------
> #include<stdio.h>
> int main(void)
> {
> int arr[]={1,2,3,4,5};
> int i,*iptr;
> iptr=&arr[4]-&arr[0];
> for(i=0;i<=4;i++)
> {
> printf("%d \n",*iptr);
> iptr++;
> }
> }
> output
> -------
> crash
>
>
> From check_2, i think even if i try subtraction
> between the addresses of the elements within the same array,
> it is giving undefined behaviour.
> But, from check_1, it confirms that subtraction of a constant
> number from the address is defined . But not between two
> addresses of the same array.
>
> Correct me if the above is wrong.

Ok, it's wrong.

My guess is that you're using gcc, and that you decided to ignore the
"warning: assignment makes pointer from integer without a cast"
message.

Don't ignore warning messages. This one is particularly serious; if
it were up to me, it would be a fatal error, not just a warning.

The result of subtracting two pointer values is a signed integer
(specifically a ptrdiff_t, defined in <stddef.h>). You assign the
result of such a subtraction to a pointer object, iptr. The result is
garbage, and anything that happens after that (or before it, or during
it) is essentially meaningless.

What's probably happening is that iptr is being assigned the value 4,
converted from ptrdiff_t to int*, resulting in a pointer to address
0x00000004. You then try to dereference that pointer, which you're
not allowed to do.

But in fact anything can happen, including the compiler rejecting your
program.

IMHO gcc is not doing you any favors by allowing the code to compile.
The solution is to pay more attention to warnings.

jameskuyper

unread,

Sep 21, 2009, 4:01:10 PM9/21/09

to

karthikbalaguru wrote:
> On Sep 21, 11:12 pm, jameskuyper <jameskuy...@verizon.net> wrote:

...

> > "When two pointers are subtracted, both shall point to elements of the
> > same array object, or one past the last element of the array object;
> > the result is the difference of the subscripts of the two array
> > elements." (6.5.6p9)
>
> I checked with this with the below code.
>
> check_1
> ---------
> #include<stdio.h>
> int main(void)
> {
> int arr[]={1,2,3,4,5};
> int i,*iptr;
> iptr=&arr[4]-4;

This is subtraction of an integer from a pointer, something quite
different from subtraction of two pointers. More about that later.

> for(i=0;i<=4;i++)
> {
> printf("%d \n",*iptr);
> iptr++;
> }
> }
> output
> ------
> 1
> 2
> 3
> 4
> 5
>
> check_2
> --------
> #include<stdio.h>
> int main(void)
> {
> int arr[]={1,2,3,4,5};
> int i,*iptr;
> iptr=&arr[4]-&arr[0];

The difference between two pointers has the type ptrdiff_t, which is
an integer type. You've declared iptr to be a pointer to an int.
Therefore, the assignment statement is a constraint violation
(6.5.16.1p1). Any conforming compiler must give you a diagnostic. Did
yours? Did you ignore it? You shouldn't have.

You could eliminate that diagnostic by putting in a cast to convert
the int to a pointer. However, "Except as previously specified, the
result is implementation-defined, might not be correctly aligned,
might not point to an
entity of the referenced type, and might be a trap representation." If
any of those "might be"s come true, the behavior of your program would
still be undefined, particularly if you try to dereference that
pointer.

> for(i=0;i<=4;i++)
> {
> printf("%d \n",*iptr);

Having set iptr to point at some completely arbitrary location, and
not necessarily even a valid one, you then try to use iptr to retrieve
the value of the int object stored in that location. What did you
expect to happen, and why?

> iptr++;

Here you take a pointer to an unknown location, not necessarily a
valid one, and then you try to increment it. Still more undefined
behavior (as if any more was needed).

> }
> }
> output
> -------
> crash
>
>
> From check_2, i think even if i try subtraction
> between the addresses of the elements within the same array,
> it is giving undefined behaviour.

No, the subtraction is one of the few things that is correct in that
program.

> But, from check_1, it confirms that subtraction of a constant
> number from the address is defined.

No, it does not. Test runs are inherently incapable of proving that
the behavior is defined. Undefined behavior includes, as one
possibility, that the code does precisely what you expect it to do,
even though the standard gives you no justified cause to expect it to
do that thing. The only way to be sure whether the behavior is
undefined is to read and understand the code, and compare it with what
the standard says.

In fact, in this case, the subtraction does have defined behavior,
because subtraction of an integer value from a pointer yields a
pointer; if the original pointer points at the forth element of
whatever array it points at, or any higher element, the result of that
subtracting 4 from that pointer is well defined. The result is a
pointer that does, in fact, point at the location you expect.

> ... But not between two

> addresses of the same array.

If you had stored the result in a ptrdiff_t object, and used an
appropriate format specifier to print the value of that object, there
would have been no problem.

Kaz Kylheku

unread,

Sep 21, 2009, 4:11:12 PM9/21/09

to

On 2009-09-21, karthikbalaguru <karthikb...@gmail.com> wrote:

> Does the standard talk anything on these lines ?

The standard says that the addresses of objects declared in a block, when
considered in their lexical order, are always decreasing.

Don't listen to all these other people. Believe me; I'm the one who is right!

And remember, next time someone asks you, just tell them you read it on Usenet.

Whatever you do, don't actually read the standard yourself.

Second-hand information is best.

Richard Tobin

unread,

Sep 21, 2009, 4:30:21 PM9/21/09

to

In article <200910022...@gmail.com>,
Kaz Kylheku <kkyl...@gmail.com> wrote:

>And remember, next time someone asks you, just tell them you read it
>on Usenet. Whatever you do, don't actually read the standard
>yourself. Second-hand information is best.

So there we have it: anything that isn't specified in the C standard
is off-topic here, and you shouldn't ask about what's in the standard
eiher.

Your question was in fact a perfectly reasonable one, and the answer
(as others have said) is that the standard doesn't specify the order
of addresses, and doesn't even let you subtract addresses unless
they point into the same object, though in practice it will work
on any system where processes have a flat address space.

-- Richard
--
Please remember to mention me / in tapes you leave behind.

Seebs

unread,

Sep 21, 2009, 4:16:57 PM9/21/09

to

On 2009-09-21, karthikbalaguru <karthikb...@gmail.com> wrote:

> On Sep 21, 11:12�pm, jameskuyper <jameskuy...@verizon.net> wrote:
>> "When two pointers are subtracted, both shall point to elements of the
>> same array object, or one past the last element of the array object;
>> the result is the difference of the subscripts of the two array
>> elements." (6.5.6p9)

> I checked with this with the below code.

No you didn't.

> But, from check_1, it confirms that subtraction of a constant
> number from the address is defined.

No, it doesn't.

> Correct me if the above is wrong.

You can never confirm that something is "defined" by running code.

See, "defined" doesn't mean "it happened to work that way once". It
means "we are guaranteed that this will always work, or at least that
failure to do so is clearly a bug in the compiler."

Imagine that you live in apartment #323.

You could make the claim: "All apartment numbers are defined to be
palindromes." You write a little program to check it, you plug in 323,
it confirms: The number is a palindrome.

But you haven't actually checked what you'd *need* to check, which is
*every possible number*.

To make the claim that subtraction is "defined", you must not only
run your program on every compiler, for every kind of computer, with every
set of options. You must also do it on every compiler that will exist
in the future, for machines that haven't even been designed yet.

... Or you could just use the language *definition* to tell you what is or
is not *defined*. Note the relationship there; "defined" is a function of
the language specification, not of any specific real-world implementation.

The reason this matters is that you may someday want to use a different
compiler, and you may encounter a system where the subtraction crashes in
all cases, for instance.

Kenneth Brody

unread,

Sep 21, 2009, 4:37:12 PM9/21/09

to

Well, to be fair, not everyone has access to the "final" version of the
Standard. From what I understand, you need to pay for that, though "near
final" draft versions are available for free. Also, not everyone
understands the "legalese" of the text.

Unfortunately, it's tough to show definitively that the Standard doesn't say
something. For that, you have to take someone's answer on faith. If it
does say something, even if it says "it's undefined", people here often
answer with C&V.

--
Kenneth Brody

Keith Thompson

unread,

Sep 21, 2009, 5:02:23 PM9/21/09

to

Was there any particular reason you felt the need to give a sarcastic
answer to a reasonable question?

Keith Thompson

unread,

Sep 21, 2009, 5:22:26 PM9/21/09

to

Kenneth Brody <kenb...@spamcop.net> writes:
[...]

> Well, to be fair, not everyone has access to the "final" version of
> the Standard. From what I understand, you need to pay for that,
> though "near final" draft versions are available for free. Also, not
> everyone understands the "legalese" of the text.

[...]

The C99 standard itself costs money (something like $30 US for a PDF
copy). But
<http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
is free; it includes the full C99 standard with all the changes
specified in the three Technical Corrigenda folded in. For most
purposes, I consider it better than the C99 standard itself. (For
some purposes, C99 plus copies of the three Technical Corrigenda might
be better, since n1256 is marginally less official.)

jameskuyper

unread,

Sep 21, 2009, 5:57:34 PM9/21/09

to

Kenneth Brody wrote:
...

> Well, to be fair, not everyone has access to the "final" version of the
> Standard. From what I understand, you need to pay for that, though "near
> final" draft versions are available for free.

More to the point, n1256.pdf is available for free. It's an unofficial
committee draft which contains the final version of the standard, plus
all of the modifications to the final version that were called for by
the three technical corrigenda that have been approved. For my
purposes, that makes n1256.pdf more useful than the final version
would be.

> Also, not everyone
> understands the "legalese" of the text.

That, on the hand, it a very serious issue, and not easily dealt with.

Mark

unread,

Sep 21, 2009, 8:27:56 PM9/21/09

to

Keith Thompson wrote:
> The C99 standard itself costs money (something like $30 US for a PDF
> copy). But
> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
> is free; it includes the full C99 standard with all the changes

Then what do people pay for if they can get the standard for free?

--
Mark

luserXtrog

unread,

Sep 22, 2009, 2:40:06 AM9/22/09

to

On Sep 21, 4:02 pm, Keith Thompson <ks...@mib.org> wrote:
> Kaz Kylheku <kkylh...@gmail.com> writes:

> > On 2009-09-21, karthikbalaguru <karthikbalagur...@gmail.com> wrote:
> >> Does the standard talk anything on these lines ?
>
> > The standard says that the addresses of objects declared in a block, when
> > considered in their lexical order, are always decreasing.
>
> > Don't listen to all these other people. Believe me; I'm the one who
> > is right!
>
> > And remember, next time someone asks you, just tell them you read it
> > on Usenet.
>
> > Whatever you do, don't actually read the standard yourself.
>
> > Second-hand information is best.
>
> Was there any particular reason you felt the need to give a sarcastic
> answer to a reasonable question?
>

Indeed, sarcasm must always have a rigorous targeted purpose.
One must agonize of over the construction of the rationalizing
paraphernalia. A twisted presentation depends upon clear facts.

--
:|

Nick Keighley

unread,

Sep 22, 2009, 4:08:25 AM9/22/09

to

On 21 Sep, 21:11, Kaz Kylheku <kkylh...@gmail.com> wrote:

> On 2009-09-21, karthikbalaguru <karthikbalagur...@gmail.com> wrote:

> > Does the standard talk anything on these lines ?

[ie. the relationship between the addresses of variables]

> The standard says that the addresses of objects declared in a block, when
> considered in their lexical order, are always decreasing.
>
> Don't listen to all these other people. Believe me; I'm the one who is right!
>
> And remember, next time someone asks you, just tell them you read it on Usenet.
>
> Whatever you do, don't actually read the standard yourself.
>
> Second-hand information is best.

for the benefit of the OP, Kaz is pulling your leg (is not serious)

Phil Carmody

unread,

Sep 22, 2009, 4:26:37 AM9/22/09

to

ric...@cogsci.ed.ac.uk (Richard Tobin) writes:
> In article <200910022...@gmail.com>,
> Kaz Kylheku <kkyl...@gmail.com> wrote:
>
>>And remember, next time someone asks you, just tell them you read it
>>on Usenet. Whatever you do, don't actually read the standard
>>yourself. Second-hand information is best.
>
> So there we have it: anything that isn't specified in the C standard
> is off-topic here, and you shouldn't ask about what's in the standard
> eiher.

Not sure how that follows from the quoted paragraph.

> Your question was in fact a perfectly reasonable one, and the answer
> (as others have said) is that the standard doesn't specify the order
> of addresses, and doesn't even let you subtract addresses unless
> they point into the same object, though in practice it will work
> on any system where processes have a flat address space.

Any sufficiently aggressive optimiser won't even bother setting
any variable set to such an undefined value. Nor will it set any
future values dependent on that variable.

I'm not sure if there are any sufficiently aggressive optimisers
out there, but not prepared to sloppily write UB in order to find
out.

Phil
--
Any true emperor never needs to wear clothes. -- Devany on r.a.s.f1

James Kuyper

unread,

Sep 22, 2009, 6:44:24 AM9/22/09

to

I can think of three possible reasons.

The most common reason is probably just because they're unaware of the
existence or nature of n1256.pdf.

They might pay for the official standard as a way of contributing to the
standardization effort.

The C99 standard and the three TCs have all been carefully reviewed and
officially approved. n1256.pdf has not gone through that same process,
so it might be somewhat less reliable that it would have been if it had.

At the top of every page of n1256 except the first, it says "Septermber
7, 2007", and as of 2009-09-21, according to Lawrence Jones, the only
defects that had been reported between those two dates have been:

1) The typo in "Septermber 7, 2007".

2) "the predefined macro __STDC_MB_MIGHT_NEQ_WC__ should appear in
6.10.8p2 (optional predefined macros) rather than p1 (required
predefined macros)."

Kenny McCormack

unread,

Sep 22, 2009, 10:46:15 AM9/22/09

to

In article <h98not$l80$2...@pc-news.cogsci.ed.ac.uk>,

Richard Tobin <ric...@cogsci.ed.ac.uk> wrote:
>In article <200910022...@gmail.com>,
>Kaz Kylheku <kkyl...@gmail.com> wrote:
>
>>And remember, next time someone asks you, just tell them you read it
>>on Usenet. Whatever you do, don't actually read the standard
>>yourself. Second-hand information is best.

That's the jist of what I've been saying all along.

Lemma: Most newsgroups have a general ethos that questions that are
covered (i.e., answered) in the FAQs or other generally available
material is inappropriate for posting. I.e., the response to "what does
'i = i++' do?" is "Read the FAQ! (Don't bother us!)". While this
condemnation is not precisely that the question is "off topic", the
effect is the same - i.e., that the question is "inappropriate".

Therefore, when you combine the above lemma with the strict ban on
anything *not* in the C standard, you come to the (obvious to anyone
with a lick of sense) conclusion that nothing is acceptable here.

Notes:
1) Clearly, I am including the C standard documents as among the
"generally available material" (that everyone is assumed to have
access to and to have read cover-to-cover before posting here - even
though most of the posters [*] to this group have probably never
even heard of it).

2) Yes, there is a small window for so-called "language lawyering" -
that is, where people who really have no lives argue about tiny
minutiae in the standards documents - that no sensible person or
working programmer is like to care about. At best, this accounts
for about 5% of the volume of postings here.

[*] Measured by actual numbers of posters, not by volume of postings
(of course...!)

Mark McIntyre

unread,

Sep 22, 2009, 6:07:09 PM9/22/09

to

Firstly its my understanding that n1256 is the final draft, not the
edited final version.

Secondly, I can get all sorts of stuff for free which I choose to pay
for in order to support the authors and/or the service they render.

Keith Thompson

unread,

Sep 22, 2009, 6:42:28 PM9/22/09

to

Mark McIntyre <markmc...@TROUSERSspamcop.net> writes:
> Mark wrote:
>> Keith Thompson wrote:
>>> The C99 standard itself costs money (something like $30 US for a PDF
>>> copy). But
>>> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
>>> is free; it includes the full C99 standard with all the changes
>>
>> Then what do people pay for if they can get the standard for free?
>
> Firstly its my understanding that n1256 is the final draft, not the
> edited final version.

I suppose that depends on what you mean by those terms.

The current official C standard, as I understand it, consists of
C99 plus the three Technical Corrigenda. There is currently no
*official* single document that is the C standard. n1256 is an
attempt (and a darned good one with two very minor exceptions)
at creating what that single official document would look like if
it existed.

The term "draft" implies an early version of something that will
become official. In that sense, I don't think n1256 is really a
"draft".

On the other hand, it does say "Committee Draft" at the top of
each page (right before the "Septermber"), so perhaps I'm missing
something.

> Secondly, I can get all sorts of stuff for free which I choose to pay
> for in order to support the authors and/or the service they render.

Sure, but my understanding is that the $18 I paid for the C99
standard, or the $30 I'd pay if I bought it today, or the $???
I'd have to pay for a hard copy, doesn't go to the people who did
the actual work of writing the standard.

Nobody

unread,

Sep 22, 2009, 6:56:56 PM9/22/09

to

On Tue, 22 Sep 2009 11:26:37 +0300, Phil Carmody wrote:

> Any sufficiently aggressive optimiser won't even bother setting
> any variable set to such an undefined value. Nor will it set any
> future values dependent on that variable.
>
> I'm not sure if there are any sufficiently aggressive optimisers
> out there, but not prepared to sloppily write UB in order to find
> out.

Consider the following code (paraphrasing a bug which was recently
discovered in the linux kernel):

1 int x = p->x;
2 if (!p) return;
...

Some versions of gcc are sufficiently aggressive that they optimise line
2 out of existence.

The rationale is that because p->x had already been evaluated at line 1,
p being null leads to undefined behaviour. If p is not null, the return on
line 2 won't occur, but if p is null, the compiler is free to return, or
not return, or do whatever else it feels like doing.

At first, I was a bit confused as to what kind of optimisation strategy
would do this. My guess is that it uses a form of /reductio ad absurdum/,
(i.e. "if x implies UB, x is false") when making deductions about the
possible values an expression can have.

Seebs

unread,

Sep 22, 2009, 8:15:30 PM9/22/09

to

On 2009-09-22, Nobody <nob...@nowhere.com> wrote:
> Consider the following code (paraphrasing a bug which was recently
> discovered in the linux kernel):
>
> 1 int x = p->x;
> 2 if (!p) return;
> ...
>
> Some versions of gcc are sufficiently aggressive that they optimise line
> 2 out of existence.

Yes.

Ran into at least one of these in a very nasty bit of kernal internals
which relied on actually performing an apparently-irrelevant test for
null. Caused hangs on exactly one architecture.

Phil Carmody

unread,

Sep 23, 2009, 2:16:23 AM9/23/09

to

Seebs <usenet...@seebs.net> writes:
> On 2009-09-22, Nobody <nob...@nowhere.com> wrote:
>> Consider the following code (paraphrasing a bug which was recently
>> discovered in the linux kernel):
>>
>> 1 int x = p->x;
>> 2 if (!p) return;
>> ...
>>
>> Some versions of gcc are sufficiently aggressive that they optimise line
>> 2 out of existence.
>
> Yes.
>
> Ran into at least one of these in a very nasty bit of kernal internals
> which relied on actually performing an apparently-irrelevant test for
> null. Caused hangs on exactly one architecture.

More subtle is when the null value appears during linked list
traversal, and you simply 'pre-calculate' something dependent
on that pointer in order to simplify the code.

Kenneth Brody

unread,

Sep 23, 2009, 9:29:46 AM9/23/09

to

Keith Thompson wrote:
> Kenneth Brody <kenb...@spamcop.net> writes:
> [...]
>> Well, to be fair, not everyone has access to the "final" version of
>> the Standard. From what I understand, you need to pay for that,
>> though "near final" draft versions are available for free. Also, not
>> everyone understands the "legalese" of the text.
> [...]
>
> The C99 standard itself costs money (something like $30 US for a PDF
> copy). But
> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
> is free; it includes the full C99 standard with all the changes
> specified in the three Technical Corrigenda folded in. For most
> purposes, I consider it better than the C99 standard itself. (For
> some purposes, C99 plus copies of the three Technical Corrigenda might
> be better, since n1256 is marginally less official.)

Thanks for the link. I currently have n1124, which I assume is superseded
by n1256?

--
Kenneth Brody

Processor-Dev1l

unread,

Sep 23, 2009, 10:54:16 AM9/23/09

to

On Sep 21, 10:16 pm, Seebs <usenet-nos...@seebs.net> wrote:

> Copyright 2009, all wrongs reversed. Peter Seebach / usenet-nos...@seebs.nethttp://www.seebs.net/log/<-- lawsuits, religion, and funny pictureshttp://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

Well, I think the position of variables in memory is not caused by UB
but by CPU itself (if it uses memory in big or little endian).
x86 uses way when less significant bit is higher in memory so it is
reversed.
If int is set to 4B then sequence of variables will have addresses
0,-4,-8,-12, etc.

Stephen Sprunk

unread,

Sep 23, 2009, 12:24:38 PM9/23/09

to

Processor-Dev1l wrote:
> Well, I think the position of variables in memory is not caused by UB
> but by CPU itself (if it uses memory in big or little endian).
> x86 uses way when less significant bit is higher in memory so it is
> reversed.
> If int is set to 4B then sequence of variables will have addresses
> 0,-4,-8,-12, etc.

As far as Standard C is concerned, it's UB to even _try to find_ this
information.

Other standards, such as your platform's ABI, might define the behavior;
for instance, x86 systems put "auto" variables in a continuous,
downward-growing stack, but not all systems do and any code that relies
on this (or any other) behavior is inherently non-portable. It's left
undefined in Standard C for a reason.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Isaac Jaffe

Stephen Sprunk

unread,

Sep 23, 2009, 12:32:11 PM9/23/09

to

Nobody wrote:
> Consider the following code (paraphrasing a bug which was recently
> discovered in the linux kernel):
>
> 1 int x = p->x;
> 2 if (!p) return;
> ...
>
> Some versions of gcc are sufficiently aggressive that they optimise line
> 2 out of existence.
>
> The rationale is that because p->x had already been evaluated at line 1,
> p being null leads to undefined behaviour. If p is not null, the return on
> line 2 won't occur, but if p is null, the compiler is free to return, or
> not return, or do whatever else it feels like doing.
>
> At first, I was a bit confused as to what kind of optimisation strategy
> would do this. My guess is that it uses a form of /reductio ad absurdum/,
> (i.e. "if x implies UB, x is false") when making deductions about the
> possible values an expression can have.

GCC has an feature that tracks whether it's possible for a pointer to be
null; if you dereference a pointer, GCC then sets the "notnull"
attribute on it and any future checks for a null pointer are optimized
away. If the code branches after a check for null, the branch taken in
the not-null condition will also have the attribute set until the two
branches merge again. The programmer can also set the attribute
manually if desired, though I can't think of any scenario where that'd
be safe and useful.

I assume that this optimization is to remove redundant tests/branches
and therefore improve performance; presumably it wouldn't be there if it
didn't help in at least some cases.

Keith Thompson

unread,

Sep 23, 2009, 12:35:00 PM9/23/09

to

Yes. n1124 incorporates TC1 and TC2. n1256 incorporates TC1, TC2,
and TC3 (plus the creating spelling of "Septermber").

(And thank you for spelling "superseded" correctly!)

Keith Thompson

unread,

Sep 23, 2009, 12:47:16 PM9/23/09

to

Processor-Dev1l <process...@gmail.com> writes:
[...]

> Well, I think the position of variables in memory is not caused by UB
> but by CPU itself (if it uses memory in big or little endian).
> x86 uses way when less significant bit is higher in memory so it is
> reversed.
> If int is set to 4B then sequence of variables will have addresses
> 0,-4,-8,-12, etc.

Undefined behavior doesn't "cause" anything. It just means that
the behavior is undefined. It gives the implementation permission
to do quite literally anything. Whatever actual behavior happens
to occur is the result of -- well, of whatever caused it. But it's
outside the scope of the C language and standard.

If I fail to tell you what to do, and you go off and do X, I didn't
cause you to do X.

Incidentally, when you post a followup, please snip any quoted
text that isn't relevant to your followup. In particular, don't
quote signatures. Keep just enough quoted text so your followup
makes sense on its own to someone who didn't necessarily see the
parent article. See this followup for an example.

Morris Keesan

unread,

Sep 23, 2009, 1:03:28 PM9/23/09

to

On Wed, 23 Sep 2009 12:24:38 -0400, Stephen Sprunk <ste...@sprunk.org>
wrote:

> Processor-Dev1l wrote:
>> Well, I think the position of variables in memory is not caused by UB
>> but by CPU itself (if it uses memory in big or little endian).
>> x86 uses way when less significant bit is higher in memory so it is
>> reversed.
>> If int is set to 4B then sequence of variables will have addresses
>> 0,-4,-8,-12, etc.
>
> As far as Standard C is concerned, it's UB to even _try to find_ this
> information.

Surely not.

printf("%p %p %p\n", (void *)&a, (void *)&b, (void *)&c);

doesn't invoke any undefined behaviour as far as I can tell.
The standard doesn't specify what values will be printed, and
the way those values will be represented as printing characters is
implementation-defined, but there's no UB there.

Similarly, this code
#include <stdint.h>

...

intptr_t aptr, bptr; /* or uintptr_t */
aptr = (intptr_t)(void *)&a;
bptr = (intptr_t)(void *)&b;

printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
"higher"));

allows one to try to find the information. There's no guarantee that
intptr_t
or uintptr_t is available, but the worst that can happen there is failure
to
compile, not UB. And even if the type exists, that doesn't mean that the
values
of aptr and bptr will correspond in any expected way to numerical memory
addresses,
but again, no UB.
--
Morris Keesan -- mke...@post.harvard.edu

Seebs

unread,

Sep 23, 2009, 1:32:13 PM9/23/09

to

On 2009-09-23, Processor-Dev1l <process...@gmail.com> wrote:
> Well, I think the position of variables in memory is not caused by UB
> but by CPU itself (if it uses memory in big or little endian).

This is totally wrong.

No one was arguing that the position of variables in memory was caused by
undefined behavior; rather, only that your attempt to figure out what the
differences between those positions was invoked undefined behavior.

> x86 uses way when less significant bit is higher in memory so it is
> reversed.

Doesn't affect location of variables in memory at all.

> If int is set to 4B then sequence of variables will have addresses
> 0,-4,-8,-12, etc.

Except they won't always. They might be stored out of order. They might
be stored in totally different regions of memory, such that comparisons
between them yield nonsense.

To understand C, you have to learn that you *don't need to know*. And that
the answer can vary wildly. There is nothing prohibiting a system where
the relative addresses of the variables might be different between one call
and another to the same function. (And indeed, I can describe a plausible
real-world example...*)

-s
[*] Left as an exercise for the reader, for now.
--

Kenneth Brody

unread,

Sep 23, 2009, 1:36:14 PM9/23/09

to

Keith Thompson wrote:
> Kenneth Brody <kenb...@spamcop.net> writes:

>> Keith Thompson wrote:
[...]
>>> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
[...]

>> Thanks for the link. I currently have n1124, which I assume is
>> superseded by n1256?
>
> Yes. n1124 incorporates TC1 and TC2. n1256 incorporates TC1, TC2,
> and TC3 (plus the creating spelling of "Septermber").

Hmm... My n1124 says "May 6, 2005", and neither "Septermber" nor
"September" appears anywhere in it. (Or was it TC3 that had that typo?)

> (And thank you for spelling "superseded" correctly!)

You're quite welcome. I strive to take pride in my spelling and grammar.
(Well, most of the time, anyway.)

--
Kenneth Brody

Keith Thompson

unread,

Sep 23, 2009, 2:27:29 PM9/23/09

to

Kenneth Brody <kenb...@spamcop.net> writes:
> Keith Thompson wrote:
>> Kenneth Brody <kenb...@spamcop.net> writes:
>>> Keith Thompson wrote:
> [...]
>>>> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
> [...]
>>> Thanks for the link. I currently have n1124, which I assume is
>>> superseded by n1256?
>>
>> Yes. n1124 incorporates TC1 and TC2. n1256 incorporates TC1, TC2,
>> and TC3 (plus the creating spelling of "Septermber").
>
> Hmm... My n1124 says "May 6, 2005", and neither "Septermber" nor
> "September" appears anywhere in it. (Or was it TC3 that had that
> typo?)

It's n1256 that has

ISO/IEC 9899:TC3 Committee Draft Septermber 7, 2007 WG14/N1256

at the top of almost every page (except that it doesn't appear on page
1, and the order of the fields alternates on even and odd pages).
Neither n1124 nor TC3 has that error. (TC3 is a 10-page document
listing just the changes.)

[...]

Stephen Sprunk

unread,

Sep 23, 2009, 3:50:02 PM9/23/09

to

Morris Keesan wrote:
> On Wed, 23 Sep 2009 12:24:38 -0400, Stephen Sprunk <ste...@sprunk.org>
> wrote:
>> Processor-Dev1l wrote:
>>> Well, I think the position of variables in memory is not caused by UB
>>> but by CPU itself (if it uses memory in big or little endian).
>>> x86 uses way when less significant bit is higher in memory so it is
>>> reversed.
>>> If int is set to 4B then sequence of variables will have addresses
>>> 0,-4,-8,-12, etc.
>>
>> As far as Standard C is concerned, it's UB to even _try to find_ this
>> information.
>
> Surely not.
>
> printf("%p %p %p\n", (void *)&a, (void *)&b, (void *)&c);
>
> doesn't invoke any undefined behaviour as far as I can tell.
> The standard doesn't specify what values will be printed, and
> the way those values will be represented as printing characters is
> implementation-defined, but there's no UB there.

That's arguable. Technically it's not UB, but in effect you're causing
the same UB as exhibited below, just performed by a human instead of the
computer.

> Similarly, this code
> #include <stdint.h>
>
> ...
>
> intptr_t aptr, bptr; /* or uintptr_t */
> aptr = (intptr_t)(void *)&a;
> bptr = (intptr_t)(void *)&b;
>
> printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
> "higher"));
>
> allows one to try to find the information. There's no guarantee that
> intptr_t or uintptr_t is available, but the worst that can happen there
> is failure to compile, not UB.

Using a relative comparison operator on pointers that do not point into
the same object is UB. Only testing for (in)equality is defined in that
case.

> And even if the type exists, that doesn't mean that the values of aptr
> and bptr will correspond in any expected way to numerical memory
> addresses, but again, no UB.

Many, many implementations (AFAIK all the ones with a flat address
space) define this, but not the C Standard itself.

Consider a segmented architecture, such as the AS/400 or x86 real mode,
where each object may be in a different segment; relative comparisons
between segments is meaningless, which is _why_ those operations had to
be left undefined.

Keith Thompson

unread,

Sep 23, 2009, 4:39:44 PM9/23/09

to

Stephen Sprunk <ste...@sprunk.org> writes:
> Morris Keesan wrote:
[...]

>> Similarly, this code
>> #include <stdint.h>
>>
>> ...
>>
>> intptr_t aptr, bptr; /* or uintptr_t */
>> aptr = (intptr_t)(void *)&a;
>> bptr = (intptr_t)(void *)&b;
>>
>> printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
>> "higher"));
>>
>> allows one to try to find the information. There's no guarantee that
>> intptr_t or uintptr_t is available, but the worst that can happen there
>> is failure to compile, not UB.
>
> Using a relative comparison operator on pointers that do not point into
> the same object is UB. Only testing for (in)equality is defined in that
> case.
>
>> And even if the type exists, that doesn't mean that the values of aptr
>> and bptr will correspond in any expected way to numerical memory
>> addresses, but again, no UB.
>
> Many, many implementations (AFAIK all the ones with a flat address
> space) define this, but not the C Standard itself.
>
> Consider a segmented architecture, such as the AS/400 or x86 real mode,
> where each object may be in a different segment; relative comparisons
> between segments is meaningless, which is _why_ those operations had to
> be left undefined.

Even for flat-address-space implementations, the standard (optionally)
provides both intptr_t, a signed type, and uintptr_t, an unsigned
type, with no indication of which is more suitable. Addresses
corresponding to the intptr_t values -1 and 0 might be adjacent, or
they might be at opposite ends of the address space; likewise
for UINTPTR_MAX and 0.

Phil Carmody

unread,

Sep 23, 2009, 11:52:50 PM9/23/09

to

Stephen Sprunk <ste...@sprunk.org> writes:
> Morris Keesan wrote:
>> On Wed, 23 Sep 2009 12:24:38 -0400, Stephen Sprunk <ste...@sprunk.org>
>> wrote:
>>> Processor-Dev1l wrote:
>>>> Well, I think the position of variables in memory is not caused by UB
>>>> but by CPU itself (if it uses memory in big or little endian).
>>>> x86 uses way when less significant bit is higher in memory so it is
>>>> reversed.
>>>> If int is set to 4B then sequence of variables will have addresses
>>>> 0,-4,-8,-12, etc.
>>>
>>> As far as Standard C is concerned, it's UB to even _try to find_ this
>>> information.
>>
>> Surely not.
>>
>> printf("%p %p %p\n", (void *)&a, (void *)&b, (void *)&c);
>>
>> doesn't invoke any undefined behaviour as far as I can tell.
>> The standard doesn't specify what values will be printed, and
>> the way those values will be represented as printing characters is
>> implementation-defined, but there's no UB there.
>
> That's arguable. Technically it's not UB, but in effect you're causing
> the same UB as exhibited below, just performed by a human instead of the
> computer.

How can something which is not UB cause UB? Care to point to somewhere
in the standard which permits that?

>> Similarly, this code
>> #include <stdint.h>
>>
>> ...
>>
>> intptr_t aptr, bptr; /* or uintptr_t */
>> aptr = (intptr_t)(void *)&a;
>> bptr = (intptr_t)(void *)&b;
>>
>> printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
>> "higher"));
>>
>> allows one to try to find the information. There's no guarantee that
>> intptr_t or uintptr_t is available, but the worst that can happen there
>> is failure to compile, not UB.
>
> Using a relative comparison operator on pointers that do not point into
> the same object is UB. Only testing for (in)equality is defined in that
> case.

Straw man - what pointers? I see a comparison of integer types, viz
integer types capable of holding object pointers.

Phil Carmody

unread,

Sep 24, 2009, 12:07:19 AM9/24/09

to

Seebs <usenet...@seebs.net> writes:
> On 2009-09-23, Processor-Dev1l <process...@gmail.com> wrote:

[SNIP jibbering]

>
> To understand C, you have to learn that you *don't need to know*. And that
> the answer can vary wildly. There is nothing prohibiting a system where
> the relative addresses of the variables might be different between one call
> and another to the same function. (And indeed, I can describe a plausible
> real-world example...*)
>
> -s
> [*] Left as an exercise for the reader, for now.

I was about to say "no way!", but think that with the joys of inlining
and as-if, it becomes quite easy.

//...
inline void copy(struct thing *p,
struct thing *q,
bool direction)
{
void *pv=p, *qv=q;
if(direction) { memcpy(pv,qv,sizeof(thing)); }
else { memcpy(qv,pv,sizeof(thing)); }
}

I imagine that copy(x,y,0) and copy(y,x,1) could cause the values
representing pv and qv to be in different relative locations in
memory on register-sparse systems.

One doesn't even need inlining for that, simply an cooperative-
enough optimiser.

Nick Keighley

unread,

Sep 24, 2009, 4:10:03 AM9/24/09

to

it is usual not to quote sigs (the bit after "-- ")

> Well, I think the position of variables in memory is not caused by UB

you are correct the "position" of variables in memory is not caused by
UB.
But then he didn't say that. He said it is undefined behaviour to
subtract (or compare) two pointers that do not point to the same
object.

so
int i, j;
long diff = i - j;

is Undefined Behaviour (even if long is big enough to hold the result
of a pointer subtraction. In a sense i and j are not even in the same
address space.

> but by CPU itself (if it uses memory in big or little endian).

I don't think you understand what endianess is. It has nothing
to do with the way addresses are allocated to variables.

> x86 uses way when less significant bit is higher in memory so it is
> reversed.
> If int is set to 4B then sequence of variables will have addresses
> 0,-4,-8,-12, etc.

nonsense, I'm afraid

Nick Keighley

unread,

Sep 24, 2009, 4:17:37 AM9/24/09

to

On 24 Sep, 04:52, Phil Carmody <thefatphil_demun...@yahoo.co.uk>
wrote:

> Stephen Sprunk <step...@sprunk.org> writes:
> > Morris Keesan wrote:

> >> On Wed, 23 Sep 2009 12:24:38 -0400, Stephen Sprunk <step...@sprunk.org>
> >> wrote:
> >>> Processor-Dev1l wrote:

> >>>> Well, I think the position of variables in memory is not caused by UB
> >>>> but by CPU itself (if it uses memory in big or little endian).
> >>>> x86 uses way when less significant bit is higher in memory so it is
> >>>> reversed.
> >>>> If int is set to 4B then sequence of variables will have addresses
> >>>> 0,-4,-8,-12, etc.
>
> >>> As far as Standard C is concerned, it's UB to even _try to find_ this
> >>> information.
>
> >> Surely not.
>
> >> printf("%p %p %p\n", (void *)&a, (void *)&b, (void *)&c);
>
> >> doesn't invoke any undefined behaviour as far as I can tell.

the output might be

":red-segment: :blue-segment: :beige-segment:"

> >> The standard doesn't specify what values will be printed, and
> >> the way those values will be represented as printing characters is
> >> implementation-defined, but there's no UB there.
>
> > That's arguable. Technically it's not UB, but in effect you're causing
> > the same UB as exhibited below, just performed by a human instead of the
> > computer.
>
> How can something which is not UB cause UB? Care to point to somewhere
> in the standard which permits that?

you're comparing pointers to different objects which is UB.
I like the idea that my mind can exhibit undefined behaviour...
Have I reformatted my hard drive just by thinking about this stuff?
:-)

> >> Similarly, this code
> >> #include <stdint.h>
>
> >> ...
>
> >> intptr_t aptr, bptr; /* or uintptr_t */
> >> aptr = (intptr_t)(void *)&a;
> >> bptr = (intptr_t)(void *)&b;

hmm. well that's unspecified behaviour. Though we know aptr
and bptr will end up with valid integers.

> >> printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
> >> "higher"));
>
> >> allows one to try to find the information. There's no guarantee that
> >> intptr_t or uintptr_t is available, but the worst that can happen there
> >> is failure to compile, not UB.
>
> > Using a relative comparison operator on pointers that do not point into
> > the same object is UB. Only testing for (in)equality is defined in that
> > case.
>
> Straw man - what pointers? I see a comparison of integer types, viz
> integer types capable of holding object pointers.

interesting. The complier would have to remember that they had been
pointers

Richard Tobin

unread,

Sep 24, 2009, 5:21:10 AM9/24/09

to

In article <j4sum.73097$nQ6....@newsfe07.iad>,
Stephen Sprunk <ste...@sprunk.org> wrote:

>GCC has an feature that tracks whether it's possible for a pointer to be
>null; if you dereference a pointer, GCC then sets the "notnull"
>attribute on it and any future checks for a null pointer are optimized
>away.

>[...]

>I assume that this optimization is to remove redundant tests/branches
>and therefore improve performance; presumably it wouldn't be there if it
>didn't help in at least some cases.

As I've said before, I wish it would tell you when it's doing
this, as it traditionally has with simpler optimisations such as
always-true comparisons. Being able to remove a chunk of code
can be a sign of a mistake by the programmer, and just removing
it often makes the results of the error even more obscure.

-- Richard
--
Please remember to mention me / in tapes you leave behind.

Dik T. Winter

unread,

Sep 24, 2009, 10:50:06 AM9/24/09

to

In article <QZuum.213126$0e4.1...@newsfe19.iad> Stephen Sprunk <ste...@sprunk.org> writes:
> Morris Keesan wrote:

...

> > Similarly, this code
> > #include <stdint.h>
> >
> > ...
> >
> > intptr_t aptr, bptr; /* or uintptr_t */
> > aptr = (intptr_t)(void *)&a;
> > bptr = (intptr_t)(void *)&b;
> >
> > printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
> > "higher"));

...

> Using a relative comparison operator on pointers that do not point into
> the same object is UB.

Read again. There is no comparison of pointers.
--
dik t. winter, cwi, science park 123, 1098 xg amsterdam, nederland, +31205924131
home: bovenover 215, 1025 jn amsterdam, nederland; http://www.cwi.nl/~dik/

Kenneth Brody

unread,

Sep 24, 2009, 11:04:32 AM9/24/09

to

Dik T. Winter wrote:
> In article <QZuum.213126$0e4.1...@newsfe19.iad> Stephen Sprunk <ste...@sprunk.org> writes:
> > Morris Keesan wrote:

> ....

> > > Similarly, this code
> > > #include <stdint.h>
> > >
> > > ...
> > >
> > > intptr_t aptr, bptr; /* or uintptr_t */
> > > aptr = (intptr_t)(void *)&a;
> > > bptr = (intptr_t)(void *)&b;
> > >
> > > printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
> > > "higher"));

> ....

> > Using a relative comparison operator on pointers that do not point into
> > the same object is UB.
>
> Read again. There is no comparison of pointers.

I would have to agree that there is no UB. However, I would also have to
say that the result of comparing aptr to pbtr is "meaningless".

First, you are using the signed "intptr_t", meaning that b could be in
"higher" memory, yet bptr be "lower" because aptr is positive and bptr is
negative.

Ignoring that, and assuming you changed to uintptr_t, it's still
meaningless, because there is not necessary (no matter how likely on most
platforms most people come across) for such comparisons to come out that
way. The only guarantee is that "void* --> [u]intptr_t --> void*" will
compare equal to the original "void*".

Consider, for example, a segmented architecture. In such an architecture,
comparing segment values for anything other than equality is meaningless.
Is something in segment 1234 really "higher" than one in segment 1233?
(Especially if one considers real-mode X86 architecture, with overlapping
segments.) Or, perhaps for efficiency, the segment ends up in the low-order
"word" of [u]intptr_t, while the offset is in the high-order "word".

So, while I agree that I don't see any UB, it's still meaningless.

--
Kenneth Brody

Keith Thompson

unread,

Sep 24, 2009, 12:00:27 PM9/24/09

to

Kenneth Brody <kenb...@spamcop.net> writes:
[...]

> I would have to agree that there is no UB. However, I would also have
> to say that the result of comparing aptr to pbtr is "meaningless".
>
> First, you are using the signed "intptr_t", meaning that b could be in
> "higher" memory, yet bptr be "lower" because aptr is positive and bptr
> is negative.
>
> Ignoring that, and assuming you changed to uintptr_t,

[...]

Why do you assume that intptr_t provides a less meaningful mapping of
addresses than uinptr_t?

Imagine a system where addresses are treated as signed. You could
even have an object covering a range of addresses from, say, -10
to +10. (A null pointer would have to have a representation other
than all-bits-zero.)

I think most systems treat addresses as unsigned (assuming that
they're numerical at all), but I wouldn't be surprised if some treat
them as signed. On the other hand, some may just avoid having
anything cross the mid-range boundary, so addresses can be considered
either signed or unsigned.

Dik T. Winter

unread,

Sep 25, 2009, 8:40:01 AM9/25/09

to

In article <ln4oqs1...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
...

> Imagine a system where addresses are treated as signed. You could
> even have an object covering a range of addresses from, say, -10
> to +10. (A null pointer would have to have a representation other
> than all-bits-zero.)

Isn't the ARM a machine where some addresses were thought to be negative?

Nobody

unread,

Sep 25, 2009, 3:57:08 PM9/25/09

to

On Fri, 25 Sep 2009 12:40:01 +0000, Dik T. Winter wrote:

> In article <ln4oqs1...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
> ...
> > Imagine a system where addresses are treated as signed. You could
> > even have an object covering a range of addresses from, say, -10
> > to +10. (A null pointer would have to have a representation other
> > than all-bits-zero.)
>
> Isn't the ARM a machine where some addresses were thought to be negative?

At the CPU level, data is neither signed nor unsigned. It's typically the
operations which treat their operands as signed or unsigned.

With two's complement, addition, subtraction and multiplication (but not
division) behave identically for signed or unsigned values. The main
difference is in comparisons.

A signed comparison subtracts two values then checks whether the overlow
flag is set, while an unsigned comparison would check the carry flag
instead.

Apart from division, the only common instruction which has signed and
unsigned variants is a right shift. An arithmetic (signed) right shift
duplicates the topmost bit (i.e. the sign bit) while a logical (unsigned)
shift fills with zeros.

Keith Thompson

unread,

Sep 25, 2009, 5:01:54 PM9/25/09

to

Ok, but the issue is addresses.

Suppose a machine has, say, an auto-increment addressing mode (an
idea that goes back at least to the PDP-11), which is useful for
stepping through arrays. Thus something like:

*ptr++ = 0;

might be a single instruction. Assuming for concreteness and
simplicitly that addresses are 16 bits, what happens on the machine
level when ptr==0x7FFF?? What happens when ptr==0xFFFF? Can a
single object cover a range of addresses that includes 0x7FFF and
0x8000? What about 0xFFFF and 0x0000 (or, equivalently, -1 and 0)?
What instructions are used to compare addresses?

I don't know the answers to any of those questions for any specific
architecture, but certain sets of answers would imply that addresses
are signed, and certain other sets of answers would imply that
they're unsigned.

And yet other sets of answers might imply that the answer is
indeterminate; either signed or unsigned comparison could work
equally well if no object can span certain address boundaries.

This is approaching the edge of clc topicality, if it hasn't
already crossed it.

Eric Sosman

unread,

Sep 25, 2009, 5:12:31 PM9/25/09

to

Dik T. Winter wrote:
> In article <ln4oqs1...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
> ...
> > Imagine a system where addresses are treated as signed. You could
> > even have an object covering a range of addresses from, say, -10
> > to +10. (A null pointer would have to have a representation other
> > than all-bits-zero.)
>
> Isn't the ARM a machine where some addresses were thought to be negative?

Long ago I used a machine that treated all its CPU registers
as signed magnitude numbers, and did arithmetic accordingly.
Addresses were notionally unsigned; the machine just grabbed the
right number of low-order bits from the appropriate register and
ignored the rest, including the sign bit.

The fun part was that "all CPU registers" included the program
counter, and that "increment" meant "add one." I wasted a fair
amount of time trying to concoct a sequence of instructions that
would execute normally until encountering one that set the PC's sign
bit, then run again in reverse as the PC "incremented" to successively
lower addresses ...

</topicality>

--
Eric....@sun.com

Ben Pfaff

unread,

Sep 25, 2009, 5:28:02 PM9/25/09

to

Keith Thompson <ks...@mib.org> writes:

> Imagine a system where addresses are treated as signed. You could
> even have an object covering a range of addresses from, say, -10
> to +10. (A null pointer would have to have a representation other
> than all-bits-zero.)

x86-64 treats addresses as signed numbers. Usually, user
processes occupy positive addresses and the kernel occupies
negative addresses. I don't think that objects are allowed to
cross 0.
--
Ben Pfaff
http://benpfaff.org

Phil Carmody

unread,

Sep 26, 2009, 3:47:58 AM9/26/09

to

Nobody <nob...@nowhere.com> writes:
> On Fri, 25 Sep 2009 12:40:01 +0000, Dik T. Winter wrote:
>
>> In article <ln4oqs1...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
>> ...
>> > Imagine a system where addresses are treated as signed. You could
>> > even have an object covering a range of addresses from, say, -10
>> > to +10. (A null pointer would have to have a representation other
>> > than all-bits-zero.)
>>
>> Isn't the ARM a machine where some addresses were thought to be negative?
>
> At the CPU level, data is neither signed nor unsigned. It's typically the
> operations which treat their operands as signed or unsigned.
>
> With two's complement, addition, subtraction and multiplication (but not
> division) behave identically for signed or unsigned values.

Full- (or double-, depending on your PoV) width multiplies are different
too. ff*ff = 0001 or fe01.

> The main difference is in comparisons.
>
> A signed comparison subtracts two values then checks whether the overlow
> flag is set, while an unsigned comparison would check the carry flag
> instead.
>
> Apart from division, the only common instruction which has signed and
> unsigned variants is a right shift.

And multiply.

Phil Carmody

unread,

Sep 26, 2009, 3:59:22 AM9/26/09

to

If all low-topicality stuff was as fun as that, I'd be campaigning
for less topicality!

If C were gcc, you'd actually be bang on topic, with the perfect
counter-example for the not-Frequently-Asked-purely-as-C-isn't-gcc
Question which would no doubt appear! (Explanation in headers.)

Morris Keesan

unread,

Sep 26, 2009, 9:20:03 PM9/26/09

to

Indeed. But there's no "undefined behavior" there (as defined by
the C standard: "behavior, upon use of a nonportable or erroneous
program construct or of erroneous data, for which this International
Standard imposes no requirements". The standard clearly requires
the values of the three pointers to be converted "to a sequence of
printing characters, in an implementation-defined manner."

>
>> >> The standard doesn't specify what values will be printed, and
>> >> the way those values will be represented as printing characters is
>> >> implementation-defined, but there's no UB there.
>>
>> > That's arguable. Technically it's not UB, but in effect you're
>> causing
>> > the same UB as exhibited below, just performed by a human instead of
>> the
>> > computer.

No, you're causing implementation-defined behavior, which is a totally
different thing.

>>
>> How can something which is not UB cause UB? Care to point to somewhere
>> in the standard which permits that?
>
> you're comparing pointers to different objects which is UB.
> I like the idea that my mind can exhibit undefined behaviour...
> Have I reformatted my hard drive just by thinking about this stuff?
> :-)

In my quoted code, there's no pointer comparison. The code below
compares two integers.

>
>
>> >> Similarly, this code
>> >> #include <stdint.h>
>>
>> >> ...
>>
>> >> intptr_t aptr, bptr; /* or uintptr_t */
>> >> aptr = (intptr_t)(void *)&a;
>> >> bptr = (intptr_t)(void *)&b;
>
> hmm. well that's unspecified behaviour. Though we know aptr
> and bptr will end up with valid integers.
>
>> >> printf("a has a %s address than b\n", (aptr < bptr) ? "lower" :
>> >> "higher"));
>>
>> >> allows one to try to find the information. There's no guarantee that
>> >> intptr_t or uintptr_t is available, but the worst that can happen
>> there
>> >> is failure to compile, not UB.
>>
>> > Using a relative comparison operator on pointers that do not point
>> into
>> > the same object is UB. Only testing for (in)equality is defined in
>> that
>> > case.
>>
>> Straw man - what pointers? I see a comparison of integer types, viz
>> integer types capable of holding object pointers.
>
> interesting. The complier would have to remember that they had been
> pointers

No. The compiler doesn't have to remember anything. The compiler
just has to generate code which converts the pointers to integers
(in the platforms I've used, where there is a suitable integer type,
this has always been a direct bit-for-bit copy), and then generate
code which compares those two integers.

My point, in this not-very-useful code sample, was not to suggest any
portable or particularly meaningful code. I was simply arguing with the
claim that it's undefined behavior "to even _try to find_ this
information."
(i.e. the relative addresses of unrelated variables).

Nobody

unread,

Sep 26, 2009, 11:59:10 PM9/26/09

to

On Fri, 25 Sep 2009 14:01:54 -0700, Keith Thompson wrote:

> Ok, but the issue is addresses.
>
> Suppose a machine has, say, an auto-increment addressing mode (an
> idea that goes back at least to the PDP-11), which is useful for
> stepping through arrays. Thus something like:
>
> *ptr++ = 0;
>
> might be a single instruction. Assuming for concreteness and
> simplicitly that addresses are 16 bits, what happens on the machine
> level when ptr==0x7FFF?? What happens when ptr==0xFFFF?

"comparison between pointer and integer" == UB ;)

Seriously, the first case will result in ptr==0x8000 (or, if you prefer,
ptr==-0x8000; they're the same thing as far as the CPU is concerned),
while the second case will result in ptr==0x0000.

Simply using the representation "0xFFFF" for a 16-bit value is treating
the values as unsigned. A 16-bit signed integer cannot be 0xFFFF; that
bit pattern would be called -0x0001.

> Can a
> single object cover a range of addresses that includes 0x7FFF and
> 0x8000? What about 0xFFFF and 0x0000 (or, equivalently, -1 and 0)?
> What instructions are used to compare addresses?

Whichever ones the compiler writer decides to use. C only defines pointer
comparison for elements of a common array. At the CPU level, you have the
same options for comparing pointers as for comparing anything else.

I can't think of a situation where the CPU considers addresses as either
"signed" or "unsigned"; they are just "words".

At the C level, on any platform with two's-complement arithmetic, most
integer operations use the same machine instructions regardless of whether
the values are signed or unsigned. The signedness only becomes relevant
for division, right shift, and comparisons.

At the machine level, there is no signed or unsigned, just words.

Nobody

unread,

Sep 27, 2009, 12:07:42 AM9/27/09

to

On Sat, 26 Sep 2009 10:47:58 +0300, Phil Carmody wrote:

>> With two's complement, addition, subtraction and multiplication (but not
>> division) behave identically for signed or unsigned values.
>
> Full- (or double-, depending on your PoV) width multiplies are different
> too. ff*ff = 0001 or fe01.

My PoV is "double-".

In C, int * int -> int, long * long -> long, and so on.

Once the types have been promoted, it makes no difference as to their
signedness. OTOH, the promotion is affected by the signedness.

>> Apart from division, the only common instruction which has signed and
>> unsigned variants is a right shift.
>
> And multiply.

True for x86's double-width multiply, but how many architectures have that
feature?

Nobody

unread,

Sep 27, 2009, 12:14:29 AM9/27/09

to

On Fri, 25 Sep 2009 20:57:08 +0100, Nobody wrote:

>> Isn't the ARM a machine where some addresses were thought to be negative?
>
> At the CPU level, data is neither signed nor unsigned. It's typically the
> operations which treat their operands as signed or unsigned.

Since posting this, it has occurred to me that there's one case where a
CPU might treat values as signed: if the PC (IP) doesn't use a full word,
it's possible that operations which copy the PC as data might use the
topmost valid bit to fill the unused bit (i.e. sign extension).

I don't think that the ARM does this, though.

Keith Thompson

unread,

Sep 27, 2009, 1:43:57 AM9/27/09

to

Nobody <nob...@nowhere.com> writes:
[...]

> I can't think of a situation where the CPU considers addresses as either
> "signed" or "unsigned"; they are just "words".

[...]

Assuming, as before, 16-bit addresses, if a single 32-byte object
can cover the range of addresses from 0x7FF0 to 0x800F, then
addresses are being treated as unsigned. Similarly, if a single
32-byte object can cover the range of addresses from -16 to +15
,then addresses are being treated as signed (and a null pointer
is not all-bits-zero). If both are possible then it's a rather
odd architecture. If neither situation can occur, then it probably
doesn't matter whether addresses are considered signed or unsigned.

Phil Carmody

unread,

Sep 27, 2009, 7:30:32 AM9/27/09

to

Nobody <nob...@nowhere.com> writes:
> On Sat, 26 Sep 2009 10:47:58 +0300, Phil Carmody wrote:

[UNSNIP - "At the CPU level ... "]

>>> With two's complement, addition, subtraction and multiplication (but not
>>> division) behave identically for signed or unsigned values.
>>
>> Full- (or double-, depending on your PoV) width multiplies are different
>> too. ff*ff = 0001 or fe01.
>
> My PoV is "double-".
>
> In C

Nah, doesn't wash. We were at the CPU level, if you remember.

>, int * int -> int, long * long -> long, and so on.
>
> Once the types have been promoted, it makes no difference as to their
> signedness. OTOH, the promotion is affected by the signedness.
>
>>> Apart from division, the only common instruction which has signed and
>>> unsigned variants is a right shift.
>>
>> And multiply.
>
> True for x86's double-width multiply, but how many architectures have that
> feature?

Well, the first processor I used that had a multiply instruction had both
signed and unsigned. The architecture I've used the most since then also
had this pair. The two other architectures I've used extensively in that
time also have them. Of the two other architectures I've used but not
extensively programmed for, one didn't have a muliply at all, the other
had both types. Only one architecture I've used that has a multiply
instruction at all fails to have the pair.

So that's 5/6 in my experience (plus 3 architectures without a multiply
at all).

Nobody

unread,

Sep 27, 2009, 1:57:48 PM9/27/09

to

On Sat, 26 Sep 2009 22:43:57 -0700, Keith Thompson wrote:

> Nobody <nob...@nowhere.com> writes:
> [...]
>> I can't think of a situation where the CPU considers addresses as either
>> "signed" or "unsigned"; they are just "words".
> [...]
>
> Assuming, as before, 16-bit addresses, if a single 32-byte object
> can cover the range of addresses from 0x7FF0 to 0x800F, then
> addresses are being treated as unsigned. Similarly, if a single
> 32-byte object can cover the range of addresses from -16 to +15
> ,then addresses are being treated as signed

No, it just means that they wrap.

> (and a null pointer is not all-bits-zero).

"null pointer" is a C concept; it doesn't mean anything to the CPU.

> If both are possible then it's a rather odd architecture.

Actually, I think that most 16-bit CPUs will happily read a 16-bit value
from both 0x7FFF-0x8000 and 0xFFFF-0x0000. Most of them don't have any
alignment constraints and don't care about addresses wrapping.

The fact that all 2^16 addresses are valid doesn't preclude using 0x0000
(or any other value) as the null pointer. The implementation just needs to
ensure that it doesn't use that address for any allocation; as
dereferencing a null-pointer is UB, it doesn't have to explicitly check
for such.

Richard Bos

unread,

Sep 28, 2009, 6:27:10 AM9/28/09

to

Mark McIntyre <markmc...@TROUSERSspamcop.net> wrote:

> Mark wrote:
> > Keith Thompson wrote:
> >> The C99 standard itself costs money (something like $30 US for a PDF
> >> copy). But
> >> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
> >> is free; it includes the full C99 standard with all the changes
> >
> > Then what do people pay for if they can get the standard for free?
>
> Firstly its my understanding that n1256 is the final draft, not the
> edited final version.

Yes, but for ordinary programmers, the differences between the two are
so small that they might as well not exist. However, it may be relevant
for legal reasons. Someone may be willing to pay money just so their
lawyers can say that they have a copy of the _official_ Standard.

Richard

James Kuyper

unread,

Sep 28, 2009, 7:22:54 AM9/28/09

to

Richard Bos wrote:
> Mark McIntyre <markmc...@TROUSERSspamcop.net> wrote:
>
>> Mark wrote:
>>> Keith Thompson wrote:
>>>> The C99 standard itself costs money (something like $30 US for a PDF
>>>> copy). But
>>>> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
>>>> is free; it includes the full C99 standard with all the changes
>>> Then what do people pay for if they can get the standard for free?
>> Firstly its my understanding that n1256 is the final draft, not the
>> edited final version.
>
> Yes,

No. They started editing from the final officially approved C99
standard, applying all three officially approved Technical corrigenda.

> ... but for ordinary programmers, the differences between the two are

> so small that they might as well not exist. However, it may be relevant
> for legal reasons. Someone may be willing to pay money just so their
> lawyers can say that they have a copy of the _official_ Standard.

To get the official standard, you need not only the C99 standard itself,
but also all three officially approved Technical Corrigenda; n1256.pdf
is less official than that set of four documents, but is a lot more
convenient for actual use (and much cheaper, too).

Keith Thompson

unread,

Sep 28, 2009, 9:02:05 AM9/28/09

to

James Kuyper <james...@verizon.net> writes:
> Richard Bos wrote:
>> Mark McIntyre <markmc...@TROUSERSspamcop.net> wrote:
>>
>>> Mark wrote:
>>>> Keith Thompson wrote:
>>>>> The C99 standard itself costs money (something like $30 US for a PDF
>>>>> copy). But
>>>>> <http://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf>
>>>>> is free; it includes the full C99 standard with all the changes
>>>> Then what do people pay for if they can get the standard for free?
>>> Firstly its my understanding that n1256 is the final draft, not the
>>> edited final version.
>>
>> Yes,
>
> No. They started editing from the final officially approved C99
> standard, applying all three officially approved Technical corrigenda.

Whether Mark McIntyre's description is correct depends on how you
parse "edited final version". n1256 is a draft (at least that's what
it calls itself); it is not a final version.

>> ... but for ordinary programmers, the differences between the two are
>> so small that they might as well not exist. However, it may be relevant
>> for legal reasons. Someone may be willing to pay money just so their
>> lawyers can say that they have a copy of the _official_ Standard.
>
> To get the official standard, you need not only the C99 standard
> itself, but also all three officially approved Technical Corrigenda;
> n1256.pdf is less official than that set of four documents, but is a
> lot more convenient for actual use (and much cheaper, too).

Note also that the three TCs are available at no charge from ansi.org.

Stephen Sprunk

unread,

Sep 28, 2009, 11:16:54 AM9/28/09

to

They're not, but not for exactly that reason. No object can occupy the
first page of memory (0 to +4095), to trap null pointer dereferences;
additionally, no object can exist in both user space and kernel space,
which also covers the wrap from positive to negative.

Most x86 systems could be viewed as having signed pointers as well, with
the same division and rules. Notable exceptions are a special mode in
Win32 that allows the user/kernel division to be 3GB/1GB and one Red Hat
Linux variant that makes it 4GB+4GB (minus some trampolines and bounce
buffers). Those oddities have mostly gone away, though, now that people
can simply use x64 and get as much space as they need.

S

--
Stephen Sprunk "Stupid people surround themselves with smart
CCIE #3723 people. Smart people surround themselves with
K5SSS smart people who disagree with them." --Isaac Jaffe

Stephen Sprunk

unread,

Sep 28, 2009, 11:32:28 AM9/28/09

to

One counter-example comes to mind:

inline void foo(void *x) {
if (!x) return;
/* do something that dereferences x */
}

void bar(void *x) {
if (!x) return;
/* do something that dereferences x */
foo(x);
/* do something more that dereferences x */
}

This is not a bug; foo() needs to be protected against dumb callers, of
which there might be many. However, I would expect that foo()'s test be
optimized away when it was inlined in smart callers, e.g. bar(), because
it's redundant. I would be annoyed if I saw a warning for that.

Dik T. Winter

unread,

Sep 29, 2009, 7:48:17 AM9/29/09

to

In article <pan.2009.09.25...@nowhere.com> Nobody <nob...@nowhere.com> writes:
> On Fri, 25 Sep 2009 12:40:01 +0000, Dik T. Winter wrote:
> > In article <ln4oqs1...@nuthaus.mib.org> Keith Thompson <ks...@mib.org> writes:
...
> > > Imagine a system where addresses are treated as signed. You could
> > > even have an object covering a range of addresses from, say, -10
> > > to +10. (A null pointer would have to have a representation other
> > > than all-bits-zero.)
> >
> > Isn't the ARM a machine where some addresses were thought to be negative?
>
> At the CPU level, data is neither signed nor unsigned. It's typically the
> operations which treat their operands as signed or unsigned.

Right. But if I remember right, on the ARM memory started always at an
address 'below' 0 and continued to an address 'above' 0.

Chris Dollin

unread,

Sep 29, 2009, 8:19:07 AM9/29/09

to

Dik T. Winter wrote:

> Right. But if I remember right, on the ARM memory started always at an
> address 'below' 0 and continued to an address 'above' 0.

I think that's the Transputer, not the ARM.

--
"These are the last remaining days." - IQ, /Sacred Sound/

Hewlett-Packard Limited registered no:
registered office: Cain Road, Bracknell, Berks RG12 1HN 690597 England

Tim Rentsch

unread,

Sep 30, 2009, 3:10:29 PM9/30/09

to

Keith Thompson <ks...@mib.org> writes:

> Nobody <nob...@nowhere.com> writes:
> [...]
>> I can't think of a situation where the CPU considers addresses as either
>> "signed" or "unsigned"; they are just "words".
> [...]
>
> Assuming, as before, 16-bit addresses, if a single 32-byte object
> can cover the range of addresses from 0x7FF0 to 0x800F, then
> addresses are being treated as unsigned. Similarly, if a single
> 32-byte object can cover the range of addresses from -16 to +15
> ,then addresses are being treated as signed (and a null pointer
> is not all-bits-zero). If both are possible then it's a rather

> odd architecture. ["If neither" snipped]

ISTM that the "signed-ness" of pointers is determined by how
comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
that the conversion just changes the type and not any of the
bits), then we'd probably call those pointers "signed";
similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

/However/, it's easy to get both indications in a larger address
space. Suppose we have pointers, ints and unsigned ints all
having 32 bits[*], and pointer comparison is defined

p1 <= p2 IFF (unsigned) p2 - (unsigned) p1 < 0x80000000

This definition of comparison allows objects to be more than two
billion bytes, and they can straddle both 0x0 and 0x80000000
(well, not both at once, but either one by different objects).
Such an architecture would have neither "signed" nor "unsigned"
pointers; the address space is homogeneous and isotropic,
as my physics friends used to say.

[*] In this mythical architecture, CHAR_BIT is 11 bits,
pointers, ints and unsigned ints all have size 3 (with
ints and unsigned ints having one padding bit),
the type (uintptr_t) is (unsigned long), which has a width
of 33 bits, and a null pointer has the bit set which is
a padding bit in int/unsigned int.

Tim Rentsch

unread,

Sep 30, 2009, 3:34:04 PM9/30/09

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> instruction at all fails to have the pair. [snip summary]

The point is that to supply C semantics the architecture
needs to provide only one multiply instruction (if 2's
complement representation is used). Presumably the
architectures you mentioned either weren't using
2's complement, or had separate instructions to
set overflow/carry flags differently (or for some
other multiple-precision arithmetic capability).

Phil Carmody

unread,

Sep 30, 2009, 5:01:03 PM9/30/09

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Keith Thompson <ks...@mib.org> writes:
>> Nobody <nob...@nowhere.com> writes:
>> [...]
>>> I can't think of a situation where the CPU considers addresses as either
>>> "signed" or "unsigned"; they are just "words".
>> [...]
>>
>> Assuming, as before, 16-bit addresses, if a single 32-byte object
>> can cover the range of addresses from 0x7FF0 to 0x800F, then
>> addresses are being treated as unsigned. Similarly, if a single
>> 32-byte object can cover the range of addresses from -16 to +15
>> ,then addresses are being treated as signed (and a null pointer
>> is not all-bits-zero). If both are possible then it's a rather
>> odd architecture. ["If neither" snipped]
>
> ISTM that the "signed-ness" of pointers is determined by how
> comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
> that the conversion just changes the type and not any of the
> bits), then we'd probably call those pointers "signed";
> similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

I seem to remember that the standard does use the word "overflow"
regarding pointers without defining precisely what they mean.
That might complicate matters, as one might view either of the
looks-like-a-wrap-but-isn't as an overflow, and the C standard can't
counter you.

> /However/, it's easy to get both indications in a larger address
> space. Suppose we have pointers, ints and unsigned ints all
> having 32 bits[*], and pointer comparison is defined
>
> p1 <= p2 IFF (unsigned) p2 - (unsigned) p1 < 0x80000000
>
> This definition of comparison allows objects to be more than two
> billion bytes, and they can straddle both 0x0 and 0x80000000
> (well, not both at once, but either one by different objects).
> Such an architecture would have neither "signed" nor "unsigned"
> pointers; the address space is homogeneous and isotropic,
> as my physics friends used to say.

It's not so far from imaginable.

Phil Carmody

unread,

Sep 30, 2009, 5:05:22 PM9/30/09

to

Close. Your presumptions are true for 4 of the 5 archs.

Keith Thompson

unread,

Sep 30, 2009, 5:26:03 PM9/30/09

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Keith Thompson <ks...@mib.org> writes:
>> Nobody <nob...@nowhere.com> writes:
>> [...]
>>> I can't think of a situation where the CPU considers addresses as either
>>> "signed" or "unsigned"; they are just "words".
>> [...]
>>
>> Assuming, as before, 16-bit addresses, if a single 32-byte object
>> can cover the range of addresses from 0x7FF0 to 0x800F, then
>> addresses are being treated as unsigned. Similarly, if a single
>> 32-byte object can cover the range of addresses from -16 to +15
>> ,then addresses are being treated as signed (and a null pointer
>> is not all-bits-zero). If both are possible then it's a rather
>> odd architecture. ["If neither" snipped]
>
> ISTM that the "signed-ness" of pointers is determined by how
> comparisons work. If (char*)0x7FF0 > (char*)0x800F (and assuming
> that the conversion just changes the type and not any of the
> bits), then we'd probably call those pointers "signed";
> similarly (char*)0xFFF0 > (char*)0x0010 would mean "unsigned".

I tend to agree, but strictly speaking it might still not be
meaningful.

If no object can span certain address boundaries, then no relational
operator (< <= > >=) on pointers *with defined behavior* can be
affected by whether the comparison is done using a signed or an
unsigned integer comparison. A conforming C implementation could
randomly choose one or the other.

The following program:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main(void)
{
void *p0;
void *p1;
if (sizeof (void*) == sizeof (int)) {
p0 = (void*)INT_MAX;
p1 = (void*)INT_MIN;
}
else if (sizeof (void*) == sizeof (long)) {
p0 = (void*)LONG_MAX;
p1 = (void*)LONG_MIN;
}
else {
fputs("Neither int nor long is the same size as void*\n", stderr);
exit(EXIT_FAILURE);
}
printf("p0 = %p\n", p0);
printf("p1 = %p\n", p1);
if (p0 < p1) puts("p0 < p1");
if (p0 <= p1) puts("p0 <= p1");
if (p0 == p1) puts("p0 == p1");
if (p0 != p1) puts("p0 != p1");
if (p0 >= p1) puts("p0 >= p1");
if (p0 > p1) puts("p0 > p1");
return 0;
}

exhibits about 42 metric tons of undefined behavior, but its output,
if any, might be moderately interesting. (On my system it implies
that the compiler treats addresses as unsigned.)

Of course the whole idea of addresses being either signed or unsigned
is completely unsupported by the C standard.

Tim Rentsch

unread,

Sep 30, 2009, 6:36:16 PM9/30/09

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> ric...@cogsci.ed.ac.uk (Richard Tobin) writes:
>> In article <200910022...@gmail.com>,
>> Kaz Kylheku <kkyl...@gmail.com> wrote:
>>
>>>And remember, next time someone asks you, just tell them you read it
>>>on Usenet. Whatever you do, don't actually read the standard
>>>yourself. Second-hand information is best.
>>
>> So there we have it: anything that isn't specified in the C standard
>> is off-topic here, and you shouldn't ask about what's in the standard
>> eiher.
>
> Not sure how that follows from the quoted paragraph.
>
>> Your question was in fact a perfectly reasonable one, and the answer
>> (as others have said) is that the standard doesn't specify the order
>> of addresses, and doesn't even let you subtract addresses unless
>> they point into the same object, though in practice it will work
>> on any system where processes have a flat address space.
>
> Any sufficiently aggressive optimiser won't even bother setting
> any variable set to such an undefined value. Nor will it set any
> future values dependent on that variable.

Nonsense.

Any sufficiently aggressive optimizer can choose to make such
optimizations /if/ they're consistent with other decisions made
by the implementation, but not every implementation will make
decisions that allow such optimizations. Just because the
Standard declares a certain behavior as undefined doesn't mean an
implementation will choose not to define it. The optimizer is
subject to the whims of the implementation, not the other way
around.

Tim Rentsch

unread,

Sep 30, 2009, 7:00:15 PM9/30/09

to

Nobody <nob...@nowhere.com> writes:

> On Tue, 22 Sep 2009 11:26:37 +0300, Phil Carmody wrote:
>
>> Any sufficiently aggressive optimiser won't even bother setting
>> any variable set to such an undefined value. Nor will it set any
>> future values dependent on that variable.
>>

>> I'm not sure if there are any sufficiently aggressive optimisers
>> out there, but not prepared to sloppily write UB in order to find
>> out.
>
> Consider the following code (paraphrasing a bug which was recently
> discovered in the linux kernel):
>
> 1 int x = p->x;
> 2 if (!p) return;
> ...
>
> Some versions of gcc are sufficiently aggressive that they optimise line
> 2 out of existence.
>
> The rationale is that because p->x had already been evaluated at line 1,
> p being null leads to undefined behaviour. If p is not null, the return on
> line 2 won't occur, but if p is null, the compiler is free to return, or
> not return, or do whatever else it feels like doing.

Another choice, equally valid and for some purposes better, would
be to re-write these two statements as

if(!p) return;
int x = p->x;

In this particular case it seems better to steer in the direction
of greater safety (as this second form of re-writing does).
Personally, I'd rather get a warning than either "optimization".
But if I have to choose between one of the above approaches,
normally I'd choose Dr. Jekyll over Mr. Hyde.

Tim Rentsch

unread,

Sep 30, 2009, 7:04:32 PM9/30/09

to

ric...@cogsci.ed.ac.uk (Richard Tobin) writes:

> In article <j4sum.73097$nQ6....@newsfe07.iad>,
> Stephen Sprunk <ste...@sprunk.org> wrote:
>
>>GCC has an feature that tracks whether it's possible for a pointer to be
>>null; if you dereference a pointer, GCC then sets the "notnull"
>>attribute on it and any future checks for a null pointer are optimized
>>away.
>>[...]
>>I assume that this optimization is to remove redundant tests/branches
>>and therefore improve performance; presumably it wouldn't be there if it
>>didn't help in at least some cases.
>
> As I've said before, I wish it would tell you when it's doing
> this, as it traditionally has with simpler optimisations such as
> always-true comparisons. Being able to remove a chunk of code
> can be a sign of a mistake by the programmer, and just removing
> it often makes the results of the error even more obscure.

I second this motion, at least that there ought to be
a flag to set to ask for the warning. And enabling
the optimization should set the warning flag ON by
default.

Keith Thompson

unread,

Sep 30, 2009, 7:11:49 PM9/30/09

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Nobody <nob...@nowhere.com> writes:
[...]

>> Consider the following code (paraphrasing a bug which was recently
>> discovered in the linux kernel):
>>
>> 1 int x = p->x;
>> 2 if (!p) return;
>> ...
>>
>> Some versions of gcc are sufficiently aggressive that they optimise line
>> 2 out of existence.
>>
>> The rationale is that because p->x had already been evaluated at line 1,
>> p being null leads to undefined behaviour. If p is not null, the return on
>> line 2 won't occur, but if p is null, the compiler is free to return, or
>> not return, or do whatever else it feels like doing.
>
> Another choice, equally valid and for some purposes better, would
> be to re-write these two statements as
>
> if(!p) return;
> int x = p->x;

[...]

I'm not sure whether the Linux kernel is normally compiled with
options that cause gcc to permit mixed declarations and statements.
If not, it would have to be:

int x;
if (!p) return;
x = p->x;

Tim Rentsch

unread,

Sep 30, 2009, 7:16:25 PM9/30/09

to

Stephen Sprunk <ste...@sprunk.org> writes:

Ahh, that's a good example.

Surely though any compiler smart enough to do the optimization in
the first place can be made smart enough to distinguish between
the annoying cases and the non-annoying cases. Also, if the
compiler is smart enough to take out the later test, it ought
to be able to transform

int x = p->x;

if(!p) return;

into

ASSERT(p!=0);

int x = p->x;

(for some appropriate form of ASSERT), and provide that
as an option for situations when subsequent code blocks
would be removed.

Nobody

unread,

Sep 30, 2009, 9:22:06 PM9/30/09

to

On Wed, 30 Sep 2009 12:34:04 -0700, Tim Rentsch wrote:

>>>>> Apart from division, the only common instruction which has signed and
>>>>> unsigned variants is a right shift.
>>>>
>>>> And multiply.
>>>
>>> True for x86's double-width multiply, but how many architectures have that
>>> feature?
>>
>> Well, the first processor I used that had a multiply instruction had both
>> signed and unsigned. The architecture I've used the most since then also
>> had this pair. The two other architectures I've used extensively in that
>> time also have them. Of the two other architectures I've used but not
>> extensively programmed for, one didn't have a muliply at all, the other
>> had both types. Only one architecture I've used that has a multiply
>> instruction at all fails to have the pair. [snip summary]
>
> The point is that to supply C semantics the architecture
> needs to provide only one multiply instruction (if 2's
> complement representation is used). Presumably the
> architectures you mentioned either weren't using
> 2's complement, or had separate instructions to
> set overflow/carry flags differently (or for some
> other multiple-precision arithmetic capability).

The point is that the CPU can provide features which go beyond C
semantics, e.g. double-width multiply (i.e. multiplying 2 32-bit values
produces a 64-bit value).

If the CPU uses two's complement and only offers a C-style multiply
(where the result is truncated to the width of the operands) there is no
difference between signed and unsigned.

Phil Carmody

unread,

Oct 1, 2009, 3:48:56 AM10/1/09

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
> Phil Carmody <thefatphi...@yahoo.co.uk> writes:
>> ric...@cogsci.ed.ac.uk (Richard Tobin) writes:
>>> In article <200910022...@gmail.com>,
>>> Kaz Kylheku <kkyl...@gmail.com> wrote:
>>>
>>>>And remember, next time someone asks you, just tell them you read it
>>>>on Usenet. Whatever you do, don't actually read the standard
>>>>yourself. Second-hand information is best.
>>>
>>> So there we have it: anything that isn't specified in the C standard
>>> is off-topic here, and you shouldn't ask about what's in the standard
>>> eiher.
>>
>> Not sure how that follows from the quoted paragraph.
>>
>>> Your question was in fact a perfectly reasonable one, and the answer
>>> (as others have said) is that the standard doesn't specify the order
>>> of addresses, and doesn't even let you subtract addresses unless
>>> they point into the same object, though in practice it will work
>>> on any system where processes have a flat address space.
>>
>> Any sufficiently aggressive optimiser won't even bother setting
>> any variable set to such an undefined value. Nor will it set any
>> future values dependent on that variable.
>
> Nonsense.

I realised later that that's an introit to warn me about what follows.

> Any sufficiently aggressive optimizer can choose to make such
> optimizations /if/ they're consistent with other decisions made
> by the implementation, but not every implementation will make
> decisions that allow such optimizations.

So I say it can conditionally do something (on condition that it's
aggressive enough), and you say it can conditionally do something
(on condition that it knows what it's doing). Right.

> Just because the
> Standard declares a certain behavior as undefined doesn't mean an
> implementation will choose not to define it.

Then it's not sufficiently aggressive. And the direction of your logic
is entirely unsuitable for addressing the point I raise.

> The optimizer is
> subject to the whims of the implementation, not the other way
> around.

The optimiser is part of the implementation.

Phil Carmody

unread,

Oct 1, 2009, 3:50:41 AM10/1/09

to

Keith Thompson <ks...@mib.org> writes:
> I'm not sure whether the Linux kernel is normally compiled with
> options that cause gcc to permit mixed declarations and statements.
> If not, it would have to be:
>
> int x;
> if (!p) return;
> x = p->x;

It gibbers about C90, and then continues to compile it. Given that
there are _tons_ of non-C90 things in the linux kernel that it
doesn't complain about, I've never understood it singling that one
out for the combination of warning and ignoring.

Tim Rentsch

unread,

Oct 2, 2009, 3:05:33 PM10/2/09

to

Keith Thompson <ks...@mib.org> writes:

Sorry, my meaning wasn't clear enough. I wasn't talking
about whether the source code is written in C99, only trying
to indicate the transformation that could be applied. That
transformation is easier to express in C99 than C90, but I
wasn't talking about changing the actual program source --
only about what the revised semantics (under the proposed
"optimization") would be.

Tim Rentsch

unread,

Oct 2, 2009, 3:29:17 PM10/2/09

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

Based on the first sentence in that paragraph, it sounds like
you're saying that any optimizer that's evolved to the point that
it makes such optimizations will make such optimizations. Are
you doing anything more than giving an implicit definition for
"sufficiently aggressive"? Or does "sufficiently agressive"
have an independent meaning so that your statement isn't just
a tautology? If so, what is it? Or do you really mean "any
sufficiently aggressive /implementation/ will ...."?

>> The optimizer is
>> subject to the whims of the implementation, not the other way
>> around.
>
> The optimiser is part of the implementation.

First, it's perfectly possible to write optimizers
that are independent of any particular implementation,
as I'm sure you must know.

Second, even ignoring that, your comment doesn't invalidate
my point. An implementation is defined by the choices
it makes about which representations to use, how various
behaviors are defined, etc. These choices constitute
the "interface" of the implementation". The optimizer
is part of the "implementation" of the implementation.
The optimizer used in an implementation is limited by the
choices made in defining how the implementation behaves;
otherwise, its optimizations aren't valid for that
implementation.

Tim Rentsch

unread,

Oct 2, 2009, 3:43:32 PM10/2/09

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

Probably what you're remembering is this sentence in 6.5.6p8:

If both the pointer operand and the result point to elements
of the same array object, or one past the last element of
the array object, the evaluation shall not produce an
overflow; otherwise, the behavior is undefined.

It's talking about adding a pointer and an integer, and
all it's saying is that adding a pointer and an integer
has to work provided the integer is within the range
of acceptable values for the pointer and array in question.
IOW overflow is meant in the "exceptional condition"
sense, not in the sense of what boundaries might be
crossed.

Tim Rentsch

unread,

Oct 2, 2009, 4:07:33 PM10/2/09

to

Keith Thompson <ks...@mib.org> writes:

I believe in fact that your statement is true even without
the "if no object can span certain address boundaries," clause.
Talking about "signedness" of pointers normally means if the
pointer bits are interpreted integers do pointer comparisons work
the same as signed comparisons or as unsigned comparisons.

Of course, a program like this might get different results
if the pointer assignments were written

p0 = (void*)(unsigned int)INT_MAX;
p1 = (void*)(unsigned int)INT_MIN;

etc. It might be better (and adding to the gross tonnage
of undefined behavior) to use an approach like this:

p0 = *(void**)(int[1]){INT_MAX};
p1 = *(void**)(int[1]){INT_MIN};

Tim Rentsch

unread,

Oct 2, 2009, 4:09:58 PM10/2/09

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

I thought it was implied that I wasn't including the
case where the processor had no multiply instructions
at all.

Phil Carmody

unread,

Oct 2, 2009, 7:38:44 PM10/2/09

to

Jeebus. I never thought that _you_ would be >this< close to being
stuffed in my killfile. Please learn to count before posting to
usenet again.

Tim Rentsch

unread,

Oct 3, 2009, 6:13:55 AM10/3/09

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

By all means, please add me to your killfile if that's what
you think is appropriate. To save you some trouble, let me
add that if you're hoping I'll get better at understanding
your rather elliptical commentary, it's probably better not
to wait for that.

Dik T. Winter

unread,

Oct 6, 2009, 7:53:42 AM10/6/09

to

In article <h9stps$b4p$1...@news-pa1.hpl.hp.com> Chris Dollin <chris....@hp.com> writes:
> Dik T. Winter wrote:
>
> > Right. But if I remember right, on the ARM memory started always at an
> > address 'below' 0 and continued to an address 'above' 0.
>
> I think that's the Transputer, not the ARM.

I think you are right.

Dik T. Winter

unread,

Oct 6, 2009, 8:07:39 AM10/6/09

to

In article <87ljjtg...@kilospaz.fatphil.org> Phil Carmody <thefatphi...@yahoo.co.uk> writes:
> > Phil Carmody <thefatphi...@yahoo.co.uk> writes:

> >>> Phil Carmody <thefatphi...@yahoo.co.uk> writes:
...

> >>>> Well, the first processor I used that had a multiply instruction had
> >>>> both signed and unsigned. The architecture I've used the most since
> >>>> then also had this pair. The two other architectures I've used
> >>>> extensively in that time also have them. Of the two other
> >>>> architectures I've used but not extensively programmed for, one didn't
> >>>> have a muliply at all, the other had both types. Only one architecture
> >>>> I've used that has a multiply instruction at all fails to have the
> >>>> pair.

...

> >> Close. Your presumptions are true for 4 of the 5 archs.

...

> Jeebus. I never thought that _you_ would be >this< close to being
> stuffed in my killfile. Please learn to count before posting to
> usenet again.

Let's count:
#1 had both variants
#2 also had this pair
#3 and #4 also have them
#5 had no multiply at all
#6 had both types
#7 has multiply but only one type

So I come at 5 out of 6.

Stephen Sprunk

unread,

Oct 7, 2009, 12:11:13 PM10/7/09

to

[ My apologies for the flood of delayed responses; my old news server
was silently eating all my posts for the last week or so.]

Keith Thompson wrote:

> Tim Rentsch <t...@alumni.caltech.edu> writes:
>> Another choice, equally valid and for some purposes better, would
>> be to re-write these two statements as
>>
>> if(!p) return;
>> int x = p->x;
>

> I'm not sure whether the Linux kernel is normally compiled with
> options that cause gcc to permit mixed declarations and statements.
> If not, it would have to be:
>
> int x;
> if (!p) return;
> x = p->x;

That is correct. GCC's default mode, which is normally used to compile
the kernel, is "GNU89", i.e. C89 plus various extensions. Mixing
statements and declarations is one of the few useful changes in C99 that
wasn't already supported as an extension, so GCC warns about it, though
it will still produce correct output as if it were in C99 or GNU99 mode.

The OSS community has learned through experience that compiler warnings
usually indicate bugs or portability problems, so code is "fixed" to get
rid of them even if there is no actual bug. Given the above, that means
no mixing statements and declarations in the Linux kernel.

S

--
Stephen Sprunk "God does not play dice." --Albert Einstein
CCIE #3723 "God is an inveterate gambler, and He throws the
K5SSS dice at every possible opportunity." --Stephen Hawking

Stephen Sprunk

unread,

Oct 7, 2009, 12:11:25 PM10/7/09

to

[ My apologies for the flood of delayed responses; my old news server
was silently eating all my posts for the last week or so.]

Tim Rentsch wrote:
> Any sufficiently aggressive optimizer can choose to make such
> optimizations /if/ they're consistent with other decisions made
> by the implementation, but not every implementation will make
> decisions that allow such optimizations. Just because the
> Standard declares a certain behavior as undefined doesn't mean an
> implementation will choose not to define it. The optimizer is
> subject to the whims of the implementation, not the other way
> around.

At least in the case of Linux, the decisions made by (or "the whims of")
the GCC team are a fundamental part of defining that implementation.
The ABI is an output from that process, i.e. it's just documentation of
what GCC et al do, not an input to that process...

Tim Rentsch

unread,

Oct 8, 2009, 4:18:24 AM10/8/09

to

Stephen Sprunk <ste...@sprunk.org> writes:

> [ My apologies for the flood of delayed responses; my old news server
> was silently eating all my posts for the last week or so.]
>
> Tim Rentsch wrote:
>> Any sufficiently aggressive optimizer can choose to make such
>> optimizations /if/ they're consistent with other decisions made
>> by the implementation, but not every implementation will make
>> decisions that allow such optimizations. Just because the
>> Standard declares a certain behavior as undefined doesn't mean an
>> implementation will choose not to define it. The optimizer is
>> subject to the whims of the implementation, not the other way
>> around.
>
> At least in the case of Linux, the decisions made by (or "the whims of")
> the GCC team are a fundamental part of defining that implementation.
> The ABI is an output from that process, i.e. it's just documentation of
> what GCC et al do, not an input to that process...

Linux is not a C implementation. GCC plus an appropriate library
is a C implmentation. It happens that one of these pairings (of
gcc plus a library) runs on top of linux, but linux is not any
part of the C implmentation, it's just the substrate on which the
C implementation does its work.

As for gcc -- I understand that some implementations make
behavioral decisions based on what's convenient for the optimizer,
and gcc may be one of those. Even so, the decisions belong to the
implementation, not the optimizer; if some fancy optimizer were
transplanted into a different implementation, any changes in
implementation-chosen behavior would require adjustments to the
optimizer -- otherwise it's not the same implementation.