help: gcc compilation difference

new

unread,

Feb 23, 2010, 10:02:51 PM2/23/10

to

Hi C Experts,

I have the following program:

poitr.c
----------------------------------
#include<stdio.h>
#include<malloc.h>

int main()
{
char *s;
char p[] = "abcda";
s = malloc(sizeof(char) *256);
s[p[0]]++;
s[p[2]]++;
return 0;
}
-----------------------------------
When I compiled the above program with the command " gcc poitr.c" the
compilation was successful.
But when I compiled with "gcc -Wall -o poitr poitr.c" compiler has
thrown below warning messages.
-------------------------------------------------------
warning: array subscript has type 'char'
-------------------------------------------------------
Questions:
1/ what is the difference between these two compilations?
2/ Which one I need to use?
3/ Why the warnings have been hidden with the first compilation
4/ Can you please explain the operation of s[p[0]]++.
------------------------------------------------------
OS used: Fedora10

Appreciate your help.

Thanks a lot.

OrganizedChaos

unread,

Feb 23, 2010, 10:13:02 PM2/23/10

to

On 2/23/2010 9:02 PM, new wrote:
> Hi C Experts,
>
> I have the following program:

<snip>

> Questions:
> 1/ what is the difference between these two compilations?
> 2/ Which one I need to use?
> 3/ Why the warnings have been hidden with the first compilation
> 4/ Can you please explain the operation of s[p[0]]++.
> ------------------------------------------------------
> OS used: Fedora10
>
> Appreciate your help.
>
> Thanks a lot.

The "-Wall" option enables all warnings. It is just showing a warning
that does not prevent the compiler from completing successfully. Since
this option was not enabled in the first compilation, the warning was
not displayed. As far as I can tell, there are no differences in the
end result.

Hope this helps

Richard Heathfield

unread,

Feb 23, 2010, 10:24:10 PM2/23/10

to

new wrote:
> Hi C Experts,
>
> I have the following program:
>
> poitr.c
> ----------------------------------
> #include<stdio.h>
> #include<malloc.h>

Better:

#include <stdio.h>
#include <stdlib.h>

The C language definition does not have anything to say about malloc.h,
which is an extension provided by some but by no means all implementations.

>
> int main()
> {
> char *s;
> char p[] = "abcda";
> s = malloc(sizeof(char) *256);

Or just: s = malloc(256); That's because sizeof(char) is guaranteed to be 1.

> s[p[0]]++;
> s[p[2]]++;
> return 0;
> }
> -----------------------------------
> When I compiled the above program with the command " gcc poitr.c" the
> compilation was successful.
> But when I compiled with "gcc -Wall -o poitr poitr.c" compiler has
> thrown below warning messages.
> -------------------------------------------------------
> warning: array subscript has type 'char'
> -------------------------------------------------------
> Questions:
> 1/ what is the difference between these two compilations?

-Wall tells gcc to be a tiny bit picky about the code.

> 2/ Which one I need to use?

-Wall is rather less than the bare minimum warning level you need for
portable C programming with gcc. At the very least, I would use:

-W -Wall -ansi -pedantic

and if you're keen to get every last warning out, there are a bunch more
flags you could use. I don't have gcc right here (it's actually about
two yards above my head at present), so I can't give you my usual list,
but -W -Wall -ansi -pedantic is a good starting-point.

> 3/ Why the warnings have been hidden with the first compilation

You didn't ask for those warnings.

> 4/ Can you please explain the operation of s[p[0]]++.

It's undefined, since s[p[0]]'s value is indeterminate. But let's write
a slightly different program:

#include <stdio.h>
#include <stdlib.h>

int main()
{
char p[] = "abcda";
char *s = malloc(256); /* [Note 1] */
if(s != NULL) /* [Note 2] */
{
s[p[0]] = 0; /* [Note 3] */
s[p[2]] = 0;
s[p[0]]++; /* [Note 4] */
s[p[2]]++;
free(s); /* [Note 5] */
}
return 0;
}

Note 1: 256 isn't guaranteed to get you enough for the whole character
set. In practice, this is not likely to be a problem, at least not on
Fedora! But if you wanted to be portable, you could ask malloc for
sufficient storage for UCHAR_MAX + 1 characters. UCHAR_MAX is defined in
<limits.h> - on your system, it's almost certainly 255, so no big deal,
but not all systems are like your system.

Note 2: whenever you ask for space from the system, it's a good idea to
check whether the request succeeded. It *can* fail, and sometimes
requests *do* fail.

Note 3: here, we give s[p[0]] a known value, 0 in this case. Without
that known value, the program is meaningless. Now, there's a potential
danger here, but I'll save that for Note 4.

Note 4: C requires all standardised members of the basic execution
character set to have non-negative code points. Since 'a' is one of
those standardised members, it's okay to talk about s[p[0]], because
p[0], which is 'a', will have a non-negative (and in this case
definitely positive) value. But the reason gcc is warning you about
s[p[0]]++ is that, by the time it gets to this line, it's probably
forgotten all about p[0] being 'a'. All it's worried about is that
you're using a char for an index, and some chars (non-standard members
of the execution character set) /can/ have negative values on some
systems, including yours - and, because this fact is easily forgotten,
gcc thinks it's worth printing a warning just in case. That's because,
if p[0] were to have a negative value, you'd be accessing a part of
memory that you don't own.

Note 5: once you're done with dynamically allocated data, remember to
free it up. That's just basic good housekeeping.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
"Usenet is a strange place" - dmr 29 July 1999
Sig line vacant - apply within

Richard Heathfield

unread,

Feb 23, 2010, 10:25:16 PM2/23/10

to

OrganizedChaos wrote:
<snip>

> The "-Wall" option enables all warnings.

(Except for all the ones it doesn't enable.)

<snip>

Keith Thompson

unread,

Feb 23, 2010, 10:27:41 PM2/23/10

to

new <luvr...@gmail.com> writes:
> Hi C Experts,
>
> I have the following program:
>
> poitr.c
> ----------------------------------
> #include<stdio.h>
> #include<malloc.h>
>
> int main()
> {
> char *s;
> char p[] = "abcda";
> s = malloc(sizeof(char) *256);
> s[p[0]]++;
> s[p[2]]++;
> return 0;
> }

Really? Where did you get it?

> -----------------------------------
> When I compiled the above program with the command " gcc poitr.c" the
> compilation was successful.
> But when I compiled with "gcc -Wall -o poitr poitr.c" compiler has
> thrown below warning messages.
> -------------------------------------------------------
> warning: array subscript has type 'char'
> -------------------------------------------------------
> Questions:
> 1/ what is the difference between these two compilations?
> 2/ Which one I need to use?
> 3/ Why the warnings have been hidden with the first compilation
> 4/ Can you please explain the operation of s[p[0]]++.
> ------------------------------------------------------
> OS used: Fedora10

The difference is the "-Wall" option; consult your gcc documentation
for details. (You asked for more warnings; you got more warnings.)

In general, using an expression of type char as an array index can be
dangerous, since plain char may be either signed or unsigned, and a
negative array index will, in most circumstances, attempt to access
memory outside the array.

In this particular case, it's not a problem, since the values of p[0]
('a') and p[2] ('c') happen to be members of the basic execution
character set, and are therefore guaranteed to be non-negative.

There are several other problems with the program, including, but not
necessarily limited to, the following:

<malloc.h> is non-standard; use <stdlib.h> instead.

"int main()" should be "int main(void)" (this isn't likely to be a
real problem).

The result of malloc() is not checked.

The elements of the allocated array are uninitialized.

It's conceivable, but vanishingly unlikely, that the values of 'a' and
'c' could exceed 255, causing the indexing operations to go past the
end of the array (this is only possible if CHAR_BIT > 8, and even then
it won't happen with any character encoding I've ever heard of).

Nothing is done with the results of the computations; the entire
program could legitimately be optimized down to:

int main(void) { return 0; }

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

Richard

unread,

Feb 24, 2010, 12:46:39 AM2/24/10

to

OrganizedChaos <no...@none.net> writes:

> On 2/23/2010 9:02 PM, new wrote:
>> Hi C Experts,
>>
>> I have the following program:
>
> <snip>
>
>> Questions:
>> 1/ what is the difference between these two compilations?
>> 2/ Which one I need to use?
>> 3/ Why the warnings have been hidden with the first compilation
>> 4/ Can you please explain the operation of s[p[0]]++.
>> ------------------------------------------------------
>> OS used: Fedora10
>>
>> Appreciate your help.
>>
>> Thanks a lot.
>
> The "-Wall" option enables all warnings.

No it doesn't. It handles more than enough for most people however.

paul

unread,

Feb 24, 2010, 2:54:44 AM2/24/10

to

"Keith Thompson" <ks...@mib.org> wrote in message
news:lnocjf2...@nuthaus.mib.org...

> new <luvr...@gmail.com> writes:
>> Hi C Experts,
>>
>> I have the following program:
>>
>> poitr.c
>> ----------------------------------
>> #include<stdio.h>
>> #include<malloc.h>
>>
>> int main()
>> {
>> char *s;
>> char p[] = "abcda";
>> s = malloc(sizeof(char) *256);
>> s[p[0]]++;
>> s[p[2]]++;
>> return 0;
>> }
>
> Really? Where did you get it?

<snip>

>
> Nothing is done with the results of the computations; the entire
> program could legitimately be optimized down to:
>
> int main(void) { return 0; }
>

Surely this optimisation will behave differently from the
above code since it will not call malloc?

How/why does the compiler know/assume that the call
can be optimised away?

Paul.

Seebs

unread,

Feb 24, 2010, 2:58:46 AM2/24/10

to

--
Copyright 2010, all wrongs reversed. Peter Seebach / usenet...@seebs.net
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!

santosh

unread,

Feb 24, 2010, 3:09:19 AM2/24/10

to

paul <no@email> writes:

>
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:lnocjf2...@nuthaus.mib.org...
>> new <luvr...@gmail.com> writes:
>>> Hi C Experts,
>>>
>>> I have the following program:
>>>
>>> poitr.c
>>> ----------------------------------
>>> #include<stdio.h>
>>> #include<malloc.h>
>>>
>>> int main()
>>> {
>>> char *s;
>>> char p[] = "abcda";
>>> s = malloc(sizeof(char) *256);
>>> s[p[0]]++;
>>> s[p[2]]++;
>>> return 0;
>>> }
>>
>> Really? Where did you get it?
> <snip>
>>
>> Nothing is done with the results of the computations; the entire
>> program could legitimately be optimized down to:
>>
>> int main(void) { return 0; }
>>
>
> Surely this optimisation will behave differently from the
> above code since it will not call malloc?

The net result after both programs have run would be the same
(assuming the host can reclaim the malloc'ed memory which was not
freed.

> How/why does the compiler know/assume that the call
> can be optimised away?

It can't and it won't, atleast for a few more decades. But Keith can,
and did:-) I guess he's making a point to the OP that the program as
posted seems pointless.

Keith Thompson

unread,

Feb 24, 2010, 3:33:04 AM2/24/10

to

"paul" <no@email> writes:
> "Keith Thompson" <ks...@mib.org> wrote in message
> news:lnocjf2...@nuthaus.mib.org...
>> new <luvr...@gmail.com> writes:

[...]

>>> #include<stdio.h>
>>> #include<malloc.h>
>>>
>>> int main()
>>> {
>>> char *s;
>>> char p[] = "abcda";
>>> s = malloc(sizeof(char) *256);
>>> s[p[0]]++;
>>> s[p[2]]++;
>>> return 0;
>>> }

[...]

>>
>> Nothing is done with the results of the computations; the entire
>> program could legitimately be optimized down to:
>>
>> int main(void) { return 0; }
>
> Surely this optimisation will behave differently from the
> above code since it will not call malloc?

Calling malloc is not part of the program's behavior, which is defined
by the standard as "external appearance or action".

> How/why does the compiler know/assume that the call
> can be optimised away?

Because malloc is part of the standard library, the implementation
is free to assume that it behaves as the standard specifies.
If the call succeeds, then the program continues to execute and
produces no output. If the call fails, then the behavior of the
following statements is undefined -- and one possible behavior
is continuing to execute and producing no output. If the program
calls a different function with the same name (say, one declared
in the non-standard header <malloc.h> and perhaps implemented in
some non-standard library), then again, the behavior is undefined,
and the implementation is free to assume that it will produce
no output. (If <malloc.h> defines "malloc" as a macro that does
something other than calling malloc, then this doesn't apply,
but I implicitly assumed that that wasn't the case.)

If the call were to some external function that's not part of the C
standard library, the compiler wouldn't be free to perform this kind
of optimization unless it happened to know what the function does;
for example, an implementation might perform some optimizations at
link time.

santosh

unread,

Feb 24, 2010, 4:11:00 AM2/24/10

to

Seebs <usenet...@seebs.net> writes:

Seems as if your follow-up was "optimised away." ;-)

Noob

unread,

Feb 24, 2010, 5:00:43 AM2/24/10

to

Richard Heathfield wrote:

> -Wall tells gcc to be a tiny bit picky about the code.

For reference.

-Wall enables all the warnings about constructions that some users
consider questionable, and that are easy to avoid (or modify to prevent
the warning), even in conjunction with macros.

Note that some warning flags are not implied by -Wall. Some of them warn
about constructions that users generally do not consider questionable,
but which occasionally you might wish to check for; others warn about
constructions that are necessary or hard to avoid in some cases, and
there is no simple way to modify the code to suppress the warning. Some
of them are enabled by -Wextra but many of them must be enabled
individually.

</quote>

http://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html

> -Wall is rather less than the bare minimum warning level you need for
> portable C programming with gcc. At the very least, I would use:
>
> -W -Wall -ansi -pedantic

(NB: -W is the older name)

<quote>

-Wextra enables some extra warning flags that are not enabled by -Wall.
(This option used to be called -W. The older name is still supported,
but the newer name is more descriptive.)

</quote>

-ansi is equivalent to -std=c89,
other useful values are c99 and iso9899:199409

http://gcc.gnu.org/onlinedocs/gcc/C-Dialect-Options.html
http://gcc.gnu.org/onlinedocs/gcc/Standards.html

Regards.

new

unread,

Feb 24, 2010, 9:57:21 AM2/24/10

to

Richard,Keith and all thanks a ton for your replies.
I have one more question.
In the code if I add the following line:
------------------------------------
z = s[p[0]]++; // say z is declared as int
------------------------------------
what would be the value of z and how it is evaluated?

Thanks a lot in advance.

santosh

unread,

Feb 24, 2010, 10:24:58 AM2/24/10

to

new <luvr...@gmail.com> writes:

Since you don't initialise the s array, it's members could contain
any garbage value, normally the value that was last written to the
memory location. So by extension, we cannot say z contains any useful
value after your assignment. In standard C parlance, it's value is
indeterminate, and reading an indeterminate object, as in the RHS of
your assignment statement, invokes undefined behaviour, again in the
standard's terminology. IOW, it assigns to z, a garbage value at the
very least, and could cause much worse behaviour at worst.

I think others have already explained how the array indexing is
evaluated and it's problems.

Ben Bacarisse

unread,

Feb 24, 2010, 10:57:13 AM2/24/10

to

santosh <santo...@gmail.com> writes:

> new <luvr...@gmail.com> writes:
>
>> Richard,Keith and all thanks a ton for your replies.
>> I have one more question.
>> In the code if I add the following line:
>> ------------------------------------
>> z = s[p[0]]++; // say z is declared as int
>> ------------------------------------
>> what would be the value of z and how it is evaluated?
>>
>> Thanks a lot in advance.
>
> Since you don't initialise the s array, it's members could contain
> any garbage value, normally the value that was last written to the
> memory location. So by extension, we cannot say z contains any useful
> value after your assignment. In standard C parlance, it's value is
> indeterminate, and reading an indeterminate object, as in the RHS of
> your assignment statement, invokes undefined behaviour, again in the
> standard's terminology.

Is this really true? I don't think it is, at least not in the general
way that is often presented here. An indeterminate value is either a
valid value of the type or it is a trap representation. Thus, on
system with no trap representations for objects interpreted as having
type T, accessing an indeterminate value of type T must simply be
unspecified.

Now, in this case, s was of type char. 6.2.6.1 p5 which defines and
discusses trap representations states that:

Certain object representations need not represent a value of the
object type. If the stored value of an object has such a
representation and is read by an lvalue expression that does not
have character type, the behavior is undefined. If such a
representation is produced by a side effect that modifies all or any
part of the object by an lvalue expression that does not have
character type, the behavior is undefined. Such a representation is
called a trap representation.

I read that as forbidding UB when a trap representation is accessed
via an lvalue expression of type char which is the case here, is it
not?

Life would be simpler if such accesses were always undefined, but I
don't think that is how C is currently defined.

<snip>
--
Ben.

Richard Heathfield

unread,

Feb 24, 2010, 11:08:30 AM2/24/10

to

Assuming p[0] has the value 'a', and assuming s['a'] has the value 0,
after the statement has been executed s['a'] will have the value 1, and
so will z. Which precise element of s is described by s['a'] depends on
the character set encoding on your implementation. For example, in ASCII
you'd be looking at s[97] if I remember rightly, whereas in EBCDIC you'd
be looking at s[129].

santosh

unread,

Feb 24, 2010, 11:19:19 AM2/24/10

to

Richard Heathfield <r...@see.sig.invalid> writes:

> new wrote:
>> Richard,Keith and all thanks a ton for your replies.
>> I have one more question.
>> In the code if I add the following line:
>> ------------------------------------
>> z = s[p[0]]++; // say z is declared as int
>> ------------------------------------
>> what would be the value of z and how it is evaluated?
>
>
> Assuming p[0] has the value 'a', and assuming s['a'] has the value
> 0, after the statement has been executed s['a'] will have the value
> 1, and so will z. Which precise element of s is described by s['a']
> depends on the character set encoding on your implementation. For
> example, in ASCII you'd be looking at s[97] if I remember rightly,
> whereas in EBCDIC you'd be looking at s[129].

z will have the value 1 after the assignment? Won't it be zero, since
it's a post-increment?

Richard Heathfield

unread,

Feb 24, 2010, 11:26:46 AM2/24/10

to

santosh wrote:
<snip>

>
> z will have the value 1 after the assignment? Won't it be zero, since
> it's a post-increment?

Um, is it that time already? I have to go...

Message has been deleted

Keith Thompson

unread,

Feb 24, 2010, 6:27:05 PM2/24/10

to

r...@zedat.fu-berlin.de (Stefan Ram) writes:

> Keith Thompson <ks...@mib.org> writes:
>>is continuing to execute and producing no output. If the program
>>calls a different function with the same name (say, one declared
>>in the non-standard header <malloc.h> and perhaps implemented in
>>some non-standard library), then again, the behavior is undefined,
>

> Couldn't this be implementation-defined in a freestanding
> implementation (last sentence of #1 of 5.1.2.1 of ISO/IEC
> 9899:1999 (E))?

Yes. I was assuming a hosted implementation, which was strongly (but
not absolutely) implied by the "#include<stdio.h>" in the OP's code.
I should have made that assumption explicit, especially since
I was being painfully pedantic anyway.

Kenneth Brody

unread,

Feb 25, 2010, 11:05:52 AM2/25/10

to

On 2/24/2010 3:09 AM, santosh wrote:
> paul<no@email> writes:
[...]

>> How/why does the compiler know/assume that the call
>> can be optimised away?
>
> It can't and it won't, atleast for a few more decades. But Keith can,
> and did:-) I guess he's making a point to the OP that the program as
> posted seems pointless.

Actually, I take it as a "good thing" from the OP. He followed the
oft-given advice to trim the code down to the minimum needed to demonstrate
the problem. I would assume that this was probably stripped down from an
assignment to count the number of occurrences of letters in a given string.
There was no need for the "extraneous" code of setting things up or
displaying the results. Just the "s[p[0]]++" and "s[p[2]]++" lines.

--
Kenneth Brody

Tim Rentsch

unread,

Mar 2, 2010, 7:23:24 PM3/2/10

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:

I believe that's a misreading. What the passage says is that if the
access type is a non-character type then the behavior is undefined.
It does not say that if the access type is a character type then the
behavior is defined. Access through a character type interprets the
stored value (ie, the representation) according to the type used to do
the read; if the access type is (char) or (signed char) and the
representation read is a trap representation for that type, it's still
undefined behavior, because there's no (Standard-)defined way to
produce a value from a trap representation. Or if you think there
is, what section in the Standard defines it?

Ben Bacarisse

unread,

Mar 2, 2010, 8:19:11 PM3/2/10

to

Tim Rentsch <t...@x-alumni2.alumni.caltech.edu> writes:

OK, that's reasonable (and was how I first read it): access via a
character type is undefined or defined depending on whether the byte
is or is not a trap representation for the character type used.

What, then, is the effect of the second sentence of the quote? It
must be to add a further blanket undefined for all accesses to one
type's trap representations when accessed via another type. I.e. that
given

union { int si; unsigned ui; } u;

access to u.ui is undefined when u.si holds a trap representation even
when unsigned int has no trap representations of its own.

If that is right there are two things that puzzled me and cause me to
over-think the clause in question. First, it seems odd to give signed
char this odd half-way position and, second, it seems at odds with the
explanation of unions in 6.5.2.3 p3. At the very least the footnote
should surely be expanded to cover the case where some other union
member is trap representation.

Neither of these are arguments for my reading. They are there to
explain why I thought the way I did.

--
Ben.

Peter Nilsson

unread,

Mar 2, 2010, 8:30:57 PM3/2/10

to

Tim Rentsch <t...@x-alumni2.alumni.caltech.edu> wrote:

> Ben Bacarisse <ben.use...@bsb.me.uk> writes:
> > 6.2.6.1 p5 which defines and discusses trap representations
> > states that:
> >
> > Certain object representations need not represent a value
> > of the object type. If the stored value of an object has
> > such a representation and is read by an lvalue expression
> > that does not have character type, the behavior is
> > undefined. If such a representation is produced by a side
> > effect that modifies all or any part of the object by an
> > lvalue expression that does not have character type, the
> > behavior is undefined. Such a representation is called a
> > trap representation.
> >
> > I read that as forbidding UB when a trap representation is
> > accessed via an lvalue expression of type char which is the
> > case here, is it not?
>
> I believe that's a misreading.

If it misreads the intent, it's because the intent is not
clear. ;)

> What the passage says is that if the access type is a non-
> character type then the behavior is undefined. It does not say
> that if the access type is a character type then the behavior
> is defined. Access through a character type interprets the
> stored value (ie, the representation) according to the type
> used to do the read; if the access type is (char) or (signed
> char) and the representation read is a trap representation for
> that type, it's still undefined behavior, because

To put it another way, the last time this discussion came up,
the majority view was that trap representations are possible for
all types except unsigned char (and unsigned bit-fields).
Access to trap representations for non character types is
explicitly undefined. Access to trap representations for signed
character types is _implicitly_ undefined due to a lack of
specification!

Note that "[It is implementation-defined] whether the value
with sign bit 1 and all value bits zero (for the first two),
or with sign bit and all value bits 1 (for ones’ complement),
is a trap representation or a normal value." does not
exclude application to signed character types.

Thus, signed character types can have trap representations.
Whether they can be accessed is a separate issue.

The question remains, why does 6.2.6.1p5 _explicitly_ exclude
character types?

> there's no (Standard-)defined way to
> produce a value from a trap representation.

What's the standard way to produce a value from a non trap
representation for an integer type? Why wouldn't that apply?

--
Peter

Tim Rentsch

unread,

Mar 4, 2010, 12:02:47 PM3/4/10

to

Peter Nilsson <ai...@acay.com.au> writes:

> Tim Rentsch <t...@x-alumni2.alumni.caltech.edu> wrote:
>> Ben Bacarisse <ben.use...@bsb.me.uk> writes:
>> > 6.2.6.1 p5 which defines and discusses trap representations
>> > states that:
>> >
>> > Certain object representations need not represent a value
>> > of the object type. If the stored value of an object has
>> > such a representation and is read by an lvalue expression
>> > that does not have character type, the behavior is
>> > undefined. If such a representation is produced by a side
>> > effect that modifies all or any part of the object by an
>> > lvalue expression that does not have character type, the
>> > behavior is undefined. Such a representation is called a
>> > trap representation.
>> >
>> > I read that as forbidding UB when a trap representation is
>> > accessed via an lvalue expression of type char which is the
>> > case here, is it not?
>>
>> I believe that's a misreading.
>
> If it misreads the intent, it's because the intent is not
> clear. ;)

I would not presume to argue that point. :)

>> What the passage says is that if the access type is a non-
>> character type then the behavior is undefined. It does not say
>> that if the access type is a character type then the behavior
>> is defined. Access through a character type interprets the
>> stored value (ie, the representation) according to the type
>> used to do the read; if the access type is (char) or (signed
>> char) and the representation read is a trap representation for
>> that type, it's still undefined behavior, because
>
> To put it another way, the last time this discussion came up,
> the majority view was that trap representations are possible for
> all types except unsigned char (and unsigned bit-fields).

Of course you mean all scalar types -- struct's and union's are
exempt.

> Access to trap representations for non character types is
> explicitly undefined. Access to trap representations for signed
> character types is _implicitly_ undefined due to a lack of
> specification!

Right.

> Note that "[It is implementation-defined] whether the value
> with sign bit 1 and all value bits zero (for the first two),
> or with sign bit and all value bits 1 (for ones' complement),
> is a trap representation or a normal value." does not
> exclude application to signed character types.
>
> Thus, signed character types can have trap representations.

Yes, even in implementations that use 2's complement.

> Whether they can be accessed is a separate issue.
>
> The question remains, why does 6.2.6.1p5 _explicitly_ exclude
> character types?

Probably because in most implementations the character
types don't have trap representations, and therefore they
shouldn't be included in a blanket statement of undefined
behavior. Also character types (notably "plain" char) are
typically used to get around representation issues; the
combination of how must implementations are and what was
(and is?) common usage probably accounts for the exception
being worded as it is. (Of course I'm only speculating...)

>> there's no (Standard-)defined way to
>> produce a value from a trap representation.
>
> What's the standard way to produce a value from a non trap
> representation for an integer type? Why wouldn't that apply?

This mapping is supplied by the required documentation giving the
implementation-defined information for representation of types
(plus the relevant sections of 6.2.6). That documentation
also defines which representations are trap representations,
ie, which representations correspond to "no value".

Tim Rentsch

unread,

Mar 4, 2010, 6:48:40 PM3/4/10

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:

Yes, that's perfectly understandable. Here are some ideas in
response to the implicit questions in your penultimate paragraph.

First, about unions. Suppose we have a union containing just a
signed integer member, eg,

union { signed int si; } siu;

where both sizeof siu == 4 and sizeof siu.si == 4 are true.

In such a case there are in fact two distinct objects, even though
they happen to occupy exactly the same area of memory -- there is
'siu', and 'siu.si'. We know these are different because the
object designated by 'siu' can never be a trap representation,
(because it's a union, which are never trap representations) even
though 'siu.si' holds a trap representation.

(Editorial side note: the language the Standard uses relating to
the term "object" in various places is among the poorest sets of
phrasings the Standard employs. At some point I might write
something more about that, but right now I'd like to gloss over
those problems.)

Similarly, in the example union mentioned above

union { int si; unsigned ui; } u;

there actually are three distinct objects -- u, u.si, and u.ui.
That at least two of these three occupy exactly the same bytes of
memory doesn't alter the number, since unions are described as
_overlapping_ objects.

What this means is that 'u.ui' and '*(unsigned*)&u.si', because
they are accessing different objects, are allowed to behave
differently.

Now for the second question -- why is (char), in the guise of
(signed char), different? Or why are types besides character
types distinguished? Here is my speculation. The character
types are different because, ever since the early days of C, the
type 'char' has been used to access memory "free form", and the
Standard didn't want to change that. The interesting question
is, why give blanket undefined behavior to all the other types?
Here is where the speculation goes a little deeper. I conjecture
that an implementation might want to use trap representations to
indicate "not yet initialized" values, doing this automatically
without being told, and furthermore that it knows this. In such
a case, we might want

unsigned u;
int i;
u = *(unsigned *) &i;

to be able to trap, because the variable 'i' hasn't been given an
explicit initial value. If the trap-representation-ness of
something depended just on what type is used for access, that
would prevent this form of error detection if some types had no
trap representations.

Even though the last part is pure speculation on my part, this
explanation seems like a plausible enough motivation for the funny
wording in 6.2.6.1#5. At least, for me it does so enough so that
my mental model can tolerate the seeming inconsistencies with
other areas of the Standard in this regard. So I offer it up here
in case it may be of help to other folks.

Ben Bacarisse

unread,

Mar 4, 2010, 10:47:03 PM3/4/10

to

Tim Rentsch <t...@x-alumni2.alumni.caltech.edu> writes:

I agree with what you've written but want to raise a detail so
forgive me for snipping so much of a helpful reply...

> First, about unions. Suppose we have a union containing just a
> signed integer member, eg,
>
> union { signed int si; } siu;
>
> where both sizeof siu == 4 and sizeof siu.si == 4 are true.
>
> In such a case there are in fact two distinct objects, even though
> they happen to occupy exactly the same area of memory -- there is
> 'siu', and 'siu.si'. We know these are different because the
> object designated by 'siu' can never be a trap representation,
> (because it's a union, which are never trap representations) even
> though 'siu.si' holds a trap representation.
>
> (Editorial side note: the language the Standard uses relating to
> the term "object" in various places is among the poorest sets of
> phrasings the Standard employs. At some point I might write
> something more about that, but right now I'd like to gloss over
> those problems.)
>
> Similarly, in the example union mentioned above
>
> union { int si; unsigned ui; } u;
>
> there actually are three distinct objects -- u, u.si, and u.ui.
> That at least two of these three occupy exactly the same bytes of
> memory doesn't alter the number, since unions are described as
> _overlapping_ objects.

This is obviously the intent from the wording about unions but there
is a problem with the == operator. 6.5.9 p6 reads:

Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (including a pointer to an
object and a subobject at its beginning) or function, both are
pointers to one past the last element of the same array object, or
one is a pointer to one past the end of one array object and the
other is a pointer to the start of a different array object that
happens to immediately follow the first array object in the address
space.

So unless we stretch the meaning of the parenthetical remark, we would
have to conclude that (void *)&u.si == (void *)&u.ui must be false
since these two are not the same object.

Of course, both u.si and u.ui are subobjects at the same object's
beginning, but that case is not explicitly covered.

<snip>
--
Ben.

Tim Rentsch

unread,

Mar 5, 2010, 9:34:08 AM3/5/10

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:

This comment illustrates one of my complaints with how the
term "object" is used in the Standard. Sometimes (as in the
cited paragraph) it means just a region of storage and nothing
more than that. Other times it means a kind of association
between an identifier and a region of storage, almost but
not quite what "variable" means in most programming languages.

I absolutely agree with your comment about the cited paragraph,
but I attribute the problem more to sloppy use of language
for the term "object" than indicating any deeper problem
in the C language requirements. In other words the problem
is with how the specifications are written, not with what
I believe to be what the specifications are meant to express.

lawrenc...@siemens.com

unread,

Mar 5, 2010, 5:32:23 PM3/5/10

to

Ben Bacarisse <ben.u...@bsb.me.uk> wrote:
>
> This is obviously the intent from the wording about unions but there
> is a problem with the == operator. 6.5.9 p6 reads:
>
> Two pointers compare equal if and only if both are null pointers,
> both are pointers to the same object (including a pointer to an
> object and a subobject at its beginning) or function, both are
> pointers to one past the last element of the same array object, or
> one is a pointer to one past the end of one array object and the
> other is a pointer to the start of a different array object that
> happens to immediately follow the first array object in the address
> space.
>
> So unless we stretch the meaning of the parenthetical remark, we would
> have to conclude that (void *)&u.si == (void *)&u.ui must be false
> since these two are not the same object.

I don't think that's stretching the parenthetical remark at all, I think
it's a natural consequence of it. If an object and a subobject at its
beginning are considered to be the same object, then two different
subobjects at the beginning of an object must be considered to be the
same object as well.
--
Larry Jones

Is it too much to ask for an occasional token gesture of appreciation?!
-- Calvin

Ben Bacarisse

unread,

Mar 5, 2010, 7:42:29 PM3/5/10

to

lawrenc...@siemens.com writes:

6.2.5 p20 says they are different objects. This is the dichotomy I
was highlighting. I think the intent is that they are different
objects with the same address.

Obviously, in any sane implementation if A == B and A == C then B == C
but in the case of the two pointers in question, I can't quite argue
that from the text. The text I quoted clearly makes == reflexive and
symmetric (for the operands that it describes) but the parenthetical
remark seems to stop is being transitive in all cases.

--
Ben.

Phil Carmody

unread,

Mar 6, 2010, 5:57:21 PM3/6/10

to

Ben Bacarisse <ben.u...@bsb.me.uk> writes:
> Tim Rentsch <t...@x-alumni2.alumni.caltech.edu> writes:

...

>> Similarly, in the example union mentioned above
>>
>> union { int si; unsigned ui; } u;
>>
>> there actually are three distinct objects -- u, u.si, and u.ui.
>> That at least two of these three occupy exactly the same bytes of
>> memory doesn't alter the number, since unions are described as
>> _overlapping_ objects.
>
> This is obviously the intent from the wording about unions but there
> is a problem with the == operator. 6.5.9 p6 reads:
>
> Two pointers compare equal if and only if both are null pointers,
> both are pointers to the same object (including a pointer to an
> object and a subobject at its beginning) or function, both are
> pointers to one past the last element of the same array object, or
> one is a pointer to one past the end of one array object and the
> other is a pointer to the start of a different array object that
> happens to immediately follow the first array object in the address
> space.
>
> So unless we stretch the meaning of the parenthetical remark, we would
> have to conclude that (void *)&u.si == (void *)&u.ui must be false
> since these two are not the same object.
>
> Of course, both u.si and u.ui are subobjects at the same object's
> beginning, but that case is not explicitly covered.

It looks like 6.5.9 p6 has issues with void* too, surely?

Phil
--
I find the easiest thing to do is to k/f myself and just troll away
-- David Melville on r.a.s.f1

Tim Rentsch

unread,

Mar 22, 2010, 4:07:47 PM3/22/10

to

Phil Carmody <thefatphi...@yahoo.co.uk> writes:

No real difference from the non- (void*) case. A (void*) pointer
(with a valid address presumably) points to an object, we just don't
know how big the object is. As long as one understands "object" in
6.5.9 p6 to mean just a region of storage, and not an association
with any identifier (which I believe is the interpretation expected
here), two (void*) pointers for the same beginning address point
either to the same region of storage or one region of storage that's
a subregion of the other (and starting at its beginning). Consider
analagous cases with pointers to arrays or incomplete structure
types:

int (*pa0)[], (*pa1)[];
struct foo *f0, *f1;

... initialize all the variables ...

return pa0 == pa1 && f0 == f1;

All sizes unknown but the pointers still point at objects,
as do pointers of type (void*).

Phil Carmody

unread,

Mar 22, 2010, 7:25:47 PM3/22/10

to

Tim Rentsch <t...@x-alumni2.alumni.caltech.edu> writes:
> Phil Carmody <thefatphi...@yahoo.co.uk> writes:
>> Ben Bacarisse <ben.u...@bsb.me.uk> writes:
>>> a problem with the == operator. 6.5.9 p6 reads:
>>>
>>> Two pointers compare equal if and only if both are null pointers,

>>> both are pointers to the same object [...]

>> It looks like 6.5.9 p6 has issues with void* too, surely?
>
> No real difference from the non- (void*) case. A (void*) pointer
> (with a valid address presumably) points to an object, we just don't
> know how big the object is.

Yes, I can see that now. I was reading the clause too literally with
the pointers each being a "pointer to an object", and thus of type
"pointer to object", and thus complete types. The intended
interpretation is indeed closer to "both are pointers, pointing
to the same object".