C99, strict aliasing

13 views
Skip to first unread message

Mike

unread,
Jul 20, 2010, 2:50:53 PM7/20/10
to
Is the following legal in C99?
Does it violate strict aliasing?

int func(int arg)
{
union {
int i;
char c[sizeof(int)];
} u;
u.i=arg;
scribble(u.c);
return u.i;
}
I've found hints that it is,
but nothing I'm sure of.

Datesfat Chicks

unread,
Jul 20, 2010, 3:20:04 PM7/20/10
to
"Mike" <henn...@web.cs.ndsu.nodak.edu> wrote in message
news:af003706-49e2-42d1...@f6g2000yqa.googlegroups.com...

My opinion is uninformed, but I don't believe this is any sort of violation.

I think the strict aliasing rule says that the compiler may assume that two
different pointer types don't point to the same memory.

I think the code you provided doesn't break any rules, and I think a
compiler knows that u.i and u.c[] are potentially the same memory.

If you want to break rules, do it this way:

union {
int i;
char c[sizeof(int)];
} u;

char *get_char_address(void)
{
return(&u.c[1]);
}

int *get_int_address(void)
{
return(&u.i);
}

int func(void)
{
char *cp;
int *ip;

cp = get_char_address();
ip = get_ip_address();

//At this point the compiler may not know these point to the same memory
area.
(*ip)++; //At this point, the compiler may have buffered the integer to
a CPU register.
*cp = 0; //The compiler may not realize you just changed the underlying
integer.

return(*ip); The compiler may use the buffered integer rather than
re-loading from memory.
}

Datesfat.

Mike

unread,
Jul 20, 2010, 3:30:19 PM7/20/10
to
On Jul 20, 2:20 pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>
wrote:
> "Mike" <henne...@web.cs.ndsu.nodak.edu> wrote in message

>
> news:af003706-49e2-42d1...@f6g2000yqa.googlegroups.com...
>
> > Is the following legal in C99?
> > Does it violate strict aliasing?
>
> > int func(int arg)
> > {
> > union {
> >    int i;
> >    char c[sizeof(int)];
> > } u;
> > u.i=arg;
> > scribble(u.c);
> > return u.i;
> > }
> > I've found hints that it is,
> > but nothing I'm sure of.
>
> My opinion is uninformed, but I don't believe this is any sort of violation.
>
> I think the strict aliasing rule says that the compiler may assume that two
> different pointer types don't point to the same memory.
>
> I think the code you provided doesn't break any rules, and I think a
> compiler knows that u.i and u.c[] are potentially the same memory.

That might just mean that the compiler could detect the error.
If the type of u.c were float and not some kind of char or int,
func would definitely be in the realm of unspecified behavior.

Datesfat Chicks

unread,
Jul 20, 2010, 3:46:19 PM7/20/10
to
"Mike" <henn...@web.cs.ndsu.nodak.edu> wrote in message
news:3964b771-3837-4a17...@g35g2000yqa.googlegroups.com...

Well, we are potentially talking about different types of unspecified
behavior (and I'm sure an expert will chime in here).

The first form of unspecified behavior ... I think that the strict aliasing
rule is designed to allow the compiler to optimize more effectively in
certain situations. If you break the rule, you may end up with logically
incorrect behavior due to compiler optimizations because the compiler
doesn't have the ability to determine that you can monkey with the same
memory using two pointers to different types. The behavior may be different
on different platforms, different compiler versions, etc.

The second form of unspecified behavior ... using a union to do things that
have an undefined result because of data representation. There are two
issues there: (a)whether the compiler can tell you are messing with the
same memory (and I believe it can), and (b)what the result will be. I
believe you are covered on (a), because the compiler will know what you're
messing with. But (b) would often be platform dependent, and could
hypothetically be different with different versions or compiler options that
affect data representation.

Unspecified behavior, yes ... but two types of it I believe.

Datesfat.

Mike

unread,
Jul 20, 2010, 4:49:17 PM7/20/10
to
On Jul 20, 2:46 pm, "Datesfat Chicks" <datesfat.chi...@gmail.com>

Unspecified I could live with.
char's have a special dispensation.
Unless that dispensation extends to my code,
we are in the realm of undefined behavior.
My previous efforts to get an answer turned up mentions of said
dispensation,
but none made clear what one could do with it.
Changing char to float definitely produces undefined behavior:


int func(int arg)
{
union {
int i;

float c[sizeof(int)];


} u;
u.i=arg;
scribble(u.c);
return u.i;
}

It's exactly the sort of thing one is supposed not to do with unions.
char's have a special dispensation with regard to accessing variables
of other types.
Where it applies, the result is unspecified instead of undefined.
What I don't know is whether it applies to my first func.

Ben Bacarisse

unread,
Jul 20, 2010, 6:33:06 PM7/20/10
to
Mike <henn...@web.cs.ndsu.nodak.edu> writes:

It's fine, at least as far as the part that seems to be bothering you is
concerned (the aliasing). One weird machines, scribble(u.c) can produce
a trap representation in u.i, but presumably you are doing this
*because* you want to fiddle with the representation and you know what
you are doing. It might be worth saying what your top-level goal is:
i.e. to what problem is the above union a solution? There may be other
well-known solutions.

Elsewhere in this thread (with so much quoted text that I stopped
reading it) I saw you bring up the issues of using an array of floats
instead of chars. Maybe you could say, here, why you want to do that
since it seems like quite another kind of use for a union. Again,
aliasing is not so much the problem as messing about with complex
representations.

--
Ben.

Mike

unread,
Jul 20, 2010, 7:11:15 PM7/20/10
to
On Jul 20, 5:33 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:

> Mike <henne...@web.cs.ndsu.nodak.edu> writes:
> > Is the following legal in C99?
> > Does it violate strict aliasing?
>
> > int func(int arg)
> > {
> > union {
> >     int i;
> >     char c[sizeof(int)];
> > } u;
> > u.i=arg;
> > scribble(u.c);
> > return u.i;
> > }
> > I've found hints that it is,
> > but nothing I'm sure of.
>
> It's fine, at least as far as the part that seems to be bothering you is
> concerned (the aliasing).  One weird machines, scribble(u.c) can produce
> a trap representation in u.i, but presumably you are doing this

Are trap ints really allowed in C?

> *because* you want to fiddle with the representation and you know what
> you are doing.  It might be worth saying what your top-level goal is:
> i.e. to what problem is the above union a solution?  There may be other
> well-known solutions.

I know of another solution, but avr-gcc produces awful code for it.
The code I've given is fairly standard procedure for manipulating the
bytes of an int.
It pretty much always works.
I'm trying to discover what C99 actually requires of such code.
If it's undefined behavior, updating a compiler could cause it to fail
without warning.
If it's unspecified behavior, then if it works once,
I would expect it to continue work until a change in the
representation of an int.
As a bonus, the compiler wouldn't complain about it.

> Elsewhere in this thread (with so much quoted text that I stopped
> reading it) I saw you bring up the issues of using an array of floats
> instead of chars.  Maybe you could say, here, why you want to do that
> since it seems like quite another kind of use for a union.  Again,
> aliasing is not so much the problem as messing about with complex
> representations.

I don't want to use floats.
It was just an example for which I knew the answer and demonstrated my
reason for concern.
In the float example, if scribble changed u.c,
the compiler would be allowed to return 0 or drown my brother-in-law
in nasal demons.

In some situations, chars are a special case,
but I'm unclear on what situations those are.

Ben Bacarisse

unread,
Jul 20, 2010, 10:16:44 PM7/20/10
to
Mike <henn...@web.cs.ndsu.nodak.edu> writes:

> On Jul 20, 5:33 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> Mike <henne...@web.cs.ndsu.nodak.edu> writes:
>> > Is the following legal in C99?
>> > Does it violate strict aliasing?
>>
>> > int func(int arg)
>> > {
>> > union {
>> >     int i;
>> >     char c[sizeof(int)];
>> > } u;
>> > u.i=arg;
>> > scribble(u.c);
>> > return u.i;
>> > }
>> > I've found hints that it is,
>> > but nothing I'm sure of.
>>
>> It's fine, at least as far as the part that seems to be bothering you is
>> concerned (the aliasing).  One weird machines, scribble(u.c) can produce
>> a trap representation in u.i, but presumably you are doing this
>
> Are trap ints really allowed in C?

Yes. int can have padding bits and some settings of these could be trap
representations. But there is another possible trap representation with
no padding bits: sign bit one and all others zero for 2's complement
and sign and magnitude representations; or all bits one for 1's
complement systems. This does not have to be a trap, but it is
permitted, presumably because it is/was one on some machines.

>> *because* you want to fiddle with the representation and you know what
>> you are doing.  It might be worth saying what your top-level goal is:
>> i.e. to what problem is the above union a solution?  There may be other
>> well-known solutions.
>
> I know of another solution, but avr-gcc produces awful code for it.
> The code I've given is fairly standard procedure for manipulating the
> bytes of an int.
> It pretty much always works.
> I'm trying to discover what C99 actually requires of such code.

It does what you think is does and C99 says that is does that. Any
problems would come from peculiar architectures such as ints with trap
representations or ones in which char has padding bits that don't
correspond to padding bits in the int. I don't know of any -- I am
just speculating.

Using unsigned char is safer since it can't have any padding bits.

By the way, you don't need the union:

int i = arg;
scribble((unsigned char *)&i);
return i;

will also work.

<snip>


> In some situations, chars are a special case,
> but I'm unclear on what situations those are.

Yes. I don't fany trying to list them all, but for your purposes
special permission is granted to access an object as an array of (signed
or unsigned) char -- which is why the direct method with no union is
often used.

--
Ben.

Tim Rentsch

unread,
Jul 21, 2010, 2:00:30 AM7/21/10
to
Mike <henn...@web.cs.ndsu.nodak.edu> writes:

It's legal. I believe it would still be legal even if the type of
'c' were a non-character type (e.g., double). In both cases the
result depends on implementation-defined information (in particular
the representations of types involved), and so could result in UB
because of a trap representation, but other than that there is no
undefined behavior here.

Unfortunately it's not clear (at least not to me) whether this
example violates GCC strict aliasing. The gcc man page contains
some examples, but it's hard to say which side of the line this
example is supposed to fall on. Here 'c' being an array of
character type may very well make a difference. It would be
nice if the GCC people provided unambiguous documentation on
this (which in fact they may very well do, but I'm not aware
of it if so).

Tim Rentsch

unread,
Jul 21, 2010, 2:42:37 AM7/21/10
to
Mike <henn...@web.cs.ndsu.nodak.edu> writes:

The data representation issues are implementation-defined, not
unspecified.

> char's have a special dispensation.
> Unless that dispensation extends to my code,
> we are in the realm of undefined behavior.
> My previous efforts to get an answer turned up mentions of said
> dispensation,
> but none made clear what one could do with it.

It's always legal to access any object using a character type.
Since the bytes of 'u.i' overlap the region pointed to by 'u.c',
the compiler must assume that 'u.i' might be changed by calling
'scribble(u.c)', unless it can prove differently.


> Changing char to float definitely produces undefined behavior:
> int func(int arg)
> {
> union {
> int i;
> float c[sizeof(int)];
> } u;
> u.i=arg;
> scribble(u.c);
> return u.i;
> }
> It's exactly the sort of thing one is supposed not to do with unions.

> [snip]

These statements are at odds with how the Standard talks about
unions. It's allowed to store into one union member and read
from another, even if the types are different and neither type is
a character type. Exactly how far this provision is meant to
extend isn't made clear, but as a categorical statement it's
wrong to say the Standard doesn't allow type punning (which is
what you're describing) using unions -- the Standard mentions
union type punning explicitly, in 6.5.2.3 p3 (in a footnote):

If the member used to access the contents of a union object is
not the same as the member last used to store a value in the
object, the appropriate part of the object representation of
the value is reinterpreted as an object representation in the
new type as described in 6.2.6 (a process sometimes called
"type punning"). This might be a trap representation.

Tim Rentsch

unread,
Jul 21, 2010, 2:47:17 AM7/21/10
to
Mike <henn...@web.cs.ndsu.nodak.edu> writes:

> On Jul 20, 5:33 pm, Ben Bacarisse <ben.use...@bsb.me.uk> wrote:
>> Mike <henne...@web.cs.ndsu.nodak.edu> writes:
>> > Is the following legal in C99?
>> > Does it violate strict aliasing?
>>
>> > int func(int arg)
>> > {
>> > union {
>> > int i;
>> > char c[sizeof(int)];
>> > } u;
>> > u.i=arg;
>> > scribble(u.c);
>> > return u.i;
>> > }
>> > I've found hints that it is,
>> > but nothing I'm sure of.
>

> [... question about making u.c be float instead of char ...]


>
> I don't want to use floats.
> It was just an example for which I knew the answer and demonstrated my
> reason for concern.
> In the float example, if scribble changed u.c,
> the compiler would be allowed to return 0 or drown my brother-in-law
> in nasal demons.

Which section(s) of the Standard lead you to think that?

Mike

unread,
Jul 21, 2010, 11:15:51 AM7/21/10
to
On Jul 21, 1:42 am, Tim Rentsch <t...@alumni.caltech.edu> wrote:

> Mike <henne...@web.cs.ndsu.nodak.edu> writes:
> > Changing char to float definitely produces undefined behavior:
> > int func(int arg)
> > {
> > union {
> >     int i;
> >     float c[sizeof(int)];
> > } u;
> > u.i=arg;
> > scribble(u.c);
> > return u.i;
> > }
> > It's exactly the sort of thing one is supposed not to do with unions.
> > [snip]
>
> These statements are at odds with how the Standard talks about
> unions.  ...

'Tis what I read about the C89 standard and I had not read of a
change.
I was wrong about the C89 standard also.
C89 also specifies implementation-defined behavior
and has less restrictions on the implementation.

> ... It's allowed to store into one union member and read


> from another, even if the types are different and neither type is
> a character type.  Exactly how far this provision is meant to
> extend isn't made clear, but as a categorical statement it's
> wrong to say the Standard doesn't allow type punning (which is
> what you're describing) using unions -- the Standard mentions
> union type punning explicitly, in 6.5.2.3 p3 (in a footnote):
>
>     If the member used to access the contents of a union object is
>     not the same as the member last used to store a value in the
>     object, the appropriate part of the object representation of
>     the value is reinterpreted as an object representation in the
>     new type as described in 6.2.6 (a process sometimes called
>     "type punning").  This might be a trap representation.

Really good to know.

Reply all
Reply to author
Forward
0 new messages