Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Stanard compliant bit-casting

5 views
Skip to first unread message

zr

unread,
Jan 5, 2010, 1:37:41 AM1/5/10
to
Hi,

How can a value of type A be bit-casted to value of type B? By bit-
casting i mean that both values will have the same machine bit
representation. Let's assume that A and B both have the same size in
bits.

Here are a few methods which i can think of:
1)
A a;
B b;
b = *static_cast<B*>(&a)

2)
Using a union

3)
memcpy(&b, &a, sizeof(A));

Which are standard compliant? Is there another way?

TIA

Rolf Magnus

unread,
Jan 5, 2010, 2:43:07 AM1/5/10
to
zr wrote:

> Hi,
>
> How can a value of type A be bit-casted to value of type B? By bit-
> casting i mean that both values will have the same machine bit
> representation.

reinterpret_cast.

> Let's assume that A and B both have the same size in bits.
>
> Here are a few methods which i can think of:
> 1)
> A a;
> B b;
> b = *static_cast<B*>(&a)
>
> 2)
> Using a union
>
> 3)
> memcpy(&b, &a, sizeof(A));
>
> Which are standard compliant?

2) is explicitly marked as invoking undefined behaviour. You are not allowed
to read any union member other than the one you have last written to.

I think the other ones are pretty much the same. I'm not sure if it's
undefined or only unspecified, but it's definitely compiler-specific. Of
course, you have to ensure that your A value is not a trap representation
for B. And A and B should be POD types.

> Is there another way?

b = reinterpret_cast<B&>(a);

Joshua Maurice

unread,
Jan 5, 2010, 4:14:43 AM1/5/10
to
On Jan 4, 10:37 pm, zr <zvir...@gmail.com> wrote:
> Hi,
>
> How can a value of type A be bit-casted to value of type B? By bit-
> casting i mean that both values will have the same machine bit
> representation. Let's assume that A and B both have the same size in
> bits.
>
> Here are a few methods which i can think of:
> 1)
> A a;
> B b;
> b = *static_cast<B*>(&a)

static_cast is not what you want. It may do conversions.

> 2)
> Using a union

Although technically undefined behavior according to the standard's
intent and a particular reading of the standard, (nearly?) all C and C+
+ compilers support type punning through a union as an extension. I
would strongly suggest having the union in scope when accessing any of
its members, though, due to a defect in the C and C++ standards.

(The defect is: separate compilation units + allowances to do strict
aliasing + unions = contradiction. If you define a union in one
translation unit, and let pointers to its members go to another
translation unit, then that other translation unit has no way to know
if those pointers alias or not. They might because they might both
point to the same union, but the strict aliasing rules are there for
the compiler to assume they do not. Ergo: bug in the C and C++
standards.)

> 3)
> memcpy(&b, &a, sizeof(A));

I think this is the most standard compliant way of doing it.

> Which are standard compliant? Is there another way?

Note that strongly suggested, but not spelled out anywhere literally,
in the C++ standard is the allowance to read to or write from any POD
type through a char pointer or unsigned char pointer (using
reinterpret_cast or static_cast through void pointer). memcpy of POD
types seems to be more strongly allowed, whereas the char pointer and
unsigned char pointer approach is not quite so guaranteed. (Also, if
you write a trap representation, you're on your own.)

Also note that this is a very black art as the standard is not the
most clear about it. Also note that all of this entirely
implementation dependent, and thus not portable. Also, you better know
what you're doing.

SG

unread,
Jan 5, 2010, 4:30:59 AM1/5/10
to
Rolf Magnus wrote:
> 2) is explicitly marked as invoking undefined behaviour. You are not allowed
> to read any union member other than the one you have last written to.

Can you quote the standard on that one? Is it implied by some of the
other rules? There is 3.10/15 which seems relevant. But I didn't find
anything else specific to unions except that at most one member of a
union can be "active". According to 3.10/15 you seem to exclude some
valid uses with your statement.

Cheers,
SG

Joshua Maurice

unread,
Jan 5, 2010, 4:46:14 AM1/5/10
to

The reading I always had was that the C++ standard seemed to hint in
that very passage on strict aliasing that you could type pun through a
union, but through discussions I've learned that the intent of the C
standard was such type punning is not allowed, or so random people X
say. The standard itself is pretty vague on the subject, and the C++
standard itself almost seems to allow such things according to
3.10/15.

SG

unread,
Jan 5, 2010, 6:50:08 PM1/5/10
to
On 5 Jan., 07:37, zr <zvir...@gmail.com> wrote:
>
> How can a value of type A be bit-casted to value of type B? By bit-
> casting i mean that both values will have the same machine bit
> representation.

FYI: In C++ standard terminology we have "object representation" and
"value representation" where the latter is a subset (in bits) of the
former. The value representation is the set of bits that determines
the value. The object representation may include padding bits.

> Let's assume that A and B both have the same size in
> bits.

OK.

> Here are a few methods which i can think of:
> 1)
> A a;
> B b;
> b = *static_cast<B*>(&a)

In case the types A and B are compatible with respect to 3.10/15
(which I quote below) you can do this with a reinterpret_cast:

b = reinterpret_cast<B&>(a);

(You don't need pointers here). In case B is a base class of A, you
don't need a reinterpret_cast, of course. In case A and B are not
"compatible" (w.r.t. 3.10/15) but both are PODs (plain old data
structures) you still have the option to use memcpy.

> 2)
> Using a union

In practice it may work with your compiler. But the standard doesn't
seem to be really clear on that one. My understanding is that if the
reinterpret_cast thing "works" (in the sense that it's guaranteed by
3.10/15) then the union version should also work. But others keep
telling us that the standard's intent is to restrict read access to
the only union member that is "active" (the last one that has been
written to).

C++ standard, 3.10/15:

"If a program attempts to access the stored value of an object
through an lvalue of other than one of the following types the
behaviour is undefined
- the dynamic type of the object,
- a cv-qualified version of the type of the object,
- a type similar (as defined in 4.4) to the dynamic type of the
object,
- a type that is the signed or unsigned type corresponding to
the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to
a cv-qualified version of the dynamic type of the object,
- an aggregate or union type that includes one of the
aforementioned types among its members (including, recursivly,
a member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of
the dynamic type of the object
- a char or unsigned char type.

> 3)
> memcpy(&b, &a, sizeof(A));

Should be okay if A and B are PODs and the value representation of b
is a valid value representation for the type A. Otherwise, it's
probably undefined behaviour (or at least implementation-defined, not
sure).

If you provide more details about your problem we could probably give
a better answer. Otherwise there are a lot of cases to consider.

Cheers,
SG

James Kanze

unread,
Jan 6, 2010, 2:54:19 PM1/6/10
to
On Jan 5, 7:43 am, Rolf Magnus <ramag...@t-online.de> wrote:

> zr wrote:
> > How can a value of type A be bit-casted to value of type B?
> > By bit- casting i mean that both values will have the same
> > machine bit representation.

> reinterpret_cast.

> > Let's assume that A and B both have the same size in bits.

> > Here are a few methods which i can think of:
> > 1)
> > A a;
> > B b;
> > b = *static_cast<B*>(&a)

Which shouldn't compile, of course.

> > 2)
> > Using a union

> > 3)
> > memcpy(&b, &a, sizeof(A));

> > Which are standard compliant?

> 2) is explicitly marked as invoking undefined behaviour. You
> are not allowed to read any union member other than the one
> you have last written to.

> I think the other ones are pretty much the same. I'm not sure
> if it's undefined or only unspecified, but it's definitely
> compiler-specific. Of course, you have to ensure that your A
> value is not a trap representation for B. And A and B should
> be POD types.

Anything you can do to make the bits of one type be interpreted
as another type has to be undefined behavior, since the
standard can't define what might happen. (Interpreting the bits
of a long as if they were a double might result in a trapping
NaN, for example.) In the end, you're necessarily playing with
implementation defined behavior at best in such cases.

Having said that, the authors of the standard also realized that
there is (generally very low level) code where such games are
necessary. That's why they provided reinterpret_cast.

Note that you still have to be very, very careful, however,
because if you end up with two pointers of different types,
unless one of the types is a character type, the compiler is
free to assume that they cannot be aliases to the same memory.
As long as the reinterpret_cast is freely visible, from a QoI
point of view, at least, you should be safe (but I think g++ may
have problems in this regard), but beyond that, all bets are
off.

--
James Kanze

James Kanze

unread,
Jan 6, 2010, 4:27:00 PM1/6/10
to
uOn Jan 5, 9:30 am, SG <s.gesem...@gmail.com> wrote:
> Rolf Magnus wrote:
> > 2) is explicitly marked as invoking undefined behaviour. You
> > are not allowed to read any union member other than the one
> > you have last written to.

> Can you quote the standard on that one? Is it implied by some
> of the other rules?

It's clearer in the C standard (which I don't have accessible
here), but even in C++, "In a union, at most one of the data
members can be active at any time, that is, the value of at most
one of the data members can be stored in a union at any time."
That pretty much indicates that in fact, the union has the time
of its "active" member, and accessing it through any other
member is undefined behavior. (IIRC, the C standard says this
explicitly.)

IMHO, it's an issue that the standards committee should address
(although maybe the C standards committee, rather than the C++,
since C and C++ should really be compatible in this regard).
Historically, I think the union was the preferred solution for
type punning, and from a compiler writer's point of view, it
should be the preferred solution. The C committee explicitly
forbid this use of unions, however, but didn't make it really
clear that casting pointers should work. The C++ committee
introduced reinterpret_cast, doubtlessly to support specific
uses of C casts (that one didn't want to accidentally get), with
a very vague suggestion that reinterpret_cast should be used for
this, but without explicitly offering the necessary guarantees.
So in fact, you're very much at the mercy of the implementers:
g++, for example, does guarantee the use of unions in such
cases, and takes all the liberties which the standard allows for
reinterpret_cast.

--
James Kanze

Joshua Maurice

unread,
Jan 6, 2010, 4:34:59 PM1/6/10
to

A lot of my understanding of these issues comes from
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html

It's my understanding from the above document and various other
sources that reinterpret_cast does not make the strict aliasing
problem go away, so I will have to disagree with your assessment
James. reinterpret_cast works to convert from values of different
types, so you could convert a float* to an int*, but it does nothing
to alleviate the requirement that accessing an int object through a
float& lvalue is undefined behavior.

James, when he mentions that the reinterpret_cast should be "in
scope", reminds me greatly of how I caution people about using unions
to type pun. (See my earlier post else-thread.) Specifically, a union
tells the compiler that its member types may alias for the scope of
the union. Note that this is not the intent of "union", merely an
extension supported by basically all compilers. However, it's my
operating assumption that, for most / all compilers, a
reinterpret_cast was not intended to, and will not in practice, tell
the compiler that in its "scope" that the unrelated pointer types may
alias. Your suggestion to the contrary is the first that I've heard.
Is this just a guess, or do you have any experience to back this up?
Admittingly, I haven't done any tests either.

Also, I know g++ actually uses the strict aliasing allowance to
optimize, but the visual studios compiler does not. I know nothing
about other compilers in this regard. I've heard that visual studios
does not because it would break too much windows code and code written
for windows. (The gcc people recognized this as well for its own
situation and provided the -fno-strict-aliasing option.)

Robbie Hatley

unread,
Mar 10, 2010, 7:23:15 PM3/10/10
to

"zr" wrote:

> How can a value of type A be bit-casted to value of type B?

> By bit-casting i mean that both values will have the same
> machine bit representation.

That's called "type casting", or just "casting".
As much as possible, it should be avoided.

> Let's assume that A and B both have the same size in
> bits.
>
> Here are a few methods which i can think of:
> 1)
> A a;
> B b;
> b = *static_cast<B*>(&a)


Needs reinterpret_cast, actually.


> 2)
> Using a union
>
> 3)
> memcpy(&b, &a, sizeof(A));
>
> Which are standard compliant?

I think they are all "standard compliant" in that
they're valid C++.

The "static_cast" / "reinterpret_cast" method is the
prefered way for C++, of the methods you list. (The
other methods are leftovers from C.)

However, the C++ standard cannot guarantee that code which
uses such casts will actually perform as you intend,
because the standard has no way of knowing which machines
you will be executing your code on, or how those machines
represent objects in memory.

Hence any code that makes assumptions about how objects
are represented in memory will always have the following
flaws:

1. Fragile. (The code may break at any time due to
compiler updates, OS changes, CPU changes, or for
other reasons.)

2. Unclear. (Maintainance programmers will have a hard time
understanding what you are doing, and may make changes
which seem harmless to them, but end up breaking your
program.)

3. Not portable. (Ask yourself what would happen if you port
your code to a machine which uses a little-endian
representation for type A, but a big-endian representation
for type B? The values will get screwed up. Or what if
you port the code to a machine where type A uses 17 bits
but type B uses 37 bits? Your object b will now contain
20 extra bits of garbage.)

That being said, casting does come in handy sometimes.
Eg, I use the following in a program of mine to get the
0-255 numerical value of a character:

// Put character 'H' in variable A:
char A = 'H';

// Explict cast to unsigned char, followed by implicit cast to int:
int Value = static_cast<unsigned char>(A);

// Print the decimal ASCII code for character 'H':
std::cout << Value << endl;

But it does make assumptions about how types char and
unsigned char are stored in memory. If those assumptions
ever become invalid, the code will break.

--
Cheers,
Robbie Hatley
lonewolf at well dot com
www dot well dot com slant tilde lonewolf slant


Joshua Maurice

unread,
Mar 10, 2010, 8:49:29 PM3/10/10
to
On Mar 10, 4:23 pm, "Robbie Hatley"

Thread resurrection!

And no. You are not correct on many points.

reinterpret_cast was not added to the language to support type
punning. Do not use it to access an object through an lvalue of an
incorrect type. reinterpret_cast to be an improvement over some usage
of C-style casts, like the other 3 named casts. Specifically, the new
casts are not context dependent in their effect, unlike the C-style
cast, and they are more easily grep-able, unlike the C-style cast.
reinterpret_cast was just intended to clean up usage of the C-style
cast, not allow new usages such as bypassing the strict aliasing rule.
Repeating to emphasize: C-style casts and reinterpret_cast's will not
bypass the strict aliasing rule.

You can type pun using std::memcpy between POD types. The standard is
quite clear that this should produce the expected results.

Then we have char and unsigned char. The intent of the standard seems
to be to allow reading or writing any object through a char lvalue or
unsigned char lvalue. However, it's not explicitly stated as allowed.

Finally we have type punning through unions. While explicitly not
supported by the standard(s), it's supported as a compiler extension
by basically every C and C++ compiler. (Confirmation anyone?)

When doing any low level bit hackery, however, I would strongly
suggest looking at the generated assembly to confirm expectations.
Compiler bugs (and incorrect expectations) tend to be a little more
prevalent when doing such things.

Might I again suggest reading:
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
This is my authoritative link on the subject, and will remain so until
someone provides me good evidence that the article is incorrect.
Anyone doing low level hackery such as type punning should read this
article and understand the nuances of the strict aliasing rule.

Also, there are plenty of good reasons to do type punning. For
example, games. I'm pretty sure that using a portable UTF8 text format
for ints for their network packets would result in rather unacceptable
performance. Yes, it carries the problem that a newer compiler, OS,
etc., may render the communication not compatible, but in certain
contexts this is an acceptable drawback. (Next I'll be hearing that a
first person shooter should send its data over the network in XML
format and should utilize a full XML parser on the receiver side. /
sigh)

Robbie Hatley

unread,
Mar 11, 2010, 5:00:03 AM3/11/10
to
"Joshua Maurice" wrote:

> Thread resurrection!

Oops! I had my columns sorted by "Subject" rather than by
"Date", so I thought I was replying to most-recent thread,
but it was actually 2 months old.

I don't have the time or inclination to reply to all of
your (somewhat lengthy) reply, but I'll reply to a few
points that caught my interest.

> ... reinterpret_cast was not added to the language to
> support type punning.

Interesting. So, what's it for? I can't see any way to
use it that wouldn't be some form of "type punning".

> ... Finally we have type punning through unions. While explicitly


> not supported by the standard(s), it's supported as a compiler

> extension by basically every C and C++ compiler ...

Yah, I've seen that kind of union type spoofing used in
production code at work. Tricky, messy, ugly, dangerous.
And I see now that it apparently violates the standard
as well.

> ... You can type pun using std::memcpy between POD types ...

Yes, memcpy works well for that sort of thing.
I've used it at work for copying stuff coming in from
(or heading out to) a serial communications port.
Basically:

Output: struct => memcpy => char array => serial port
Input: serial port => char array => memcpy => struct

> ... When doing any low level bit hackery, however, I would


> strongly suggest looking at the generated assembly to confirm

> expectations ...

I don't speak assembly myself. But if a piece of
representation-reinterpretation-based code works during testing,
then it works. I don't see how examining assembly code is going
to tell you more. The catch is, even if it works NOW, there is
never any guarantee that it will KEEP ON working, or work if ported.

Interesting and informative, but I'd have to learn assembly
to fully understand everything in it.

> ... there are plenty of good reasons to do type punning ...

And yet it's usually dangerous, no matter how good the reasons
for doing it.

Message has been deleted

Joshua Maurice

unread,
Mar 11, 2010, 4:04:32 PM3/11/10
to
On Mar 11, 2:00 am, "Robbie Hatley"
<see.my.signat...@for.my.contact.info> wrote:

> "Joshua Maurice" wrote:
> > ... reinterpret_cast was not added to the language to
> > support type punning.
>
> Interesting.  So, what's it for?  I can't see any way to
> use it that wouldn't be some form of "type punning".

Again. The C++ standard writers considered C-style casts "bad" and
"ugly", and rightfully so.

The first problem is the C-style cast is context dependent.
A* a = (B*)b;
Depending on if A and B are complete types at the point of the cast,
the cast will do different things. If they're incomplete types, it's a
reinterpret_cast, always. This will generally break when multiple
inheritance or virtual inheritance is involved because casting in such
a type hierarchy actually can change the bit value of the pointer,
change the offset into the object, but reinterpret_cast will never do
that, so you'll have an A* pointing to the wrong offset, the wrong
virtual function pointer, etc. I've hit this in production code, where
someone did a C-style cast with MI, but it broke because the types
were just forward declared. By providing 4 different named casts, we
can avoid this potential problem. Generally, you'll want a static_cast
or dynamic_cast, and both of these will fail to compile when the types
are incomplete.

The next problem is again one of vagueness. The C-style cast can
1- cast between unrelated types ala reinterpret_cast
2- do implicit casts, like casting to a base class
3- downcast, like a static_cast
4- (And it can also cast to an inaccessible base class, but the 4
named casts cannot)
By separating these different functions into different named casts,
the code becomes clearer as the intent is more easily ascertained, and
there's less chance of mistakes due to what's in scope, if the types
are complete types, etc.

Finally, it's relatively hard to grep for c-style casts, but it's
quite easy to grep for the 4 named casts. It also makes the code
clearer IMHO that a cast is going on with a quick glance. Casts should
be rare, and they should stand out more than the C-style cast stands
out.

Joshua Maurice

unread,
Mar 11, 2010, 5:58:04 PM3/11/10
to
On Mar 11, 1:04 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:
>   A* a = (B*)b;

Err. I meant
A* a = (A*) b;
where b is some other type.

James Kanze

unread,
Mar 11, 2010, 7:21:34 PM3/11/10
to
On Mar 11, 12:23 am, "Robbie Hatley"

<see.my.signat...@for.my.contact.info> wrote:
> "zr" wrote:
> > How can a value of type A be bit-casted to value of type B?
> > By bit-casting i mean that both values will have the same
> > machine bit representation.

> That's called "type casting", or just "casting". As much as
> possible, it should be avoided.

Actually, it's called type punning. In C++ (and in C), a "cast"
is an explicit type conversion. Any explicit type conversion:
int to double, for example, or even just removing const.

But you're right that type punning should be avoided in general.
It has its place in some very low level software, but unless
you're implementing a garbage collector, or something along
those lines, you probably shouldn't be using it.

> > Let's assume that A and B both have the same size in
> > bits.

> > Here are a few methods which i can think of:
> > 1)
> > A a;
> > B b;
> > b = *static_cast<B*>(&a)

> Needs reinterpret_cast, actually.

Yes. reinterpret_cast is the cast for type punning in C++.

> > 2)
> > Using a union

Involves undefined behavior.

> > 3)
> > memcpy(&b, &a, sizeof(A));

> > Which are standard compliant?

> I think they are all "standard compliant" in that they're
> valid C++.

> The "static_cast" / "reinterpret_cast" method is the prefered
> way for C++, of the methods you list. (The other methods are
> leftovers from C.)

> However, the C++ standard cannot guarantee that code which
> uses such casts will actually perform as you intend, because
> the standard has no way of knowing which machines you will be
> executing your code on, or how those machines represent
> objects in memory.

The C++ standard also allows the compiler to assume that
pointers to different types never point to the same object (with
an exception for pointers to character types), which means that
even reinterpret_cast can be dangerous if the compiler is
aggresively optimizing. The safest way is the memcpy, because
it involves two different objects. Used correctly, however,
reinterpret_cast should be fairly safe.

[...]


> That being said, casting does come in handy sometimes.

Most of the more useful conversions are implicit, but it's still
relatively frequent to use things like:

int a;
int b;
double percent = (double)a / (double)b * 100.0;
(I'd write that last line:
double percent = 100.0 * a / b;
and let the implicit type promotions do the job, but it's not
always that simple.)

Of course, that's not a bitwise conversion; not type punning.

> Eg, I use the following in a program of mine to get the
> 0-255 numerical value of a character:

> // Put character 'H' in variable A:
> char A = 'H';

> // Explict cast to unsigned char, followed by implicit cast to int:
> int Value = static_cast<unsigned char>(A);

> // Print the decimal ASCII code for character 'H':
> std::cout << Value << endl;

> But it does make assumptions about how types char and unsigned
> char are stored in memory. If those assumptions ever become
> invalid, the code will break.

What assumptions. Except for the actual encoding of the letter
'H', the code above is fully defined.

It is, in fact, one of the more frequent uses of casts: you
can't portably call any of the functions declared in <ctype.h>
with a char; you have to explicitly convert the char to unsigned
char first.

--
James Kanze

James Kanze

unread,
Mar 11, 2010, 7:29:57 PM3/11/10
to
On Mar 11, 1:49 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> reinterpret_cast was not added to the language to support type
> punning.

Why was it added to the language, then?

[...]


> Finally we have type punning through unions. While explicitly
> not supported by the standard(s), it's supported as a compiler
> extension by basically every C and C++ compiler. (Confirmation
> anyone?)

The only compiler I've seen that documents it as being supported
is g++ (but I've not really looked at all of the documentation).
And even with g++, it depends on the context---there are cases
where it will fail.

From a QoI point of view: if the union or the reinterpret_cast
is visible, I would expect the code to give the expected
results. Any accesses elsewhere, and all bets are off, e.g.:

int f(int* pi, double* pd)
{
int retval = *pi;
*pd = 3.14159;
return retval;
}

int
main()
{
union U { int i; double d; } x;
x.i = 42;
std::cout << f(&x.i, &x.d) << std::endl;
return 0;
}

I would not count on this code outputting 42, regardless of any
guarantees the compiler might give.

[...]


> Also, there are plenty of good reasons to do type punning. For
> example, games. I'm pretty sure that using a portable UTF8
> text format for ints for their network packets would result in
> rather unacceptable performance.

So you use any one of a number of binary formats. You still
don't need (nor want) type punning to implement them.

> Yes, it carries the problem that a newer compiler, OS, etc.,
> may render the communication not compatible, but in certain
> contexts this is an acceptable drawback. (Next I'll be hearing
> that a first person shooter should send its data over the
> network in XML format and should utilize a full XML parser on
> the receiver side. / sigh)

What's wrong with XDR?

--
James Kanze


Joshua Maurice

unread,
Mar 11, 2010, 8:09:05 PM3/11/10
to
On Mar 11, 4:29 pm, James Kanze <james.ka...@gmail.com> wrote:
> On Mar 11, 1:49 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
>
> > reinterpret_cast was not added to the language to support type
> > punning.
>
> Why was it added to the language, then?

See my previous post else-thread for my understanding.


> > Finally we have type punning through unions. While explicitly
> > not supported by the standard(s), it's supported as a compiler
> > extension by basically every C and C++ compiler. (Confirmation
> > anyone?)
>
> The only compiler I've seen that documents it as being supported
> is g++ (but I've not really looked at all of the documentation).
> And even with g++, it depends on the context---there are cases
> where it will fail.
>
> From a QoI point of view: if the union or the reinterpret_cast
> is visible, I would expect the code to give the expected
> results.  Any accesses elsewhere, and all bets are off, e.g.:
>
>     int f(int* pi, double* pd)
>     {
>         int retval = *pi;
>         *pd = 3.14159;
>         return retval;
>     }
>
>     int
>     main()
>     {
>         union U { int i; double d; } x;
>         x.i = 42;
>         std::cout << f(&x.i, &x.d) << std::endl;
>         return 0;
>     }
>
> I would not count on this code outputting 42, regardless of any
> guarantees the compiler might give.

//Start code for foo.cpp
#include <iostream>
using namespace std;

int main()
{
cout << sizeof(int) << " " << sizeof(short) << endl;
{
int x = 1;
short* s = reinterpret_cast<short*>(&x);
s[0] = 2;
s[1] = 3;
cout << x << endl;
}
{
int x = 1;
union { int u_int; short u_short_array[2]; };
u_int = x;
u_short_array[0] = 2;
u_short_array[1] = 3;
x = u_int;
cout << x << endl;
}
}
//End code

//Start prompt copy
bash-3.2$ g++ --version
g++ (GCC) 4.1.2 20080704 (Red Hat 4.1.2-44)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.

bash-3.2$ g++ -O3 foo.cpp -Wall
foo.cpp: In function âint main()â:
foo.cpp:9: warning: dereferencing type-punned pointer will break
strict-aliasing rules
bash-3.2$ ./a.out
4 2
1
196610
bash-3.2$
//end prompt copy

So, when the union is in scope, gcc "does the right thing", and when
reinterpret_cast is in scope, gcc does not "do the right thing". Now,
I don't know any other compilers offhand which optimize with the
strict aliasing allowance besides newer gcc versions, but I would
suggest you revise your understanding of the QoI implications of
reinterpret_cast. My previous argument succinctly: C-style casts were
never intended to get around strict aliasing. reinterpret_cast was
never intended to be more powerful than C-style casts. (The named
casts were each intended to fulfill a specific role of the several
roles of C-style casts to remove potential ambiguity to the code
writer and readers.) Thus reinterpret_cast was never intended to get
around strict aliasing.

Perhaps I am wrong about the original intent. However, I'm at least
right on the questions of fact, at least if we count gcc as a good
example.


> > Also, there are plenty of good reasons to do type punning. For
> > example, games. I'm pretty sure that using a portable UTF8
> > text format for ints for their network packets would result in
> > rather unacceptable performance.
>
> So you use any one of a number of binary formats.  You still
> don't need (nor want) type punning to implement them.

You would need to type pun somewhere as the OS network calls probably
only work in terms of char pointers. So either the game code is type
punning, or the network library on top of the OS is type punning, or
the device driver is type punning (or written in assembly). Someone is
probably type punning in C or C++.

Joshua Maurice

unread,
Mar 11, 2010, 8:26:08 PM3/11/10
to
Also, I started re-reading
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html
just now. It appears as though it's not perfectly accurate either. It
ignores the allowance that you can cast between POD types with common
leading parts and access the common leading parts as expected. The
article's very point about structs Foo and Bar is actually inaccurate,
though I support the article on its intent to say "No. It doesn't work
naively. The compiler allowed to do non-obvious and surprising things
because of the strict aliasing rule."

James Kanze

unread,
Mar 12, 2010, 4:56:54 AM3/12/10
to
On Mar 12, 1:09 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Mar 11, 4:29 pm, James Kanze <james.ka...@gmail.com> wrote:

> > On Mar 11, 1:49 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:

> > > reinterpret_cast was not added to the language to support type
> > > punning.

> > Why was it added to the language, then?

> See my previous post else-thread for my understanding.

I didn't see any real explination, other than to provide a new
style cast for certain C style casts.

> > > Finally we have type punning through unions. While
> > > explicitly not supported by the standard(s), it's
> > > supported as a compiler extension by basically every C and
> > > C++ compiler. (Confirmation anyone?)

> > The only compiler I've seen that documents it as being
> > supported is g++ (but I've not really looked at all of the
> > documentation). And even with g++, it depends on the
> > context---there are cases where it will fail.

> > From a QoI point of view: if the union or the
> > reinterpret_cast is visible, I would expect the code to give
> > the expected results. Any accesses elsewhere, and all bets
> > are off, e.g.:

> > int f(int* pi, double* pd)
> > {
> > int retval = *pi;
> > *pd = 3.14159;
> > return retval;
> > }

> > int
> > main()
> > {
> > union U { int i; double d; } x;
> > x.i = 42;
> > std::cout << f(&x.i, &x.d) << std::endl;
> > return 0;
> > }

> > I would not count on this code outputting 42, regardless of
> > any guarantees the compiler might give.

Note that my statement above is based on how compiler optimizers
work. The motivation behind the anti-aliasing rule (e.g. that
two pointers to different types cannot refer to the same object)
is to allow certain optimizations. Optimizations which are
important in some code. But this only works if the above is not
guaranteed to work.

As the C++ standard is currently written, the above *is*
guaranteed to work. I don't think that this was the intent,
however. C has (or had) similar rules. I know that the issue
was discussed in the C committee, but I don't know the exact
status of the proposed resolution.

> //End code

Which is, from a QoI point of view, an error.

In fact, I don't know whether this is an error in the coding,
a problem in the way that the optimizer works which would make
it difficult (and perhaps not worth it) to fix, or simply a bit
of stubborness on the part of g++ developers. It does mean that
reinterpret_cast is useless, which is certainly not the intent
of the committee.

The issue isn't simple. Historically (pre-ISO C), the union was
the preferred solution, at least from what I understood. For
whatever reasons, the ISO C committee (or at least the parts of
it which formallized the wording in this regard) designed to
make type punning via a union undefined behavior, which means
that only casting remains. The C++ committee simply followed
the C committee in this respect---I'm 100% sure that the intent
of the C++ committee is that a reinterpret_cast, when legal,
behave exactly the same as a cast in C.

In addition, there is a note (non-normative) in the C++ standard
(§5.2.10/3) concerning the mapping done by reinterpret_cast: "it
is intended to be unsurprising to those who know the addressing
structure of the underlying machine". Although this note is
directly attached to the pointer to integer conversion, in the
absense of any other indications, it seems reasonable to me to
apply it to the other uses of reinterpret_cast as well.

In any case, the current standard is very unclear with regards
to type punning---with the exception of character types. And I
don't think that this has changed in the more recent drafts---in
a very real sense, I think that it is more a C problem; that the
C++ committee should simply wait, and adopt whatever the C
committee finally decides.

> Now, I don't know any other compilers offhand which optimize
> with the strict aliasing allowance besides newer gcc versions,

I don't know of any that don't. It's a common optimization. It
was present in Microsoft C 1.0, for example (in which your union
example would break).

> but I would suggest you revise your understanding of the QoI
> implications of reinterpret_cast.

Why? My understanding of the QoI implications are based on the
actual words in the standard, and the various discussions that
I've followed in the standardization committees.

> My previous argument succinctly: C-style casts were never
> intended to get around strict aliasing.

Yes and no. An optimizer is expected to use the knowledge at
its disposal. If it can see that there is aliasing, whether
from a union or a cast, it should take it into account.

That is, by the way, the direction the C committee was going the
last time I looked.

> reinterpret_cast was never intended to be more powerful than
> C-style casts.

Certainly not. But since the C style cast should behave as
expected, when visible, so should the reinterpret_cast.

> (The named casts were each intended to fulfill a specific role
> of the several roles of C-style casts to remove potential
> ambiguity to the code writer and readers.) Thus
> reinterpret_cast was never intended to get around strict
> aliasing.

You seem to be misunderstanding the motivation behind the strict
aliasing rule. It is to allow the compiler to assume no
aliasing in cases where it otherwise couldn't. There was never
any intent to allow the compiler to totally ignore aliasing that
it can clearly see.

> Perhaps I am wrong about the original intent. However, I'm at
> least right on the questions of fact, at least if we count gcc
> as a good example.

I don't think you can count any single compiler as a
"reference".

> > > Also, there are plenty of good reasons to do type punning.
> > > For example, games. I'm pretty sure that using a portable
> > > UTF8 text format for ints for their network packets would
> > > result in rather unacceptable performance.

> > So you use any one of a number of binary formats. You still
> > don't need (nor want) type punning to implement them.

> You would need to type pun somewhere as the OS network calls
> probably only work in terms of char pointers.

No. You do need to convert value types, but that's all.

For integral types, there's really no need for any type punning
whatsoever. In the case of floating point, the issue is more
complex, since the code necessary to portably convert a string
of bytes of a given format into a floating point value is
relatively complex, and more expensive than type punning an
uint_64 to a double, in the case where you know that the
external format and the internal format are the same (e.g.
IEEE).

> So either the game code is type punning, or the network
> library on top of the OS is type punning, or the device driver
> is type punning (or written in assembly). Someone is probably
> type punning in C or C++.

I've written a lot of network code in which there was no type
punning. I have an implemenation of an xdrstream which does no
type punning, even for floating point. For integral types, it's
about as fast as implementations which do type pun (but it is
far more portable); for floating point, it's measurably slower
(but has the advantage that it works regardless of the machine
floating point format), but not nearly as much as I initially
expected.

--
James Kanze

James Kanze

unread,
Mar 12, 2010, 5:18:54 AM3/12/10
to
On Mar 12, 1:26 am, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> Also, I started re-reading
> http://cellperformance.beyond3d.com/articles/2006/06/understanding-st...

> just now. It appears as though it's not perfectly accurate either.

And how. Or rather, it seems to be discussing in detail what
g++ actually does, rather than anything based on the standard.
(Note that there's also a problem with terminology. There is a
statement "Pointers to aggregate or union types with different
tags do not alias", but the example doesn't have any tags, and
the pointers in it are, in fact, allowed to alias in C, because
in C, the struct's Foo and Bar are the same type.)

> It ignores the allowance that you can cast between POD types
> with common leading parts and access the common leading parts
> as expected. The article's very point about structs Foo and
> Bar is actually inaccurate, though I support the article on
> its intent to say "No. It doesn't work naively. The compiler
> allowed to do non-obvious and surprising things because of the
> strict aliasing rule."

That is, of course, the crux of the matter. If you have a
function:
void f(int* pi, double* pd)
The compiler will assume that pi and pd don't refer to the same
element. The standard clearly intends to give this guarantee,
although there are a very few cases where it in fact doesn't.
And the guarantee is important for optimizing purposes. Beyond
that, the standard is far from clear: from a QoI point of view,
I would expect the compiler to recognize visible aliasing, and
take it into account; if nothing else, not doing so is being
intentionally perverse. From discussions in the C committee,
prior to the formalizing of C90, I conclude that the *intent* is
1) that a checking compiler is allowed to somehow "discriminate"
unions, and detect cases where the accessed entry is not the
last written (modulo the few cases where this is guaranteed to
work), and 2) that the intent is that type punning be done by
casting. This conclusion is, of course, based on my memory and
my interpretation of discussions which occured a long time ago.
But pratically speaking, support for pointer casts in C doesn't
make sense otherwise.

Practically speaking, from a QoI point of view: if the compiler
sees a reinterpret_cast (or a pointer cast in C), it should be
clear that the programmer is doing something tricky at a very
low level, and that there *is* aliasing. Not taking that into
account is simply perverse. From a practical point of view,
too, unless the compiler is generating extensive debugging code
and actually discriminating unions, in order to detect errors,
the compiler should also make unions work as expected (but the
earliest versions of Microsoft C didn't); even in a debugging
compiler, I would expect some sort of option or pragma to allow
this common and traditional, albeit illegal, use of unions.

--
James Kanze

Joshua Maurice

unread,
Mar 12, 2010, 7:19:22 AM3/12/10
to

It's an error if you believe that those are the desired semantics.


> In fact, I don't know whether this is an error in the coding,
> a problem in the way that the optimizer works which would make
> it difficult (and perhaps not worth it) to fix, or simply a bit
> of stubborness on the part of g++ developers.  It does mean that
> reinterpret_cast is useless, which is certainly not the intent
> of the committee.

I would argue that it's the gcc team's stubbornness to follow the
standard as written. I cannot speak to the intent of the committee,
nor can most users of C++. However, we can speak to what the standard
clearly says. That said, it is somewhat silly to provide type punning
when the union is in scope but not allow type punning when a cast is
in scope. I think it makes a little more sense if we say that they're
simply following current practice, and this is how most other
compilers do it. (Again, confirmation or evidence to the contrary
anyone?)


> The issue isn't simple.  Historically (pre-ISO C), the union was
> the preferred solution, at least from what I understood.  For
> whatever reasons, the ISO C committee (or at least the parts of
> it which formallized the wording in this regard) designed to
> make type punning via a union undefined behavior, which means
> that only casting remains.  The C++ committee simply followed
> the C committee in this respect---I'm 100% sure that the intent
> of the C++ committee is that a reinterpret_cast, when legal,
> behave exactly the same as a cast in C.

Repeating for emphasis:

> I'm 100% sure that the intent
> of the C++ committee is that a reinterpret_cast, when legal,
> behave exactly the same as a cast in C.

I agree with that. I cannot speak to the intent of the committee(s) as
you can, but I can speak to what they wrote, and the standard is quite
clear that the C-style cast does not get around the strict aliasing
rule, and thus reinterpret_cast does not get around the strict
aliasing rule.


> In addition, there is a note (non-normative) in the C++ standard
> (§5.2.10/3) concerning the mapping done by reinterpret_cast: "it
> is intended to be unsurprising to those who know the addressing
> structure of the underlying machine".  Although this note is
> directly attached to the pointer to integer conversion, in the
> absense of any other indications, it seems reasonable to me to
> apply it to the other uses of reinterpret_cast as well.

There is no such absence in the C++ standard. It is very clear that
accessing an object through an lvalue of a sufficiently different type
is undefined behavior (except for the char and unsigned exception, and
the common leading part of POD exception).

The section you cite, including the normative note, is a very narrow
exception which states that a reinterpret_cast on a pointer will
produce an rvalue whose value should not be surprising to those who
know the addressing structure of the underlying machine. This in no
way is an exception to the strict aliasing rule. Instead, in this
context reinterpret_cast takes one value of a certain type and casts
that value to another type. It does not tell the compiler that two
different pointers alias or in any way affect the strict aliasing
rule.


> In any case, the current standard is very unclear with regards
> to type punning---with the exception of character types.  And I
> don't think that this has changed in the more recent drafts---in
> a very real sense, I think that it is more a C problem; that the
> C++ committee should simply wait, and adopt whatever the C
> committee finally decides.
>
> > Now, I don't know any other compilers offhand which optimize
> > with the strict aliasing allowance besides newer gcc versions,
>
> I don't know of any that don't.  It's a common optimization.  It
> was present in Microsoft C 1.0, for example (in which your union
> example would break).

Really? I was under the impression that basically no Microsoft
compiler actually optimized with the strict aliasing allowance, that
too much windows code would break if it did by default. Very simple
testing like that above seems to show that the Microsoft compiler does
not.


> > but I would suggest you revise your understanding of the QoI
> > implications of reinterpret_cast.
>
> Why?  My understanding of the QoI implications are based on the
> actual words in the standard, and the various discussions that
> I've followed in the standardization committees.
>
> > My previous argument succinctly: C-style casts were never
> > intended to get around strict aliasing.
>
> Yes and no.  An optimizer is expected to use the knowledge at
> its disposal.  If it can see that there is aliasing, whether
> from a union or a cast, it should take it into account.
>
> That is, by the way, the direction the C committee was going the
> last time I looked.
>
> > reinterpret_cast was never intended to be more powerful than
> > C-style casts.
>
> Certainly not. But since the C style cast should behave as
> expected, when visible, so should the reinterpret_cast.

I cannot speak to your private discussions with the committees. It's
just that's not what's in the current standards.


> > (The named casts were each intended to fulfill a specific role
> > of the several roles of C-style casts to remove potential
> > ambiguity to the code writer and readers.) Thus
> > reinterpret_cast was never intended to get around strict
> > aliasing.
>
> You seem to be misunderstanding the motivation behind the strict
> aliasing rule.  It is to allow the compiler to assume no
> aliasing in cases where it otherwise couldn't.  There was never
> any intent to allow the compiler to totally ignore aliasing that
> it can clearly see.

Perhaps I was too strong. However, I don't think it's right to be so
dismissive of that position. It is a reasonable one. Many times I hear
"The compiler should just be smart enough", but many times this is not
the case, for various reasons, such as too hard to implement, or the
semantics would be too vague or not well defined, or it would be bad
style and confusing to the coders. I think all kinds of "type punning
but only in certain scopes [unions and casts]" qualify.

Joshua Maurice

unread,
Mar 12, 2010, 7:23:51 AM3/12/10
to
On Mar 12, 2:18 am, James Kanze <james.ka...@gmail.com> wrote:
> Practically speaking, from a QoI point of view: if the compiler
> sees a reinterpret_cast (or a pointer cast in C), it should be
> clear that the programmer is doing something tricky at a very
> low level, and that there *is* aliasing.  Not taking that into
> account is simply perverse.  From a practical point of view,
> too, unless the compiler is generating extensive debugging code
> and actually discriminating unions, in order to detect errors,
> the compiler should also make unions work as expected (but the
> earliest versions of Microsoft C didn't); even in a debugging
> compiler, I would expect some sort of option or pragma to allow
> this common and traditional, albeit illegal, use of unions.

I'll just make a quick note that not all reinterpret_casts are trying
to alias pointers. A reinterpret_cast from a pointer to int, or int to
pointer, does not "try" to make any two differently typed pointers
alias. It just converts one value of a type to a value of the other
type.

Also, if they did intend for C-style casts to the be the correct way
to type pun, why include rules expressly disallowing it, and not make
any exceptions for when the cast is in scope? Again, intent, and
intent specifically contrary to what's written.

Message has been deleted

James Kanze

unread,
Mar 13, 2010, 5:30:16 PM3/13/10
to
On Mar 12, 12:23 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Mar 12, 2:18 am, James Kanze <james.ka...@gmail.com> wrote:

[...]


> I'll just make a quick note that not all reinterpret_casts are
> trying to alias pointers. A reinterpret_cast from a pointer to
> int, or int to pointer, does not "try" to make any two
> differently typed pointers alias. It just converts one value
> of a type to a value of the other type.

Yes. That's another use of reinterpret_cast. It's also
"undefined behavior", but from a QoI point of view, it should
behave in some sort of rational manner.

> Also, if they did intend for C-style casts to the be the
> correct way to type pun, why include rules expressly
> disallowing it, and not make any exceptions for when the cast
> is in scope? Again, intent, and intent specifically contrary
> to what's written.

The standard doesn't disallow it; it says it's undefined
behavior. There are two motivations for undefined behavior in
the standard: the first is that the standard doesn't expect
anyone to write such code, and so places no constraints on the
compiler if they do; the second is that any reasonable behavior
will depend on the implementation, in ways the standard cannot
forsee or delimit, and undefined behavior leaves the
implementation free to implement whatever is reasonable on that
platform, supposing that there is something reasonable. Thus,
for example, the standard doesn't what to get into the issues of
what happens if you reinterpret_cast an unsigned long* to a
double*, then access memory through the double*, since that
would mean considering things like trapping NaN's and such. So
it says "undefined behavior", and leaves it up to the
implementation to do something appropriate for that
implementation.

Which is a somewhat different issue than the aliasing issue:
there is a definite intention to allow the anti-aliasing rules
to be used for optimization. IMHO, the intent is clear: if the
compiler cannot see aliasing otherwise, it should be free to
assume that pointers to different types (modulo explicit
exceptions like characters to character types, or the initial
indentical sequences of structs) do not alias.

I'll admit that I'm reading a lot into the standard: part of
that is based on discussions I followed during the normalization
of C, and part is based on what I consider common sense: if it
is clear that there is aliasing, the compiler shouldn't assume
that there isn't, simply because of some arbitrary and perhaps
misinterpreted rule. Still, it seems like a reasonable set of
expectations to me.

--
James Kanze

James Kanze

unread,
Mar 13, 2010, 6:08:19 PM3/13/10
to
On Mar 12, 12:19 pm, Joshua Maurice <joshuamaur...@gmail.com> wrote:
> On Mar 12, 1:56 am, James Kanze <james.ka...@gmail.com> wrote:

[...]


> > > So, when the union is in scope, gcc "does the right thing",
> > > and when reinterpret_cast is in scope, gcc does not "do the
> > > right thing".

> > Which is, from a QoI point of view, an error.

> It's an error if you believe that those are the desired semantics.

Error may not be the best word in this case, since it implies
something unintentional. My argument is, precisely, that from a
QoI point of view, the desired semantics are the only ones which
make sense.

> > In fact, I don't know whether this is an error in the coding,
> > a problem in the way that the optimizer works which would make
> > it difficult (and perhaps not worth it) to fix, or simply a bit
> > of stubborness on the part of g++ developers. It does mean that
> > reinterpret_cast is useless, which is certainly not the intent
> > of the committee.

> I would argue that it's the gcc team's stubbornness to follow the
> standard as written.

I suspect that you're right. Which is, IMHO, an error from a
QoI point of view: the standard gives implementations a lot of
leeway, but from a QoI point of view, some common sense is to be
expected.

> I cannot speak to the intent of the committee, nor can most
> users of C++. However, we can speak to what the standard
> clearly says. That said, it is somewhat silly to provide type
> punning when the union is in scope but not allow type punning
> when a cast is in scope.

Exactly. The standard makes both undefined behavior. In the
case of unions, this is intentional (if I recall and interpret
correctly discussions I followed during the standardization of
C); the goal is to allow hidden discriminators. Again, if I
recall and interpret such discussions correctly, the motivation
for undefined behavior in the case of reinterpret_cast (or a
pointer case in C) is that the "reasonable" behaviors in the
case of dereferencing such a cast are in fact impossible to
specify in a portable way, but that the committee expected the
implementations to do what is reasonable for whatever the
platform was.

Keeping in mind in all of this that the committee also wanted to
allow the compiler to deduce as much as possible with regards to
anti-aliasing. We have a definite conflict in the goals here,
and the standard is, regretfully, not really as clear as we'd
like with regards to how to resolve this conflict.

> I think it makes a little more sense if we say that they're
> simply following current practice, and this is how most other
> compilers do it. (Again, confirmation or evidence to the
> contrary anyone?)

As I mentionned, Microsoft C 1.0 ignored any aliasing due to
unions, but did respect that resulting from casts.

That g++ does provide additional guarantees for unions is, IMHO,
a positive point, since in fact, before the C standard, that was
the traditional solution, and it is still widespread.

> > The issue isn't simple. Historically (pre-ISO C), the union was
> > the preferred solution, at least from what I understood. For
> > whatever reasons, the ISO C committee (or at least the parts of
> > it which formallized the wording in this regard) designed to
> > make type punning via a union undefined behavior, which means
> > that only casting remains. The C++ committee simply followed
> > the C committee in this respect---I'm 100% sure that the intent
> > of the C++ committee is that a reinterpret_cast, when legal,
> > behave exactly the same as a cast in C.

> Repeating for emphasis:

> > I'm 100% sure that the intent of the C++ committee is that a
> > reinterpret_cast, when legal, behave exactly the same as a
> > cast in C.

> I agree with that. I cannot speak to the intent of the
> committee(s) as you can, but I can speak to what they wrote,
> and the standard is quite clear that the C-style cast does not
> get around the strict aliasing rule, and thus reinterpret_cast
> does not get around the strict aliasing rule.

See above. There are two motivations for undefined behavior.

> > In addition, there is a note (non-normative) in the C++ standard
> > (§5.2.10/3) concerning the mapping done by reinterpret_cast: "it
> > is intended to be unsurprising to those who know the addressing
> > structure of the underlying machine". Although this note is
> > directly attached to the pointer to integer conversion, in the
> > absense of any other indications, it seems reasonable to me to
> > apply it to the other uses of reinterpret_cast as well.

> There is no such absence in the C++ standard. It is very clear that
> accessing an object through an lvalue of a sufficiently different type
> is undefined behavior (except for the char and unsigned exception, and
> the common leading part of POD exception).

It is also clear that dereferencing the result of an arbitrary
int, converted to a pointer, is undefined behavior. I think it
reasonable to apply the same text to both: "it is intended that
the results be unsurprising to those who know the addressing


structure of the underlying machine."

Note that while these words are only used for the conversions
between integral types and pointers in the standard, it is easy
to extend them (for most implementations, anyway) by using an
intermediate cast:
double d;
short* ps = reinterpret_cast<short*>(
reinterpret_cast<long long>(&d));

This is with regards to the cast, and it's immediate use. I do
think that the intend was that in a function:


void f(int* pi, double* pd)

, the compiler should be allowed to assume no aliasing between
pi and pd (even though it is possible to construct cases where
the standard doesn't allow this).

> The section you cite, including the normative note, is a very
> narrow exception which states that a reinterpret_cast on a
> pointer will produce an rvalue whose value should not be
> surprising to those who know the addressing structure of the
> underlying machine.

Exactly. Any use of reinterpret_cast must be considered
unportable (except to and from character type pointers).

> This in no way is an exception to the strict aliasing rule.
> Instead, in this context reinterpret_cast takes one value of a
> certain type and casts that value to another type. It does not
> tell the compiler that two different pointers alias or in any
> way affect the strict aliasing rule.

No. And any use of the resulting pointers is undefined behavior
according to the standard. There's no disagreement there. The
question is what the motivation for this undefined behavior is,
why it is there, and what we should expect from a QoI point of
view.

> > In any case, the current standard is very unclear with
> > regards to type punning---with the exception of character
> > types. And I don't think that this has changed in the more
> > recent drafts---in a very real sense, I think that it is
> > more a C problem; that the C++ committee should simply wait,
> > and adopt whatever the C committee finally decides.

> > > Now, I don't know any other compilers offhand which
> > > optimize with the strict aliasing allowance besides newer
> > > gcc versions,

> > I don't know of any that don't. It's a common optimization.
> > It was present in Microsoft C 1.0, for example (in which
> > your union example would break).

> Really? I was under the impression that basically no Microsoft
> compiler actually optimized with the strict aliasing
> allowance,

Current compilers certainly don't do the same optimizations that
Microsoft C 1.0 did. In some cases, they do a lot more, but in
a few special cases, they do less. (There is no common code
from 1.0 in the current compilers.) However...

> that too much windows code would break if it did by default.
> Very simple testing like that above seems to show that the
> Microsoft compiler does not.

I've not done too much testing with regards to how the Microsoft
compiler does aliasing analysis, but the fact that it does
optimize better than g++ (for Windows platforms) leads me to
think that it does use the anti-aliasing rule. Or that the
anti-aliasing rule doesn't really buy much in practice, and
should perhaps be dropped. (I've not tried special test cases,
but from what little I've looked at, I suspect that most of the
aliasing which causes problems for optimization involves
pointers of the same type, so ignoring the anti-aliasing rule
actually has no impact with regards to optimizating. In which
case, from a QoI point of view, the compiler should ignore it,
and suppose that all pointers may alias.)

> > > but I would suggest you revise your understanding of the
> > > QoI implications of reinterpret_cast.

> > Why? My understanding of the QoI implications are based on the
> > actual words in the standard, and the various discussions that
> > I've followed in the standardization committees.

> > > My previous argument succinctly: C-style casts were never
> > > intended to get around strict aliasing.

> > Yes and no. An optimizer is expected to use the knowledge at
> > its disposal. If it can see that there is aliasing, whether
> > from a union or a cast, it should take it into account.

> > That is, by the way, the direction the C committee was going the
> > last time I looked.

> > > reinterpret_cast was never intended to be more powerful than
> > > C-style casts.

> > Certainly not. But since the C style cast should behave as
> > expected, when visible, so should the reinterpret_cast.

> I cannot speak to your private discussions with the
> committees. It's just that's not what's in the current
> standards.

Just for the record, it's not private discussions, in the sense
of just me and the committee. Any member of the committee can
view them.

As for the current standards, the "anti-aliasing" rule doesn't
allow any optimizations, because of the following case:

int f(int* pi, double* pd)
{
int retval = *pi;
*pd = 3.14159;
return retval;
}

void main()
{
union { int i; double d; } u;
u.i = 42;
f(&u.i, &u.d);
std::cout << u.d << std::endl;
return 0;
}

According to the strict words of the current standard, this is
guaranteed to output 42. In practice, if the compiler applies
the anti-aliasing rule in f, it may assign *pd before actually
reading *pi, which will result in a wrong output.

The current wording of the standard guarantee this. IMHO, this
is not the intent, and the discussions in the C committee lead
me to believe that there will be a clarification in this
respect. In the meantime, however, we are left speculating with
regards to the intent of the standard.

But regardless of the words in the standard, common sense says
that if I explicitly say that there is aliasing (and that is
what reinterpret_cast says), then the compiler shouldn't assume
that there isn't.

> > > (The named casts were each intended to fulfill a specific role
> > > of the several roles of C-style casts to remove potential
> > > ambiguity to the code writer and readers.) Thus
> > > reinterpret_cast was never intended to get around strict
> > > aliasing.

> > You seem to be misunderstanding the motivation behind the strict
> > aliasing rule. It is to allow the compiler to assume no
> > aliasing in cases where it otherwise couldn't. There was never
> > any intent to allow the compiler to totally ignore aliasing that
> > it can clearly see.

> Perhaps I was too strong. However, I don't think it's right to
> be so dismissive of that position. It is a reasonable one.
> Many times I hear "The compiler should just be smart enough",
> but many times this is not the case, for various reasons, such
> as too hard to implement, or the semantics would be too vague
> or not well defined, or it would be bad style and confusing to
> the coders. I think all kinds of "type punning but only in
> certain scopes [unions and casts]" qualify.

The issue is, obviously, not a simple one for compiler
implementers. There is a definite motivation to generate the
fastest code possible, and a number of "undefined behavior" in
the standard are present precisely to allow the compiler
implementer the most freedom possible to do so. My argument is
simply that reinterpret_cast is, or should be, a red flag: the
programmer is effectively telling the compiler that he knows
something that the compiler doesn't. And that the compiler
should respond in consequence, and not ignore what the
programmer is telling it. (And IMHO, reinterpret_cast should be
rare enough that even turning off all optimization in a function
that uses it shouldn't matter.)

--
James Kanze

0 new messages