Is type-punning with unions legal?

449 views
Skip to first unread message

Andrzej Krzemieński

unread,
Jul 14, 2015, 5:19:59 AM7/14/15
to std-dis...@isocpp.org
Hi,
I am unable to figure out from the Standard what it has to say about type-punning with unions.

I have the following union:
union String
{
 
char as_array[sizeof(int)];
 
int as_int;
};

I intend to initialize member as_array, but later access member as_int. The goal is to perform a sort of reinterpret cast by accessing memory through a different member. The question is: what does the Standard has to say about it?

Is this an undefined behavior? But if so, can you point me to the relevant sections?

Or is this part of the standard underspecified? In that case, does someone know what the intention is?

Regards,
&rzej

Fabio Fracassi

unread,
Jul 14, 2015, 5:42:21 AM7/14/15
to std-dis...@isocpp.org
If I remember the last discussion (on the undefined behavior list) about this correctly, it is intentionally undefined behavior.
I also interpret §[class.union]/1 as forbidding it, also it is not very explicit about it. The first sentence reads:
"In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time."
Which I interpret as forbidding accessing a unions member that is not the currently active one. The following note supports this reading as it explicitly defines an exception to this rule:
" [ Note: One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence (9.2), and if an object of this standard-layout union type contains one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of standard-layout struct members; see 9.2. — end note ]"

so it is legal to do
struct A { int u; int v;};
struct B { int x; char y;};
union legal {
    A a;
    B b;
}

legal.A = A{1,1};
read(legal.B.x);

both read(legal.B.y) and your example would be illegal.

IIRC the only way to do legal type pruning is using memcpy.

Best
Fabio

Johannes Schaub

unread,
Jul 14, 2015, 5:46:12 AM7/14/15
to std-dis...@isocpp.org


Am 14.07.2015 11:42 schrieb "Fabio Fracassi" <f.fra...@gmx.net>:
>
>
>
> On 14.07.2015 11:19, Andrzej Krzemieński wrote:
>>
>> Hi,
>> I am unable to figure out from the Standard what it has to say about type-punning with unions.
>>
>> I have the following union:
>> union String
>> {
>>   char as_array[sizeof(int)];
>>   int as_int;
>> };
>>
>> I intend to initialize member as_array, but later access member as_int. The goal is to perform a sort of reinterpret cast by accessing memory through a different member. The question is: what does the Standard has to say about it?
>
>
>> Is this an undefined behavior? But if so, can you point me to the relevant sections?
>>
>> Or is this part of the standard underspecified? In that case, does someone know what the intention is?
>>
>
> If I remember the last discussion (on the undefined behavior list) about this correctly, it is intentionally undefined behavior.
> I also interpret §[class.union]/1 as forbidding it, also it is not very explicit about it. The first sentence reads:
> "In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time."
> Which I interpret as forbidding accessing a unions member that is not the currently active one.

If so, this would not need to be banned in constexpressions directly, but merely by its undefined behavior (and reading an initial common sequence is intended to be allowed there aswell).

Richard?

David Krauss

unread,
Jul 14, 2015, 5:47:39 AM7/14/15
to std-dis...@isocpp.org

On 2015–07–14, at 5:42 PM, Fabio Fracassi <f.fra...@gmx.net> wrote:

IIRC the only way to do legal type pruning is using memcpy.

You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.

Johannes Schaub

unread,
Jul 14, 2015, 5:50:45 AM7/14/15
to std-dis...@isocpp.org

For this to work, aliasing rules need to be transitive or am I missing something?

David Krauss

unread,
Jul 14, 2015, 5:57:17 AM7/14/15
to std-dis...@isocpp.org

On 2015–07–14, at 5:50 PM, 'Johannes Schaub' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:

> You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.
>

For this to work, aliasing rules need to be transitive or am I missing something?


Nothing is transitive. The bytes of object A are copied into storage buffer B. Now B can be treated as another object of some other POD type, provided that the object representation is valid. Upon the first access, though, that permanently becomes the type of B.

All I’m saying is, std::memcpy isn’t specially blessed. The standard only specifies what you can do with object representations. It doesn’t prescribe specific library functions.

Johannes Schaub

unread,
Jul 14, 2015, 6:05:50 AM7/14/15
to std-dis...@isocpp.org


Am 14.07.2015 11:57 schrieb "David Krauss" <pot...@gmail.com>:
>
>
>> On 2015–07–14, at 5:50 PM, 'Johannes Schaub' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:
>>
>> > You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.
>> >
>>
>> For this to work, aliasing rules need to be transitive or am I missing something?
>
>
> Nothing is transitive. The bytes of object A are copied into storage buffer B. Now B can be treated as another object of some other POD type, provided that the object representation is valid. Upon the first access, though, that permanently becomes the type of B.
>

I guess this only works because any simple object has sizeof(T) char objects in its obj representation.

Otherwise the copy relies entirely on the aliasing rule. And since you ask for copying char objects, the copy function can decide to read the object T using a type compatible with char.

To have that not be UB you would need.transitive aliasing blessing from compatible-type to T by link over char.

David Krauss

unread,
Jul 14, 2015, 6:09:18 AM7/14/15
to std-dis...@isocpp.org

On 2015–07–14, at 5:57 PM, David Krauss <pot...@gmail.com> wrote:

Nothing is transitive. The bytes of object A are copied into storage buffer B.

Just to be clear, the union member of type char[N] is not suitable as such a storage buffer. A sanitizer would be free to complain about the non-active member being used. I don’t suppose a real-world C++ compiler would break with common C programming practice, though, regardless of undefined behavior.

Really, what value is added by the union as opposed to a reinterpret_cast? At the moment I can’t recall the C rules for union punning, but as for C compatibility, you could define an inline function to perform the cast


On 2015–07–14, at 6:05 PM, 'Johannes Schaub' via ISO C++ Standard - Discussion <std-dis...@isocpp.org> wrote:

To have that not be UB you would need.transitive aliasing blessing from compatible-type to T by link over char.

Yes, the character types char and unsigned char are specified by the aliasing rule to be valid ways of accessing any object representation. See §3.10/10.8.

Andrzej Krzemieński

unread,
Jul 14, 2015, 8:00:00 AM7/14/15
to std-dis...@isocpp.org, pot...@mac.com

Does this provision work only one way (from type T to char[])? Or is it also possible to reinterpret_cast from char[]?

It looks like a "safe" use of reinterpret_cast is to temporarily cast value v (of type T) to some other type U, and later back to T. But is there any legal way in the Standard to observe an (properly aligned) array of characters as int? (because computing equality of integers appears to be faster than performing memcmp on the same region).

For instance, is it legal to use the same union in combination with reinterpret cast:

union String
{
  char as_array[sizeof(int)];
  int as_int;
};

String s;
memcpy(&s.as_array, "12345678", sizeof(int));
int & i = reinterpret_cast<int&>(s.as_array);
read(i);

?

David Krauss

unread,
Jul 14, 2015, 10:59:22 PM7/14/15
to Andrzej Krzemieński, std-dis...@isocpp.org
On 2015–07–14, at 8:00 PM, Andrzej Krzemieński <akrz...@gmail.com> wrote:

W dniu wtorek, 14 lipca 2015 11:47:39 UTC+2 użytkownik David Krauss napisał:

You can pun to character types with reinterpret_cast. Other types require something like memcpy, which works because it treats each of the two objects as a character sequence. Anything that behaves that way is sufficient, though, including std::copy<char const*, char*>.

Does this provision work only one way (from type T to char[])? Or is it also possible to reinterpret_cast from char[]?

Yes, sort-of. This is an ill-specified corner of the standard and a focus of the undefined behavior study group. For example, std::aligned_storage::type only works if its object representation is not committed to any type. To be sure, given the choice, it’s clearer to call operator new(N) instead of e.g. new char[N]. And all bets are off once the machine evaluates an lvalue-to-rvalue conversion or assignment expression with non-char type, or a class member access expression.

It looks like a "safe" use of reinterpret_cast is to temporarily cast value v (of type T) to some other type U, and later back to T. But is there any legal way in the Standard to observe an (properly aligned) array of characters as int? (because computing equality of integers appears to be faster than performing memcmp on the same region).

For instance, is it legal to use the same union in combination with reinterpret cast:

union String
{
  char as_array[sizeof(int)];
  int as_int;
};

String s;
memcpy(&s.as_array, "12345678", sizeof(int));
int & i = reinterpret_cast<int&>(s.as_array);
read(i);

This looks OK. Although as_int is never accessed, it guarantees that the union is correctly aligned for int. It would be easier to use std::aligned_storage, though.

Reply all
Reply to author
Forward
0 new messages