standard attribute for unions to pun types?

464 views
Skip to first unread message

Evan Teran

unread,
Mar 4, 2016, 2:58:21 PM3/4/16
to ISO C++ Standard - Future Proposals
So it's not uncommon to see people write code like this:

union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
};

T x
;
x
.u32 = 0x11223344;
x
.bytes[0] = 0xff;//undefined behavior :-(


Of course this is met with those who are trilled that it works, and just as many others who are determined to avoid undefined behavior and would like to avoid it at all costs.
Now that C++ has a proper attribute syntax. Has anyone considered having an attribute which says "I want to pun this type, compiler please don't get too clever here".

Something as simple as this:

[[type-pun]] union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
 
};

 T x
;
 x
.u32 = 0x11223344;
 x
.bytes[0] = 0xff; // compiler won't get too clever with optimizations, will do the obvious thing
                   
// and source code analyzers will know that the type punning is deliberate.

The meaning being that the compiler can expect the user to write to any field, and read from any other field and it should do the obvious thing and return the underlying bit-pattern interpreted as the appropriate type. Something which many compilers do anyway.

Of course by default we'd get the current rules about only having one active member at a time and all that goodness, allowing optimizers to assume that you won't do that and do clever things. And I recognize that there would still be concerns about endianess and such, but at least we've moved the issue from undefined behavior into "system specific" behavior, which low level developers tend to be comfortable with.

Thoughts?

Thiago Macieira

unread,
Mar 4, 2016, 7:12:00 PM3/4/16
to std-pr...@isocpp.org
On sexta-feira, 4 de março de 2016 11:58:21 PST Evan Teran wrote:
> Now that C++ has a proper attribute syntax. Has anyone considered having an
> attribute which says "I want to pun this type, compiler please don't get
> too clever here".
>
> Something as simple as this:
>
> [[type-pun]] union T {
> uint8_t bytes[sizeof(uint32_t)];
> uint32_t u32;
> };

Considering that this type of type-punning is an extension to the standard, we
don't need a standardised attribute name either. C11 has wording about type
punning (comment 95 in section 6.5.2.3), but no such wording exists in C++.
The standard only allows accessing a member that is not active if it's type-
compatible with the active or part of a common initial sequence with the
active member.

I think you can use [[gnu::may_alias]].

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Nicol Bolas

unread,
Mar 4, 2016, 8:21:15 PM3/4/16
to ISO C++ Standard - Future Proposals
On Friday, March 4, 2016 at 2:58:21 PM UTC-5, Evan Teran wrote:
So it's not uncommon to see people write code like this:

union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
};

T x
;
x
.u32 = 0x11223344;
x
.bytes[0] = 0xff;//undefined behavior :-(


Of course this is met with those who are trilled that it works, and just as many others who are determined to avoid undefined behavior and would like to avoid it at all costs.
Now that C++ has a proper attribute syntax. Has anyone considered having an attribute which says "I want to pun this type, compiler please don't get too clever here".

Well, the thing is, type punning is not forbidden to allow the compiler to be "clever". Or rather, not just for that. For unions, there's really not much "clever" that compilers can do with them.

The C++ specification explains behavior. It specifies what the behavior of every expression ought to be. Now, consider your example. Even if we ignore compiler "cleverness"... what should the specification say the result of this operation will be?

What will `x.u32` be after the two statements are executed? It's either undefined, implementation-defined, or well-defined. You want to rule out "undefined". So... what value does `x.u32` have? You say that it returns "the underlying bit-pattern interpreted as the appropriate type", but what does that mean?

If I perform `x.u32 = 20;`, that is well-defined C++ code, and I know that following this statement, I can do `x.u32` and get back exactly 20. If I do `x.bytes[0] = 0x20;`, what do I get back with `x.u32`? What value is it?

Of course by default we'd get the current rules about only having one active member at a time and all that goodness, allowing optimizers to assume that you won't do that and do clever things. And I recognize that there would still be concerns about endianess and such, but at least we've moved the issue from undefined behavior into "system specific" behavior, which low level developers tend to be comfortable with.

In my experience, "low level developers" don't care that the above is undefined behavior; they'll do it anyway because they have no choice.

Bo Persson

unread,
Mar 5, 2016, 5:55:11 AM3/5/16
to std-pr...@isocpp.org
On 2016-03-04 20:58, Evan Teran wrote:
> So it's not uncommon to see people write code like this:
>
> |
> unionT {
> uint8_t bytes[sizeof(uint32_t)];
> uint32_t u32;
> };
>
> T x;
> x.u32 =0x11223344;
> x.bytes[0]=0xff;//undefined behavior :-(
> |
>
>
> Of course this is met with those who are trilled that it works, and just
> as many others who are determined to avoid undefined behavior and would
> like to avoid it at all costs.
> Now that C++ has a proper attribute syntax. Has anyone considered having
> an attribute which says "I want to pun this type, compiler please don't
> get too clever here".
>
> Something as simple as this:
>
> |
> [[type-pun]]unionT {
> uint8_t bytes[sizeof(uint32_t)];
> uint32_t u32;
> };
>
> T x;
> x.u32 =0x11223344;
> x.bytes[0]=0xff;// compiler won't get too clever with optimizations,
> will do the obvious thing
> // and source code analyzers will know that the type punning is deliberate.
> |
>
> The meaning being that the compiler can expect the user to write to any
> field, and read from any other field and it should do the obvious thing
> and return the underlying bit-pattern interpreted as the appropriate
> type. Something which many compilers do anyway.
>

The problem with putting this into the standard is that on some systems
certain bit patterns cannot be interpreted as the appropriate type. The
invalid value might be trapped by the hardware.

Then what?


Bo Persson



Edward Catmur

unread,
Mar 5, 2016, 6:06:15 AM3/5/16
to ISO C++ Standard - Future Proposals
On Saturday, 5 March 2016 09:21:15 UTC+8, Nicol Bolas wrote:
> Well, the thing is, type punning is not forbidden to allow the compiler to be "clever". Or rather, not just for that. For unions, there's really not much "clever" that compilers can do with them.

Mostly, the cleverness is aliasing analysis rather than anything pertaining to unions per se. I'm a little confused by the given example, as character types aliasing other types is explicitly permitted by the standard and given something close to the desired meaning already.

Also, is this attribute intended to have local or global effect? If local, then one can convert bitwise between types already using memcpy. If global, the problem is that in C, which is supposed to have those semantics, no compiler has ever implemented them and there is serious movement toward removing them from the next version of the C standard. I suppose this is because it's difficult to determine whether two function parameters could alias via a union and assuming they could has severe effects on performance.

> What will `x.u32` be after the two statements are executed? It's either undefined, implementation-defined, or well-defined. You want to rule out "undefined". So... what value does `x.u32` have? You say that it returns "the underlying bit-pattern interpreted as the appropriate type", but what does that mean?

That's easy enough; it has the value which if written to the object would exhibit the byte sequence present, if such a value exists. That's how memcpy works already in practice.

> In my experience, "low level developers" don't care that the above is undefined behavior; they'll do it anyway because they have no choice.

For bitwise conversion, they can use memcpy. For global aliasing... well, are they sure that's what they really want?

Andrey Semashev

unread,
Mar 5, 2016, 6:24:31 AM3/5/16
to std-pr...@isocpp.org
I think the standard could provide the same degree of guaranteed
behavior as it does for type punning through memcpy.

Andrey Semashev

unread,
Mar 5, 2016, 6:45:39 AM3/5/16
to std-pr...@isocpp.org
On 2016-03-05 14:06, Edward Catmur wrote:
> On Saturday, 5 March 2016 09:21:15 UTC+8, Nicol Bolas wrote:
>> Well, the thing is, type punning is not forbidden to allow the
>> compiler to be "clever". Or rather, not just for that. For unions,
>> there's really not much "clever" that compilers can do with them.
>
> Mostly, the cleverness is aliasing analysis rather than anything
> pertaining to unions per se. I'm a little confused by the given
> example, as character types aliasing other types is explicitly
> permitted by the standard and given something close to the desired
> meaning already.
>
> Also, is this attribute intended to have local or global effect? If
> local, then one can convert bitwise between types already using
> memcpy. If global, the problem is that in C, which is supposed to
> have those semantics, no compiler has ever implemented them and there
> is serious movement toward removing them from the next version of the
> C standard. I suppose this is because it's difficult to determine
> whether two function parameters could alias via a union and assuming
> they could has severe effects on performance.

There was a related discussion on the Boost developers mailing list and
I've come to a proposal to allow union-based type punning but only when
the compiler can immediately see that the involved reads and stores are
from/to union members.

union variant
{
std::uint32_t as_uint32;
float as_float;
};

variant v1, v2, v3;

// This is defined behavior (DB)
v1.as_float = 1.0f;
std::cout << v1.as_uint32 << std::endl;

// This is also DB
using pmu_t = std::uint32_t (variant::*);
using pmf_t = float (variant::*);
pmu_t pmu = &variant::as_uint32;
pmf_t pmf = &variant::as_float;
v2.*pmf = 1.0f;
std::cout << v2.*pmu << std::endl;

// This is still formally UB
std::uint32_t* pu = &v3.as_uint32;
float* pf = &v3.as_float;
*pf = 1.0f;
std::cout << *pu << std::endl;

It did not involve an attribute but I guess such behavior could be
enabled with one.

I think this would be a reasonable compromise between optimization
opportunitiy and the behavior that most people expect from such code.

Regarding the argument that memcpy can be used as a replacement. The
problem with memcpy is that it's a potential function call and it cannot
be used in a constant expression. The function call part can be
optimized by the compiler, although the standard does not give any
guarantees to that effect, and the developer can only hope for the best.
Also, as a function call, memcpy can be overridden in run time (e.g.
with LD_PRELOAD). The constexpr part is yet more difficult to tackle
with memcpy because of the above reasons. The union-based type punning
has the potential to solve these problems.

Nicol Bolas

unread,
Mar 5, 2016, 9:48:58 AM3/5/16
to ISO C++ Standard - Future Proposals

Type punning through memcpy is defined in C++ based on layout compatibility. And `uint8_t[4]` and `uint32_t` are not layout compatible. So under your rules, it would still be undefined behavior.

C++ allows you to copy memory into a byte array and copy memory from a byte array into a type (if it was copied from a layout compatible original type). But it doesn't allow you to modify that byte array and have well-defined behavior when you copy it into some type.

Edward Catmur

unread,
Mar 6, 2016, 8:39:23 AM3/6/16
to std-pr...@isocpp.org

Who's to say that you haven't arranged that those bytes contain the object representation of some value of the destination type? Clearly this must be OK if the bytes arrive in the program from disk or network, so why not by modification of some other value's object representation?

Nicol Bolas

unread,
Mar 6, 2016, 9:33:48 AM3/6/16
to ISO C++ Standard - Future Proposals

The Standard For Programming Language C++ says that. You can copy, and you can copy back, but you cannot modify the values of the intermediary byte array.
 

Clearly this must be OK if the bytes arrive in the program from disk or network, so why not by modification of some other value's object representation.


Copying to/from a byte stream reproduces the same value representation. Modifying it, by definition, does not reproduce the value representation.

There is no reasonable way for C++ to define the behavior that results from such operations.

Bo Persson

unread,
Mar 6, 2016, 11:59:58 AM3/6/16
to std-pr...@isocpp.org
On 2016-03-06 14:39, 'Edward Catmur' via ISO C++ Standard - Future
Proposals wrote:
>
> On 5 Mar 2016 22:49, "Nicol Bolas" <jmck...@gmail.com
> <mailto:jmck...@gmail.com>> wrote:
> >
> >
> > C++ allows you to copy memory into a byte array and copy memory from
> a byte array into a type (if it was copied from a layout compatible
> original type). But it doesn't allow you to modify that byte array and
> have well-defined behavior when you copy it into some type.
>
> Who's to say that you haven't arranged that those bytes contain the
> object representation of some value of the destination type? Clearly
> this must be OK if the bytes arrive in the program from disk or network,
> so why not by modification of some other value's object representation?
>

You might, but the standard doesn't say how to do that.

It is allowed for an implementation to have tag bits marking the type of
a memory value. If you write to memory using a different type, those tag
bits might end up invalid when copied back to the original object.



Bo Persson



Edward Catmur

unread,
Mar 6, 2016, 7:19:02 PM3/6/16
to std-pr...@isocpp.org

Character types don't have tag bits, and other types might not either.

Consider this program:

int32_t a = 0x11223344;
char c[4];
memcpy(c, &a, 4);
int32_t b = 0x11223355;
char d[4];
memcpy(d, &b, 4);
if (d[0] == 0x55 &&
    !memcmp(c + 1, d + 1, 3)) {
    c[0] = 0x55;
    memcpy(&a, c, 4);
    assert(a == b);
}

Is this well defined? On a platform where the if block is entered?

I'm not saying that the result of bit-level hackery should be defined in all cases, just that it should be when the bit sequence would have been acceptable if it had arrived in the program by some other means.

Nicol Bolas

unread,
Mar 6, 2016, 8:30:43 PM3/6/16
to ISO C++ Standard - Future Proposals

It's undefined, because the standard does not specify the representation of the data you put into `d`. You are allowed to copy it in from an `int32_t` and copy it out to an `int32_t`. But interpreting the bits as anything other than an `int32_t` (or something that is layout compatible with it) is undefined behavior.

So it became undefined behavior the moment you did `d[0] == 0x55`.
 

I'm not saying that the result of bit-level hackery should be defined in all cases, just that it should be when the bit sequence would have been acceptable if it had arrived in the program by some other means.


And when exactly would that be? If you got `d` via a read from a file, what you got was a byte stream. The data in that byte stream is defined by the stream reading function and the file data you read from. If you got it from a network, then the bytes are defined by the networking API.

In neither case is well-defined behavior for you to memcpy `d` into an `int32_t` unless you have certain knowledge that it came from someone else memcpy'ing an `int32_t` into a byte stream and saving it to a file or sending it across a network. And the "someone else" must be from the same C++ implementation (compiler, platform, etc) as yourself.

And if you do know that it was an `int32_t` that was memcpy'd across files/networks... it's still not well defined what it means to access any of that data as anything other than an `int32_t`.

Edward Catmur

unread,
Mar 8, 2016, 6:39:15 AM3/8/16
to std-pr...@isocpp.org
On Mon, Mar 7, 2016 at 1:30 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Sunday, March 6, 2016 at 7:19:02 PM UTC-5, Edward Catmur wrote:

Consider this program:

int32_t a = 0x11223344;
char c[4];
memcpy(c, &a, 4);
int32_t b = 0x11223355;
char d[4];
memcpy(d, &b, 4);
if (d[0] == 0x55 &&
    !memcmp(c + 1, d + 1, 3)) {
    c[0] = 0x55;
    memcpy(&a, c, 4);
    assert(a == b);
}

Is this well defined? On a platform where the if block is entered?


It's undefined, because the standard does not specify the representation of the data you put into `d`. You are allowed to copy it in from an `int32_t` and copy it out to an `int32_t`. But interpreting the bits as anything other than an `int32_t` (or something that is layout compatible with it) is undefined behavior.

So it became undefined behavior the moment you did `d[0] == 0x55`.

How is that undefined behavior? Which rule or rules in the standard does it violate?

I'm not saying that the result of bit-level hackery should be defined in all cases, just that it should be when the bit sequence would have been acceptable if it had arrived in the program by some other means.


And when exactly would that be? If you got `d` via a read from a file, what you got was a byte stream. The data in that byte stream is defined by the stream reading function and the file data you read from. If you got it from a network, then the bytes are defined by the networking API.

In neither case is well-defined behavior for you to memcpy `d` into an `int32_t` unless you have certain knowledge that it came from someone else memcpy'ing an `int32_t` into a byte stream and saving it to a file or sending it across a network. And the "someone else" must be from the same C++ implementation (compiler, platform, etc) as yourself.

And if you do know that it was an `int32_t` that was memcpy'd across files/networks... it's still not well defined what it means to access any of that data as anything other than an `int32_t`.

OK, so we can memcpy the bytes into a byte stream and then back into the program. What if we use base64 as the serialization format for the byte stream, or hex dump? What if we print the hex dump to the console and ask the operator to type it back in at the console keyboard? Why not play 20 Questions with the operator to guess the value of each hex character?

If the bits in the array of narrow characters are correct, it does not matter how they arrived in the program. The implementation is not permitted to tag values of narrow character type with their provenance.

Andrey Semashev

unread,
Mar 8, 2016, 6:44:03 AM3/8/16
to std-pr...@isocpp.org
On 2016-03-08 14:39, 'Edward Catmur' via ISO C++ Standard - Future
Proposals wrote:
> On Mon, Mar 7, 2016 at 1:30 AM, Nicol Bolas <jmck...@gmail.com
> <mailto:jmck...@gmail.com>> wrote:
>
> On Sunday, March 6, 2016 at 7:19:02 PM UTC-5, Edward Catmur wrote:
>
> Consider this program:
>
> int32_t a = 0x11223344;
> char c[4];
> memcpy(c, &a, 4);
> int32_t b = 0x11223355;
> char d[4];
> memcpy(d, &b, 4);
> if (d[0] == 0x55 &&
> !memcmp(c + 1, d + 1, 3)) {
> c[0] = 0x55;
> memcpy(&a, c, 4);
> assert(a == b);
> }
>
> Is this well defined? On a platform where the if block is entered?
>
>
> It's undefined, because the standard does not specify the
> representation of the data you put into `d`. You are allowed to copy
> it in from an `int32_t` and copy it out to an `int32_t`. But
> /interpreting the bits/ as anything other than an `int32_t` (or
> something that is layout compatible with it) is undefined behavior.
>
> So it became undefined behavior the moment you did `d[0] == 0x55`.
>
>
> How is that undefined behavior? Which rule or rules in the standard does
> it violate?
>
> I'm not saying that the result of bit-level hackery should be
> defined in all cases, just that it should be when the bit
> sequence would have been acceptable if it had arrived in the
> program by some other means.
>
>
> And when exactly would that be? If you got `d` via a read from a
> file, what you got was a byte stream. The data in that byte stream
> is defined by the stream reading function and the file data you read
> from. If you got it from a network, then the bytes are defined by
> the networking API.
>
> In neither case is well-defined behavior for you to memcpy `d` into
> an `int32_t` /unless/ you have certain knowledge that it came from
> someone else memcpy'ing an `int32_t` into a byte stream and saving
> it to a file or sending it across a network. And the "someone else"
> must be from the same C++ implementation (compiler, platform, etc)
> as yourself.
>
> And if you do know that it was an `int32_t` that was memcpy'd across
> files/networks... it's still not well defined what it means to
> access any of that data as anything other than an `int32_t`.
>
>
> OK, so we can memcpy the bytes into a byte stream and then back into the
> program. What if we use base64 as the serialization format for the byte
> stream, or hex dump? What if we print the hex dump to the console and
> ask the operator to type it back in at the console keyboard? Why not
> play 20 Questions with the operator to guess the value of each hex
> character?
>
> If the bits in the array of narrow characters are correct, it does not
> matter how they arrived in the program. The implementation is not
> permitted to tag values of narrow character type with their provenance.

For the sake of completeness, the implementation is also not allowed to
use tag or padding bits in (u)intXX_t.

Nicol Bolas

unread,
Mar 8, 2016, 9:57:32 AM3/8/16
to ISO C++ Standard - Future Proposals
On Tuesday, March 8, 2016 at 6:39:15 AM UTC-5, Edward Catmur wrote:
On Mon, Mar 7, 2016 at 1:30 AM, Nicol Bolas <jmck...@gmail.com> wrote:
On Sunday, March 6, 2016 at 7:19:02 PM UTC-5, Edward Catmur wrote:

Consider this program:

int32_t a = 0x11223344;
char c[4];
memcpy(c, &a, 4);
int32_t b = 0x11223355;
char d[4];
memcpy(d, &b, 4);
if (d[0] == 0x55 &&
    !memcmp(c + 1, d + 1, 3)) {
    c[0] = 0x55;
    memcpy(&a, c, 4);
    assert(a == b);
}

Is this well defined? On a platform where the if block is entered?


It's undefined, because the standard does not specify the representation of the data you put into `d`. You are allowed to copy it in from an `int32_t` and copy it out to an `int32_t`. But interpreting the bits as anything other than an `int32_t` (or something that is layout compatible with it) is undefined behavior.

So it became undefined behavior the moment you did `d[0] == 0x55`.

How is that undefined behavior? Which rule or rules in the standard does it violate?

It's the part of the standard that doesn't say what that will actually do.

The standard specifies behavior. If you do this:

int x = 20;
if(x == 15)
  doSomething
();

The standard requires that this not call `doSomething`. That's well-defined behavior.

If you instead do:

int x = 20;
unsigned char b[sizeof(int)];
memcpy
(b, &x, sizeof(int));
if(b[0] == 15)
  doSomething
();

What is the behavior of this program? The standard does not say. The standard does not define the what the "underlying bytes" that make up the value representation of `int` are. It does not define what exactly gets stored in `b`. It only defines that if you copy the array back to a layout-compatible type, you get the same value representation back.

And therefore, doing anything other than copying the array is not defined behavior by the standard. And is therefore undefined.

UB doesn't always mean "crash". It can mean "does the thing you want, but the standard doesn't guarantee anything". Many people write programs that rely on UB every day. Indeed, it's more or less impossible to write (for example) code that deals with endian issues without at some point relying on UB.

Andrey Semashev

unread,
Mar 8, 2016, 10:47:24 AM3/8/16
to std-pr...@isocpp.org
On 2016-03-08 17:57, Nicol Bolas wrote:
> On Tuesday, March 8, 2016 at 6:39:15 AM UTC-5, Edward Catmur wrote:
>
> On Mon, Mar 7, 2016 at 1:30 AM, Nicol Bolas <jmck...@gmail.com
> <javascript:>> wrote:
>
> On Sunday, March 6, 2016 at 7:19:02 PM UTC-5, Edward Catmur wrote:
>
> Consider this program:
>
> int32_t a = 0x11223344;
> char c[4];
> memcpy(c, &a, 4);
> int32_t b = 0x11223355;
> char d[4];
> memcpy(d, &b, 4);
> if (d[0] == 0x55 &&
> !memcmp(c + 1, d + 1, 3)) {
> c[0] = 0x55;
> memcpy(&a, c, 4);
> assert(a == b);
> }
>
> Is this well defined? On a platform where the if block is
> entered?
>
>
> It's undefined, because the standard does not specify the
> representation of the data you put into `d`. You are allowed to
> copy it in from an `int32_t` and copy it out to an `int32_t`.
> But /interpreting the bits/ as anything other than an `int32_t`
> (or something that is layout compatible with it) is undefined
> behavior.
>
> So it became undefined behavior the moment you did `d[0] == 0x55`.
>
>
> How is that undefined behavior? Which rule or rules in the standard
> does it violate?
>
>
> It's the part of the standard that doesn't say what that will actually do.
>
> The standard specifies behavior. If you do this:
>
> |
> intx =20;
> if(x ==15)
> doSomething();
> |
>
> The standard /requires/ that this not call `doSomething`. That's
> well-defined behavior.
>
> If you instead do:
>
> |
> intx =20;
> unsignedcharb[sizeof(int)];
> memcpy(b,&x,sizeof(int));
> if(b[0]==15)
> doSomething();
> |
>
> What is the behavior of this program? The standard does not say. The
> standard does not define the what the "underlying bytes" that make up
> the value representation of `int` are. It does not define what exactly
> gets stored in `b`. It only defines that if you copy the array back to a
> layout-compatible type, you get the same value representation back.

[snip]

> UB doesn't always mean "crash". It can mean "does the thing you want,
> but the standard doesn't guarantee anything". Many people write programs
> that rely on UB every day. Indeed, it's more or less impossible to write
> (for example) code that deals with endian issues without at some point
> relying on UB.

I think you're misusing the 'undefined behavior' term. Is this code
undefined behavior:

int n = std::rand();
if (n > 10)
doSomething();

Or is this:

int n = 0;
std::cin >> n;
if (n > 10)
doSomething();

Both these have well defined behavior AFAICT, but the result is not
known until the code is run. The example Edward presented is similar to
this in that its result will depend on runtime factors (the
representation of integers in the particular architecture, for instance).

Hyman Rosen

unread,
Mar 8, 2016, 11:53:55 AM3/8/16
to std-pr...@isocpp.org
On Tue, Mar 8, 2016 at 9:57 AM, Nicol Bolas <jmck...@gmail.com> wrote:
And therefore, doing anything other than copying the array is not defined behavior by the standard. And is therefore undefined.

You're using the 1.3 {Definitions} meaning, "behavior not specifically defined by the standard", but that's not the one that is meant when people talk about undefined behavior.  It's the 1.9 {Program execution} meaning, "behavior specifically described by the standard as undefined".

It is not the case that the standard imposes no requirements on the behavior of a program which examines a byte copied by memcpy from another object 

Myriachan

unread,
Mar 8, 2016, 6:23:51 PM3/8/16
to ISO C++ Standard - Future Proposals
On Tuesday, March 8, 2016 at 6:57:32 AM UTC-8, Nicol Bolas wrote:

If you instead do:

int x = 20;
unsigned char b[sizeof(int)];
memcpy
(b, &x, sizeof(int));
if(b[0] == 15)
  doSomething
();

What is the behavior of this program? The standard does not say. The standard does not define the what the "underlying bytes" that make up the value representation of `int` are. It does not define what exactly gets stored in `b`. It only defines that if you copy the array back to a layout-compatible type, you get the same value representation back.

And therefore, doing anything other than copying the array is not defined behavior by the standard. And is therefore undefined.

UB doesn't always mean "crash". It can mean "does the thing you want, but the standard doesn't guarantee anything". Many people write programs that rely on UB every day. Indeed, it's more or less impossible to write (for example) code that deals with endian issues without at some point relying on UB.


This is what I get from reading the Standard:

By [basic.types]/4 and other areas, the result of copying an object to a byte array is to get an implementation-defined series of bytes.  The values of these bytes that represents the value is the value's value representation.  (It need not be unique, but that is implementation-defined too.)

The snippet of code above has implementation-defined behavior, not undefined behavior.  The result depends upon the implementation's choice of the value representation of the signed int 20.  It may even vary randomly (such as if the platform is big-endian and has "nails").  But the behavior is defined as having the result of reading that value representation.

If you modify the bytes then copy them back, the result is still defined behavior, but implementation-defined.  The effect, by [basic.types]/2-4, is that the result is whatever value that corresponds to the value representation contained within the bytes.  This may be a trap representation.

This can be formally proven in a similar manner to my proof that offsetof must work for all trivially-copyable types and not just standard-layout types.


Melissa

arvid.g...@gmail.com

unread,
Oct 17, 2017, 6:05:06 PM10/17/17
to ISO C++ Standard - Future Proposals
Have their been made new attempts in bringing type-punning through unions into C++?

Jan Wilmans

unread,
Oct 18, 2017, 1:38:38 AM10/18/17
to std-pr...@isocpp.org
sound really useful to me, specifically in the embedded environment where hardware interfacing through memory mapped IO is common.

Edward Catmur

unread,
Oct 18, 2017, 7:32:57 PM10/18/17
to std-pr...@isocpp.org


On 17 Oct 2017 23:05, <arvid.g...@gmail.com> wrote:
Have their been made new attempts in bringing type-punning through unions into C++?

On Friday, March 4, 2016 at 8:58:21 PM UTC+1, Evan Teran wrote:
So it's not uncommon to see people write code like this:

union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
};

T x
;
x
.u32 = 0x11223344;
x
.bytes[0] = 0xff;//undefined behavior :-(


Of course this is met with those who are trilled that it works, and just as many others who are determined to avoid undefined behavior and would like to avoid it at all costs.
Now that C++ has a proper attribute syntax. Has anyone considered having an attribute which says "I want to pun this type, compiler please don't get too clever here".

Something as simple as this:

[[type-pun]] union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
 
};

 T x
;
 x
.u32 = 0x11223344;
 x
.bytes[0] = 0xff; // compiler won't get too clever with optimizations, will do the obvious thing
                   
// and source code analyzers will know that the type punning is deliberate.

The meaning being that the compiler can expect the user to write to any field, and read from any other field and it should do the obvious thing and return the underlying bit-pattern interpreted as the appropriate type. Something which many compilers do anyway.

The problem is when you form a pointer to a member and write to a different member. Anyone using such an attribute would expect a read via that pointer to reflect the update, but this defeats type-based aliasing globally, since the translation unit containing the read may have no conception that the union even exists. 

This is, by the way, why some people involved in the C standard consider union type punning in that language to be a mistake. 



Of course by default we'd get the current rules about only having one active member at a time and all that goodness, allowing optimizers to assume that you won't do that and do clever things. And I recognize that there would still be concerns about endianess and such, but at least we've moved the issue from undefined behavior into "system specific" behavior, which low level developers tend to be comfortable with.

Thoughts?

--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/5df63738-279b-44ae-be59-6e729577000d%40isocpp.org.

Myriachan

unread,
Oct 18, 2017, 7:59:59 PM10/18/17
to ISO C++ Standard - Future Proposals
On Wednesday, October 18, 2017 at 4:32:57 PM UTC-7, Edward Catmur wrote:
On 17 Oct 2017 23:05, <arvid.g...@gmail.com> wrote:
Have their been made new attempts in bringing type-punning through unions into C++?

On Friday, March 4, 2016 at 8:58:21 PM UTC+1, Evan Teran wrote:
So it's not uncommon to see people write code like this:

union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
};

T x
;
x
.u32 = 0x11223344;
x
.bytes[0] = 0xff;//undefined behavior :-(


Of course this is met with those who are trilled that it works, and just as many others who are determined to avoid undefined behavior and would like to avoid it at all costs.
Now that C++ has a proper attribute syntax. Has anyone considered having an attribute which says "I want to pun this type, compiler please don't get too clever here".

Something as simple as this:

[[type-pun]] union T {
    uint8_t bytes
[sizeof(uint32_t)];
    uint32_t u32
;
 
};

 T x
;
 x
.u32 = 0x11223344;
 x
.bytes[0] = 0xff; // compiler won't get too clever with optimizations, will do the obvious thing
                   
// and source code analyzers will know that the type punning is deliberate.

The meaning being that the compiler can expect the user to write to any field, and read from any other field and it should do the obvious thing and return the underlying bit-pattern interpreted as the appropriate type. Something which many compilers do anyway.

The problem is when you form a pointer to a member and write to a different member. Anyone using such an attribute would expect a read via that pointer to reflect the update, but this defeats type-based aliasing globally, since the translation unit containing the read may have no conception that the union even exists. 

This is, by the way, why some people involved in the C standard consider union type punning in that language to be a mistake. 


The optimizer people don't like us taking pointers to union members even when type punning isn't involved at all:


Anyway, the question is really about what we're going to do now.  We can't go back and change how struct sockaddr_in6, OSVERSIONINFOEXW, etc. work.

Melissa

Edward Catmur

unread,
Oct 18, 2017, 8:17:51 PM10/18/17
to std-pr...@isocpp.org
That's up to the respective vendors. If they need to add private attributes to ensure that these quasi-union types continue to work as documented, then so be it. But there's no need to require such facilities to exist on platforms that don't have to support these APIs. 

Note that in the case of sockaddr, all the compiler needs is to permit casts between sockaddr, sockaddr_storage and the various protocol address types. There is no particular need for unions to be involved per my understanding. 


Melissa

--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Myriachan

unread,
Oct 18, 2017, 8:44:15 PM10/18/17
to ISO C++ Standard - Future Proposals
On Wednesday, October 18, 2017 at 5:17:51 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:00, "Myriachan" <myri...@gmail.com> wrote:
The optimizer people don't like us taking pointers to union members even when type punning isn't involved at all:


Anyway, the question is really about what we're going to do now.  We can't go back and change how struct sockaddr_in6, OSVERSIONINFOEXW, etc. work.

That's up to the respective vendors. If they need to add private attributes to ensure that these quasi-union types continue to work as documented, then so be it. But there's no need to require such facilities to exist on platforms that don't have to support these APIs. 


Name one "hosted" (rather than "freestanding") platform that doesn't use structures/unions in this way somewhere in their API.

Note that in the case of sockaddr, all the compiler needs is to permit casts between sockaddr, sockaddr_storage and the various protocol address types. There is no particular need for unions to be involved per my understanding. 


From compiler writers' perspectives, there is very little difference between these unions and structs with a common initial sequence that get reinterpret_cast among each other.  As an example, comment 20 in the link I gave above shows Clang messing up when no unions are involved, in a similar situation to how struct sockaddr works with sa_family.

Melissa

Edward Catmur

unread,
Oct 18, 2017, 9:28:41 PM10/18/17
to std-pr...@isocpp.org


On 19 Oct 2017 01:44, "Myriachan" <myri...@gmail.com> wrote:
On Wednesday, October 18, 2017 at 5:17:51 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:00, "Myriachan" <myri...@gmail.com> wrote:
The optimizer people don't like us taking pointers to union members even when type punning isn't involved at all:


Anyway, the question is really about what we're going to do now.  We can't go back and change how struct sockaddr_in6, OSVERSIONINFOEXW, etc. work.

That's up to the respective vendors. If they need to add private attributes to ensure that these quasi-union types continue to work as documented, then so be it. But there's no need to require such facilities to exist on platforms that don't have to support these APIs. 


Name one "hosted" (rather than "freestanding") platform that doesn't use structures/unions in this way somewhere in their API.

Hah. Well, if there's existing practice, there's something to standardize. But I don't think much more than the existing common initial sequence rule is really required. 


Note that in the case of sockaddr, all the compiler needs is to permit casts between sockaddr, sockaddr_storage and the various protocol address types. There is no particular need for unions to be involved per my understanding. 


From compiler writers' perspectives, there is very little difference between these unions and structs with a common initial sequence that get reinterpret_cast among each other.  As an example, comment 20 in the link I gave above shows Clang messing up when no unions are involved, in a similar situation to how struct sockaddr works with sa_family.

Sure, once you've got pointers to different elements of the same union. Casting between those pointers is problematic at present. This is still only common initial sequence stuff, though; there's no type punning at scalar level. 


Melissa

--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Hyman Rosen

unread,
Oct 19, 2017, 10:30:20 AM10/19/17
to std-pr...@isocpp.org
On Wed, Oct 18, 2017 at 9:28 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
Hah. Well, if there's existing practice, there's something to standardize. But I don't think much more than the existing common initial sequence rule is really required.

David M. Gay's old dtoa code (in C) uses both union type punning and memcpy across object boundaries (starting in the middle of one object!)  And indeed, it appears to have been made to stop working by modern compilers which started applying the aliasing rules: <http://patrakov.blogspot.com/2009/03/dont-use-old-dtoac.html>. 

this defeats type-based aliasing globally

Yes!  Defeat type-based aliasing (detection) now! 

Edward Catmur

unread,
Oct 19, 2017, 10:35:21 AM10/19/17
to std-pr...@isocpp.org
On Thu, Oct 19, 2017 at 3:29 PM, Hyman Rosen <hyman...@gmail.com> wrote:
On Wed, Oct 18, 2017 at 9:28 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
Hah. Well, if there's existing practice, there's something to standardize. But I don't think much more than the existing common initial sequence rule is really required.

David M. Gay's old dtoa code (in C) uses both union type punning and memcpy across object boundaries (starting in the middle of one object!)  And indeed, it appears to have been made to stop working by modern compilers which started applying the aliasing rules: <http://patrakov.blogspot.com/2009/03/dont-use-old-dtoac.html>. 

Sorry, I meant existing (implementation) vendor practice. Existing practice from third-party library vendors is relevant to the Library, but not to the language.

Arvid Gerstmann

unread,
Oct 19, 2017, 11:26:37 AM10/19/17
to std-pr...@isocpp.org
TBAA is a useful concept, it should not be purposefully be defeated.
Despite that, even if unions turn out to be not the best way to convert
between unrelated types, C++ should have a standard way which can be
used and taught. As of right now, 'memcpy' is the only working way.

I could think of a few ways:

- Making 'reinterpret_cast' work for converting between types,
which would not alias but strictly copy the bit-pattern into the
destination type (though, I could see how this pose a few
problems)

- Creating a new cast (for example 'bit_cast': https://godbolt.org/g/1PWp67),
which can act like type punning through unions did / does.

- Creating a new 'union' (or annotating a union), which restricts
it usages to only converting between types.
> --
> You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to std-proposal...@isocpp.org.
> To post to this group, send email to std-pr...@isocpp.org.
> To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAHSYqdZ6NUV5ypE-0AOfnJPoDxzq_4HNXVz0AWGAk6SaLNfn7w%40mail.gmail.com.
Arvid Gerstmann

Website: https://arvid.io | Skype: leandros9999
Twitter: https://twitter.com/ArvidGerstmann

Edward Catmur

unread,
Oct 19, 2017, 12:34:52 PM10/19/17
to std-pr...@isocpp.org
On Thu, Oct 19, 2017 at 4:26 PM, Arvid Gerstmann <a...@arvid.io> wrote:
TBAA is a useful concept, it should not be purposefully be defeated.
Despite that, even if unions turn out to be not the best way to convert
between unrelated types, C++ should have a standard way which can be
used and taught. As of right now, 'memcpy' is the only working way.

I could think of a few ways:

    - Making 'reinterpret_cast' work for converting between types,
    which would not alias but strictly copy the bit-pattern into the
    destination type (though, I could see how this pose a few
    problems)

    - Creating a new cast (for example 'bit_cast': https://godbolt.org/g/1PWp67),
    which can act like type punning through unions did / does.

Are you aware of the bit_cast proposal? https://wg21.link/p0476 

    - Creating a new 'union' (or annotating a union), which restricts
    it usages to only converting between types.

On 10/19, Hyman Rosen wrote:
> On Wed, Oct 18, 2017 at 9:28 PM, 'Edward Catmur' via ISO C++ Standard -
> Future Proposals <std-pr...@isocpp.org> wrote:
> >
> > Hah. Well, if there's existing practice, there's something to standardize.
> > But I don't think much more than the existing common initial sequence rule
> > is really required.
> >
>
> David M. Gay's old dtoa code (in C) uses both union type punning and memcpy
> across object boundaries (starting in the middle of one object!)  And
> indeed, it appears to have been made to stop working by modern compilers
> which started applying the aliasing rules: <
> http://patrakov.blogspot.com/2009/03/dont-use-old-dtoac.html>.
>
> this defeats type-based aliasing globally
> >
>
> Yes!  Defeat type-based aliasing (detection) now!
>
> --
> You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.
Arvid Gerstmann

Website: https://arvid.io | Skype: leandros9999
Twitter: https://twitter.com/ArvidGerstmann
--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

Myriachan

unread,
Oct 19, 2017, 3:16:22 PM10/19/17
to ISO C++ Standard - Future Proposals
On Wednesday, October 18, 2017 at 6:28:41 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:44, "Myriachan" <myri...@gmail.com> wrote:
On Wednesday, October 18, 2017 at 5:17:51 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:00, "Myriachan" <myri...@gmail.com> wrote:
The optimizer people don't like us taking pointers to union members even when type punning isn't involved at all:


Anyway, the question is really about what we're going to do now.  We can't go back and change how struct sockaddr_in6, OSVERSIONINFOEXW, etc. work.

That's up to the respective vendors. If they need to add private attributes to ensure that these quasi-union types continue to work as documented, then so be it. But there's no need to require such facilities to exist on platforms that don't have to support these APIs. 


Name one "hosted" (rather than "freestanding") platform that doesn't use structures/unions in this way somewhere in their API.

Hah. Well, if there's existing practice, there's something to standardize. But I don't think much more than the existing common initial sequence rule is really required. 


From the way Richard Smith and Hal Finkle talked about it in the Clang bug I linked, it sounded like they wanted to change the rule such that the common initial sequence rule is only valid if you access the "wrong type" through the union type.  This breaks sockaddr, because you don't pass a pointer to a union type, you pass a pointer to the sockaddr.  Also, most programs declare e.g. a sockaddr_in6, then just reinterpret_cast the pointer to sockaddr * for calling connect(), rather than have a union at all.  (I personally use a union for such code, but it's never been required.)

Are you aware of the bit_cast proposal? https://wg21.link/p0476


The above wasn't a reply to me, but I really want bit_cast.  It covers the simple situations like float-to-int reinterpretation very well, and it makes your intentions very clear to both programmers and compilers.  I just don't think that it should be the only way.

bit_cast does have an unresolved issue.  It makes perfect sense that bit_cast would be constexpr, but in that case, it would need a rule that bit_cast is not constexpr if any pointer or reference types are involved in either the to or from types.  It doesn't seem reasonable to require that bit_cast function in the face of compilers' constexpr evaluation.  (For example, allowing such a bit_cast in constexpr context would make it possible to create a .cpp file that when compiled pwns your machine, as opposed to when run.)

Melissa

Hyman Rosen

unread,
Oct 19, 2017, 4:07:10 PM10/19/17
to std-pr...@isocpp.org
On Thu, Oct 19, 2017 at 12:34 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
Are you aware of the bit_cast proposal? https://wg21.link/p0476 

Can bit_cast return a reference to the From object reinterpreted as the To object,
or is it strictly copied return value only?  If the latter, it's just more optimizationists
papering over what people really want to do.

Edward Catmur

unread,
Oct 19, 2017, 5:05:07 PM10/19/17
to std-pr...@isocpp.org
On Thu, Oct 19, 2017 at 8:16 PM, Myriachan <myri...@gmail.com> wrote:
On Wednesday, October 18, 2017 at 6:28:41 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:44, "Myriachan" <myri...@gmail.com> wrote:
On Wednesday, October 18, 2017 at 5:17:51 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:00, "Myriachan" <myri...@gmail.com> wrote:
The optimizer people don't like us taking pointers to union members even when type punning isn't involved at all:


Anyway, the question is really about what we're going to do now.  We can't go back and change how struct sockaddr_in6, OSVERSIONINFOEXW, etc. work.

That's up to the respective vendors. If they need to add private attributes to ensure that these quasi-union types continue to work as documented, then so be it. But there's no need to require such facilities to exist on platforms that don't have to support these APIs. 


Name one "hosted" (rather than "freestanding") platform that doesn't use structures/unions in this way somewhere in their API.

Hah. Well, if there's existing practice, there's something to standardize. But I don't think much more than the existing common initial sequence rule is really required. 


From the way Richard Smith and Hal Finkle talked about it in the Clang bug I linked, it sounded like they wanted to change the rule such that the common initial sequence rule is only valid if you access the "wrong type" through the union type.  This breaks sockaddr, because you don't pass a pointer to a union type, you pass a pointer to the sockaddr.  Also, most programs declare e.g. a sockaddr_in6, then just reinterpret_cast the pointer to sockaddr * for calling connect(), rather than have a union at all.  (I personally use a union for such code, but it's never been required.)

Well, that would be consistent, at least. As Hal Finkel says, if compilers break APIs like sockaddr, the solution to would be to add appropriate attributes. After all, vendors aren't going to break APIs that they document as supported. Now, if that happens, third-party vendors will be able to use those attributes to patch up their libraries to ensure that they continue to work with modern compilers, so ultimately - if there is demand - we can consider standardizing those attributes.

Edward Catmur

unread,
Oct 19, 2017, 5:11:37 PM10/19/17
to std-pr...@isocpp.org
References are not TriviallyCopyable, so the answer is no.

I am sure that plenty of "what people really want to do" would be satisfied by bit_cast as presented. For example, the dtoa you mention above could readily be implemented in terms of a value-semantic bit_cast.

Hyman Rosen

unread,
Oct 19, 2017, 5:36:51 PM10/19/17
to std-pr...@isocpp.org
On Thu, Oct 19, 2017 at 5:11 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
I am sure that plenty of "what people really want to do" would be satisfied by bit_cast as presented. For example, the dtoa you mention above could readily be implemented in terms of a value-semantic bit_cast.

People want to work in place, not copy back and forth.
That's been true ever since Fortran had an EQUIVALENCE statement.

And why is bit_cast restricted to trivially copyable types anyway?
Why shouldn't I be able to muck around with class innards,
including vtables and references and such, if I want to?

Nicol Bolas

unread,
Oct 19, 2017, 6:37:21 PM10/19/17
to ISO C++ Standard - Future Proposals


On Thursday, October 19, 2017 at 5:05:07 PM UTC-4, Edward Catmur wrote:


On Thu, Oct 19, 2017 at 8:16 PM, Myriachan <myri...@gmail.com> wrote:
On Wednesday, October 18, 2017 at 6:28:41 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:44, "Myriachan" <myri...@gmail.com> wrote:
On Wednesday, October 18, 2017 at 5:17:51 PM UTC-7, Edward Catmur wrote:
On 19 Oct 2017 01:00, "Myriachan" <myri...@gmail.com> wrote:
The optimizer people don't like us taking pointers to union members even when type punning isn't involved at all:


Anyway, the question is really about what we're going to do now.  We can't go back and change how struct sockaddr_in6, OSVERSIONINFOEXW, etc. work.

That's up to the respective vendors. If they need to add private attributes to ensure that these quasi-union types continue to work as documented, then so be it. But there's no need to require such facilities to exist on platforms that don't have to support these APIs. 


Name one "hosted" (rather than "freestanding") platform that doesn't use structures/unions in this way somewhere in their API.

Hah. Well, if there's existing practice, there's something to standardize. But I don't think much more than the existing common initial sequence rule is really required. 


From the way Richard Smith and Hal Finkle talked about it in the Clang bug I linked, it sounded like they wanted to change the rule such that the common initial sequence rule is only valid if you access the "wrong type" through the union type.  This breaks sockaddr, because you don't pass a pointer to a union type, you pass a pointer to the sockaddr.  Also, most programs declare e.g. a sockaddr_in6, then just reinterpret_cast the pointer to sockaddr * for calling connect(), rather than have a union at all.  (I personally use a union for such code, but it's never been required.)

Well, that would be consistent, at least. As Hal Finkel says, if compilers break APIs like sockaddr, the solution to would be to add appropriate attributes. After all, vendors aren't going to break APIs that they document as supported.

So, you expect compiler vendors to, in order:

1: Break existing code.

2: Provide non-standard attributes to undo the breakage, which everyone will now have to go and edit their code to use.

3: Hope that some attribute for this will be standardized.

No, that doesn't seem very likely to happen. It's far more likely that step 1 will be avoided, since it would almost certainly mean that the compiler itself would cease to be self-hosting.

Nicol Bolas

unread,
Oct 19, 2017, 6:41:20 PM10/19/17
to ISO C++ Standard - Future Proposals
On Thursday, October 19, 2017 at 5:36:51 PM UTC-4, Hyman Rosen wrote:
On Thu, Oct 19, 2017 at 5:11 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
I am sure that plenty of "what people really want to do" would be satisfied by bit_cast as presented. For example, the dtoa you mention above could readily be implemented in terms of a value-semantic bit_cast.

People want to work in place, not copy back and forth.
That's been true ever since Fortran had an EQUIVALENCE statement.

I can't really bring myself to care about such things. As long as you can get the job done that you want, so long as the method is not obtuse, what does it matter how it happens?

Oh, and if you really want to do an in-place cast:

some_type *ptr = ...
auto other_ptr = new(ptr) auto(bit_cast<other_type>(*ptr));

Thanks to guaranteed elision, we are guaranteed that the prvalue returned will directly initialize the memory at address `ptr`. So the return value object must have the same address as the parameter object. So unless there is something explicitly forbidding this, it should work fine.

And why is bit_cast restricted to trivially copyable types anyway?
Why shouldn't I be able to muck around with class innards,
including vtables and references and such, if I want to?

Because we like our object models to make sense.

Edward Catmur

unread,
Oct 19, 2017, 7:26:22 PM10/19/17
to std-pr...@isocpp.org
"Everyone" is a bit of a stretch. Most people don't author APIs that would be broken by such a change.

Ricardo Fabiano de Andrade

unread,
Oct 19, 2017, 7:29:42 PM10/19/17
to std-pr...@isocpp.org
Humm ... no, because placement new is not guaranteed to preserve the contents of ptr ?

On Thu, Oct 19, 2017 at 5:41 PM Nicol Bolas <jmck...@gmail.com> wrote:

Nicol Bolas

unread,
Oct 19, 2017, 8:18:52 PM10/19/17
to ISO C++ Standard - Future Proposals
On Thursday, October 19, 2017 at 7:29:42 PM UTC-4, Ricardo Andrade wrote:
Humm ... no, because placement new is not guaranteed to preserve the contents of ptr ?

The call to `::operator new(size_t, void*)` is not allowed to modify `ptr`. The standard very clearly says, "Intentionally performs no other action." No action other than returning `ptr`.

The expression `new(ptr) auto(bit_cast<other_type>(ptr));` has 3 steps: resolve the memory which the object will be initialized in, resolve the expression used to initialize the object, and initialize the object inside the memory from step 1. It is the initialization of the object inside that memory which is not guaranteed to preserve the contents of that memory.

The rules of guaranteed elision tells us that the prvalue returned by `bit_cast` must directly initialize the memory of `ptr`. Therefore, `bit_cast`'s return value must be that address. And therefore, `bit_cast` is performing a copy from an address to the same address. And `bit_cast`'s defining trait is that it does preserve the contents of the memory. That is, the object being output must have the same bits as the object it was given.

What happens when the object it is given and the address it is writing to are the same thing is... well, it's the language-based version of having to protect against self-copy-assignment.

Ricardo Fabiano de Andrade

unread,
Oct 19, 2017, 11:59:30 PM10/19/17
to std-pr...@isocpp.org
Actually, I wasn't precise enough.
I meant "the contents pointed by ptr are not guaranteed to be preserved", which is what you've explained with "It is the initialization of the object inside that memory which is not guaranteed to preserve the contents of that memory."
What I would understand that the pointee of `new(ptr) auto(bit_cast<other_type>(ptr));` good to write to, but never to read from.

Are you saying that `bit_cast` circumvent that limitation by "memcpy initializing" the pointee with its own contents?

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposals+unsubscribe@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

Edward Catmur

unread,
Oct 20, 2017, 1:31:04 AM10/20/17
to std-pr...@isocpp.org


On 20 Oct 2017 04:59, "Ricardo Fabiano de Andrade" <ricardofabi...@gmail.com> wrote:
Actually, I wasn't precise enough.
I meant "the contents pointed by ptr are not guaranteed to be preserved", which is what you've explained with "It is the initialization of the object inside that memory which is not guaranteed to preserve the contents of that memory."
What I would understand that the pointee of `new(ptr) auto(bit_cast<other_type>(ptr));` good to write to, but never to read from.

Are you saying that `bit_cast` circumvent that limitation by "memcpy initializing" the pointee with its own contents?


Hm, I don't think this will work, at least if bit_cast vivifies the `to` object, terminating `from`, before the memcpy. Firstly, I don't think basic.life allows you to memcpy from a dead object. Secondly, the process of terminating the old object and starting the lifetime of the new can itself alter the storage of the object, so there's no guarantee the bits you're copying are the same as they were. 

(There was a discussion a little while back about whether you can secretly pass values to a constructor by prefilling its storage; I'm fairly sure the conclusion was that you can't.) 

This isn't the fault of bit_cast, though; you can have the same issue with any function that takes a reference and returns a prvalue. 

Bo Persson

unread,
Oct 20, 2017, 5:01:47 AM10/20/17
to std-pr...@isocpp.org
On 2017-10-19 23:36, Hyman Rosen wrote:
> On Thu, Oct 19, 2017 at 5:11 PM, 'Edward Catmur' via ISO C++ Standard -
> Future Proposals <std-pr...@isocpp.org
> <mailto:std-pr...@isocpp.org>> wrote:
>
> I am sure that plenty of "what people really want to do" would be
> satisfied by bit_cast as presented. For example, the dtoa you
> mention above could readily be implemented in terms of a
> value-semantic bit_cast.
>
>
> People want to work in place, not copy back and forth.

If you check the code generated by the popular compilers, you will see
that `bit_cast` doesn't really copy anything (despite containing a memcpy).

Returning a value is optimized into a register load from the source.
That's it.



Bo Persson


Hyman Rosen

unread,
Oct 20, 2017, 10:23:35 AM10/20/17
to std-pr...@isocpp.org
On Fri, Oct 20, 2017 at 5:01 AM, Bo Persson <b...@gmb.dk> wrote:
If you check the code generated by the popular compilers, you will see that `bit_cast` doesn't really copy anything (despite containing a memcpy).

Returning a value is optimized into a register load from the source. That's it.

Nevertheless, if I want to, say, set a float to its next representable value, I want to do
    ++(uint32_t &)float_val;
instead of
    float_val = bit_cast<float>(1 + bit_cast<uint32_t>(float_val));

Thiago Macieira

unread,
Oct 20, 2017, 12:20:18 PM10/20/17
to std-pr...@isocpp.org
On Thursday, 19 October 2017 17:18:52 PDT Nicol Bolas wrote:
> The expression `new(ptr) auto(bit_cast<other_type>(ptr));` has 3 steps:
> resolve the memory which the object will be initialized in, resolve the
> expression used to initialize the object, and initialize the object inside
> the memory from step 1. It is the initialization of the object inside that
> memory which is not guaranteed to preserve the contents of that memory.

Not entirely correct.

*Because* you're initialising a new object there, the previous contents can be
discarded and the compiler is allowed to assume that a new object's lifetime
began there. Any previously-stored values are lost.

GCC 6 implements this with the Dead Store Elimination optimisation. We ran
into this in the QtQml virtual machine and it was very hard to track down...

> The rules of guaranteed elision tells us that the prvalue returned by
> `bit_cast` must directly initialize the memory of `ptr`. Therefore,
> `bit_cast`'s return value must be that address. And therefore, `bit_cast`
> is performing a copy from an address to the same address. And `bit_cast`'s
> defining trait is that it *does preserve* the contents of the memory. That
> is, the object being output must have the same bits as the object it was
> given.

Right, this is what makes it work: you're initialising the new object at ptr
with the value that was already there, found via memcpy.

> What happens when the object it is given and the address it is writing to
> are the same thing is... well, it's the language-based version of having to
> protect against self-copy-assignment.

We just have to hope the compiler optimises this to nothing.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Thiago Macieira

unread,
Oct 20, 2017, 12:24:42 PM10/20/17
to std-pr...@isocpp.org
On Friday, 20 October 2017 07:23:11 PDT Hyman Rosen wrote:
> On Fri, Oct 20, 2017 at 5:01 AM, Bo Persson <b...@gmb.dk> wrote:
> > If you check the code generated by the popular compilers, you will see
> > that `bit_cast` doesn't really copy anything (despite containing a
> > memcpy).
> >
> > Returning a value is optimized into a register load from the source.
> > That's it.
>
> Nevertheless, if I want to, say, set a float to its next representable
> value, I want to do

No, you want to do:
float_val = std::nextafter(float_val, std::numeric_limits<double>::infinity());

Ville Voutilainen

unread,
Oct 20, 2017, 12:25:29 PM10/20/17
to ISO C++ Standard - Future Proposals
On 20 October 2017 at 19:19, Thiago Macieira <thi...@macieira.org> wrote:
>> What happens when the object it is given and the address it is writing to
>> are the same thing is... well, it's the language-based version of having to
>> protect against self-copy-assignment.
>
> We just have to hope the compiler optimises this to nothing.


Optimizations should not affect semantics. Either you can rely on it
or you can't.
Sure, there's a vast space of UB where the _actual_ outcome, which is
not semantically
guaranteed, can do vastly different things, but for reasonable code we
should work towards
avoiding the dragons of "this code may launch.. eh, balloons, next
year". We have bit-blasts
in Qt that happen to work. Some of them work because we tell the
compiler to stop UB-related
memory tracking via casts. Unless the current trend of aggressive
UB-related optimization
by compilers is curbed, we are going to need to turn some of that UB
into non-UB.

Edward Catmur

unread,
Oct 20, 2017, 12:26:37 PM10/20/17
to std-pr...@isocpp.org
And float_val = std::nextafter(float_val) is too self-explanatory? 

Hyman Rosen

unread,
Oct 20, 2017, 2:21:17 PM10/20/17
to std-pr...@isocpp.org
On Fri, Oct 20, 2017 at 12:26 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
And float_val = std::nextafter(float_val) is too self-explanatory?

It was just an example.  And what do you think the implementors of nextafter do?

Arvid Gerstmann

unread,
Oct 20, 2017, 2:53:15 PM10/20/17
to std-pr...@isocpp.org
This https://github.com/ifduyue/musl/blob/master/src/math/nextafter.c
Mit freundlichen Grüßen, / Yours sincerely,

Arvid Gerstmann <a...@arvid-g.de>

---
Portfolio: http://arvid-gerstmann.de
Twitter: http://twitter.com/ArvidGerstmann
Google+: http://google.com/+ArvidGerstmann
---
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "ISO C++ Standard - Future Proposals" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> std-proposal...@isocpp.org.
> To post to this group, send email to std-pr...@isocpp.org.
> To view this discussion on the web visit
> https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAHSYqdYnhj81dKhfGhRu4wOv%3DJz54JtesOTa76rupDadPeAdKg%40mail.gmail.com.

Hyman Rosen

unread,
Oct 20, 2017, 3:05:48 PM10/20/17
to std-pr...@isocpp.org
On Fri, Oct 20, 2017 at 2:52 PM, Arvid Gerstmann <a...@arvid-g.de> wrote:
This https://github.com/ifduyue/musl/blob/master/src/math/nextafter.c

Exactly.  To quote a few lines,

double nextafter(double x, double y) {
    union {double f; uint64_t i;} ux={x}, uy={y};
    // ...
    if (ux.i == uy.i) return y;

Union type punning.

Edward Catmur

unread,
Oct 20, 2017, 3:59:09 PM10/20/17
to std-pr...@isocpp.org
In non-portable C. I fail to see the relevance. 

Thiago Macieira

unread,
Oct 20, 2017, 4:38:45 PM10/20/17
to std-pr...@isocpp.org
On Friday, 20 October 2017 09:25:26 PDT Ville Voutilainen wrote:
> On 20 October 2017 at 19:19, Thiago Macieira <thi...@macieira.org> wrote:
> >> What happens when the object it is given and the address it is writing to
> >> are the same thing is... well, it's the language-based version of having
> >> to
> >> protect against self-copy-assignment.
> >
> > We just have to hope the compiler optimises this to nothing.
>
> Optimizations should not affect semantics. Either you can rely on it
> or you can't.

That's not what I meant.

Nicol's code effectively does:
T tmp;
memcpy(&tmp, ptr, sizeof(T));
new (ptr) T(tmp);

Which is to say: read from ptr to a temporary, then write back to ptr from
that temporary. That is perfectly legal and there's no UB. The dead store
elimination can't cause a problem, since we reinitialised an object of type T
at ptr.

What I meant is that I hope the compiler generates zero code for this, since
it really does nothing at the machine level.

And they apparently work like that: https://godbolt.org/g/vveJZN
[ICC seems to be having trouble with the placement operator new returning
nullptr]

Thiago Macieira

unread,
Oct 20, 2017, 4:51:59 PM10/20/17
to std-pr...@isocpp.org
Assembly.

https://code.woboq.org/userspace/glibc/sysdeps/ia64/fpu/s_nextafterf.S.html
https://code.woboq.org/userspace/glibc/sysdeps/ia64/fpu/s_nextafter.S.html
https://code.woboq.org/userspace/glibc/sysdeps/ia64/fpu/s_nextafterl.S.html

https://code.woboq.org/userspace/glibc/sysdeps/ieee754/flt-32/
s_nextafterf.c.html

Note how GET_FLOAT_WORD uses inline assembly.

Yes, there is one implementation in
https://code.woboq.org/userspace/glibc/math/s_nextafter.c.html
which uses the union trick. But it's relying on GCC allowing the type punning
via union, which is a documented feature.

It would miscompile in compilers that don't allow that. But glibc uses so many
GCC-specific extensions I'd doubt that it compiles even with Clang.

Note also how the equivalent EXTRACT_WORDS64 macro, defined in
https://code.woboq.org/userspace/glibc/sysdeps/x86_64/fpu/
math_private.h.html#_M/EXTRACT_WORDS64
uses assembly.

Bo Persson

unread,
Oct 21, 2017, 6:17:16 AM10/21/17
to std-pr...@isocpp.org
Again, this is standard library code which doesn't have to be portable.
It is not UB if your specific compiler defines the behavior.


Bo Persson




Hyman Rosen

unread,
Oct 22, 2017, 5:18:59 PM10/22/17
to std-pr...@isocpp.org
On Sat, Oct 21, 2017 at 6:16 AM, Bo Persson <b...@gmb.dk> wrote:
Again, this is standard library code which doesn't have to be portable. It is not UB if your specific compiler defines the behavior.
It's something that wants to be an implementation of a standard library
function, but it's not an implementation by a specific compiler vendor.

More to the point, if people trying to write useful code have to write
non-portable code that exhibits UB by the language standard but not
by the specific compiler, that very much suggests that the language
standard is deficient or wrong.

And even more to the point, having this sort of UB in the language
allows new versions compilers, influenced by the optimizationists,
to silently break existing code, as happened with old versions of dtoa.

And yet even more, the language of the standard that purports to define
legal and portable code is so impenetrable that it is impossible to know if
a program is obeying the rules or not.

Edward Catmur

unread,
Oct 24, 2017, 2:36:55 AM10/24/17
to std-pr...@isocpp.org


On 22 Oct 2017 22:19, "Hyman Rosen" <hyman...@gmail.com> wrote:
On Sat, Oct 21, 2017 at 6:16 AM, Bo Persson <b...@gmb.dk> wrote:
Again, this is standard library code which doesn't have to be portable. It is not UB if your specific compiler defines the behavior.
It's something that wants to be an implementation of a standard library
function, but it's not an implementation by a specific compiler vendor.

More to the point, if people trying to write useful code have to write
non-portable code that exhibits UB by the language standard but not
by the specific compiler, that very much suggests that the language
standard is deficient or wrong.

Most useful code is not attempting to reinvent a wheel already provided by vendors. 


And even more to the point, having this sort of UB in the language
allows new versions compilers, influenced by the optimizationists,
to silently break existing code, as happened with old versions of dtoa.

Code that is not portable cannot be expected to work across versions of a compiler. The breakage as such is hardly silent; there are release notes, and optimization flags, and sanitizers. 


And yet even more, the language of the standard that purports to define
legal and portable code is so impenetrable that it is impossible to know if
a program is obeying the rules or not.

If the rules truly are impenetrable, that is an argument for clarification, not removal. 

--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/mPFd9uLsus8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.

To post to this group, send email to std-pr...@isocpp.org.

Hyman Rosen

unread,
Oct 24, 2017, 10:44:00 AM10/24/17
to std-pr...@isocpp.org
On Tue, Oct 24, 2017 at 2:36 AM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
Most useful code is not attempting to reinvent a wheel already provided by vendors.

Dismissing as irrelevant every piece of code written in a style that you don't
like is not a good way to convince people who don't already believe you of the
correctness of your views.

I don't care that type-based alias analysis may let compilers make code run
faster.  Those compilers have broken programs that had been behaving as
their authors expected for literal decades.  Those compilers have introduced
security errors by removing code that relied on signed integer overflow wrapping.
I think it's only recently that they have started emitting warnings in some cases
about what they're doing, and then they get the warnings wrong:
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55217>.

People on these lists, who are presumably much more expert in C++ than the
average bear, have trouble agreeing on whether some piece of code has defined
behavior.  But we are expected to trust that compiler programmers are getting
this right and are not going to be introducing silently lurking errors into compiled
code.

The standardization process is encouraging vendors to make brittle and broken
tools that fail to carry the intent of the programmers into the compiled programs.
The standardization process is wrapping itself into knots trying to express the
inexpressible.  And all so that optimizationists can point and say "look at that!"

Edward Catmur

unread,
Oct 24, 2017, 4:47:35 PM10/24/17
to std-pr...@isocpp.org


On 24 Oct 2017 15:44, "Hyman Rosen" <hyman...@gmail.com> wrote:
On Tue, Oct 24, 2017 at 2:36 AM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
Most useful code is not attempting to reinvent a wheel already provided by vendors.

Dismissing as irrelevant every piece of code written in a style that you don't
like is not a good way to convince people who don't already believe you of the
correctness of your views.

That was not my intention; rewriting standard library facilities is entirely relevant and indeed necessary for some specific use cases. But it is still very much a minority pursuit, so there is little point to concentrate on it at the expense of correctness and performance of ordinary programs. 


I don't care that type-based alias analysis may let compilers make code run
faster.  Those compilers have broken programs that had been behaving as
their authors expected for literal decades.  Those compilers have introduced
security errors by removing code that relied on signed integer overflow wrapping.
I think it's only recently that they have started emitting warnings in some cases
about what they're doing, and then they get the warnings wrong:
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55217>.

That bug is five years old. Personally I find sanitizers more useful than those warnings. 


People on these lists, who are presumably much more expert in C++ than the
average bear, have trouble agreeing on whether some piece of code has defined
behavior.  But we are expected to trust that compiler programmers are getting
this right and are not going to be introducing silently lurking errors into compiled
code.

We have to trust compiler writers regardless, barring formally verification. And formal verification is easier the smaller and stricter the language it proves. 


The standardization process is encouraging vendors to make brittle and broken
tools that fail to carry the intent of the programmers into the compiled programs.
The standardization process is wrapping itself into knots trying to express the
inexpressible.  And all so that optimizationists can point and say "look at that!"

Vendors contribute to the standards process and do so in response to the requirements of their customers. Permissive compilers are entirely possible within the standard, but are not what customers want. 
Reply all
Reply to author
Forward
0 new messages