Fun with unions

fmatthew5876

unread,

Dec 12, 2012, 4:37:45 PM12/12/12

to

Do you believe the following will work as intended according to the standard?
In otherwords, we should expect Vec4<float> to contain exactly 4 floats and
the various unions members become syntactic sugar.

For example if you want the middle two floats, you can do v.yz or v.gb.
Or if you do the following, you should expect the conditions to hold true.

v.xy = Vec2<float>(3.0f, 8.0f);
v.x == 3.0f
v.y == 8.0f

Assume Vec2 and Vec3 are defined similarly.

template <typename T>
struct Vec4 {
Vec4() = default;
explicit Vec4(T v);
Vec4(T xx, T yy, T zz, T ww);
Vec4(const Vec2<T>& xy, T zz, T ww);
Vec4(T xx, const Vec2<T>& yz, T ww);
Vec4(T xx, T yy, const Vec2<T>& zw);
Vec4(const Vec2<T>& xy, const Vec2<T>& zw);
Vec4(const Vec3<T>& xyz, T ww);
Vec4(T xx, const Vec3<T>& yzw);
explicit Vec4(const T* v);

T& operator[](int i);
const T& operator[](int i) const;

enum Constants {
size = 4,
};
typedef T value_type;

union {
T m[Vec4<T>::size];
Vec3<T> xyz;
Vec2<T> xy;
struct {
T x;
union {
Vec3<T> yzw;
Vec2<T> yz;
struct {
T y;
union {
Vec2<T> zw;
struct {
T z;
T w;
};
};
};
};
};
Vec3<T> rgb;
Vec2<T> rb;
struct {
T r;
union {
Vec3<T> gba;
Vec2<T> gb;
struct {
T g;
union {
Vec2<T> ba;
struct {
T b;
T a;
};
};
};
};
};
Vec3<T> uvs;
Vec2<T> uv;
struct {
T u;
union {
Vec3<T> vst;
Vec2<T> vs;
struct {
T v;
union {
Vec2<T> st;
struct {
T s;
T t;
};
};
};
};
};
};
};

--
[ See http://www.gotw.ca/resources/clcm.htm for info about ]
[ comp.lang.c++.moderated. First time posters: Do this! ]

fmatthew5876

unread,

Dec 12, 2012, 10:42:48 PM12/12/12

to

Section 9.2.19 of the standard says
"If a standard-layout union contains two or more standard-layout structs that
share a common initial sequence,
and if the standard-layout union object currently contains one of these
standard-layout structs, it is permitted
to inspect the common initial part of any of them. Two standard-layout structs
share a common initial
sequence if corresponding members have layout-compatible types and either
neither member is a bit-field or
both are bit-fields with the same width for a sequence of one or more initial
members."

So for

union A {
struct {
float x, y;
int z;
};
struct {
float r, g;
char b;
};
};

If I understand the "common initial sequence" property correctly this means
you can safely and standards compliantly read and write to x and r, y and g.

But how about this:

union B {
float x;
struct {
float r;
};
};

Do the members x and the anonymous struct containing r have a "common initial
sequence" namely of a single float?

Bart van Ingen Schenau

unread,

Dec 13, 2012, 9:09:34 AM12/13/12

to

On Wed, 12 Dec 2012 19:42:48 -0800, fmatthew5876 wrote:

> But how about this:
>
> union B {
> float x;
> struct {
> float r;
> };
> };
>
> Do the members x and the anonymous struct containing r have a "common
> initial sequence" namely of a single float?

No, they don't have a "common initial sequence", because x is not a
member of a struct (which in turn would be a member of the union). So
technically the results would not be guaranteed if you write to r and
then read the value of x.
But, a compiler would have to go out of its way to give it other behaviour
than that which is mandated for common initial sequences, because the
assertion "B b; assert(&b.x == &b.r);" must still hold. There may not be
any padding at the start of a POD-struct and the members of a union must
be allocated as if they are all the initial member.

Bart v Ingen Schenau

Johannes Sixt

unread,

Dec 14, 2012, 12:00:57 PM12/14/12

to

On 13 Dez., 15:09, Bart van Ingen Schenau

<b...@ingen.ddns.info.invalid> wrote:
> On Wed, 12 Dec 2012 19:42:48 -0800, fmatthew5876 wrote:
> > But how about this:
>
> > union B {
> > float x;
> > struct {
> > float r;
> > };
> > };
>
> > Do the members x and the anonymous struct containing r have a "common
> > initial sequence" namely of a single float?
>
> No, they don't have a "common initial sequence", because x is not a
> member of a struct (which in turn would be a member of the union). So
> technically the results would not be guaranteed if you write to r and
> then read the value of x.
> But, a compiler would have to go out of its way to give it other behaviour
> than that which is mandated for common initial sequences, because the
> assertion "B b; assert(&b.x == &b.r);" must still hold.

But this is still not immune against aliasing problems: If you write a new
value to x, then read from r, you trigger undefined behavior as per 3.10p10
because of the lack of other guarantees, no?

-- Hannes

fmatthew5876

unread,

Dec 14, 2012, 7:34:34 PM12/14/12

to

On Friday, December 14, 2012 12:00:57 PM UTC-5, Johannes Sixt wrote:
> On 13 Dez., 15:09, Bart van Ingen Schenau
> <b...@ingen.ddns.info.invalid> wrote:

> But this is still not immune against aliasing problems: If you write a new
> value to x, then read from r, you trigger undefined behavior as per 3.10p10
> because of the lack of other guarantees, no?

x and r have the same address, does this not imply you can read and
write to them interchangeably?

Jeff Flinn

unread,

Dec 17, 2012, 2:48:44 PM12/17/12

to

On 12/14/2012 7:34 PM, fmatthew5876 wrote:
> On Friday, December 14, 2012 12:00:57 PM UTC-5, Johannes Sixt wrote:
>> On 13 Dez., 15:09, Bart van Ingen Schenau
>> <b...@ingen.ddns.info.invalid> wrote:
>
>> But this is still not immune against aliasing problems: If you write a new
>> value to x, then read from r, you trigger undefined behavior as per 3.10p10
>> because of the lack of other guarantees, no?
>
> x and r have the same address, does this not imply you can read and
> write to them interchangeably?

My understanding is no. Google discriminated unions.

Jeff

fmatthew5876

unread,

Dec 17, 2012, 7:29:47 PM12/17/12

to

On Monday, December 17, 2012 2:48:44 PM UTC-5, Jeff Flinn wrote:
> My understanding is no. Google discriminated unions.
>

Sorry I should have added to that, x and r have the same address and the
same type. If they have different types all bets are off.

Daniel Krügler

unread,

Dec 18, 2012, 9:27:35 AM12/18/12

to

On 2012-12-18 01:29, fmatthew5876 wrote:
> On Monday, December 17, 2012 2:48:44 PM UTC-5, Jeff Flinn wrote:
>> My understanding is no. Google discriminated unions.
>
> Sorry I should have added to that, x and r have the same address and
> the same type. If they have different types all bets are off.

They *do* have the same address (even though this is nowhere directly
expressed yet), but your incorrect assumption is (at least I
understand you claiming it) that you can *write* an object as X,
followed by a *read* as Y (where X and Y are different types). Modulo
some special rules in regard to arrays of (unsigned) char, these are
restricted by the aliasing constraint of C++, see also

http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1116

HTH & Greetings from Bremen,

Daniel Kr�gler

fmatthew5876

unread,

Jan 12, 2013, 2:03:02 PM1/12/13

to

On Tuesday, December 18, 2012 9:27:35 AM UTC-5, Daniel Kr�gler wrote:
> On 2012-12-18 01:29, fmatthew5876 wrote:
>
> > On Monday, December 17, 2012 2:48:44 PM UTC-5, Jeff Flinn wrote:
>
> >> My understanding is no. Google discriminated unions.

Lets try to prove for a simple case that we can read and write
two different union members interchangably.

Consider the following:

template <typename T>
struct vec2 {
union {
T m[2];
struct {
T x;
T y;
};
};

We know that &m[0] == &x. In other words the address of of the array
of T m and the T x are the same. What about &m[1] and &y? m[1] and y
have the same type, namely T. If we can show that &m[1] == &y, then
we can safely read and write m[1] and y interchangeably.

We can prove that &m[1] == &y by proving that
(&m[1] - &m[0]) == (&y - &x).

First, lets consider (&m[1] - &m[0]). This quantity is required
to be sizeof(T). The standard mandates that the size of all
objects be large enough so that they can be stored in an array
so that alignment constraints are satisfied without any *external*
padding. Without this hard requirement, simple pointer arithmetic
would no longer work.

How about (&y - &x)? First, we cannot have (&y - &x) < sizeof(T).
\If we did, then part of x would overlap with y, which is
prohibited. Can we have (&y - &x) > sizeof(T)?

Assume (&y - &x) > sizeof(T), then the compiler must have inserted
some padding between x and y.

Here is a quote from 9.2.14 of the C++11 standard.
"Nonstatic data members of a (non-union) class with the same
access control (Clause 11) are allocated so that later
members have higher addresses within a class object. The
order of allocation of non-static data members with
different access control is unspecified (11). Implementation
alignment requirements might cause two adjacent members not
to be allocated immediately after each other; so might
requirements for space for managing virtual functions
(10.3) and virtual base classes (10.1)."

This paragragh states two possible reasons for a class/struct
to pad elements. The first is for managing virtual functions.
vec2 has no virtual functions so the compiler cannot add padding
for this reason. The only other reason the compiler is allowed
to add padding is for "implementation alignment requirements."

Are there any implementation alignment requirements that may
require padding between y and x of type T? The answer is no
because if there were, you could not create an array of type T.

The standard does not say the compiler is free to add padding
for any reason it wants. Only for virtual and alignment, which
we showed do not apply here.

Therefore, the compiler cannot add padding between y and x.
Therefore, (&y - &x) is not > sizeof(T), we also showed (&y-&x)
is not < sizeof(T).
Therefore, (&y - &x) = sizeof(T) = (&m[1] - &m[0]).
Since &m[0] == &x, (&y - &x) = (&m[1] - &x)
which implies &y = &m[1].
and finally since y and m[1] have the same address and same
type, they can be read and written from interchangeably.

Please go ahead and destroy my proof. Lets figure this out.

(Either way this exercise is entirely academic, no sane
compiler is going to add extra padding between 2 struct
members of the same type).

Gerhard Fiedler

unread,

Jan 13, 2013, 4:07:51 AM1/13/13

to

On 2013-01-12 13:03:02, in comp.lang.c++.moderated fmatthew5876 wrote:

> On Tuesday, December 18, 2012 9:27:35 AM UTC-5, Daniel Kr�gler wrote:
>
> Lets try to prove for a simple case that we can read and write
> two different union members interchangably.

AIUI, proving that the addresses are the same doesn't prove that we can
read and write interchangeably. Couldn't it be that the compiler assumes
(based on the standard) that this is not possible and uses some
optimizations that only work if the assumption is correct?

IMO, what you're trying to prove here is that a struct with multiple
members of the same type has the same size (and therefore alignment) as
an array. That's something quite different than what you state in this
phrase.

> Here is a quote from 9.2.14 of the C++11 standard.
> "Nonstatic data members of a (non-union) class with the same
> access control (Clause 11) are allocated so that later
> members have higher addresses within a class object. The
> order of allocation of non-static data members with
> different access control is unspecified (11). Implementation
> alignment requirements might cause two adjacent members not
> to be allocated immediately after each other; so might
> requirements for space for managing virtual functions
> (10.3) and virtual base classes (10.1)."
>
> This paragragh states two possible reasons for a class/struct
> to pad elements. The first is for managing virtual functions.
> vec2 has no virtual functions so the compiler cannot add padding
> for this reason. The only other reason the compiler is allowed
> to add padding is for "implementation alignment requirements."
>
> Are there any implementation alignment requirements that may
> require padding between y and x of type T? The answer is no
> because if there were, you could not create an array of type T.
>
> The standard does not say the compiler is free to add padding
> for any reason it wants. Only for virtual and alignment, which
> we showed do not apply here.
>
> Therefore, the compiler cannot add padding between y and x.

I'm not sure you interpreted "implementation alignment requirements"
correctly. This is not a black/white question; there are degrees of
requirements: there are possible alignments, and there are optimal
alignments.

Couldn't it be that aligning the objects as required by an array is
suboptimal, and in the array the compiler aligns them suboptimally
because that is required by the standard, but follows the "alignment
requirements" for optimal alignment of the given platform for the struct
case (because it can, according to the standard, in the way I understand
this)?

Gerhard

Francis Glassborow

unread,

Jan 13, 2013, 4:04:54 AM1/13/13

to

But it notably does not state that these are the only reasons. It just
gives two common ones. So your argument fails.

BTW one reason compilers add padding is for purposes of debugging by
ensuring that distinct objects have space between them which allows
checking for writing one beyond the end of an array (and single objects
are arrays of one -- goes all the way back to C)

Richard Damon

unread,

Jan 13, 2013, 4:33:29 AM1/13/13

to

On 1/12/13 2:03 PM, fmatthew5876 wrote:

Let me put forward a viable case that is a counter example.

Let T be "long double", implemented as a 10 byte IEEE float. It might
have a "hard" alignment requirement of 2, but this type of access causes
some slow down to fetch, so the compiler might impose an opportunistic
16 byte alignment. Because of the rules for arrays, m[1] will be aligned
for a valid, but slow access, but y might be spaced with 6 byte of
padding so that it can use the faster access.

Note also in your proof, you have invoked undefined behavior, as &y - &x
is not defined, - for pointers only has defined behavior if the two
operands are members of the same array. (Also note that by definition
&m[1] - &m[0] == 1, since the difference of pointers work as the
converse of pointer addition, and (&m[0])+1 = &m[1].

The standard imposes few limitations on what the implementation may
chose for alignment rules. There is no explicit requirement that
"Implementation alignment requirements" force object to be at their
minimum alignment requirement. Also, the phrase "Implementation

alignment requirements might cause two adjacent members not

to be allocated immediately after each other;" is written in a
permissive manner, not restrictive, and contains no solid constraints on
the implementation (it contains no MUST or SHALL, or even a SHOULD), but
to me seems more an explanatory comment warning the programmer that they
should NOT make the assumption that elements are packed next to each other.

fmatthew5876

unread,

Jan 13, 2013, 9:23:37 PM1/13/13

to

> AIUI, proving that the addresses are the same doesn't prove that we can
> read and write interchangeably. Couldn't it be that the compiler assumes
> (based on the standard) that this is not possible and uses some
> optimizations that only work if the assumption is correct?
>

But would that not violate the fact that if two pointers of the same
type point to the same object then they alias the same address?

>
>
> IMO, what you're trying to prove here is that a struct with multiple
> members of the same type has the same size (and therefore alignment) as
> an array. That's something quite different than what you state in this
> phrase.
>

If the struct and array were the same size and alignment, would it not
then prove that the 2 members of the union can be used interchangeably?

>
> Couldn't it be that aligning the objects as required by an array is
> suboptimal, and in the array the compiler aligns them suboptimally
> because that is required by the standard, but follows the "alignment
> requirements" for optimal alignment of the given platform for the struct
> case (because it can, according to the standard, in the way I understand
> this)?

While this idea of suboptimal alignment may be possible in theory why
would any compiler choose to make arrays sub optimal? Arrays are the
bread and butter of anyone trying to write high performance cache
friendly code.

fmatthew5876

unread,

Jan 14, 2013, 12:34:02 AM1/14/13

to

{ Reformatted; please limit your lines to 70 characters. Quoted banner
removed -mod }

> But it notably does not state that these are the only reasons. It
> just gives two common ones. So your argument fails.

That's probably true, the argument requires a certain interpretation
of the standard.

> BTW one reason compilers add padding is for purposes of debugging by
> ensuring that distinct objects have space between them which allows
> checking for writing one beyond the end of an array (and single
> objects are arrays of one -- goes all the way back to C)

Do they actually ever do this for classes and structs though? Seems
like it could add a lot of overhead. I've only see this done on the
stack, to protect against buffer overflows. Even if debug padding was
added, the compiler would probably only add it to array members, not
single members of a type T.

fmatthew5876

unread,

Jan 14, 2013, 10:12:17 AM1/14/13

to

> Let T be "long double", implemented as a 10 byte IEEE float. It
> might have a "hard" alignment requirement of 2, but this type of
> access causes some slow down to fetch, so the compiler might impose
> an opportunistic 16 byte alignment. Because of the rules for arrays,
> m[1] will be aligned for a valid, but slow access, but y might be
> spaced with 6 byte of padding so that it can use the faster access.

In that case we would have sizeof(long double) == 16. I think they
would use internal padding to fulfill the alignment constraint. The
long double itself may only use the first/last n-bits of the storage
internally (endian), essentially being internally padded.

Again we have this notion of arrays as second class citizens. An array
whether allocated dynamically or created statically on the stack is
the fastest possible data structure to use when iterating over a
collection of elements 1 at a time.

Finally also consider the new alignof() operator. alignof(T) is one
value per type T. It doesn't change if T is allocated on the stack, as
a class member, dynamically allocated, or put into an array.

> Note also in your proof, you have invoked undefined behavior, as &y
> - &x is not defined, - for pointers only has defined behavior if the
> two operands are members of the same array. (Also note that by
> definition &m[1] - &m[0] == 1, since the difference of pointers work
> as the converse of pointer addition, and (&m[0])+1 = &m[1].

I believe if you cast them to character pointers you can do pointer
arithmetic on any pointers. You could also cast to intptr_t. That's
what my intention was.

> The standard imposes few limitations on what the implementation may
> chose for alignment rules. There is no explicit requirement that
> "Implementation alignment requirements" force object to be at their
> minimum alignment requirement. Also, the phrase "Implementation
> alignment requirements might cause two adjacent members not to be
> allocated immediately after each other;" is written in a permissive
> manner, not restrictive, and contains no solid constraints on the
> implementation (it contains no MUST or SHALL, or even a SHOULD), but
> to me seems more an explanatory comment warning the programmer that
> they should NOT make the assumption that elements are packed next to
> each other.

This is true, I admit the most shakey part of the argument is the
rather liberal interpretation of the standard with regards to
structure padding. Its true that it doesn't say you cannot add
padding for other reasons.

It would be nice if the standard offered more clarity on this subject.
Casting through unions is a popular and useful technique.

Gerhard Fiedler

unread,

Jan 15, 2013, 12:29:41 AM1/15/13

to

On 2013-01-13 20:23:37, in comp.lang.c++.moderated fmatthew5876 wrote:

>> AIUI, proving that the addresses are the same doesn't prove that we
>> can read and write interchangeably. Couldn't it be that the
>> compiler assumes (based on the standard) that this is not possible
>> and uses some optimizations that only work if the assumption is
>> correct?
>>
> But would that not violate the fact that if two pointers of the same
> type point to the same object then they alias the same address?

I don't understand what would violate that fact. AIUI, a fact is that
the C++ standard doesn't guarantee any specific behavior for reading a
union member other than the one that was last written to. And AIUI the
compiler is free to perform optimizations that are based on the
assumption that you don't read a union member other than the last
written to. No?

>> IMO, what you're trying to prove here is that a struct with
>> multiple members of the same type has the same size (and therefore
>> alignment) as an array. That's something quite different than what
>> you state in this phrase.
>>
> If the struct and array were the same size and alignment, would it
> not then prove that the 2 members of the union can be used
> interchangeably?

Not AIUI -- see the argument about possible optimizations that the
compiler may use. I don't have any example at hand, but compiler
optimizations are so many that probably nobody knows them all for all
common compilers, and IMO it's best not to assume that the compiler
will behave in a given way unless that's defined by the standard or
the compiler docs.

>> Couldn't it be that aligning the objects as required by an array is
>> suboptimal, and in the array the compiler aligns them suboptimally
>> because that is required by the standard, but follows the
>> "alignment requirements" for optimal alignment of the given
>> platform for the struct case (because it can, according to the
>> standard, in the way I understand this)?
>
> While this idea of suboptimal alignment may be possible in theory
> why would any compiler choose to make arrays sub optimal? Arrays are
> the bread and butter of anyone trying to write high performance
> cache friendly code.

Because with arrays, the compiler doesn't really have a choice. Assume
T to be char[7]. It has a size of 7. An array of these things will
have them placed at intervals of 7. If best alignment requires an
alignment of, say, 8, this can be used with a struct, but not with an
array. Am I missing something?

(FWIW, it seems that if you are writing high performance code on such
a platform, you better avoid arrays of such types. Even if you only
need 7 bytes in your T, it may be advantageous to make it 8 bytes
wide.)

Gerhard

fmatthew5876

unread,

Jan 15, 2013, 3:04:43 PM1/15/13

to

{ Reformatted; please limit your lines to 70 characters. -mod }

> I don't understand what would violate that fact. AIUI, a fact is
> that the C++ standard doesn't guarantee any specific behavior for
> reading a union member other than the one that was last written
> to. And AIUI the

It doesn't guarantee any behavior sure, but does it guarantee
undefined behavior even if the addresses match?

> compiler is free to perform optimizations that are based on the
> assumption that you don't read a union member other than the last
> written to. No?

If reading and writing 2 union members invokes undefined behavior even
if they actually alias the same memory of the same type, then you
could just do this.

vec2<int> v;

//This is legal, take the address of 2 members of the union
float* a = &v.m[1];
float* b = &v.y;

//a and b point to the same memory as argued above and have same type
assert(a == b);
//Therefore they alias the same object
assert(*a == *b)

Assuming my argument about the addresses of m[1] and y being the same
is correct, this should work as intended. We now have 2 pointers of
type float which point to the same address. There would be something
seriously deficient if a == b did not imply *a == *b where a and be
are the same pointer type.

It looks like if you wanted to be really pedantic and avoid undefined
behavior you could just do this every time you want to read and write
a union member.

*(&m[1]) = something;
*(&y) = something;

*(&y) == *(&m[1]);

Why does that work? Because as soon as you take the address of the
member with & you are no longer working with a member of a union but
instead a pointer to a T. The first two lines are simply dereferencing
and assigning to a pointer and the last one is dereferencing and
comparing two pointers. No undefined behavior is invoked.

Of course all of this is rather silly. I think what we really need is
a little clarification in the standard on unions and what happens if
you try to read and write different union members.

I really like being able to alias all of the members of a vector like
the vec4 in the original post and I see no reason why it should be a
bad thing.

If you still don't believe me how about this:

void foo(float* a, float* b) {
assert(*a == *b);
}

float *a = &v.m[1];
float *b = &v.y;

foo(a, b);

Does the foo function now automatically invoke undefined behavior
because the pointers passed to it happened to come from union members?
That's impossible because foo has no idea where its pointers come
from.

> Because with arrays, the compiler doesn't really have a
> choice. Assume T to be char[7]. It has a size of 7. An array of
> these things will have them placed at intervals of 7. If best
> alignment requires an alignment of, say, 8, this can be used with a
> struct, but not with an array. Am I missing something?

The alignment requirement of char is 1, so the alignment of char[7]
will also be 1. Therefore the alignment of char[2][7] is also 1. There
is no reason the compiler would need to align this object on an 8 byte
boundary because its nothing more than an array of bytes. char[2][7]
is really just a char[14] with build in pointer math to compute the
indexing. The fact that it is contained within a struct doesn't make
any difference.

The following code outputs "align 1 size 14" on gcc 4.7, linux,
x86_64. g++ -std=c++11 test.cpp

#include <cstdio>

template <typename T>
struct vec2 {
union {
T m[2];
struct {
T x;
T y;
};
};

};

int main(int argc, char** argv) {
vec2<char[7]> v;

printf("align %lu size %lu\n", alignof(v), sizeof(v));
return 0;

}

> (FWIW, it seems that if you are writing high performance code on
> such a platform, you better avoid arrays of such types. Even if you
> only need 7 bytes in your T, it may be advantageous to make it 8
> bytes wide.)

If you are writing high performance code on any platform you should be
using arrays (static or dynamic (aka std::vector)). Anything else will
be destroying your cache. Why on earth would structs be laid out more
optimally? Are we expecting people to do this if they want to process
10000 T's?

template <typename T>
stuct {
T t0;
T t1;
...
T t9999;
};

Frank Birbacher

unread,

Jan 15, 2013, 3:13:14 PM1/15/13

to

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

Am 15.01.13 06:29, schrieb Gerhard Fiedler:

> Because with arrays, the compiler doesn't really have a choice.
> Assume T to be char[7]. It has a size of 7. An array of these
> things will have them placed at intervals of 7.

Well, sizeof(char[7]) is in fact 7, but the alignment of an array is
the same as the alignment of its members. So char[7] has the alignment
of char, which is 1. So for your argument of suboptimal array
alignment the char[7] is a bad example.

A struct { int i; char c; } might do. But then on my compiler sizeof
this struct is 8 which again matches its alignment. So still optimal
array alignment. And sizeof(long double) == 16. So still no example
here for anyone who expected 12.

I don't know of a compiler that does suboptimal array alignment.

Frank
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.17 (Darwin)
Comment: GPGTools - http://gpgtools.org
Comment: keyserver x-hkp://pool.sks-keyservers.net

iEYEARECAAYFAlD1q/0ACgkQhAOUmAZhnmq7GQCdH/PEte2o+/DEGqvkvQzQquje
3dEAn3AWoGRCXILQaPB/p7MDcfeR/VQP
=gEUI
-----END PGP SIGNATURE-----

Richard Damon

unread,

Jan 16, 2013, 12:24:31 AM1/16/13

to

On 1/14/13 10:12 AM, fmatthew5876 wrote:
>> Let T be "long double", implemented as a 10 byte IEEE float. It
>> might have a "hard" alignment requirement of 2, but this type of
>> access causes some slow down to fetch, so the compiler might impose
>> an opportunistic 16 byte alignment. Because of the rules for
>> arrays, m[1] will be aligned for a valid, but slow access, but y
>> might be spaced with 6 byte of padding so that it can use the
>> faster access.
>
> In that case we would have sizeof(long double) == 16. I think they
> would use internal padding to fulfill the alignment constraint. The
> long double itself may only use the first/last n-bits of the storage
> internally (endian), essentially being internally padded.

No, the proposed implementation only reserves 10 bytes for the long
double, if the long double is followed by other smaller items, they
might get packed into the the same 16 byte chunk. If sizeof(long
double) in this case was made 16, then 6 bytes would be wasted for
every long double allocated since the padding bytes could NOT be
shared with other variables.

> Again we have this notion of arrays as second class citizens. An
> array whether allocated dynamically or created statically on the
> stack is the fastest possible data structure to use when iterating
> over a collection of elements 1 at a time.

Where is there any promise that an array is the "fastest possible data
structure"? In the case I am indicating, there is a natural trade off,
the 10 byte value can either be allocated so as to be "fast" or
"small", it may make sense that for single values, fast is good,
especially if the wasted space can often be reclaimed by reordering
where variables are stored (which is allowed if not in a struct), but
if creating an array, then the size consideration becomes more
important. The wasted space might not be that much in the single
variable case, as the alignment rule might well be that speed is lost
only if the 10 byte number crosses as 16 byte boundary, giving even
more options for not wasting the space.

> Finally also consider the new alignof() operator. alignof(T) is one
> value per type T. It doesn't change if T is allocated on the stack,
> as a class member, dynamically allocated, or put into an array.

alignof() does not promise that such an alignment is "optimal", only
that it is allowed. In the defined case, alignof(long double) is 2,
this does not preclude the compiler of adding more padding to optimize
placement.

For some machines, reading a multi-byte value from an improperly
aligned address causes a malfunction (trap, or wrong value read),
implementations for these machines must have a value of alignof() big
enough to make it work. Other machines may support the non-aligned
read, but at some cost, on these the implementation has a choice of
what to do with alignof(), it could be 1, indicating that the
unaligned access is permissible, or it could make it higher, ignoring
that the hardware could support the unaligned access, but if it does,
then the compiler must use the higher value itself when it is
allocating objects. Note that if the implementation chose to make the
value of alignof() 1 in this case, there is no requirement in the
standard that would prohibit the addition of padding bytes between
objects to make the accesses of the multi-byte object more efficient.

>> Note also in your proof, you have invoked undefined behavior, as &y
>> - &x is not defined, - for pointers only has defined behavior if
>> the two operands are members of the same array. (Also note that by
>> definition &m[1] - &m[0] == 1, since the difference of pointers
>> work as the converse of pointer addition, and (&m[0])+1 = &m[1].
>
> I believe if you cast them to character pointers you can do pointer
> arithmetic on any pointers. You could also cast to intptr_t. That's
> what my intention was.

Casting the pointers will give the difference in the array to be
sizeof(T), that is true. If x and y were unrelated variables, it does
not make their difference defined, I would have to look more closely
at the standard to see if the fact that x and y are members of the
same struct make taking the difference of them cast to char defined
(since the struct can be viewed as an array of char).

Note that casting to intptr_t does NOT make your result hold. About
all the standard promises for intptr_t is that you can (if the type
exists) convert a pointer to intptr_t and then convert that exact same
value back to a pointer that will then be equivalent. There is no
promise that math on the intptr_t relates in any way to math on the
pointer. This particularly won't hold for some cases on segmented
architectures.

>> The standard imposes few limitations on what the implementation may
>> chose for alignment rules. There is no explicit requirement that
>> "Implementation alignment requirements" force object to be at their
>> minimum alignment requirement. Also, the phrase "Implementation
>> alignment requirements might cause two adjacent members not to be
>> allocated immediately after each other;" is written in a permissive
>> manner, not restrictive, and contains no solid constraints on the
>> implementation (it contains no MUST or SHALL, or even a SHOULD),
>> but to me seems more an explanatory comment warning the programmer
>> that they should NOT make the assumption that elements are packed
>> next to each other.
>
> This is true, I admit the most shakey part of the argument is the
> rather liberal interpretation of the standard with regards to
> structure padding. Its true that it doesn't say you cannot add
> padding for other reasons.
>
> It would be nice if the standard offered more clarity on this
> subject. Casting through unions is a popular and useful technique.

The key is that the standard only promises what it is willing to force
all implementations to do. The standard intentionally gives
implementations room to do things to generate better code for their
platform. In general, this form of "type punning" is going to be
implementation dependent anyway, so having the behavior by the
language standard being "undefined" isn't that bad, if you have other
documentation/standards defining it.

The important part is that your program works, and knowing where/when
it will work. I suspect that the percentage of programs that are
"strictly conforming", relying on no implementation defined behavior
is vanishingly small. Implementation defined behavior isn't bad, but
depending on it without knowing it can be, as you can be caught
unawares if that behavior changes without you knowing it (due to
upgrades, or porting to new systems, etc).

Gerhard Fiedler

unread,

Jan 16, 2013, 9:10:38 AM1/16/13

to

On 2013-01-15 14:13:14, in comp.lang.c++.moderated Frank Birbacher
wrote:

> Am 15.01.13 06:29, schrieb Gerhard Fiedler:
>> Because with arrays, the compiler doesn't really have a choice.
>> Assume T to be char[7]. It has a size of 7. An array of these
>> things will have them placed at intervals of 7.
>
> Well, sizeof(char[7]) is in fact 7, but the alignment of an array is
> the same as the alignment of its members. So char[7] has the
> alignment of char, which is 1. So for your argument of suboptimal
> array alignment the char[7] is a bad example.

I'm not sure I follow you. I postulated a platform where optimal
alignment is 8. How would an array of char[7] members be aligned
optimally?

Gerhard

Gerhard Fiedler

unread,

Jan 16, 2013, 9:17:08 AM1/16/13

to

On 2013-01-15 14:04:43, in comp.lang.c++.moderated fmatthew5876 wrote:

>> I don't understand what would violate that fact. AIUI, a fact is
>> that the C++ standard doesn't guarantee any specific behavior for
>> reading a union member other than the one that was last written
>> to. And AIUI the
>
> It doesn't guarantee any behavior sure, but does it guarantee
> undefined behavior even if the addresses match?

No. The standard doesn't define what happens with such an access -- it
just doesn't require it to work. Any given compiler is free to allow
such a pattern.

>> compiler is free to perform optimizations that are based on the
>> assumption that you don't read a union member other than the last
>> written to. No?
>
> If reading and writing 2 union members invokes undefined behavior
> even if they actually alias the same memory of the same type, then
> you could just do this.
>
> vec2<int> v;
>
> //This is legal, take the address of 2 members of the union
> float* a = &v.m[1];
> float* b = &v.y;

Assigning the address of an int to a float*? What are you doing here?

> //a and b point to the same memory as argued above and have same type
> assert(a == b);

This statement compares uninitialized memory.

> //Therefore they alias the same object
> assert(*a == *b)
>
> Assuming my argument about the addresses of m[1] and y being the
> same is correct, this should work as intended. We now have 2
> pointers of type float which point to the same address. There would
> be something seriously deficient if a == b did not imply *a == *b
> where a and be are the same pointer type.

The problem here is that the standard says that only the member that
has last been written may be read. So first you need to write one,
then you can read it -- it, not the other(s).

Again, a compiler is free to allow this, but the standard doesn't seem
to require that it works.

Could you state your goal here? Precisely, unambiguously? Proof that
the standard requires that this works, even though it has some
seemingly clear language that it doesn't have to work?

> It looks like if you wanted to be really pedantic and avoid
> undefined behavior you could just do this every time you want to
> read and write a union member.
>
> *(&m[1]) = something;
> *(&y) = something;
>
> *(&y) == *(&m[1]);

The last line here seems to violate the standard. The last member to
be written to is the struct (for which you forgot to define a name),
but you're reading the array m (which specifically is not required to
work).

> Why does that work? Because as soon as you take the address of the
> member with & you are no longer working with a member of a union but
> instead a pointer to a T.

Which points to one or the other member of the union. AIUI, a pointer
to y accesses (reads or writes) y, and a pointer to m[1] accesses
m[1].

> I really like being able to alias all of the members of a vector
> like the vec4 in the original post and I see no reason why it should
> be a bad thing.

Who said it's a bad thing? I've done it before, and it seems that most
compilers/platforms tolerate such access patters. But it also seems
that the standard doesn't require that they do.

>> Because with arrays, the compiler doesn't really have a choice.
>> Assume T to be char[7]. It has a size of 7. An array of these
>> things will have them placed at intervals of 7. If best alignment
>> requires an alignment of, say, 8, this can be used with a struct,
>> but not with an array. Am I missing something?
>
> The alignment requirement of char is 1, so the alignment of char[7]
> will also be 1. Therefore the alignment of char[2][7] is also 1.
> There is no reason the compiler would need to align this object on
> an 8 byte boundary because its nothing more than an array of bytes.
> char[2][7] is really just a char[14] with build in pointer math to
> compute the indexing. The fact that it is contained within a struct
> doesn't make any difference.

Are we talking about the same thing here? Different platforms and
hardware designs have different access patters, and there are designs
where alignment on 8-byte boundaries results in better alignment than
on 7-byte boundaries.

As a simplified example, think of it like this... Imagine 8-byte wide
registers and memory data busses. Data aligned on 8-byte boundaries
can be moved (all imaginary and simplified) in a single operation from
memory into a register (where it can efficiently be processed). Data
aligned on 7-byte boundaries needs generally two memory accesses and a
shift to get it into a register.

So a compiler for this (imaginary) platform could very well decide to
pad a char[7] with 1 byte so that stuff is aligned on 8-byte
boundaries.

> The following code outputs "align 1 size 14" on gcc 4.7, linux,
> x86_64. g++ -std=c++11 test.cpp

Yup, no surprise here.

>> (FWIW, it seems that if you are writing high performance code on
>> such a platform, you better avoid arrays of such types. Even if you
>> only need 7 bytes in your T, it may be advantageous to make it 8
>> bytes wide.)
>
> If you are writing high performance code on any platform you should
> be using arrays (static or dynamic (aka std::vector)). Anything else
> will be destroying your cache. Why on earth would structs be laid
> out more optimally? Are we expecting people to do this if they want
> to process 10000 T's?
>
> template <typename T>
> stuct {
> T t0;
> T t1;
> ...
> T t9999;
> };

Maybe re-read what I wrote. I never said not to use arrays or to
create such abominal structs (which couldn't replace the arrays anyway
-- no indexing possible). I just said that for a platform where
alignment on 8-byte boundaries brings performance advantages, you may
want to avoid using arrays with types T where sizeof(T)==7 (assuming a
char is a byte) and think of increasing the size of the type so that
sizeof(T)==8.

Gerhard

Francis Glassborow

unread,

Jan 17, 2013, 12:45:23 PM1/17/13

to

On 16/01/2013 14:17, Gerhard Fiedler wrote:
> On 2013-01-15 14:04:43, in comp.lang.c++.moderated fmatthew5876 wrote:
>

>>> I don't understand what would violate that fact. AIUI, a fact is
>>> that the C++ standard doesn't guarantee any specific behavior for
>>> reading a union member other than the one that was last written
>>> to. And AIUI the
>>
>> It doesn't guarantee any behavior sure, but does it guarantee
>> undefined behavior even if the addresses match?

As undefined behaviour means that anything can happen (the behaviour
has not been defined) obviously the Standard guarantees undefined
behaviour whenever it chooses not to define the behaviour :)

Francis