Standard layout, layout compatibility, and reinterpret_cast

358 views
Skip to first unread message

Shea Levy

unread,
Feb 22, 2014, 9:50:04 PM2/22/14
to std-dis...@isocpp.org
Hi all,

As part of a little project I'm working on, I'm writing some modern
C++-style wrappers around traditional UNIX/C-style APIs. Of particular
relevance to my question, I have a standard-layout filedes class with a
single int member and a destructor that closes the file (along with
member-free subclasses like kqueue or named_file that provide
constructors specific to their type and type-specific member functions
like kevent), and a struct kevent wrapper that uses scoped enums and
bitmask types built on top of them to represent fields like filter,
flags, etc. (for those unfamiliar with kqueue, see the section on struct
kevent at http://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2).

I could of course write member functions that construct the proper
C data structures when I need to call an underlying C API, but I
purposefully defined them with the intent that their underlying memory
layout would be *equivalent* to the relevant C types, and that I could
just cast back and forth at the interface boundaries. What I have [1] is
working and seems to me like a reasonable interpretation of the *spirit*
of concepts like standard layout and layout compatible, but there are a
few assumptions I'm making that I can't seem to find justification for
in the standard (I am going based off of n3690).

In particular, I assume that:

* A standard-layout class with a single non-static member (and no base
classes with any non-static members) is layout-compatible with the
type of that member. For example, I assume that my filedes class is
layout-compatible with int.
* An enumeration type is layout-compatible with its underlying type (see
my SO question at [2]) I was able to find language that ensures two
enumerations with the same underlying type are layout-compatible with
each other, but nothing about the relationship between an enumeration
type and its underlying type.
* If T1 and T2 are standard-layout and layout-compatible with each
other, then you can safely reinterpret_cast back and forth between
them in all contexts. For example, my event class is (assuming my
previous assumptions are correct) layout-compatible with struct
kevent, so I can reinterpret_cast an array of events to an array of
struct kevent when calling the kevent function and reinterpret_cast
the results back when I'm done. If I only have a single member and I
only have a single element I want to point to (i.e. I'm dealing with a
pointer, not an array), this is covered by the "a pointer to a
standard-layout type is a pointer to its first element" rule, but once
we get to arrays or structs with more than one member that doesn't
suffice. I was able to find that T1 and T2 here have the same value
representation and the same alignment requirements, but nothing that
allows me to cast types with the same value representation and same
alignment requirements to each other.

I hope you'll agree that these assumptions are reasonable and intended
by the spirit of the standard, but I would like to verify a) that they
indeed are reasonable and b) whether or not they can actually be
justified based on the text of the standard alone. It would also be nice
to know if the approach is a good idea at all, but this project is
mostly just for fun so that's a secondary concern here :).

Cheers,
Shea Levy

P.S. This is mostly unrelated, but it would be nice to have an
is_layout_compatible type trait so I could have some static_asserts on
some of my assumptions here.


[1]: https://bitbucket.org/shlevy/linkpaper/src/b5d1f3fe21f5/src , see
in particular filedes.hh and event.hh. Sorry if the coding style is
horrid, part of the goal of this project is to just try some things
and see how they work
[2]: http://stackoverflow.com/questions/21956017/are-enumeration-types-layout-compatible-with-their-underlying-type

Jens Maurer

unread,
Feb 23, 2014, 9:53:59 AM2/23/14
to std-dis...@isocpp.org

(I have not looked at your source code.)

I'd like to point out that "layout-compatible" is only used in the
definition of a "common initial sequence" for union members,
as far as I know, nowhere else. Are you looking for that?

On 02/23/2014 03:50 AM, Shea Levy wrote:
> In particular, I assume that:
>
> * A standard-layout class with a single non-static member (and no base
> classes with any non-static members) is layout-compatible with the
> type of that member. For example, I assume that my filedes class is
> layout-compatible with int.

Not layout-compatible, but maybe you meant this:

9.2p19 "If a standard-layout class object has any non-static data members,
its address is the same as the address of its first non-static data member.
..."

> * An enumeration type is layout-compatible with its underlying type (see
> my SO question at [2]) I was able to find language that ensures two
> enumerations with the same underlying type are layout-compatible with
> each other, but nothing about the relationship between an enumeration
> type and its underlying type.

Note that there is no rule in 3.10p10 that would allow aliasing
between an enumeration and its underlying type, so you're in
dangerous territory here.

> * If T1 and T2 are standard-layout and layout-compatible with each
> other, then you can safely reinterpret_cast back and forth between
> them in all contexts.

Again, that's dangerous to assume, since you're actively subverting
alias analysis of your compiler; see 3.10p10. (If you wrap the two
classes in a union, you're possibly safe according to 9.2p18.)

> For example, my event class is (assuming my
> previous assumptions are correct) layout-compatible with struct
> kevent, so I can reinterpret_cast an array of events to an array of
> struct kevent when calling the kevent function and reinterpret_cast
> the results back when I'm done.

Not a good assumption to make.

> If I only have a single member and I
> only have a single element I want to point to (i.e. I'm dealing with a
> pointer, not an array), this is covered by the "a pointer to a
> standard-layout type is a pointer to its first element" rule, but once
> we get to arrays or structs with more than one member that doesn't
> suffice. I was able to find that T1 and T2 here have the same value
> representation and the same alignment requirements, but nothing that
> allows me to cast types with the same value representation and same
> alignment requirements to each other.

Right, and you shouldn't.

> I hope you'll agree that these assumptions are reasonable and intended
> by the spirit of the standard, but I would like to verify a) that they
> indeed are reasonable

They're not. See 3.10p10 for the leeway C++ gives to optimizers in
terms of alias analysis. Different class types, even if layout-compatible,
are different from a compiler's point of view, and the compiler may assume
that modifying an object of type X doesn't actually modify an object of
type Y. Your reinterpret_cast activity positively subverts that, and
you'll get what you deserve:

"If a program attempts to access the stored value of an object through a
glvalue of other than one of the following types the behavior is undefined:"

> and b) whether or not they can actually be
> justified based on the text of the standard alone.

They cannot.

> It would also be nice
> to know if the approach is a good idea at all,

It is not.

Jens

Dinka Ranns

unread,
Feb 23, 2014, 12:09:54 PM2/23/14
to std-dis...@isocpp.org
On 23 February 2014 14:53, Jens Maurer <Jens....@gmx.net> wrote:

(I have not looked at your source code.)

I'd like to point out that "layout-compatible" is only used in the
definition of a "common initial sequence" for union members,
as far as I know, nowhere else.  Are you looking for that?

I was trying to find out what clasifies as "common initial sequence", but I can't find the relevant wording (the standard points to 9.2. but I don't see anything there). What exactly constitutes as a " common initial sequence" ? ( Am i correct in thinking that the term is "common initial sequence", rather than an "initial sequence" which is common between two types?)

Also, in 9.5/p1 the standard says "... it is permitted to inspect the common initial sequence ..."
What does it mean "to inspect the common initial sequence" ?
 

Thanks,
D.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.

Christof Meerwald

unread,
Feb 23, 2014, 12:27:48 PM2/23/14
to std-dis...@isocpp.org
On Sun, Feb 23, 2014 at 05:09:54PM +0000, Dinka Ranns wrote:
> I was trying to find out what clasifies as "common initial sequence", but I
> can't find the relevant wording (the standard points to 9.2. but I don't
> see anything there).

9.2/p18 says "[...] Two standard-layout structs share a common initial
sequence if corresponding members have layout-compatible types and
either neither member is a bit-field or both are bit-fields with the
same width for a sequence of one or more initial members."

[...]
> Also, in 9.5/p1 the standard says "... it is permitted to inspect the
> common initial sequence ..."
> What does it mean "to inspect the common initial sequence" ?

I'd say access (lvalue-to-rvalue conversion) members that are part of
the common initial sequence.


Christof

--

http://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org

Dinka Ranns

unread,
Feb 23, 2014, 12:40:43 PM2/23/14
to std-dis...@isocpp.org
On 23 February 2014 17:27, Christof Meerwald <cme...@cmeerw.org> wrote:
On Sun, Feb 23, 2014 at 05:09:54PM +0000, Dinka Ranns wrote:
> I was trying to find out what clasifies as "common initial sequence", but I
> can't find the relevant wording (the standard points to 9.2. but I don't
> see anything there).

9.2/p18 says "[...] Two standard-layout structs share a common initial
sequence if corresponding members have layout-compatible types and
either neither member is a bit-field or both are bit-fields with the
same width for a sequence of one or more initial members."

[...]

I did see that, but that does not explain what a common initial sequence is, it only says when two standard-layout structs share one.
 

> Also, in 9.5/p1 the standard says "... it is permitted to inspect the
> common initial sequence ..."
> What does it mean "to inspect the common initial sequence" ?

I'd say access (lvalue-to-rvalue conversion) members that are part of
the common initial sequence. 


Christof

--

http://cmeerw.org                              sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

Jens Maurer

unread,
Feb 23, 2014, 4:36:27 PM2/23/14
to std-dis...@isocpp.org
On 02/23/2014 06:40 PM, Dinka Ranns wrote:
>
> On 23 February 2014 17:27, Christof Meerwald <cme...@cmeerw.org <mailto:cme...@cmeerw.org>> wrote:
>
> On Sun, Feb 23, 2014 at 05:09:54PM +0000, Dinka Ranns wrote:
> > I was trying to find out what clasifies as "common initial sequence", but I
> > can't find the relevant wording (the standard points to 9.2. but I don't
> > see anything there).
>
> 9.2/p18 says "[...] Two standard-layout structs share a common initial
> sequence if corresponding members have layout-compatible types and
> either neither member is a bit-field or both are bit-fields with the
> same width for a sequence of one or more initial members."
>
> [...]
>
>
> I did see that, but that does not explain what a common initial sequence is, it only says when two standard-layout structs share one.

Do you need more than that?

Where else is "common initial sequence" used?

Jens


Dinka Ranns

unread,
Feb 23, 2014, 5:23:13 PM2/23/14
to std-dis...@isocpp.org
in  9.5/p1 the standard says "... it is permitted to inspect the common initial sequence of any of
standard-layout struct members;"  Even if I was to go with Christof's explanation that "to inspect" means to access members of the common initial sequence, i still don't know which members make up the common initial sequence, and , subsequently, what is that note trying to tell me...



Jens

Dean Michael Berris

unread,
Feb 23, 2014, 5:54:05 PM2/23/14
to std-dis...@isocpp.org
So I think this explains what the note is saying:

struct A {
char c;
int x;
};

struct B {
char d;
char y[sizeof(int)];
};

union foo {
A a;
B b;
};

In both cases, foo.a.c and foo.b.d would be the common initial
sequence. They only apply to unions.

Jens Maurer

unread,
Feb 23, 2014, 5:57:44 PM2/23/14
to std-dis...@isocpp.org
On 02/23/2014 11:23 PM, Dinka Ranns wrote:
>
>
>
> On 23 February 2014 21:36, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:
>
> On 02/23/2014 06:40 PM, Dinka Ranns wrote:
> >
> > On 23 February 2014 17:27, Christof Meerwald <cme...@cmeerw.org <mailto:cme...@cmeerw.org> <mailto:cme...@cmeerw.org <mailto:cme...@cmeerw.org>>> wrote:
> >
> > On Sun, Feb 23, 2014 at 05:09:54PM +0000, Dinka Ranns wrote:
> > > I was trying to find out what clasifies as "common initial sequence", but I
> > > can't find the relevant wording (the standard points to 9.2. but I don't
> > > see anything there).
> >
> > 9.2/p18 says "[...] Two standard-layout structs share a common initial
> > sequence if corresponding members have layout-compatible types and
> > either neither member is a bit-field or both are bit-fields with the
> > same width for a sequence of one or more initial members."
> >
> > [...]
> >
> >
> > I did see that, but that does not explain what a common initial sequence is, it only says when two standard-layout structs share one.
>
> Do you need more than that?
>
> Where else is "common initial sequence" used?
>
>
> in 9.5/p1 the standard says "... it is permitted to inspect the common initial sequence of any of
> standard-layout struct members;" Even if I was to go with Christof's explanation that "to inspect" means to access members of the common initial sequence, i still don't know which members make up the common initial sequence, and , subsequently, what is that note trying to tell me...


Does it help to replace "share a common initial sequence" with "have a common initial sequence"
in the definition in 9.2p18?

Jens

Richard Smith

unread,
Feb 23, 2014, 7:58:31 PM2/23/14
to std-dis...@isocpp.org
The second sentence of 9.2p18 defines what it means for structs to "share a common initial sequence", and that's the phrase that 9.5p1 uses, so I don't think we should change either of them. We should probably italicize the definition in 9.2p18, though. (While we're at it, there seems to be some confusion over what "to inspect" means here; it would probably help to replace "inspect" with "read".)

Dinka Ranns

unread,
Feb 24, 2014, 5:32:42 AM2/24/14
to std-dis...@isocpp.org

wait... before you start changing things around, I really am asking for a clarification :)

Dean's examples helps, I think. So, my  remaining questions are

1) am i correct in thinking that the "initial sequence" is a sequence of non-static members of a struct, starting from the first one declared ?

2) am i correct in thinking that the "common initial sequence" for two standard-layout structs is the largest initial sequence of those two structs for which coresponding members of the sequence have layout compatible types, or if they are bit fields they are of the same width ?

3) am i correct in thinking that the whole point of defining the common initial sequence is to say that you can alias a member in the common initial sequence of the active member   through a member of the common initial sequence of a non-active member  ?


Again, not trying to propose new wording here, just trying to phrase it in human speak. :)



On 24 February 2014 00:58, Richard Smith <ric...@metafoo.co.uk> wrote:
On Sun, Feb 23, 2014 at 2:57 PM, Jens Maurer <Jens....@gmx.net> wrote:
On 02/23/2014 11:23 PM, Dinka Ranns wrote:
>
>
>
> On 23 February 2014 21:36, Jens Maurer <Jens....@gmx.net <mailto:Jens....@gmx.net>> wrote:
>
>     On 02/23/2014 06:40 PM, Dinka Ranns wrote:
>     >
>     > On 23 February 2014 17:27, Christof Meerwald <cme...@cmeerw.org <mailto:cme...@cmeerw.org> <mailto:cme...@cmeerw.org <mailto:cme...@cmeerw.org>>> wrote:
>     >
>     >     On Sun, Feb 23, 2014 at 05:09:54PM +0000, Dinka Ranns wrote:
>     >     > I was trying to find out what clasifies as "common initial sequence", but I
>     >     > can't find the relevant wording (the standard points to 9.2. but I don't
>     >     > see anything there).
>     >
>     >     9.2/p18 says "[...] Two standard-layout structs share a common initial
>     >     sequence if corresponding members have layout-compatible types and
>     >     either neither member is a bit-field or both are bit-fields with the
>     >     same width for a sequence of one or more initial members."
>     >
>     >     [...]
>     >
>     >
>     > I did see that, but that does not explain what a common initial sequence is, it only says when two standard-layout structs share one.
>
>     Do you need more than that?
>
>     Where else is "common initial sequence" used?
>
>
> in  9.5/p1 the standard says "... it is permitted to inspect the common initial sequence of any of
> standard-layout struct members;"  Even if I was to go with Christof's explanation that "to inspect" means to access members of the common initial sequence, i still don't know which members make up the common initial sequence, and , subsequently, what is that note trying to tell me...


Does it help to replace "share a common initial sequence" with "have a common initial sequence"
in the definition in 9.2p18?

not really :) The reason i choked on that sentence is because, to someone unfamiliar with the terminology, it reads pretty much like "two wheels that share an axis have a car in common" - which to me is not a definition of an axis or a car. It would help if it phrasing was more on the line of "common initial sequence *is a*..."

Also,taking that sentence as (corresponding members have layout-compatible types ) and ( (either neither member is a bit-field) or (both are bit-fields with the same width)) - when are two bit-fields layout compatible ? Are they ever ? Is that equivalent to " if corresponding members have layout-compatible types, or if they are bit-fields and they are of the same width".


The second sentence of 9.2p18 defines what it means for structs to "share a common initial sequence", and that's the phrase that 9.5p1 uses, so I don't think we should change either of them. We should probably italicize the definition in 9.2p18, though. (While we're at it, there seems to be some confusion over what "to inspect" means here; it would probably help to replace "inspect" with "read".)

If I was to write a wish list to Santa about this , I would ask for
* italics somewhere so I am clear it is a definition (if i can't have *common initial sequence is a*)
* clarification of what "to inspect" means. Or replacing it with something like "access through" or "read through" or whatever is the closest to a phrase i can find in other places 
* some cosmetic surgery regarding the bit-field part of the definition

disclaimer: this is only a personal wish list :)

D.

Jens Maurer

unread,
Feb 24, 2014, 12:48:00 PM2/24/14
to std-dis...@isocpp.org
On 02/24/2014 11:32 AM, Dinka Ranns wrote:
>
> wait... before you start changing things around, I really am asking for a clarification :)
>
> Dean's examples helps, I think. So, my remaining questions are
>
> 1) am i correct in thinking that the "initial sequence" is a sequence of non-static members of a struct, starting from the first one declared ?
>
> 2) am i correct in thinking that the "common initial sequence" for two standard-layout structs is the largest initial sequence of those two structs for which coresponding members of the sequence have layout compatible types, or if they are bit fields they are of the same width ?
>
> 3) am i correct in thinking that the whole point of defining the common initial sequence is to say that you can alias a member in the common initial sequence of the active member through a member of the common initial sequence of a non-active member ?

Yes to all three.

Jens

Richard Smith

unread,
Feb 24, 2014, 4:54:01 PM2/24/14
to std-dis...@isocpp.org
On Mon, Feb 24, 2014 at 2:32 AM, Dinka Ranns <dinka...@googlemail.com> wrote:

wait... before you start changing things around, I really am asking for a clarification :)

Dean's examples helps, I think. So, my  remaining questions are

1) am i correct in thinking that the "initial sequence" is a sequence of non-static members of a struct, starting from the first one declared ?

This is what I take "a sequence of one or more initial members" to mean. But I think the wording is not exactly right: we should also require unnamed bit-fields to match, and those aren't members -- I believe the intent is that we inspect the member-declarators that declare either non-static data members or unnamed bit-fields, in declaration order.

2) am i correct in thinking that the "common initial sequence" for two standard-layout structs is the largest initial sequence of those two structs for which coresponding members of the sequence have layout compatible types, or if they are bit fields they are of the same width ?

Not quite (see below).

3) am i correct in thinking that the whole point of defining the common initial sequence is to say that you can alias a member in the common initial sequence of the active member   through a member of the common initial sequence of a non-active member  ?

You can read such a member but not write it (so yes or no, depending on what you mean by "alias").
It's not equivalent to that. An "int : 3" bitfield and a "short : 3" bitfield don't match for the purposes of this definition. (Under some ABIs, the underlying type affects the layout.) The above rule says:

* If the corresponding members are both bit-fields, they match if they have layout-compatible types and they have the same width.
* If neither is a bit-field, they match if they have layout-compatible types.
* If only one of them is a bit-field, they do not match.

The second sentence of 9.2p18 defines what it means for structs to "share a common initial sequence", and that's the phrase that 9.5p1 uses, so I don't think we should change either of them. We should probably italicize the definition in 9.2p18, though. (While we're at it, there seems to be some confusion over what "to inspect" means here; it would probably help to replace "inspect" with "read".)

If I was to write a wish list to Santa about this , I would ask for
* italics somewhere so I am clear it is a definition (if i can't have *common initial sequence is a*)
* clarification of what "to inspect" means. Or replacing it with something like "access through" or "read through" or whatever is the closest to a phrase i can find in other places 
* some cosmetic surgery regarding the bit-field part of the definition

disclaimer: this is only a personal wish list :)

I think there's enough wrong here to be worthy of a core issue. (Most of this is editorial, but at least the unnamed bit-field part appears to not be, and maybe also "to inspect" versus "to read".)

Shea Levy

unread,
Feb 24, 2014, 7:41:37 PM2/24/14
to std-dis...@isocpp.org
Hi Jens,

Thanks for the detailed explanations! I now feel I have a better understanding of what layout-compatible is for (and what it's not for) and see why my assumptions could break aliasing rules. I have a few more questions, which I feel can best be expressed through some sample code and comments:

enum class zero : char { zero = 0 };
enum class nil : char { nil = 0 };
struct struct_of_zero { zero z; };
struct struct_of_nil     { nil n; };
union nil_and_zero { struct_of_nil sn; struct_of_zero sz; };

nil_and_zero nz;
nz.sn.n = nil.nil;
assert(nz.sz.z == zero.zero); /* Q1: Will this ever be false? */

void f(struct_of_zero[], size_t count);

nil_and_zero nzs[2];
nzs[0].sn.n = nil.nil;
nzs[1].sn.n = nil.nil;
/* Q2: Is there any way I can pass nzs to f as-is, instead of constructing an intermediate array of struct_of_zero? */
/* Q3: If I must create the intermediate struct_of_zero array, can I use legally use memcpy to copy nzs to it? I know I *shouldn't* */

struct struct_of_char  { char c; };
union nil_and_char { struct_of_nil sn; struct_of_char sc; };

nil_and_char nc;
nc.sn.n = nil.nil;
assert(nc.sc.c == 0); /* Q4: Is this UB, since there is no rule saying an enum and its underlying type are layout-compatible? Should it be? */

struct s_int { int i; };
union int_and_s_int { int i; s_int si; };

int_and_s_int isi;
isi.i = 0;
assert(isi.si.i == 0); /* Q5: Is this UB, since int and s_int are not layout-compatible standard layout structs? Should it be? */


Thanks,
Shea

Faisal Vali

unread,
Feb 26, 2014, 12:36:15 AM2/26/14
to std-dis...@isocpp.org

On Sun, Feb 23, 2014 at 4:54 PM, Dean Michael Berris <dbe...@google.com> wrote:

struct A {
  char c;
  int x;
};

struct B {
  char d;
  char y[sizeof(int)];
};

union foo {
  A a;
  B b;
};


On Mon, Feb 24, 2014 at 3:54 PM, Richard Smith <ric...@metafoo.co.uk> wrote:
On Mon, Feb 24, 2014 at 2:32 AM, Dinka Ranns <dinka...@googlemail.com> wrote:

 <snip>

3) am i correct in thinking that the whole point of defining the common initial sequence is to say that you can alias a member in the common initial sequence of the active member   through a member of the common initial sequence of a non-active member  ?

You can read such a member but not write it (so yes or no, depending on what you mean by "alias").


Anyone know the rationale for prohibiting writes here (assuming neither is const qualified)?  

<snip>

Jens Maurer

unread,
Feb 26, 2014, 2:35:43 AM2/26/14
to std-dis...@isocpp.org
On 02/26/2014 06:36 AM, Faisal Vali wrote:
>
>
> On Sun, Feb 23, 2014 at 4:54 PM, Dean Michael Berris <dbe...@google.com <mailto:dbe...@google.com>> wrote:
>
> struct A {
> char c;
> int x;
> };
>
> struct B {
> char d;
> char y[sizeof(int)];
> };
>
> union foo {
> A a;
> B b;
> };
>
>
> On Mon, Feb 24, 2014 at 3:54 PM, Richard Smith <ric...@metafoo.co.uk <mailto:ric...@metafoo.co.uk>> wrote:
>
> On Mon, Feb 24, 2014 at 2:32 AM, Dinka Ranns <dinka...@googlemail.com <mailto:dinka...@googlemail.com>> wrote:
>
>
> <snip>
>
> 3) am i correct in thinking that the whole point of defining the common initial sequence is to say that you can alias a member in the common initial sequence of the active member through a member of the common initial sequence of a non-active member ?
>
>
> You can read such a member but not write it (so yes or no, depending on what you mean by "alias").
>
>
> Anyone know the rationale for prohibiting writes here (assuming neither is const qualified)?

Writing changes the active member of a union (see 9.5), and the "trailing end"
members (beyond the common initial sequence) would then have indeterminate values
(see 4.1 conv.lval for the consequences).

I'd like to remind everybody that extending what unions can and cannot do
directly influences alias analysis of the compiler. Sticking to the conservative
minimum might therefore be a good course of action.
(There's still some grey area here that needs eventual cleanup.)

Jens

Jens Maurer

unread,
Feb 26, 2014, 2:47:15 AM2/26/14
to std-dis...@isocpp.org
On 02/25/2014 01:41 AM, Shea Levy wrote:
> Hi Jens,
>
> Thanks for the detailed explanations! I now feel I have a better understanding of what layout-compatible is for (and what it's not for) and see why my assumptions could break aliasing rules. I have a few more questions, which I feel can best be expressed through some sample code and comments:
>
> enum class zero : char { zero = 0 };
> enum class nil : char { nil = 0 };

Those two have the same underlying type, thus are layout-compatible.

> struct struct_of_zero { zero z; };
> struct struct_of_nil { nil n; };

By consequence, these are layout-compatible, too.

> union nil_and_zero { struct_of_nil sn; struct_of_zero sz; };

The common initial sequence consists of "zero" and "nil" members, respectively.

> nil_and_zero nz;
> nz.sn.n = nil.nil;

> assert(nz.sz.z == zero.zero); /* Q1: Will this ever be false? */

The read is valid; the remaining question is whether the
enumerator value "zero.zero" is mapped to the same bit-pattern as
"nil.nil". I'm not sure we say that in 7.2, but it's unlikely
to conceive an implementation that doesn't do the right thing
here.

> void f(struct_of_zero[], size_t count);
>
> nil_and_zero nzs[2];
> nzs[0].sn.n = nil.nil;
> nzs[1].sn.n = nil.nil;
> /* Q2: Is there any way I can pass nzs to f as-is, instead of constructing an intermediate array of struct_of_zero? */

Not that I know of. The size of the union array elements could be different fro
the size of the elements of struct_of_zero[] (for example, due to additional,
unrelated union elements).

> /* Q3: If I must create the intermediate struct_of_zero array, can I use legally use memcpy to copy nzs to it? I know I *shouldn't* */

See above about the bit-pattern issue.

> struct struct_of_char { char c; };
> union nil_and_char { struct_of_nil sn; struct_of_char sc; };
>
> nil_and_char nc;
> nc.sn.n = nil.nil;
> assert(nc.sc.c == 0); /* Q4: Is this UB, since there is no rule saying an enum and its underlying type are layout-compatible? Should it be? */

This is my current understanding.
"Should it be": for the "char" case when directly looking at the union probably not.

> struct s_int { int i; };
> union int_and_s_int { int i; s_int si; };
>
> int_and_s_int isi;
> isi.i = 0;
> assert(isi.si.i == 0); /* Q5: Is this UB, since int and s_int are not layout-compatible standard layout structs?

This is my current understanding.

> Should it be? */

No. Don't kill even more of the aliasing analysis of the compiler.

Jens

Shea Levy

unread,
Feb 26, 2014, 6:17:44 AM2/26/14
to std-dis...@isocpp.org
Hi Jens,

Thanks again for the detail! I think I understand this to the extent I
need for my purposes (basically, 'don't use this feature'). Added some
comments in-line.

On Wed, Feb 26, 2014 at 08:47:15AM +0100, Jens Maurer wrote:
> On 02/25/2014 01:41 AM, Shea Levy wrote:
> > Hi Jens,
> >
> > Thanks for the detailed explanations! I now feel I have a better understanding of what layout-compatible is for (and what it's not for) and see why my assumptions could break aliasing rules. I have a few more questions, which I feel can best be expressed through some sample code and comments:
> >
> > enum class zero : char { zero = 0 };
> > enum class nil : char { nil = 0 };
>
> Those two have the same underlying type, thus are layout-compatible.
>
> > struct struct_of_zero { zero z; };
> > struct struct_of_nil { nil n; };
>
> By consequence, these are layout-compatible, too.
>
> > union nil_and_zero { struct_of_nil sn; struct_of_zero sz; };
>
> The common initial sequence consists of "zero" and "nil" members, respectively.
>
> > nil_and_zero nz;
> > nz.sn.n = nil.nil;
>
> > assert(nz.sz.z == zero.zero); /* Q1: Will this ever be false? */
>
> The read is valid; the remaining question is whether the
> enumerator value "zero.zero" is mapped to the same bit-pattern as
> "nil.nil". I'm not sure we say that in 7.2, but it's unlikely
> to conceive an implementation that doesn't do the right thing
> here.
>

Is it possible that the "standard-layout types that are
layout-compatible have the same value representation" rule requires this
mapping to do the right thing?

>
> > void f(struct_of_zero[], size_t count);
> >
> > nil_and_zero nzs[2];
> > nzs[0].sn.n = nil.nil;
> > nzs[1].sn.n = nil.nil;
> > /* Q2: Is there any way I can pass nzs to f as-is, instead of constructing an intermediate array of struct_of_zero? */
>
> Not that I know of. The size of the union array elements could be different fro
> the size of the elements of struct_of_zero[] (for example, due to additional,
> unrelated union elements).
>

Is this possible even if the union *only* contains nil and zero as it
does in this case? Let's throw in the additional assumption that nil and
zero are not over-aligned.

>
> > /* Q3: If I must create the intermediate struct_of_zero array, can I use legally use memcpy to copy nzs to it? I know I *shouldn't* */
>
> See above about the bit-pattern issue.
>
> > struct struct_of_char { char c; };
> > union nil_and_char { struct_of_nil sn; struct_of_char sc; };
> >
> > nil_and_char nc;
> > nc.sn.n = nil.nil;
> > assert(nc.sc.c == 0); /* Q4: Is this UB, since there is no rule saying an enum and its underlying type are layout-compatible? Should it be? */
>
> This is my current understanding.
> "Should it be": for the "char" case when directly looking at the union probably not.
>

OK, so if I understand you correctly you think it might make sense to
make an enum and its underlying type layout-compatible?

> > struct s_int { int i; };
> > union int_and_s_int { int i; s_int si; };
> >
> > int_and_s_int isi;
> > isi.i = 0;
> > assert(isi.si.i == 0); /* Q5: Is this UB, since int and s_int are not layout-compatible standard layout structs?
>
> This is my current understanding.
>
> > Should it be? */
>
> No. Don't kill even more of the aliasing analysis of the compiler.
>

Do compilers do aliasing analysis based on which member of the union has
been set? I thought the entire point of unions was to allow different
names to alias (part of) the same memory.

~Shea

>
> Jens
>
> --
>
> ---
> You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
> To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/swgdeSqyTIw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to std-discussio...@isocpp.org.

Faisal Vali

unread,
Feb 26, 2014, 11:55:30 AM2/26/14
to std-dis...@isocpp.org
Thanks Jens - (In the interest of non-violence ;), I should assert that I wouldn't propose extending a union's capability without compelling use cases - and i can't readily think of one for allowing this) - but based on my complete lack of experience with compiler optimizers, I can quite unconfidently say that  I would have expected the 'common initial sequence' to introduce a 'must-alias' relationship and hence not interfere with optimization. 

I also find it interesting that writing to the shared common initial sequence subobject through a different name indeterminates the original objects value.  Ah Bartleby! Ah C++! ;)



 

Jens Maurer

unread,
Feb 26, 2014, 1:25:50 PM2/26/14
to std-dis...@isocpp.org
On 02/26/2014 12:17 PM, Shea Levy wrote:
> Is it possible that the "standard-layout types that are
> layout-compatible have the same value representation" rule requires this
> mapping to do the right thing?

Which rule, exactly?

>>> void f(struct_of_zero[], size_t count);
>>>
>>> nil_and_zero nzs[2];
>>> nzs[0].sn.n = nil.nil;
>>> nzs[1].sn.n = nil.nil;
>>> /* Q2: Is there any way I can pass nzs to f as-is, instead of constructing an intermediate array of struct_of_zero? */
>>
>> Not that I know of. The size of the union array elements could be different fro
>> the size of the elements of struct_of_zero[] (for example, due to additional,
>> unrelated union elements).
>>
>
> Is this possible even if the union *only* contains nil and zero as it
> does in this case? Let's throw in the additional assumption that nil and
> zero are not over-aligned.

I don't know of any rule that would allow this for that case.

>>> /* Q3: If I must create the intermediate struct_of_zero array, can I use legally use memcpy to copy nzs to it? I know I *shouldn't* */
>>
>> See above about the bit-pattern issue.
>>
>>> struct struct_of_char { char c; };
>>> union nil_and_char { struct_of_nil sn; struct_of_char sc; };
>>>
>>> nil_and_char nc;
>>> nc.sn.n = nil.nil;
>>> assert(nc.sc.c == 0); /* Q4: Is this UB, since there is no rule saying an enum and its underlying type are layout-compatible? Should it be? */
>>
>> This is my current understanding.
>> "Should it be": for the "char" case when directly looking at the union probably not.
>>
>
> OK, so if I understand you correctly you think it might make sense to
> make an enum and its underlying type layout-compatible?

No, I believe that "char" is allowed to alias anything, so it should
be ok to read an active member of a union through non-active
"char" (or rather, unsigned char) members.

>>> Should it be? */
>>
>> No. Don't kill even more of the aliasing analysis of the compiler.
>>
>
> Do compilers do aliasing analysis based on which member of the union has
> been set? I thought the entire point of unions was to allow different
> names to alias (part of) the same memory.

How does your compiler know two pointers to different types are
actually pointing into the same union, and therefore may alias?

union U {
struct A a;
struct B b;
};


void f(A * a);
void g(const B * b);

void h()
{
U u;
u.a = { ... };
f(&u.a);
g(&u.b); // how does "g" know u.b is aliasing an "A"?
}

If A and B are not layout-compatible, they can't alias.
If they don't share a common initial sequence, they can't alias.
Otherwise, they may alias.

My opinion is we should furthermore state normatively that they can
never alias, unless accessed through the (visible) union.

Jens

Shea Levy

unread,
Feb 26, 2014, 1:55:19 PM2/26/14
to std-dis...@isocpp.org
On Wed, Feb 26, 2014 at 07:25:50PM +0100, Jens Maurer wrote:
> On 02/26/2014 12:17 PM, Shea Levy wrote:
> > Is it possible that the "standard-layout types that are
> > layout-compatible have the same value representation" rule requires this
> > mapping to do the right thing?
>
> Which rule, exactly?
>

From 3.9.2 paragraph 3 of N3797:

> Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible types shall have the same value representation and alignment requirements (3.11).

So I remembered wrong that there was a standard-layout requirement
specified in that rule.

>
> >>> void f(struct_of_zero[], size_t count);
> >>>
> >>> nil_and_zero nzs[2];
> >>> nzs[0].sn.n = nil.nil;
> >>> nzs[1].sn.n = nil.nil;
> >>> /* Q2: Is there any way I can pass nzs to f as-is, instead of constructing an intermediate array of struct_of_zero? */
> >>
> >> Not that I know of. The size of the union array elements could be different fro
> >> the size of the elements of struct_of_zero[] (for example, due to additional,
> >> unrelated union elements).
> >>
> >
> > Is this possible even if the union *only* contains nil and zero as it
> > does in this case? Let's throw in the additional assumption that nil and
> > zero are not over-aligned.
>
> I don't know of any rule that would allow this for that case.
>

OK, I suspected not.

>
> >>> /* Q3: If I must create the intermediate struct_of_zero array, can I use legally use memcpy to copy nzs to it? I know I *shouldn't* */
> >>
> >> See above about the bit-pattern issue.
> >>
> >>> struct struct_of_char { char c; };
> >>> union nil_and_char { struct_of_nil sn; struct_of_char sc; };
> >>>
> >>> nil_and_char nc;
> >>> nc.sn.n = nil.nil;
> >>> assert(nc.sc.c == 0); /* Q4: Is this UB, since there is no rule saying an enum and its underlying type are layout-compatible? Should it be? */
> >>
> >> This is my current understanding.
> >> "Should it be": for the "char" case when directly looking at the union probably not.
> >>
> >
> > OK, so if I understand you correctly you think it might make sense to
> > make an enum and its underlying type layout-compatible?
>
> No, I believe that "char" is allowed to alias anything, so it should
> be ok to read an active member of a union through non-active
> "char" (or rather, unsigned char) members.
>

Ah, I see! I chose a bad example then :)

>
>
> >>> Should it be? */
> >>
> >> No. Don't kill even more of the aliasing analysis of the compiler.
> >>
> >
> > Do compilers do aliasing analysis based on which member of the union has
> > been set? I thought the entire point of unions was to allow different
> > names to alias (part of) the same memory.
>
> How does your compiler know two pointers to different types are
> actually pointing into the same union, and therefore may alias?
>
> union U {
> struct A a;
> struct B b;
> };
>
>
> void f(A * a);
> void g(const B * b);
>
> void h()
> {
> U u;
> u.a = { ... };
> f(&u.a);
> g(&u.b); // how does "g" know u.b is aliasing an "A"?
> }
>
> If A and B are not layout-compatible, they can't alias.
> If they don't share a common initial sequence, they can't alias.
> Otherwise, they may alias.
>
> My opinion is we should furthermore state normatively that they can
> never alias, unless accessed through the (visible) union.
>

Ah, good point. Your requirement makes good sense.
Reply all
Reply to author
Forward
0 new messages