Unsign interchangeably layout compatible classes

84 views
Skip to first unread message

Vlad from Moscow

unread,
Oct 1, 2015, 11:09:15 AM10/1/15
to ISO C++ Standard - Discussion
I would like to make it clear whether the following program is well-defined

#include <iostream>

struct A
{
    A( char c, int i ) : c( c ), i( i ) {}
private:
    char c;
    int i;
   
    friend std::ostream & operator << ( std::ostream &os, const A &rhs )
    {
        return os << "c = " << rhs.c << ", i = " << rhs.i;
    }       
};

struct B
{
    char c;
    int i;
};

char & get_c( A &a )
{
    void *p = &a;
   
    B *pb = static_cast<B *>( p );
  
    return pb->c = 'B';;
}

int main()
{
    A a( 'A', 65 );
    char &c = get_c( a) ;
  
    std::cout << a << std::endl;
   
    c = 'C';
    std::cout << a << std::endl;
}   

Vlad from Moscow

unread,
Oct 1, 2015, 12:02:08 PM10/1/15
to ISO C++ Standard - Discussion
I made the code example more meaningful

#include <iostream>

struct A
{
    A( char c, int i ) : c( c ), i( i ) {}
private:
    char c;
    int i;
   
    friend std::ostream & operator << ( std::ostream &os, const A &rhs )
    {
        return os << "c = " << rhs.c << ", i = " << rhs.i;
    }       
};

struct B
{
    char c;
    int i;
};
B & get_B( A &a )
{
    void *p = &a;
   
    B *pb = static_cast<B *>( p );
  
    return *pb;
}

int main()
{
    A a( 'A', 65 );
    std::cout << a << std::endl;
    B &b = get_B( a ) ;
  
    b.c = 'B'; b.i = 66;

Nicol Bolas

unread,
Oct 1, 2015, 1:38:10 PM10/1/15
to ISO C++ Standard - Discussion


On Thursday, October 1, 2015 at 11:09:15 AM UTC-4, Vlad from Moscow wrote:
char & get_c( A &a )
{
    void *p = &a;
   
    B *pb = static_cast<B *>( p );
  
    return pb->c = 'B';;
}

I'm not an expert on the strict-aliasing rules, but I'm pretty sure that breaks strict aliasing.

Layout compatibility means that the two types have the same value representation. This means that they store the same bitpattern.

If you combine this with trivial copyability (and only if you combine it with this), then this is legal:


A
&a = ...
B b
;
memcpy
(&b, &a, sizeof(B));

//b and a have the same value.

Trivial copyability means that you can memcpy the internal data into another object of that type. But since the two types in this case have the same value representation (due to layout-compatiiblity), you can copy data from one object to another.

Vlad from Moscow

unread,
Oct 1, 2015, 1:49:50 PM10/1/15
to ISO C++ Standard - Discussion
According to the C++ Standard 3.9.2 Compound types, p.#3

"...Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible types shall have the same value representation and alignment requirements"

So what problem can be with the pointer?

Tony V E

unread,
Oct 1, 2015, 5:54:21 PM10/1/15
to std-dis...@isocpp.org
I thought "private" "class" etc (C++isms) allow the compiler to break the C compatibility.

ie struct A could actually have i before c, whereas struct B can't.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-discussion/.

Nicol Bolas

unread,
Oct 1, 2015, 6:18:11 PM10/1/15
to ISO C++ Standard - Discussion
On Thursday, October 1, 2015 at 1:49:50 PM UTC-4, Vlad from Moscow wrote:
According to the C++ Standard 3.9.2 Compound types, p.#3

"...Pointers to cv-qualified and cv-unqualified versions (3.9.3) of layout-compatible types shall have the same value representation and alignment requirements"

So what problem can be with the pointer?

The problem is that it violates strict aliasing. A C++ compiler is allowed to assume that two pointers/references to unrelated types do not reference the same memory. Therefore:

void *p = &a;

auto c = p->c
   
B
*pb = static_cast<B *>( p );
pb
->c = 'B';

if(c == p->c)
 
return true;
else
 
return false;

The compiler is permitted to assume that `c == p->c`. There is no code between the setting of `c` and this conditional that should be able to modify p->c. Therefore, compilers are allowed to behave as if `c == p->c`.

That's the strict aliasing rule. It has nothing to do with the layout of the types involved. Because you have violated that rule, the result of this function is undefined.

Miro Knejp

unread,
Oct 1, 2015, 6:20:03 PM10/1/15
to std-dis...@isocpp.org
Am 01.10.2015 um 23:54 schrieb Tony V E:
> I thought "private" "class" etc (C++isms) allow the compiler to break
> the C compatibility.
>
> ie struct A could actually have i before c, whereas struct B can't.
Nope, A is standard-layout:
* [class]/7.2 "A class S is a standard-layout class if it:" ... "has the
same access control (Clause 11) for all non-static data members"
* [class]/8 "A standard-layout struct is a standard-layout class defined
with the class-key struct or the class-key class"
* [class.mem]/13 "Nonstatic data members of a (non-union) class with the
same access control (Clause 11) are allocated so that later members have
higher addresses within a class object. The order of allocation of
non-static data members with different access control is unspecified
(Clause 11)."

But the strict aliasing problem still remains.

Vlad from Moscow

unread,
Oct 1, 2015, 6:20:35 PM10/1/15
to ISO C++ Standard - Discussion
To be a standard layout class it is enough that it

"has the same access control (Clause 11) for all non-static data members,"

That is it is unimportant whether all non-static data members have the private access control or the protected access control or the public access control.

The other language where the original structure is from may know nothing about the  C++ access controls.

I think so and I would like to make the question clear.

Nicol Bolas

unread,
Oct 1, 2015, 6:21:45 PM10/1/15
to ISO C++ Standard - Discussion
On Thursday, October 1, 2015 at 5:54:21 PM UTC-4, Tony VE wrote:
I thought "private" "class" etc (C++isms) allow the compiler to break the C compatibility.

ie struct A could actually have i before c, whereas struct B can't.

Not since C++11. `A` is standard layout now, as is B. And the two are layout compatible, per 9.2, p16.

What would stop `A` from being standard layout is if only some of its NSDMs were public and some were private. But if they're all the same, then they're layout compatible. It doesn't (really) break encapsulation, because strict aliasing stops you from simply casting from one type to the other. You have to copy the data to a compatible object to be able to access it.

Vlad from Moscow

unread,
Oct 1, 2015, 6:40:39 PM10/1/15
to ISO C++ Standard - Discussion
Could you update your example because I do not think that this statement

auto c = p->c

is valid?

Nicol Bolas

unread,
Oct 1, 2015, 8:27:50 PM10/1/15
to ISO C++ Standard - Discussion


On Thursday, October 1, 2015 at 6:40:39 PM UTC-4, Vlad from Moscow wrote:
Could you update your example because I do not think that this statement

auto c = p->c

is valid?

That is essentially beside the point. Like I said, I'm not an expert on strict aliasing rules.

However, according to section 3.10, p10, one thing seems clear: you accessed an object of one type through a glvalue of a different, unrelated type. That's breaking strict aliasing.

So I'm pretty sure it doesn't matter if you access `c` directly or through a public member function. You broke strict aliasing the moment you tried to access `c` through a pointer to type `B`.

A compiler might allow that to work as a member function where it might not have worked via direct access. Maybe one compiler does and another one doesn't. Maybe an update of a compiler causes it to switch from working to not working. Maybe on every other Friday, it'll work.

That is all essentially irrelevant.

As far as the standard is concerned, the compiler is not required to allow you to modify the object of type `A` through a pointer of type `B`. And therefore, undefined behavior results. Period.

Larry Evans

unread,
Oct 1, 2015, 9:43:26 PM10/1/15
to std-dis...@isocpp.org
On 10/01/2015 07:27 PM, Nicol Bolas wrote:
>
>
> On Thursday, October 1, 2015 at 6:40:39 PM UTC-4, Vlad from Moscow wrote:
>
> Could you update your example because I do not think that this statement
>
> auto c =p->c
>
> is valid?
>
>
> That is essentially beside the point. Like I said, I'm not an expert on
> strict aliasing rules.
>

But it would make the example clearer to the reader:

A *pa = &a;
auto c = pa->c;

B *pb = static_cast<B *>( static_cast<void*>(pa) );

pb->c = 'B';//compiler can assume by strict aliasing that pa!=pb

if(c == pa->c)

Vlad from Moscow

unread,
Oct 2, 2015, 5:03:36 AM10/2/15
to ISO C++ Standard - Discussion, cpplj...@suddenlink.net
Well, what about this code

int x = 10;

void *p = &x;

std::memset( p, 0, sizeof( int ) );

if ( x == 10 )
    return true;
else
    return false;...

or this code

A *pa = &a;
auto c = pa->c;

char *pc = static_cast<char *>( static_cast<void*>(pa) ) 

*pc = 'B';

if(c == pa->c)
  return true;
else
  return false;

or this code

void f( void *p )
{
    char *pc = static_cast<char *>( p );
   *pc = 'B';
}

A *pa = &a;
auto c = pa->c;
  
f( &c );

if(c == pa->c)
  return true;
else
  return false;


or this code

int cmp( const void *p1, const void *p2 )
{
    int a = *static_cast<const int *>( p1 );
    int b = *static_cast<const int *>( p2 );

    return ( a > b ) - ( b > a );
}

long long int x = 1;
auto y = x;

std::qsort( &x, 2, sizeof( int ), cmp );

if ( y == x )
    return true;
else
    return false;

David Krauss

unread,
Oct 2, 2015, 5:32:20 AM10/2/15
to std-dis...@isocpp.org
On 2015–10–02, at 5:03 PM, Vlad from Moscow <vlad....@mail.ru> wrote:

Well, what about this code

int x = 10;

void *p = &x;

std::memset( p, 0, sizeof( int ) );

if ( x == 10 )
    return true;
else
    return false;…

Must be false. Strict aliasing says that a char* may always alias anything, so the compiler is forced to reload x.

or this code

A *pa = &a;
auto c = pa->c;

char *pc = static_cast<char *>( static_cast<void*>(pa) ) 

*pc = 'B';

if(c == pa->c)
  return true;
else
  return false;

Again false, same reason.

or this code

void f( void *p )
{
    char *pc = static_cast<char *>( p );
   *pc = 'B';
}

A *pa = &a;
auto c = pa->c;
  
f( &c );

if(c == pa->c)
  return true;
else
  return false;

Once again. Now it’s getting easier to observe the difference, by changing “char” to something else: http://melpon.org/wandbox/permlink/duPyEeeL1AQfxNyi

Interestingly, only versions of GCC prior to 4.5 actually behave differently. I didn’t look for any flags to ask newer versions to apply aliasing more eagerly, but it seems likely they took the warning diagnostic from 4.4 and decided not to apply the optimization in diagnosable cases.

or this code

int cmp( const void *p1, const void *p2 )
{
    int a = *static_cast<const int *>( p1 );
    int b = *static_cast<const int *>( p2 );

    return ( a > b ) - ( b > a );
}

long long int x = 1;
auto y = x;

std::qsort( &x, 2, sizeof( int ), cmp );

if ( y == x )
    return true;
else
    return false;

That looks like the wackiest endianness check I’ve ever seen.

This breaks strict aliasing, so it’s UB. But since qsort is usually statically linked from the C library, the chances are slim that anything could go wrong. Most likely, the optimizer will assume that it (legally) modifies every local variable whose address it can reach.

Vlad from Moscow

unread,
Oct 2, 2015, 6:04:45 AM10/2/15
to ISO C++ Standard - Discussion
It follows that qsort is an ill-formed function and any function that  is defined the similar way is also ilol-formed is not it?

And one more remark. Function memset does not have a parameter of type char *. So it is not clear why you decided that inside main x shall be reloaded. I could agree with you if memset were declared like

memset( char *, int, size_t );

as it was declared if I am not mistaken when the type void was absent in C.

Vlad from Moscow

unread,
Oct 2, 2015, 6:13:49 AM10/2/15
to ISO C++ Standard - Discussion
Also returning to the standard layout classes. If to follow this logic it seems that they are initially break the rule of strict aliasing because usually they are declared differently in a module written in other langiage. So what is the sense of introducing the standard layout class if it initially breaks the rule?


On Friday, October 2, 2015 at 12:32:20 PM UTC+3, David Krauss wrote:

David Krauss

unread,
Oct 2, 2015, 6:32:21 AM10/2/15
to std-dis...@isocpp.org
On 2015–10–02, at 6:04 PM, Vlad from Moscow <vlad....@mail.ru> wrote:

It follows that qsort is an ill-formed function and any function that  is defined the similar way is also ilol-formed is not it?

The access only happens within the callback. That’s not the responsibility of qsort.

And one more remark. Function memset does not have a parameter of type char *. So it is not clear why you decided that inside main x shall be reloaded. I could agree with you if memset were declared like

memset( char *, int, size_t );

as it was declared if I am not mistaken when the type void was absent in C.

The parameter type doesn’t matter; only accesses are significant. C11 §7.24.6.1 says that memset accesses a sequence of characters.

Vlad from Moscow

unread,
Oct 2, 2015, 8:21:24 AM10/2/15
to ISO C++ Standard - Discussion
So if I write a similar function

void * my_memset( void *, int, size_t );

when the compiler will read my description of the function somewhere and behave accordingly in the same way as with the memset?

Or if I write

int x = 10;
int y = x;

int *p = ( int * )std::memset( &x, 0, sizeof( x ) );

*p = 20;

if ( x == y )
    return true;
else
   return false;

I will get true?

And one more example of undefined behaviour

    int x = 10;
    int y = x;
   
    int *p = static_cast<int *>( static_cast<void *>( &x ) );
    *p = 20;

   
    if ( y == x )
    {
        std::cout << "True" << std::endl;
        std::cout << "x = " << x << ", y = " << y << std::endl;
    }       
    else
    {
        std::cout << "False" << std::endl;
        std::cout << "x = " << x << ", y = " << y << std::endl;
    }       

Nicol Bolas

unread,
Oct 2, 2015, 9:13:07 AM10/2/15
to ISO C++ Standard - Discussion
On Friday, October 2, 2015 at 6:13:49 AM UTC-4, Vlad from Moscow wrote:
Also returning to the standard layout classes. If to follow this logic it seems that they are initially break the rule of strict aliasing because usually they are declared differently in a module written in other langiage. So what is the sense of introducing the standard layout class if it initially breaks the rule?

Because converting one pointer to a different type is not the point of standard layout. Standard layout allows compatibility, but only really through copying, not through casting.

Some people compile their code without the strict aliasing rule. Or they just rely on undefined behavior. But as far as the standard is concerned, code that converts pointer types like this is undefined.

Nicol Bolas

unread,
Oct 2, 2015, 9:36:58 AM10/2/15
to ISO C++ Standard - Discussion


On Friday, October 2, 2015 at 8:21:24 AM UTC-4, Vlad from Moscow wrote:
So if I write a similar function

void * my_memset( void *, int, size_t );

when the compiler will read my description of the function somewhere and behave accordingly in the same way as with the memset?

You keep talking about what "the compiler" does. Undefined behavior is based on the standard, not what compilers do.

If your `my_memset` function works by using a `char*`, then you get defined behavior. If your `my_memset` function works in some other way, then you don't get defined behavior.

Any particular compiler may allow it to work. But any particular compiler may also not allow it to work, and it would not be a compiler bug for them to make your code not work.
 
Or if I write

int x = 10;
int y = x;

int *p = ( int * )std::memset( &x, 0, sizeof( x ) );

*p = 20;

if ( x == y )
    return true;
else
   return false;

I will get true?

And one more example of undefined behaviour

    int x = 10;
    int y = x;
   
    int *p = static_cast<int *>( static_cast<void *>( &x ) );
    *p = 20;
   
    if ( y == x )
    {
        std::cout << "True" << std::endl;
        std::cout << "x = " << x << ", y = " << y << std::endl;
    }       
    else
    {
        std::cout << "False" << std::endl;
        std::cout << "x = " << x << ", y = " << y << std::endl;
    }

As previously stated, the strict aliasing rule is violated when you attempt to access an object (ie: a particular piece of memory) through a pointer/reference to a type that is unrelated to the type that the object actually is.

If you convert an `int*` into a `void*` and then back into an `int*`, you get well-defined behavior, since the object being point at really is an `int`.

The problem with your `A` and `B` example is that the object itself is an `A`, but you're trying to access it via a pointer to `B`.

David Krauss

unread,
Oct 2, 2015, 9:42:22 AM10/2/15
to std-dis...@isocpp.org

> On 2015–10–02, at 8:21 PM, Vlad from Moscow <vlad....@mail.ru> wrote:
>
> So if I write a similar function
>
> void * my_memset( void *, int, size_t );
>
> when the compiler will read my description of the function somewhere and behave accordingly in the same way as with the memset?

Eh… please don’t extrapolate from any speculations about what is or isn’t likely to aggravate a UB condition.

> Or if I write
>
> int x = 10;
> int y = x;
>
> int *p = ( int * )std::memset( &x, 0, sizeof( x ) );
>
> *p = 20;
>
> if ( x == y )
> return true;
> else
> return false;
>
> I will get true?

No, there’s nothing wrong with either of those assignments. You cleared the object representation of x, then restored the value of &x back to its rightful type int*, then used that to legally assign a new value of 20. No problems, the answer is false.

> And one more example of undefined behaviour
>
> int x = 10;
> int y = x;
>
> int *p = static_cast<int *>( static_cast<void *>( &x ) );
> *p = 20;
>
> if ( y == x )
> {
> std::cout << "True" << std::endl;
> std::cout << "x = " << x << ", y = " << y << std::endl;
> }
> else
> {
> std::cout << "False" << std::endl;
> std::cout << "x = " << x << ", y = " << y << std::endl;
> }

Again, casting to void* and then back to int* is fine.

The only way you get a problem from an X* pointer with a referent of unrelated type Y is if the compiler makes an invalid assumption that a Y can never be reached from an X*. I don’t know enough about alias analysis… I can’t say whether such assumptions are commonplace and difficult to avoid, or if they would be regarded compiler bugs, or if in reality they no longer exist. See the end of the recent thread, “[class.mem]/19 and writing to common initial sequence members.” Jens Maurer says, “Having two seemingly type-unrelated pointers alias is totally unhelpful.” My impression is that the strict aliasing rule is imperfect, but you should be OK as long as you restrict function parameters and globals to void* or [unsigned] char* when the referent type is not as listed in the rule, [basic.lval] §3.10/10.

Reply all
Reply to author
Forward
0 new messages