Aliasing of object memory with char* is undefined by standard?

657 views
Skip to first unread message

Иван Вайгульт

unread,
Dec 15, 2017, 11:06:24 AM12/15/17
to ISO C++ Standard - Discussion
It is clearly said, that triviality copy copyable object can be  copied to an array of chars and back. [basic.types 2]

It is also allowed to copy one trivially copyable instance to another. [basic.types.3]

Representation of an object is said to be sequence of unsigned chars [basic.types.6.7]

The standard allows to create an instance of object reusing char array [basic.memobj.3]

Aliasing an object instance with char is allowed by [expr.basic.lval.11]

But all of this doesn't imply that I can:

MyStruct s = {};
char* begin = (char*) &s;
char* end = begin + sizeof(MyStruct);
for(char* it = begin; it != end; ++it) {
   
//do something with it here
}

Because it is not said anywhere directly or indirectly that this memory is consequent and can be viewed as char[sizeof(MyStruct)].

Moreover, it is unclear how to write a procedure that copies object representation to an array.

Am I missing something? Was that intended? If so, what's the profit from char aliasing if it cannot be used.

Thanks,
Ivan

Jens Maurer

unread,
Dec 15, 2017, 11:18:12 AM12/15/17
to std-dis...@isocpp.org
On 12/15/2017 05:06 PM, Иван Вайгульт wrote:
> MyStructs ={};
> char*begin=(char*)&s;
> char*end=begin+sizeof(MyStruct);
> for(char*it =begin;it !=end;++it){
> //do something with it here
> }
>
> Because it is not said anywhere directly or indirectly that this memory is consequent and can be viewed as char[sizeof(MyStruct)].

[intro.memory] says that, I believe.

> Moreover, it is unclear how to write a procedure that copies object representation to an array.

std::memcpy?

Jens

Иван Вайгульт

unread,
Dec 15, 2017, 11:52:52 AM12/15/17
to ISO C++ Standard - Discussion
This is the example form [intro.memory]

struct
{
 
char a;
 
int b:5,
  c
:11,
 
:0,
  d
:8;
 
struct {int ee:8;} e;
}

The top level struct is said to have four separate memory locations. And memory location is defined as either an object of scalar type or maximal sequence of adjacent bit-fields all having nonzero width. It is not said anywhere that those memory locations are stored consequently.

> std::memcpy?
It is not clear how to implement memcpy in such case.

Nicol Bolas

unread,
Dec 15, 2017, 12:15:45 PM12/15/17
to ISO C++ Standard - Discussion
On Friday, December 15, 2017 at 11:52:52 AM UTC-5, Иван Вайгульт wrote:
> std::memcpy?
It is not clear how to implement memcpy in such case.

You don't have to implement `memcpy`; it's part of the C++ standard library.
 


Иван Вайгульт

unread,
Dec 15, 2017, 12:37:48 PM12/15/17
to ISO C++ Standard - Discussion
I know that memcpy is part of standard. I understand that from the common sense perspective accessing the object via char array is fine. I just want to know whether this behavior defined or not (or unspecified, at least). And if it is really UB, how do I implement memcpy (let say as standard library developer) from the standard perspective.

inkwizyt...@gmail.com

unread,
Dec 15, 2017, 2:01:08 PM12/15/17
to ISO C++ Standard - Discussion


On Friday, December 15, 2017 at 6:37:48 PM UTC+1, Иван Вайгульт wrote:
I know that memcpy is part of standard. I understand that from the common sense perspective accessing the object via char array is fine. I just want to know whether this behavior defined or not (or unspecified, at least). And if it is really UB, how do I implement memcpy (let say as standard library developer) from the standard perspective.

If you are standard library developer you can use any thing that is normally UB to implements `memcpy`.
Of corse your code will be not portable in any way (or even be valid C++) but this is whole point of standard library to hide all un-portable parts.

Myriachan

unread,
Dec 15, 2017, 4:25:12 PM12/15/17
to ISO C++ Standard - Discussion, inkwizyt...@gmail.com
But why should memcpy be a magic function?  The language is kind of broken right now from this rule.  I really believe that char/unsigned char/std::byte pointers should be able to traverse any object like this.

As it currently stands, you can legally read the first byte by reinterpret_casting to a byte pointer, but you can't read any other byte, because such pointer arithmetic is undefined behavior.  This is broken.

Fixing this, though, has other problems that would have to be addressed, so unfortunately, I don't think it's changing anytime soon >.<

Melissa

inkwizyt...@gmail.com

unread,
Dec 15, 2017, 5:15:26 PM12/15/17
to ISO C++ Standard - Discussion, inkwizyt...@gmail.com
I was only pointing out that if you write something as std lib implementer you not need obey standard. Special status of `memcpy` and impossibility to implement this in standard is different thing.
I think too that standard should allow indexing to whole objects created in one memory block.
BTW did anyone write paper to push this change through committee?

Иван Вайгульт

unread,
Dec 15, 2017, 5:25:44 PM12/15/17
to ISO C++ Standard - Discussion
Just to clarify, I guess Melissa is talking about [expr.add.4], and the traversing is prohibited because there is no underlying char[sizeof(Object)] array, right? Could you, please, also explain, why accessing of the first byte is valid?

Myriachan

unread,
Dec 15, 2017, 7:58:13 PM12/15/17
to ISO C++ Standard - Discussion
On Friday, December 15, 2017 at 2:25:44 PM UTC-8, Иван Вайгульт wrote:
Just to clarify, I guess Melissa is talking about [expr.add.4], and the traversing is prohibited because there is no underlying char[sizeof(Object)] array, right? Could you, please, also explain, why accessing of the first byte is valid?


Yes, that's why the behavior is currently undefined if you implement your own custom memcpy() and attempt to copy between trivially copyable classes using it.  It's the pointer arithmetic that is undefined behavior, not the cast nor the accessing the memory as a different type.

As for why accessing the first byte is legal, [basic.lval]/8 defines the pointer aliasing rules, and char/unsigned char/std::byte are explicitly allowed to break the aliasing rules:

8 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
...
(8.8) a char, unsigned char, or std::byte type.

Accessing any object through a char/unsigned char/std::byte pointer or reference is legal, assuming that the pointer itself is valid.  So we can access the first byte of a structure without undefined behavior, but we can't move the pointer to get to the rest of the structure.  Especially considering that offsetof() exists, this is weird but currently correct.

Melissa

Иван Вайгульт

unread,
Dec 16, 2017, 8:09:44 AM12/16/17
to ISO C++ Standard - Discussion
Was that UB intended? Should I file an issue on github?

Jens Maurer

unread,
Dec 16, 2017, 9:21:54 AM12/16/17
to std-dis...@isocpp.org
On 12/16/2017 02:09 PM, Иван Вайгульт wrote:
> Was that UB intended? Should I file an issue on github?

On which github?

https://github.com/cplusplus/draft/issues
is intended for editorial issues, and any effective change in this
area does not seem editorial to me.

See https://isocpp.org/std/submit-issue for submitting
non-editorial issues.

Jens


> On Saturday, December 16, 2017 at 3:58:13 AM UTC+3, Myriachan wrote:
>
> On Friday, December 15, 2017 at 2:25:44 PM UTC-8, Иван Вайгульт wrote:
>
> Just to clarify, I guess Melissa is talking about [expr.add.4 <http://eel.is/c++draft/expr.add#4>], and the traversing is prohibited because there is no underlying char[sizeof(Object)] array, right? Could you, please, also explain, why accessing of the first byte is valid?
>
>
> Yes, that's why the behavior is currently undefined if you implement your own custom memcpy() and attempt to copy between trivially copyable classes using it. It's the pointer arithmetic that is undefined behavior, not the cast nor the accessing the memory as a different type.
>
> As for why accessing the first byte is legal, [basic.lval]/8 defines the pointer aliasing rules, and char/unsigned char/std::byte are explicitly allowed to break the aliasing rules:
>
> 8 If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
> ...
> (8.8) a char, unsigned char, or std::byte type.
>
>
> Accessing any object through a char/unsigned char/std::byte pointer or reference is legal, assuming that the pointer itself is valid. So we can access the first byte of a structure without undefined behavior, but we can't move the pointer to get to the rest of the structure. Especially considering that offsetof() exists, this is weird but currently correct.
>
> Melissa
>
> --
>
> ---
> You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org <mailto:std-discussio...@isocpp.org>.
> To post to this group, send email to std-dis...@isocpp.org <mailto:std-dis...@isocpp.org>.
> Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Иван Вайгульт

unread,
Dec 18, 2017, 4:48:43 AM12/18/17
to ISO C++ Standard - Discussion
Non editorial issue, for sure.

Did I get it right, that was unintended and I can file an issue?

On Friday, December 15, 2017 at 4:06:24 PM UTC, Иван Вайгульт wrote:

Jens Maurer

unread,
Dec 18, 2017, 5:24:58 AM12/18/17
to std-dis...@isocpp.org
On 12/18/2017 10:48 AM, Иван Вайгульт wrote:
> Non editorial issue, for sure.
>
> Did I get it right, that was unintended and I can file an issue?

Richard Smith cleaned up the area of "storage" vs. "object" a short while
ago, and while we say that an "object occupies a region of storage" and
"The memory available to a C++ program consists of one or more sequences of
contiguous bytes", we seem to fall short of actually making the storage
for an object viewable as an array of std::byte.

Note that this is a rather tricky area, because changes here might
gravely affect the liberty of optimizers. In particular, we don't
want to have the ability that a pointer to one complete object can
be somehow used to gain access to another (possibly adjacent) complete
object.

In short, I'm not sure about the "unintended" part.

Jens


> On Friday, December 15, 2017 at 4:06:24 PM UTC, Иван Вайгульт wrote:
>
> It is clearly said, that triviality copy copyable object can be copied to an array of chars and back. [basic.types 2 <http://eel.is/c++draft/basic.types#2>]
>
> It is also allowed to copy one trivially copyable instance to another. [basic.types.3 <http://eel.is/c++draft/basic.types#3>]
>
>
> Representation of an object is said to be sequence of unsigned chars [basic.types.6.7 <http://eel.is/c++draft/basic#types-4>]
>
>
> The standard allows to create an instance of object reusing char array [basic.memobj.3 <http://eel.is/c++draft/basic.memobj#intro.object-3>]
>
>
> Aliasing an object instance with char is allowed by [expr.basic.lval.11 <http://eel.is/c++draft/expr#basic.lval-11>]
>
>
> But all of this doesn't imply that I can:
>
>
> ||
> MyStructs ={};
> char*begin=(char*)&s;
> char*end=begin+sizeof(MyStruct);
> for(char*it =begin;it !=end;++it){
> //do something with it here
> }
>
> Because it is not said anywhere directly or indirectly that this memory is consequent and can be viewed as char[sizeof(MyStruct)].
>
>
> Moreover, it is unclear how to write a procedure that copies object representation to an array.
>
>
> Am I missing something? Was that intended? If so, what's the profit from char aliasing if it cannot be used.
>
>
> Thanks,
>
> Ivan
>

Иван Вайгульт

unread,
Dec 18, 2017, 7:55:21 AM12/18/17
to ISO C++ Standard - Discussion
I found some interesting details about offsetof.

> The macro offsetof(type, member-designator ) has the same semantics as 
> the corresponding macro in the C standard library header , but accepts a 
> restricted set of type arguments in this document.

There is a normative reference to c99 standard. From the c99 standard:
> offesetoff [...] expands to an integer constant expression that has type size_t, the value of 
> which is the offset in bytes, to the structure member (designated by member-designator), 
> from the beginning of its structure (designated by type).

So, it means that offsetof is always the same and does not depend on how an object instance
was created. 

Note, that an object instance can be created inside char[sizeof(Object)] (according to [basic.memobj.3]).

Hence, we may conclude that a POD structure is always stored in consequent memory (But it is not viewable via std::byte nor char array). Am I right? 

-Ivan


On Friday, December 15, 2017 at 4:06:24 PM UTC, Иван Вайгульт wrote:

Nicol Bolas

unread,
Dec 18, 2017, 1:25:40 PM12/18/17
to ISO C++ Standard - Discussion


On Monday, December 18, 2017 at 7:55:21 AM UTC-5, Иван Вайгульт wrote:
I found some interesting details about offsetof.

> The macro offsetof(type, member-designator ) has the same semantics as 
> the corresponding macro in the C standard library header , but accepts a 
> restricted set of type arguments in this document.

There is a normative reference to c99 standard. From the c99 standard:
> offesetoff [...] expands to an integer constant expression that has type size_t, the value of 
> which is the offset in bytes, to the structure member (designated by member-designator), 
> from the beginning of its structure (designated by type).

So, it means that offsetof is always the same and does not depend on how an object instance
was created. 

Note, that an object instance can be created inside char[sizeof(Object)] (according to [basic.memobj.3]).

Hence, we may conclude that a POD structure is always stored in consequent memory (But it is not viewable via std::byte nor char array). Am I right?

No. You may conclude that a standard layout structure is always stored in the same layout, with the same spacing between member subobjects.

POD is dead; `offsetof` is restricted to the (much more broad) category of standard layout.
Reply all
Reply to author
Forward
0 new messages