Is/Can it be well-defined to subtract (unsigned char *) pointers within the same object (representation)?

D. B.

unread,

Aug 18, 2016, 4:59:01 AM8/18/16

to std-pr...@isocpp.org

Given this: https://www.securecoding.cert.org/confluence/display/c/ARR36-C.+Do+not+subtract+or+compare+two+pointers+that+do+not+refer+to+the+same+array

We see the summary that, formally, the only pointer arithmetic operations that are defined are:

taking the difference of 2 pointers within the same array
comparing using < 2 pointers within the same object

...but what I want to do depends on being able to form well-defined pointer subtractions (to std::ptrdiff_t) between bytes within the object representation in the same object, or usually just between an arbitrary byte and the base address, i.e. given

auto const someBytePointer{ reinterpret_cast<unsigned char const *>(&someObj.someMember) };
std::ptrdiff_t myOffset{ someBytePointer - reinterpret_cast<unsigned char const *>(&someObj) };

I want to be able to guarantee that myOffset will hold the distance in unsigned char between someObj.someMember and the base address of someObj

However, I don't think the above guarantees specifically allow this. And I'm not sure the wording elsewhere that any object can be considered as a 'sequence' of unsigned char - the object representation - formally allows said sequence to be considered as an array in order to get around the 1st allowed piece of arithmetic earlier... or that there's any other allowance making this well-defined.

Anyhow, TL;DR, the questions:

Is this well-defined? Or am I relying on UB here?
Can this be made well-defined? Martin Sebor at another CERT page - https://www.securecoding.cert.org/confluence/display/c/ARR37-C.+Do+not+add+or+subtract+an+integer+to+a+pointer+to+a+non-array+object?focusedCommentId=58359845#comment-58359845 - (after I believe mistakenly inferring that any object can be treated as an array of unsigned char, rather than the verbatim wording sequence) - says this:

If the decision is to retain this rule I believe that, at minimum, an exception should be added to make it possible to treat an object as an array of unsigned char, otherwise there's would be no way to implement user-defined variants of memset() or memcpy() without violating this standard.

But I do still think this rule is overly restrictive and the decision to prohibit treating non-array objects of any type equivalently to arrays of a single element should be reconsidered. To avoid violating the rule, programmers whose code is subject to this standard would have to define all objects that might be potentially treated as arrays as arrays of size 1. That's an infeasible requirement in systems consisting of libraries or modules whose public APIs expose individual objects.

So there might be some momentum already for specifically defining this. I hope so! (Otherwise I'll have to radically rethink my design, as this would be the only piece of formal UB in my current project)

Thanks for any thoughts!

Viacheslav Usov

unread,

Aug 18, 2016, 6:28:35 AM8/18/16

to ISO C++ Standard - Future Proposals

On Thu, Aug 18, 2016 at 10:58 AM, D. B. <db0...@gmail.com> wrote:

> However, I don't think the above guarantees specifically allow this. And I'm not sure the wording elsewhere that any object can be considered as a 'sequence' of unsigned char - the object representation - formally allows said sequence to be considered as an array in order to get around the 1st allowed piece of arithmetic earlier... or that there's any other allowance making this well-defined.

Per [intro.object]/5, "An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage."

Per [basic.types]/2, "For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. 42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value."

Per [basic.types]/9, (not quoting), an array of unsigned char is itself a trivially copyable type, so it is contiguous.

I may still be mistaken, but for me that implies that pointers to at least trivially copyable types can be recast as pointers into an array of unsigned chars and operated safely while they stay within the storage occupied by the object. This can probably be extended to cover standard-layout types.

Cheers,

V.

Bo Persson

unread,

Aug 18, 2016, 6:47:15 AM8/18/16

to std-pr...@isocpp.org

On 2016-08-18 10:58, D. B. wrote:
> Given this:
> https://www.securecoding.cert.org/confluence/display/c/ARR36-C.+Do+not+subtract+or+compare+two+pointers+that+do+not+refer+to+the+same+array
>
> We see the summary that, formally, the only pointer arithmetic
> operations that are defined are:
>

> * taking the difference of 2 pointers *within the same array*
> * comparing using < 2 pointers *within the same object*
>
> ...but what I want to do depends on being able to form *well-defined
> pointer subtractions* (to /std::ptrdiff_t/) *between bytes within the
> object representation in the same object*, or usually just between an

> arbitrary byte and the base address, i.e. given
>

> * /auto const someBytePointer{ reinterpret_cast<unsigned char const
> *>(&someObj.someMember) };/
> * /std::ptrdiff_t myOffset{ someBytePointer -
> reinterpret_cast<unsigned char const *>(&someObj) };/
>
> I want to be able to *guarantee *that myOffset *will *hold the *distance
> in /unsigned char/* between /someObj.someMember/ and the base address of
> /someObj/

I think this fails already here, because the reinterpret_cast is not
guaranteed to work. For example, on a word addressed machine, a char
pointer might need a part-word index in addition to the word address.

Bo Persson

D. B.

unread,

Aug 18, 2016, 6:53:55 AM8/18/16

to std-pr...@isocpp.org

Surely it must work, since the Standard provisions for char* and unsigned char* to be able to alias any other type - which we must assume from context is so that they can read object representation?

D. B.

unread,

Aug 18, 2016, 7:02:58 AM8/18/16

to std-pr...@isocpp.org

Hi Viacheslav,

On Thu, Aug 18, 2016 at 11:28 AM, Viacheslav Usov <via....@gmail.com> wrote:

On Thu, Aug 18, 2016 at 10:58 AM, D. B. <db0...@gmail.com> wrote:

> However, I don't think the above guarantees specifically allow this. And I'm not sure the wording elsewhere that any object can be considered as a 'sequence' of unsigned char - the object representation - formally allows said sequence to be considered as an array in order to get around the 1st allowed piece of arithmetic earlier... or that there's any other allowance making this well-defined.

Per [intro.object]/5, "An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage."

Per [basic.types]/2, "For any object (other than a base-class subobject) of trivially copyable type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. 42 If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value."

You have reached many of the same parts of the Standard as me, but I felt they didn't guarantee what I want.

In relation to the above, much of my doubt centres around some key specifiers here: can be copied into an array of char or unsigned char and copied back – not that the object itself, considered in place without copying, is (can be treated as) such an array.

The wording about object representation only calls it a sequence of unsigned char. And there is no overt indication - as far as I can see - that such a sequence is explicitly allowed to be operated on using pointer arithmetic - only that arrays are. So my concern is that taking char* pointers into the object representation then doing offset arithmetic on them is formally, if only by omission, UB.

Again, it works fine on my compiler and does what I expect - because the compiler doesn't go out of its way to be capricious and just does the intuitive thing... but I strive for perfectly formed code in everything else, and the vagueness around this question is really bugging me now that I've considered it in more depth than 'well, the object representation is kinda like an array'. ;-)

I may still be mistaken, but for me that implies that pointers to at least trivially copyable types can be recast as pointers into an array of unsigned chars and operated safely while they stay within the storage occupied by the object. This can probably be extended to cover standard-layout types.

I'm not sure it follows from anything we've quoted, but I hope you're right! This is the exact guarantee I want.

The objects I'm talking about are trivially copyable, of course - and at the moment only consist of instances of unsigned char anyway - but with various acrobatics going on around them, which mean I cannot (without extreme toil) change them to be arrays of unsigned char.

Also, less selfishly, this seems - precisely because both you and I have independently inferred similar guarantees, but can't point at any passage that proves them - to be the kind of thing that users would expect, and which other clauses in the Standard do seem to indicate, if not explicitly... meaning that, as Martin Sebor said, it would be really advantageous if the Standard explicitly allowed this. I'm sure all sane implementations do, and the concept is probably widely assumed, but I worry that gaps in the Standard are making it officially undefined.

Whew, I need to lie down.

Viacheslav Usov

unread,

Aug 18, 2016, 7:56:56 AM8/18/16

to ISO C++ Standard - Future Proposals

On Thu, Aug 18, 2016 at 1:02 PM, D. B. <db0...@gmail.com> wrote:

> In relation to the above, much of my doubt centres around some key specifiers here: can be copied into an array of char or unsigned char and copied back – not that the object itself, considered in place without copying, is (can be treated as) such an array.

reinterpret_cast between two pointer types (via the requirements imposed on static_cast) results in "the same address as the original pointer value". The result is also a safely derived pointer. It is therefore safe and well defined to convert from any type (I am neglecting alignment complications here and in what follows) to a pointer to an array of unsigned characters. Then, assuming the original type trivially copyable, the "address" is that of contiguous bytes of storage in both the original object and the array. Any subobject must necessarily have an address within that same contiguous region, and because the array is also contiguous, the reinterpret_cast of the subobject's address is also a safely derived pointer to an element of the array.

Cheers,

V.

D. B.

unread,

Aug 18, 2016, 8:09:03 AM8/18/16

to std-pr...@isocpp.org

On Thu, Aug 18, 2016 at 12:56 PM, Viacheslav Usov <via....@gmail.com> wrote:

On Thu, Aug 18, 2016 at 1:02 PM, D. B. <db0...@gmail.com> wrote:

> In relation to the above, much of my doubt centres around some key specifiers here: can be copied into an array of char or unsigned char and copied back – not that the object itself, considered in place without copying, is (can be treated as) such an array.

reinterpret_cast between two pointer types (via the requirements imposed on static_cast) results in "the same address as the original pointer value". The result is also a safely derived pointer.

I'm with you so far...

It is therefore safe and well defined to convert from any type (I am neglecting alignment complications here and in what follows) to a pointer to an array of unsigned characters.

This is where it gets unclear for me. The aliasing rules make it clear that we can convert a pointer to any object to an [unsigned] char *, which by inference is so that object representation can be read. But it never specifically says that the resulting char * can be considered to be part of a conceptual array of size sizeof(Obj). So, are we only allowed to read that original 1 char? :-)

My gripe is with this lack of clarity. The fact memcpy exists indicates that everyone else infers the resulting pointer is within a conceptual array - or even that all of memory is one big array - else we would not be allowed to simply increment and dereference to continue with the copy! But we seem doomed to inference only, as there is no wording that specifically guarantees this.

To be clear, I'm definitely not disagreeing with what you say in practical terms, but I wish it were expressly allowed by the wording of the Standard, as I can't see it in there. All I can do is infer from various other passages - things like 'Well, they were probably thinking X' and 'The existence of Y implies this is well-formed', and so on. It feels like trying to put a jigsaw puzzle together!

But if I'm just being dense and missing the logic, please do keep explaining until I get it. ;-)

Bo Persson

unread,

Aug 18, 2016, 9:32:14 AM8/18/16

to std-pr...@isocpp.org

The missing guarantee is sizeof(&someObj) == sizeof(char*), which is not
required.

So if char* (and void*) is larger than int* or some_class*, a
reinterpret_cast will likely not work.

Bo Persson

Viacheslav Usov

unread,

Aug 18, 2016, 10:17:02 AM8/18/16

to ISO C++ Standard - Future Proposals

On Thu, Aug 18, 2016 at 2:08 PM, D. B. <db0...@gmail.com> wrote:

> This is where it gets unclear for me. The aliasing rules make it clear that we can convert a pointer to any object to an [unsigned] char *, which by inference is so that object representation can be read. But it never specifically says that the resulting char * can be considered to be part of a conceptual array of size sizeof(Obj).

That part is easy. We can convert that to a pointer to unsigned char[sizeof object]. Then we can certainly access every element of that array.

The nasty part is that you want an offset, which means you would end up with two pointers to arrays, different arrays, so you still cannot do arithmetic with them directly. What I think you can do is take the address of the complete object, cast it to pointer to array of unsigned chars, get the address of the first element of that array, then increment the resultant pointer until it becomes equal to the address of the member of interest (that is cast to pointer to unsigned char).

That should work because each byte is required to have a unique address (so different addresses cannot refer to the same byte), and the comparison for equality of pointers compares the addresses they represent.

Either that, or the standard offsetof macro.

Cheers,

V.

D. B.

unread,

Aug 18, 2016, 10:27:03 AM8/18/16

to std-pr...@isocpp.org

Why would the Standard explicitly allow aliasing via [unsigned] char lvalues to objects of different type, if it was UB to ever form such an lvalue in the first place?

I'm not sure why the sizeof different pointer types matters. A reinterpret_cast should be capable of doing any manipulation it wants to the numerical values of a pointer, as long as it ends up pointing to the same thing when converted to the destination type (and back). I'm not sure the sizeof is relevant. We can convert between other types with different sizes just fine; why not pointers? As long as no information is lost that can't be reconstituted between converting the numerical representations, I see no problem. Assuming that's what you mean, can you elaborate on precisely where such information could be lost irrecoverably?

Arthur O'Dwyer

unread,

Aug 18, 2016, 11:17:41 PM8/18/16

to ISO C++ Standard - Future Proposals

On Thursday, August 18, 2016 at 7:27:03 AM UTC-7, D. B. wrote:

On Thu, Aug 18, 2016 at 2:32 PM, Bo Persson <b...@gmb.dk> wrote:

On 2016-08-18 12:53, D. B. wrote:

On Thu, Aug 18, 2016 at 11:47 AM, Bo Persson <b...@gmb.dk

<mailto:b...@gmb.dk>> wrote:

I think this fails already here, because the reinterpret_cast is not
guaranteed to work. For example, on a word addressed machine, a char
pointer might need a part-word index in addition to the word address.

Surely it must work, since the Standard provisions for char* and
unsigned char* to be able to alias any other type - which we must assume
from context is so that they can read object representation?

The missing guarantee is sizeof(&someObj) == sizeof(char*), which is not required.

So if char* (and void*) is larger than int* or some_class*, a reinterpret_cast will likely not work.

Why would the Standard explicitly allow aliasing via [unsigned] char lvalues to objects of different type, if it was UB to ever form such an lvalue in the first place?

I'm not sure why the sizeof different pointer types matters. A reinterpret_cast should be capable of doing any manipulation it wants to the numerical values of a pointer, as long as it ends up pointing to the same thing when converted to the destination type (and back). I'm not sure the sizeof is relevant. We can convert between other types with different sizes just fine; why not pointers? As long as no information is lost that can't be reconstituted between converting the numerical representations, I see no problem. Assuming that's what you mean, can you elaborate on precisely where such information could be lost irrecoverably?

Bo is talking about something different (and irrelevant).

5.2.10 [expr.reinterpret.cast] /7:

An object pointer can be explicitly converted to an object pointer of a different type.72 When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v)). Converting a prvalue of type “pointer to T1” to the type “pointer to T2”

(where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value.

That is, reinterpret_cast<foo*>(mybarptr) is identical in meaning to static_cast<foo*>(static_cast<void*>(mybarptr)).

What Bo was thinking of, probably, was

bar *mybarptr;

foo *myfooptr = reinterpret_cast<foo*&>(mybarptr);

which is equivalent to

foo *myfooptr = *static_cast<foo**>(static_cast<void *>(&mybarptr));

which of course does what you'd expect ONLY IF the two pointer types are layout-compatible (and is undefined behavior in this case anyway because of the type-punning, IIUC). This case is covered in 5.2.10 [expr.reinterpret.cast] /11:

A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type.

But this case is not at all what D.B. was talking about.

HTH,

–Arthur

Edward Catmur

unread,

Aug 19, 2016, 8:49:56 AM8/19/16

to ISO C++ Standard - Future Proposals

On Thursday, 18 August 2016 15:17:02 UTC+1, Viacheslav Usov wrote:

On Thu, Aug 18, 2016 at 2:08 PM, D. B. <db0...@gmail.com> wrote:

> This is where it gets unclear for me. The aliasing rules make it clear that we can convert a pointer to any object to an [unsigned] char *, which by inference is so that object representation can be read. But it never specifically says that the resulting char * can be considered to be part of a conceptual array of size sizeof(Obj).

That part is easy. We can convert that to a pointer to unsigned char[sizeof object]. Then we can certainly access every element of that array.

The nasty part is that you want an offset, which means you would end up with two pointers to arrays, different arrays, so you still cannot do arithmetic with them directly. What I think you can do is take the address of the complete object, cast it to pointer to array of unsigned chars, get the address of the first element of that array, then increment the resultant pointer until it becomes equal to the address of the member of interest (that is cast to pointer to unsigned char).

Exactly; we can't use [expr.rel] or [expr.add] between a pointer to complete object (or a pointer derived from a pointer to complete object) and a pointer to subobject, but we can use [expr.eq]. It's O(n) in the size of the complete object (before optimization), but if I understand correctly the motivation is just that an Standard-compliant implementation should be possible; on any actual system pointer arithmetic could be used.

That should work because each byte is required to have a unique address (so different addresses cannot refer to the same byte), and the comparison for equality of pointers compares the addresses they represent.

Yes, although it is possible for equal but different pointers to refer to the same byte (that is, their value representation is distinct). This could be the case on e.g. a segmented memory system, and provides one obvious justification for the difference between the relational, arithmetic and equality operators: the equality operator is required to canonicalize segmented pointers (or, equally, compare them taking into account segmentation) while the relational and arithmetic operators can assume the pointers have the same segment id. Another possibility could be running C++ on top of an object-oriented machine (a Lisp machine, or a Javascript VM) where a char pointer is a tuple of reference to native object (not necessarily complete object) and byte offset.

Either that, or the standard offsetof macro.

Which is however only conditionally supported for non-standard-layout types.

Viacheslav Usov

unread,

Aug 19, 2016, 10:53:21 AM8/19/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 19, 2016 at 2:49 PM, Edward Catmur <e...@catmur.co.uk> wrote:

> the equality operator is required to canonicalize segmented pointers (or, equally, compare them taking into account segmentation)

Since the standard already requires normalisation for equality tests, any comformant implementation must have the requisite machinery, so it should not be too difficult to make that machinery available for pointer arithmetic. For example, we could have

template<typename T> std::ptrdiff_t std::pointer_distance<T>(T *p, T *q);

that is well-defined whenever p and q (both of type T) point within the same complete object (or the outermost array).

Cheers,

V.

Edward Catmur

unread,

Aug 19, 2016, 1:24:50 PM8/19/16

to std-pr...@isocpp.org

Or we could require that the specialization of std::distance for pointer types works for any two (aligned) pointers within the same region of storage, not just within the same array. So your pointer_distance, but with an existing name. This should be reasonably uncontroversial, seeing as we already have std::less specialized on pointer types providing a stable, equality-respecting order for any two pointers anywhere in memory.

As an aside, std::less<T*> means that you can find the offset of a member in O(log n) i.e. O(sizeof(size_t)), using binary search!

D. B.

unread,

Aug 19, 2016, 1:34:04 PM8/19/16

to std-pr...@isocpp.org

On Fri, Aug 19, 2016 at 6:24 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

Or we could require that the specialization of std::distance for pointer types works for any two (aligned) pointers within the same region of storage, not just within the same array. So your pointer_distance, but with an existing name. This should be reasonably uncontroversial, seeing as we already have std::less specialized on pointer types providing a stable, equality-respecting order for any two pointers anywhere in memory.

Nice workarounds :-) but would it not be easier just to formally define the intuitive results of subtracting (aligned, preferably unsigned char) pointers? If we are apparently allowed to 'fake it' using operator< and iteration, ultimately arriving at the same number... what's the point of requiring all that hoop-jumping just to be Standard-compliant?

How does memcpy() get around these vague rules anyway? By incrementing and dereferencing within 2 regions of memory whose identity it doesn't know, which rules does it rely on? Or doesn't it? Since users should pass it 2 pointers and a size that fall within well-defined arrays or objects on both source and destination sides, it seems like another argument for defining pointer arithmetic proper within a single trivially copyable (and perhaps also standard layout?) object, not just within an array.

Anyway, appreciate all the thoughts. Keep them coming! :-)

Viacheslav Usov

unread,

Aug 19, 2016, 1:43:50 PM8/19/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 19, 2016 at 7:24 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> Or we could require that the specialization of std::distance for pointer types works for any two (aligned) pointers within the same region of storage, not just within the same array. So your pointer_distance, but with an existing name.

I considered that, but rejected. My rationale was that when pointer normalisation is expensive, existing code using std::distance with pointers should not suffer. It can be argued, though, that existing code doing that can only be correct when no normalisation is needed, so an implementation could in principle be able to avoid extra costs. Then this is a QoI issue.

Another argument is that the fundamental issue deserves to be visible, rather than being trivialised by a catch-all std::distance.

Cheers,

V.

Viacheslav Usov

unread,

Aug 19, 2016, 1:47:04 PM8/19/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 19, 2016 at 7:34 PM, D. B. <db0...@gmail.com> wrote:

> How does memcpy() get around these vague rules anyway?

It is part of the implementation, so, if it is itself implemented in C++, it can exploit an implementation-defined behaviour, or it can be implemented "magically" to begin with. It does not need to be portable.

Cheers,

V.

Viacheslav Usov

unread,

Aug 19, 2016, 1:54:16 PM8/19/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 19, 2016 at 7:34 PM, D. B. <db0...@gmail.com> wrote:

> Nice workarounds :-) but would it not be easier just to formally define the intuitive results of subtracting (aligned, preferably unsigned char) pointers?

I'm old enough to remember and have used "near", "far" and "huge" pointers. On some ancient x86 architectures, huge pointers could address the whole 1 megabyte of RAM, and they behaved in this nice way you want. Problem is, they were much slower than the far pointers, which could also address the whole 1 megabyte of RAM, but were weird with respect to arithmetic and even the equality comparison (they were not really comformant in this part).

So the performance/niceness tradeoff was quite real, and I think what we see in the standard today was directly influenced by that. I do not think this is likely to change, at least till the people who remember the same story are on the committee :)

Cheers,

V.

Edward Catmur

unread,

Aug 19, 2016, 4:41:54 PM8/19/16

to std-pr...@isocpp.org

It's not solely of historical interest, though. Optimizers can and do take advantage of the strict rules for pointer arithmetic and order comparison, so I think there would be considerable resistance to relaxing them.

Also, flat memory might be in the ascendant now, but who knows what the future holds? Perhaps strict pointer arithmetic and order comparison will be of advantage programming future heterogeneous architectures.

Edward Catmur

unread,

Aug 19, 2016, 4:46:58 PM8/19/16

to std-pr...@isocpp.org

Thanks, I'm convinced by both your points.

I guess the difference between this and std::less<T*> is that the latter is essential to disarm a landmine in the Library, namely using pointers as the key type in associative containers, while computing offsets at runtime is far more a niche use case.

Thiago Macieira

unread,

Aug 19, 2016, 7:37:07 PM8/19/16

to std-pr...@isocpp.org

On sexta-feira, 19 de agosto de 2016 21:46:56 PDT 'Edward Catmur' via ISO C++

Standard - Future Proposals wrote:
> I guess the difference between this and std::less<T*> is that the latter is
> essential to disarm a landmine in the Library, namely using pointers as the
> key type in associative containers, while computing offsets at runtime is
> far more a niche use case.

I've used this technique before: when looping over two arrays, instead of
advancing both pointers (two registers that need updating) or using an index
variable (three registers in use, of which one only needs updating), I found
it useful to compute the flat-memory offset of between the two pointers. Then I
had only two registers in use, with one only being updated in the loop.

I needed that because of register pressure on 32-bit x86 when doing string
comparisons with the PCMPESTRI instruction, which has too many input
arguments. It also read past the end of the strings. Fortunately, this code
was written in assembly, so no C++ rules were violated.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

D. B.

unread,

Aug 20, 2016, 5:26:40 AM8/20/16

to std-pr...@isocpp.org

I get all the points about different types of pointers, memory regions, etc. - which are valid counterarguments for totally arbitrary subtractions. But I don't don't grasp why such caveats should disallow (in fact: don't logically allow) arithmetic within the same (trivially copyable/standard layout) object.. Such an object must occupy a single, known region of storage. Just as is true for arrays.

Could it not, then, be made well-defined to perform such arithmetic within such objects too? If the argument is that the compiler can't know the origin of the pointers - yet it must handle the same uncertainty properly for pointers within arrays, but not objects... despite the latter being formally representable as a "sequence" of unsigned char. If the Standard already requires compilers to honour the array case, it seems like all the required mechanisms already exist to support objects. Refine the definition of said "sequence" by specifying that the object's base address can be cast to an array of unsigned char[ sizeof(T) ], and we seem to be there. But it can't be that simple - right? ;-)

I'm not saying they should add this just because I have a relatively rare use-case for it - well, I am, but also! - more that guarantees elsewhere in the Standard already seem to make it practically possibly, and yet it's not theoretically defined.

However, it's possible I'm still missing something, so I apologise if you end up having to explain it yet again!

Edward Catmur

unread,

Aug 20, 2016, 8:05:53 AM8/20/16

to std-pr...@isocpp.org

If we're talking pointers of arbitrary type, there's the optimization issue, but also that one of the pointers might not be aligned - think #pragma pack. If on the other hand you're only concerned about character pointers, then optimization and alignment cease to be issues.

Also consider that currently it would be allowable for the largest possible object to be larger than the largest possible array of non-character type. That would no longer work the same if we were able to subtract and compare arbitrary non-character pointers within the same object.

Note that if your object is constructed into user-visible storage then there already exists a character array beneath it and you can use std::launder on appropriate character pointers to retrieve pointers into the character array.

D. B.

unread,

Aug 20, 2016, 8:41:06 AM8/20/16

to std-pr...@isocpp.org

On Sat, Aug 20, 2016 at 1:05 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

If we're talking pointers of arbitrary type, there's the optimization issue, but also that one of the pointers might not be aligned - think #pragma pack. If on the other hand you're only concerned about character pointers, then optimization and alignment cease to be issues

Indeed - I am only worried about unsigned char * pointers. I tend to work with byte buffers, both out of necessity and the fact that they just make more sense to me. More importantly, because only these pointers have so many already-defined privileges wrt object representation, aliasing, etc.

Also consider that currently it would be allowable for the largest possible object to be larger than the largest possible array of non-character type. That would no longer work the same if we were able to subtract and compare arbitrary non-character pointers within the same object

No issue, as above. :-)

It sounds like maybe this is a reasonable idea to put forward for a wording change/enhancement, then? Your comments here make me quite optimistic, at least...

Lastly:

Note that if your object is constructed into user-visible storage then there already exists a character array beneath it and you can use std::launder on appropriate character pointers to retrieve pointers into the character array.

Sorry, could you explain what you mean by user-visible storage and how it means there's already such an array, and how launder would contribute?

D. B.

unread,

Aug 20, 2016, 8:56:34 AM8/20/16

to std-pr...@isocpp.org

Looking at this about launder et al:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4430.html

This addition is intriguing:

[ Footnote: An object that is not an array element is considered to belong to a single-element array for this purpose; see 5.3.1. A pointer past the end of the last element of an array of n elements is considered to point to a hypothetical element n for this purpose; see 3.9.2. ],

So, does this mean that...

We can cast an object to unsigned char *, either - adjust to taste - because said object contains an unsigned char in 1st member position, or by some interpretation of the object rep/aliasing rules
So we now have an unsigned char* p
...which is an unsigned char p[0]
...to which we can point 1-past the end and get something that acts like unsigned char p[1]
...from which we can point 1-past the end and get something that acts like unsigned char p[2]
...and repeat indefinitely?

I might be misreading, but... does anyone else think we can extrapolate this so far that any object can ultimately be treated as an array of unsigned char[sizeof(T)] after all?

I'm trying to work out whether what I want (which already seemed kinda implicit in C++14) has now been made explicit via deduction through the above.

D. B.

unread,

Aug 20, 2016, 9:04:09 AM8/20/16

to std-pr...@isocpp.org

2 fixes:

On Sat, Aug 20, 2016 at 1:56 PM, D. B. <db0...@gmail.com> wrote:

Looking at this about launder et al:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4430.html

This addition is intriguing:

[ Footnote: An object that is not an array element is considered to belong to a single-element array for this purpose; see 5.3.1. A pointer past the end of the last element of an array of n elements is considered to point to a hypothetical element n for this purpose; see 3.9.2. ],

This is marked as "Change in 5.7 (expr.add) paragraph 4"

Hard to explain the below but a correction to the 1st array size and some attempted clarifications:

So, does this mean that...

We can cast an object to unsigned char *, either - adjust to taste - because said object contains an unsigned char in 1st member position, or by some interpretation of the object rep/aliasing rules
So we now have an unsigned char* p

...which can act like an unsigned char q[1], so we now have &q[0] == *(p + 1)
...to which we can point 1-past the end and get a conceptual larger array, having 2 elements, in which &q[1] == *(p + 2)
...from which we can point 1-past the end ...etc.... &q[2] == *(p + 3)
...and repeat indefinitely?

Having said all that, whether in Standardese it directly follows - without literally iterating every element along the way - that we have an array of unsigned char[sizeof(T)]... I can't quite make out. But it seems mildly encouraging, at least...

Edward Catmur

unread,

Aug 20, 2016, 10:06:39 AM8/20/16

to std-pr...@isocpp.org

No, sorry. What you have is a contiguous sequence of unsigned char[1], as e.g. if they were adjacent data members of a struct.

That is, unless such an array of characters already exists as it would if e.g. your object were constructed in storage provided by std::aligned_storage, or by a user-defined allocator.

D. B.

unread,

Aug 20, 2016, 10:21:44 AM8/20/16

to std-pr...@isocpp.org

On Sat, Aug 20, 2016 at 3:06 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

No, sorry. What you have is a contiguous sequence of unsigned char[1], as e.g. if they were adjacent data members of a struct.

That is, unless such an array of characters already exists as it would if e.g. your object were constructed in storage provided by std::aligned_storage, or by a user-defined allocator.

I really appreciate your efforts to think of legal loopholes that might be able to make this well-defined for me. :-) Is std::aligned_storage defined to use an array of char, or can it do any implementation-defined acrobatics it wishes? If the answer is affirmative, that would be a lot easier than writing a custom allocator just to stay on the Standard's good side...

...but I might if I have to, at least until I can refactor the current setup to use an array directly... a thought that fills me with horror. As far as UB goes, I don't feel 100% bad about this bit due to how obtuse an implementation would have to be to subvert my intuition... only 99% bad. ;-)

Edward Catmur

unread,

Aug 20, 2016, 11:13:35 AM8/20/16

to std-pr...@isocpp.org

On Sat, Aug 20, 2016 at 3:21 PM, D. B. <db0...@gmail.com> wrote:

On Sat, Aug 20, 2016 at 3:06 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:
No, sorry. What you have is a contiguous sequence of unsigned char[1], as e.g. if they were adjacent data members of a struct.

That is, unless such an array of characters already exists as it would if e.g. your object were constructed in storage provided by std::aligned_storage, or by a user-defined allocator.

I really appreciate your efforts to think of legal loopholes that might be able to make this well-defined for me. :-) Is std::aligned_storage defined to use an array of char, or can it do any implementation-defined acrobatics it wishes? If the answer is affirmative, that would be a lot easier than writing a custom allocator just to stay on the Standard's good side...

Ah. No, it can be any POD. However, writing your own aligned_storage is easy if somewhat recursive (for "default-alignment"):

template<std::size_t Len, std::size_t Align = alignof(std::aligned_storage_t<Len>)>
struct my_aligned_storage {
    typedef struct {
        alignas(Align) unsigned char data[Len];
    } type;
};

Viacheslav Usov

unread,

Aug 26, 2016, 8:42:28 AM8/26/16

to ISO C++ Standard - Future Proposals

On Sat, Aug 20, 2016 at 11:26 AM, D. B. <db0...@gmail.com> wrote:

> I get all the points about different types of pointers, memory regions, etc. - which are valid counterarguments for totally arbitrary subtractions. But I don't don't grasp why such caveats should disallow (in fact: don't logically allow) arithmetic within the same (trivially copyable/standard layout) object.. Such an object must occupy a single, known region of storage. Just as is true for arrays.

Let me try again.

(0) [intro.memory]/1 "Every byte has a unique address."

(1) [intro.object]/5 "An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage."

(2) [basic.compound]/3 "If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained."

(3) [basic.types]/9 "Scalar types, standard-layout class types (Clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called standard-layout types."

(4) [basic.align]/6 "Furthermore, the narrow character types (3.9.1) shall have the weakest alignment requirement."

(5) [conv.ptr]/2 "A prvalue of type “pointer to cv T”, where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The result of converting a non-null pointer value of a pointer to object type to a “pointer to cv void” represents the address of the same byte in memory as the original pointer value"

(6) [expr.static.cast]/13 "A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer value is converted to the null pointer value of the destination type. If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A"

(7) [expr.reinterpret.cast]/7 "An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v))."

(8) [expr.sizeof] "The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id. The sizeof operator shall not be applied to an expression that has function or incomplete type, to the parenthesized name of such types, or to a glvalue that designates a bit-field. sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1."

Assume types A and C are trivially copyable or standard-layout types, type C has field a of type A, which is not a bit-field.

Per (5), (6) & (7), pac defined via

auto pac = reinterpret_cast<unsigned char(*)[sizeof (C)]>(&c);

is a pointer representing the address of the first byte of c.

Per (2), pc defined via

auto pc = &(*pac)[0]

is a pointer to the first unsigned char in the array pointed to by pac, and it represents the address of c.

Per (5), (6) & (7), pa defined via

auto pa = reinterpret_cast<unsigned char*>(&c.a);

is a pointer representing address X of the first byte of c.a.

Per (0) & (1), X is an address of a byte somewhere in the storage occupied by the array pointed to by pac. There must therefore be an unsigned char in the array pointed to by pac, with address X. Assume the latter is not true, then, per (8), there is exactly one byte per any unsigned character, therefore the byte at address X does not correspond to any byte representing any unsigned char in the array pointed by pac, which contradicts either (1) or (4). Therefore, an unsigned char in the array pointed to by pac exists at address X. Per (2), pa, representing address X, points to an unsigned char in the array pointed to by pac.

Therefore, pc and pa are both pointers to unsigned chars in the array pointed to by pac.

Per (2), we should achieve the same if pc is defined directly via

auto pc = reinterpret_cast<unsigned char*>(&c);

Cheers,

V.

D. B.

unread,

Aug 26, 2016, 9:07:17 AM8/26/16

to std-pr...@isocpp.org

Whew! That explanation is so logical, it makes my head hurt... ;-)

So you think that doing &c.a - &c is defined behaviour, via these rules? I know you originally said this (but in much less detail :-) but got the impression from some of your later messages that you had changed your mind/interpretation.

Specifically, this:

That part is easy. We can convert that to a pointer to unsigned char[sizeof object]. Then we can certainly access every element of that array.

The nasty part is that you want an offset, which means you would end up with two pointers to arrays, different arrays, so you still cannot do arithmetic with them directly.

Am I correct that you no longer think the 2nd paragraph is a barrier to this subtraction being defined, considering the steps you've shown?

If so, then by simply 'imagining' an array pointed at be pac, combined with the rules for converting/casting addresses, would mean that pa and pc must be formally subtractable, although they were not directly obtained from an array. I hope I've read this right...

Thanks again!

Edward Catmur

unread,

Aug 26, 2016, 9:18:01 AM8/26/16

to std-pr...@isocpp.org

On Fri, Aug 26, 2016 at 1:42 PM, Viacheslav Usov <via....@gmail.com> wrote:

On Sat, Aug 20, 2016 at 11:26 AM, D. B. <db0...@gmail.com> wrote:

> I get all the points about different types of pointers, memory regions, etc. - which are valid counterarguments for totally arbitrary subtractions. But I don't don't grasp why such caveats should disallow (in fact: don't logically allow) arithmetic within the same (trivially copyable/standard layout) object.. Such an object must occupy a single, known region of storage. Just as is true for arrays.

Let me try again.

(0) [intro.memory]/1 "Every byte has a unique address."

(1) [intro.object]/5 "An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage."

(2) [basic.compound]/3 "If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained."

Not since P0137R1. But I'll assume that you're either working within C++14 as written (although no implementation conforms to that as written) or are inserting calls to std::launder where required.

(3) [basic.types]/9 "Scalar types, standard-layout class types (Clause 9), arrays of such types and cv-qualified versions of these types (3.9.3) are collectively called standard-layout types."

(4) [basic.align]/6 "Furthermore, the narrow character types (3.9.1) shall have the weakest alignment requirement."

(5) [conv.ptr]/2 "A prvalue of type “pointer to cv T”, where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The result of converting a non-null pointer value of a pointer to object type to a “pointer to cv void” represents the address of the same byte in memory as the original pointer value"

(6) [expr.static.cast]/13 "A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. The null pointer value is converted to the null pointer value of the destination type. If the original pointer value represents the address A of a byte in memory and A satisfies the alignment requirement of T, then the resulting pointer value represents the same address as the original pointer value, that is, A"

(7) [expr.reinterpret.cast]/7 "An object pointer can be explicitly converted to an object pointer of a different type. When a prvalue v of object pointer type is converted to the object pointer type “pointer to cv T”, the result is static_cast<cv T*>(static_cast<cv void*>(v))."

(8) [expr.sizeof] "The sizeof operator yields the number of bytes in the object representation of its operand. The operand is either an expression, which is an unevaluated operand (Clause 5), or a parenthesized type-id. The sizeof operator shall not be applied to an expression that has function or incomplete type, to the parenthesized name of such types, or to a glvalue that designates a bit-field. sizeof(char), sizeof(signed char) and sizeof(unsigned char) are 1."

Assume types A and C are trivially copyable or standard-layout types, type C has field a of type A, which is not a bit-field.

Per (5), (6) & (7), pac defined via

auto pac = reinterpret_cast<unsigned char(*)[sizeof (C)]>(&c);

is a pointer representing the address of the first byte of c.

Per (2), pc defined via

auto pc = &(*pac)[0]

is a pointer to the first unsigned char in the array pointed to by pac, and it represents the address of c.

pac does not point to an array of unsigned char (even assuming the use of std::launder) because such an array does not exist, unless it has previously been caused to exist.

Per (5), (6) & (7), pa defined via

auto pa = reinterpret_cast<unsigned char*>(&c.a);

is a pointer representing address X of the first byte of c.a.

Per (0) & (1), X is an address of a byte somewhere in the storage occupied by the array pointed to by pac. There must therefore be an unsigned char in the array pointed to by pac, with address X. Assume the latter is not true, then, per (8), there is exactly one byte per any unsigned character, therefore the byte at address X does not correspond to any byte representing any unsigned char in the array pointed by pac, which contradicts either (1) or (4). Therefore, an unsigned char in the array pointed to by pac exists at address X. Per (2), pa, representing address X, points to an unsigned char in the array pointed to by pac.

Therefore, pc and pa are both pointers to unsigned chars in the array pointed to by pac.

Per (2), we should achieve the same if pc is defined directly via

auto pc = reinterpret_cast<unsigned char*>(&c);

This would all be fine if such an array of unsigned char were guaranteed to exist. There is no such guarantee.

Viacheslav Usov

unread,

Aug 26, 2016, 9:24:47 AM8/26/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 26, 2016 at 3:07 PM, D. B. <db0...@gmail.com> wrote:

> So you think that doing &c.a - &c is defined behaviour, via these rules? I know you originally said this (but in much less detail :-) but got the impression from some of your later messages that you had changed your mind/interpretation.

Indeed I got carried away chasing my own far pointer red herring.

> Am I correct that you no longer think the 2nd paragraph is a barrier to this subtraction being defined, considering the steps you've shown?

The paragraph is partially correct, because formally you do get pointers to formally different arrays. What I was missing when I wrote that was (2) from my previous email, which is a very strong constraint on the weirdness pointers may have.

Cheers,

V.

Viacheslav Usov

unread,

Aug 26, 2016, 9:33:26 AM8/26/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 26, 2016 at 3:17 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> Not since P0137R1. But I'll assume that you're either working within C++14 as written (although no implementation conforms to that as written) or are inserting calls to std::launder where required.

I'll look into those details and will probably have something to say later.

> pac does not point to an array of unsigned char (even assuming the use of std::launder) because such an array does not exist, unless it has previously been caused to exist.

I do not understand what you mean by this. Or, more precisely, what you mean by "exists". We are talking about storage within its duration; alignment requirements for narrow chars are met per (4); an array of narrow chars has vacuous initialisation; per [basic.life]/1, the lifetime of an array of narrow chars within that storage has begun. How is that different from "exists"?

Cheers,

V.

Edward Catmur

unread,

Aug 26, 2016, 10:30:50 AM8/26/16

to std-pr...@isocpp.org

Until you've read P0137R1 (and, unless you're smarter than me, reread it multiple times) what I have to say here will be surprising, so please bear that in mind. Per [intro.object]/1 and the associated drafting note, an array of N unsigned char is *not* created when you obtain storage for an object type T, sizeof(T) == N. This qualifies [basic.life]/1, since if an object is not created, its lifetime cannot begin even though appropriate storage has been obtained.

I can't find the conversation where this was discussed, but the gist was that we need to preclude any possibility that merely obtaining storage (i.e. malloc) causes a panoply of trivial objects to flash into existence at that memory location, while still allowing the use of that storage as an lvalue (i.e. `*(int*)malloc(sizeof(int)) = 42`).

Viacheslav Usov

unread,

Aug 26, 2016, 12:49:58 PM8/26/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 26, 2016 at 4:30 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> Until you've read P0137R1 (and, unless you're smarter than me, reread it multiple times) what I have to say here will be surprising, so please bear that in mind. Per [intro.object]/1 and the associated drafting note, an array of N unsigned char is *not* created when you obtain storage for an object type T, sizeof(T) == N. This qualifies [basic.life]/1, since if an object is not created, its lifetime cannot begin even though appropriate storage has been obtained.

The current standard says there "An object is a region of storage", so creating an object means "creating a region of storage", it does not mean creating a "type", which suggests that the type is subject to interpretation of the storage. The interpretation is reinforced by [expr.reinterpret.cast]/11, which says: "A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type. [...] No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called."

Observe that with the current definition "object is storage", there is no contradiction, because we can meaningfully speak of "same object, different type". This is why pac in my previous messages exists (in the current standard) even though it was not created as an object of an array type.

P0137R1 changes that to "An object occupies a region of storage...", and it is not clear what "creates" means. We do not know what an object is. That's a pretty big defect, I'd say. This creates a problem with [expr.reinterpret.cast]/11 cited above, because it is impossible to say what it means and whether it really means anything.

Note finally that in C the notion of the object is consistent with the current C++:

3.15

1 object

region of data storage in the execution environment, the contents of which can represent values

2 NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

(end)

It says very clearly that the type is an interpretation (I wish we had the same clarity in C++). Given that standard-layout types are for compatibility with C and other languages, I do not see how P0137R1 could change that.

I would say P0137R1 is half-baked at this point, and we cannot use it to justify anything.

Cheers,

V.

D. B.

unread,

Aug 26, 2016, 12:57:21 PM8/26/16

to std-pr...@isocpp.org

I'll simply say thank you to both of you for continuing to ponder and debate this! It makes for intriguing reading about something I'm interested in, and raises some interesting questions about definitions and concepts in the Standard. Perhaps those questions can even lead to clearer Standardese some day, one way or another...

For now, I'll just take some time to process the last few messages and whatever you write next... at least until/if I can think of any useful input. :-)

Cheers!

Michał Dominiak

unread,

Aug 26, 2016, 1:28:11 PM8/26/16

to std-pr...@isocpp.org

On Fri, Aug 26, 2016 at 6:49 PM Viacheslav Usov <via....@gmail.com> wrote:

On Fri, Aug 26, 2016 at 4:30 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> Until you've read P0137R1 (and, unless you're smarter than me, reread it multiple times) what I have to say here will be surprising, so please bear that in mind. Per [intro.object]/1 and the associated drafting note, an array of N unsigned char is *not* created when you obtain storage for an object type T, sizeof(T) == N. This qualifies [basic.life]/1, since if an object is not created, its lifetime cannot begin even though appropriate storage has been obtained.

The current standard says there "An object is a region of storage", so creating an object means "creating a region of storage", it does not mean creating a "type", which suggests that the type is subject to interpretation of the storage. The interpretation is reinforced by [expr.reinterpret.cast]/11, which says: "A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type. [...] No temporary is created, no copy is made, and constructors (12.1) or conversion functions (12.3) are not called."

Observe that with the current definition "object is storage", there is no contradiction, because we can meaningfully speak of "same object, different type". This is why pac in my previous messages exists (in the current standard) even though it was not created as an object of an array type.

P0137R1 changes that to "An object occupies a region of storage...", and it is not clear what "creates" means. We do not know what an object is. That's a pretty big defect, I'd say. This creates a problem with [expr.reinterpret.cast]/11 cited above, because it is impossible to say what it means and whether it really means anything.

No, the change is (and this entire thing is the definition, not just the sentence that was there and is now in the middle of it):

An object is created by a definition (3.1), by a new-expression (5.3.4), when implicitly changing the active member of a union ([class.union]), or when a temporary object is created ([conv.rval], [class.temporary]). An object isobject occupies a region of storage in its period of construction (12.7 class.cdtor), throughout its lifetime (3.8 basic.life), and in its period of destruction (12.7 class.cdtor).

The intent here is to describe what an object is, instead of giving a straight definition, because I honestly can't say how to word this in a way that'd give you a straight definition and not a description of what are the properties of an object.

Furthermore, it clearly states that objects are created (that'd be the word using the English definition of it, I imagine) by a number of language constructs that are listed there, which defines what "create an object" means (it means "use one of those language constructs).

Also I don't believe there's any problem with the wording w.r.t. [expr.reinterpret.cast]/11. That paragraph keeps meaning the exact same thing as before. The only thing that changes is when the objects comes into existence and when it stops being an object.

Note finally that in C the notion of the object is consistent with the current C++:

3.15

1 object
region of data storage in the execution environment, the contents of which can represent values

2 NOTE When referenced, an object may be interpreted as having a particular type; see 6.3.2.1.

(end)

It says very clearly that the type is an interpretation (I wish we had the same clarity in C++). Given that standard-layout types are for compatibility with C and other languages, I do not see how P0137R1 could change that.

I do not believe there's any problem with that. The C++ standard will specify how it works on the C++ side, other sides are unaffected, though someone more versed with how a program with two different abstract machines is supposed to work (...and this would be the CWG, which has been reviewing this paper for what I believe is 3 last meetings).

I would say P0137R1 is half-baked at this point, and we cannot use it to justify anything.

Whether you consider it half-baked or not, it's a part of the working draft now, it's in the CD ballot, and then unless a NB comment removes it, it'll be in the next ballot and then in the international standard ISO 14882:2017 (...assuming we arrive on time...), so this is exactly the point where we start using it to explain how the language is going to work since C++17.

Cheers,
V.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAA7YVg0uSONt%2BMOoa4p933S-0H_vAvZTro00_ZHnr4ugZrsV4A%40mail.gmail.com.

Myriachan

unread,

Aug 26, 2016, 4:58:54 PM8/26/16

to ISO C++ Standard - Future Proposals

On Friday, August 26, 2016 at 6:18:01 AM UTC-7, Edward Catmur wrote:

On Fri, Aug 26, 2016 at 1:42 PM, Viacheslav Usov <via....@gmail.com> wrote:
On Sat, Aug 20, 2016 at 11:26 AM, D. B. <db0...@gmail.com> wrote:

> I get all the points about different types of pointers, memory regions, etc. - which are valid counterarguments for totally arbitrary subtractions. But I don't don't grasp why such caveats should disallow (in fact: don't logically allow) arithmetic within the same (trivially copyable/standard layout) object.. Such an object must occupy a single, known region of storage. Just as is true for arrays.

Let me try again.

(0) [intro.memory]/1 "Every byte has a unique address."

(1) [intro.object]/5 "An object of trivially copyable or standard-layout type (3.9) shall occupy contiguous bytes of storage."

(2) [basic.compound]/3 "If an object of type T is located at an address A, a pointer of type cv T* whose value is the address A is said to point to that object, regardless of how the value was obtained."

Not since P0137R1. But I'll assume that you're either working within C++14 as written (although no implementation conforms to that as written) or are inserting calls to std::launder where required.

P0137R1 horribly breaks the language in ways that will bite people later.

Melissa

Michał Dominiak

unread,

Aug 26, 2016, 5:24:40 PM8/26/16

to std-pr...@isocpp.org

Well then, please do write a paper about it, submit it to a mailing and get a NB to consider it as a NB commet for the current ballot. I'm assuming you're talking as someone having some examples of how it'll bite us, and not just saying so because of some sort of intuition? :)

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/73160ad3-ee21-473d-8362-f39203217f7f%40isocpp.org.

Viacheslav Usov

unread,

Aug 27, 2016, 7:42:49 AM8/27/16

to ISO C++ Standard - Future Proposals

On Fri, Aug 26, 2016 at 7:27 PM, Michał Dominiak <gri...@griwes.info> wrote:

> The intent here is to describe what an object is, instead of giving a straight definition, because I honestly can't say how to word this in a way that'd give you a straight definition and not a description of what are the properties of an object.

And, as I said, it fails to deliver on that intent. With the current "an object is a region of storage", we can tell that it is essentially the same definition as in C. We can infer that "type" is a just an interpretation, which can be changed without affecting other interpretations. With P0137R1, that is not clear. I do not think you said explicitly whether "type is just an interpretation, which can be changed without affecting other interpretations" is what you think we will have post-P0137R1, but it is evident just from this thread that there are people who think otherwise.

> Also I don't believe there's any problem with the wording w.r.t. [expr.reinterpret.cast]/11. That paragraph keeps meaning the exact same thing as before. The only thing that changes is when the objects comes into existence and when it stops being an object.

I hope that's what the authors of P0137R1 really have in mind. I hope that we will still have objects as merely a shorthand for "a region of storage", which we can interpret as any data type without having to "create" and "destroy" anything when the interpretation changes. Unfortunately, the message that P0137R1 sends to me (and other commenters) is different.

> I do not believe there's any problem with that. The C++ standard will specify how it works on the C++ side

If the eventual outcome of P0137R1 will be compatible (for standard-layout types) with "object is a region of storage" and "type is just an interpretation, which can be changed without affecting other interpretations", then this is all moot. If not, then we will have a major problem.

> Whether you consider it half-baked or not, it's a part of the working draft now, it's in the CD ballot, and then unless a NB comment removes it, it'll be in the next ballot and then in the international standard ISO 14882:2017 (...assuming we arrive on time...), so this is exactly the point where we start using it to explain how the language is going to work since C++17.

Oh, I am pretty sure we can end up with a defective international standard. We have seen that happen many, many times.

Cheers,

V.

Michał Dominiak

unread,

Aug 27, 2016, 8:30:28 AM8/27/16

to std-pr...@isocpp.org

On Sat, Aug 27, 2016 at 1:42 PM Viacheslav Usov <via....@gmail.com> wrote:

On Fri, Aug 26, 2016 at 7:27 PM, Michał Dominiak <gri...@griwes.info> wrote:

> The intent here is to describe what an object is, instead of giving a straight definition, because I honestly can't say how to word this in a way that'd give you a straight definition and not a description of what are the properties of an object.

And, as I said, it fails to deliver on that intent. With the current "an object is a region of storage", we can tell that it is essentially the same definition as in C. We can infer that "type" is a just an interpretation, which can be changed without affecting other interpretations. With P0137R1, that is not clear. I do not think you said explicitly whether "type is just an interpretation, which can be changed without affecting other interpretations" is what you think we will have post-P0137R1, but it is evident just from this thread that there are people who think otherwise.

> Also I don't believe there's any problem with the wording w.r.t. [expr.reinterpret.cast]/11. That paragraph keeps meaning the exact same thing as before. The only thing that changes is when the objects comes into existence and when it stops being an object.

I hope that's what the authors of P0137R1 really have in mind. I hope that we will still have objects as merely a shorthand for "a region of storage", which we can interpret as any data type without having to "create" and "destroy" anything when the interpretation changes. Unfortunately, the message that P0137R1 sends to me (and other commenters) is different.

The standard says this (and it has not been touched by P0137R1):

An object has a type ([basic.types]). The term object type refers to the type with which the object is created.

And immediately after (still not touched by P0137R1):

Some objects are polymorphic ([class.virtual]); the implementation generates information associated with each such object that makes it possible to determine that object's type during program execution. For other objects, the interpretation of the values found therein is determined by the type of the expressions (Clause [expr]) used to access them.

Please help me understand your paragraph starting with "I hope" in the context of this quote ([intro.object]/1). Seriously, this and your statements seem not to ccompute.

The way I'm reading this, P0137R1 doesn't change anything w.r.t. rules of interpreting the object types or their values, and essentially only redefines the lifetime of an object to start when it's created and end when it's destroyed, instead of being bound to the lifetime of its storage. I don't see any of the problems you're talking about after this change.

To reiterate: as far as I understand this, the changes of P0137R1 essentially makes the object-storage association slightly weaker; namely, instead of object=storage, we now have "object must have storage; storage may have an object".

Let's wait from someone from CWG to confirm this. (I'm somewhat tempted to CC Richard to this thread...).

> I do not believe there's any problem with that. The C++ standard will specify how it works on the C++ side

If the eventual outcome of P0137R1 will be compatible (for standard-layout types) with "object is a region of storage" and "type is just an interpretation, which can be changed without affecting other interpretations", then this is all moot. If not, then we will have a major problem.

> Whether you consider it half-baked or not, it's a part of the working draft now, it's in the CD ballot, and then unless a NB comment removes it, it'll be in the next ballot and then in the international standard ISO 14882:2017 (...assuming we arrive on time...), so this is exactly the point where we start using it to explain how the language is going to work since C++17.

Oh, I am pretty sure we can end up with a defective international standard. We have seen that happen many, many times.

As I said before: do you believe it's a defect? Do write a paper. If anybody on the committee considers it a defect too, there will be discussion, possibly NB comments, and possibly the lack of the defect in the standard.

Nothing will come from just writing on a mailing list.

Just keep in mind that, as far as I understand, removing this also means that `std::optional` and `std::variant` don't actually work for all the types you want them to work with.

Cheers,
V.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAA7YVg0Y3On8HE05W9XS_UtJSL4YV6pMWtPYsooa5%3DALwM-48g%40mail.gmail.com.

Viacheslav Usov

unread,

Aug 27, 2016, 10:09:49 AM8/27/16

to ISO C++ Standard - Future Proposals

On Sat, Aug 27, 2016 at 2:30 PM, Michał Dominiak <gri...@griwes.info> wrote:

> Please help me understand your paragraph starting with "I hope" in the context of this quote ([intro.object]/1). Seriously, this and your statements seem not to ccompute.

I do not see why. The language you quoted says, with more details and nuances, that "type is a just an interpretation, which can be changed without affecting other interpretations".

> The way I'm reading this, P0137R1 doesn't change anything w.r.t. rules of interpreting the object types or their values, and essentially only redefines the lifetime of an object to start when it's created and end when it's destroyed, instead of being bound to the lifetime of its storage. I don't see any of the problems you're talking about after this change.

We can have imaginary problems, too. Here is what Edward wrote: "Per [intro.object]/1 and the associated drafting note, an array of N unsigned char is *not* created when you obtain storage for an object type T, sizeof(T) == N." I then read the new language, as suggested, many times over, and it may be that I was trying just too hard. At some point I felt that, because in the new definition an object is not storage, some additional "create" event is required to reinterpret the storage. What would you say to this?

It may be that Edward says something that cannot really be said with the current or future standard terms, which is why we end up with those problems. Specifically, he says "array of N unsigned char is *not* created". But, indeed, it is not an array that is created, it is an object that is (or occupies) a region of storage. And being an array (or not) is still just an interpretation of that storage.

Cheers,

V.

Michał Dominiak

unread,

Aug 27, 2016, 10:26:38 AM8/27/16

to std-pr...@isocpp.org

On Sat, Aug 27, 2016 at 4:09 PM Viacheslav Usov <via....@gmail.com> wrote:

On Sat, Aug 27, 2016 at 2:30 PM, Michał Dominiak <gri...@griwes.info> wrote:

> Please help me understand your paragraph starting with "I hope" in the context of this quote ([intro.object]/1). Seriously, this and your statements seem not to ccompute.

I do not see why. The language you quoted says, with more details and nuances, that "type is a just an interpretation, which can be changed without affecting other interpretations".

> The way I'm reading this, P0137R1 doesn't change anything w.r.t. rules of interpreting the object types or their values, and essentially only redefines the lifetime of an object to start when it's created and end when it's destroyed, instead of being bound to the lifetime of its storage. I don't see any of the problems you're talking about after this change.

We can have imaginary problems, too. Here is what Edward wrote: "Per [intro.object]/1 and the associated drafting note, an array of N unsigned char is *not* created when you obtain storage for an object type T, sizeof(T) == N." I then read the new language, as suggested, many times over, and it may be that I was trying just too hard. At some point I felt that, because in the new definition an object is not storage, some additional "create" event is required to reinterpret the storage. What would you say to this?

I say "there's literally no reason to feel that". Please quote the parts of the new working draft that together lead you to that conclusion. So far there's only "a region of storage is not an object until it's created", and reinterpreting the object just needs an alive object. Nowhere in the normative text there's anything to imply that reinterpreting an object requires another "create" event. Nowhere. (Specifically becasue the text for reinterpreting objects does not change.)

It may be that Edward says something that cannot really be said with the current or future standard terms, which is why we end up with those problems. Specifically, he says "array of N unsigned char is *not* created". But, indeed, it is not an array that is created, it is an object that is (or occupies) a region of storage. And being an array (or not) is still just an interpretation of that storage.

The object that is being created is an object of the type that is created, not of other type (and if you feel like reading a tautology while reading this sentence, that feeling is correct, because it is a tautology). Obviously a char array isn't created. But this does not change the rule about using `char *` or `unsigned char *` obtained by reinterpreting the created object's address to examine the representation of the object.

Now, the thing that is somewhat problematic with P0137R1 is that before it, you could reinterpret a `char` array as a POD object without creating it; now, it's either 1) you cannot do that, because there was never any object at that storage in the first place, or 2) you can do that, because there was a char array that you've read data into, and you're just reinterpreting that as a POD. Frankly I'm not sure what the correct interpretation in this case is; could someone lend me their brain for a second to clear this up?

Cheers,
V.

--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/CAA7YVg26Zc7%3DET-UdBnAMvWq%3DC851dDsBajOChDJ%3DxM8ZuFSAw%40mail.gmail.com.

Viacheslav Usov

unread,

Aug 27, 2016, 11:42:22 AM8/27/16

to ISO C++ Standard - Future Proposals

On Sat, Aug 27, 2016 at 4:26 PM, Michał Dominiak <gri...@griwes.info> wrote:

> Now, the thing that is somewhat problematic with P0137R1 is that before it, you could reinterpret a `char` array as a POD object without creating it; now, it's either 1) you cannot do that, because there was never any object at that storage in the first place, or 2) you can do that, because there was a char array that you've read data into, and you're just reinterpreting that as a POD. Frankly I'm not sure what the correct interpretation in this case is; could someone lend me their brain for a second to clear this up?

I'd say you have the same problem that I "felt", it's just phrased differently. There is something in the new language that makes you feel this kind of uncertainty.

You say you can end up with a POD object without creating it. There is an object, so it was created, but with a different nominal object type. But the the nominal type is intrinsically significant only during construction/initialisation/destruction, during the object's lifetime it is the type of the expressions used to access it that is significant (neglecting the polymorphic twist here). So even if you did not create a POD object, there is an object that you can access as if it were POD. Which is what your interpretation (2) says, I believe.

Cheers,

V.

Kazutoshi Satoda

unread,

Aug 28, 2016, 6:34:49 AM8/28/16

to std-pr...@isocpp.org

On 2016/08/18 17:58 +0900, D. B. wrote:
> Anyhow, TL;DR, the questions:
>
> * Is this well-defined? Or am I relying on UB here?

I think you are relying on UB, unfortunately.

> * Can this be made well-defined?
...
> So there might be some momentum already for specifically defining this.
> I hope so! (Otherwise I'll have to radically rethink my design, as this
> would be the only piece of formal UB in my current project)

FYI, there is CWG issue #1701.
"Array vs sequence in object representation"
http://wg21.cmeerw.net/cwg/issue1701

Probably, we need a formal proposal (which shows what should be portable
and why) to proceed the issue.

--
k_satoda

D. B.

unread,

Aug 28, 2016, 6:59:20 AM8/28/16

to std-pr...@isocpp.org

On Sun, Aug 28, 2016 at 11:34 AM, Kazutoshi Satoda <k_sa...@f2.dion.ne.jp> wrote:

> * Can this be made well-defined?
...
> So there might be some momentum already for specifically defining this.
> I hope so! (Otherwise I'll have to radically rethink my design, as this
> would be the only piece of formal UB in my current project)

FYI, there is CWG issue #1701.
"Array vs sequence in object representation"
http://wg21.cmeerw.net/cwg/issue1701

Excellent, I hadn't found that - thank you. Here's an alternative link: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1701

This definitional distinction between array and sequence is the crux of my question. To me, if the object representation is a sequence of unsigned char, and an instance can be copied into and back from an array thereof and retain its value - there absolutely must be the same count, layout, and relative positioning of unsigned char within both the copied-to/-from array and the object's own nebulous sequence... so why not just define them as equivalent?

Calling the object representation an array of unsigned char would define the pointer arithmetic I want - namely that given a trivially copyable (or ideally standard-layout) object, we (and especially it) should be able to reason about the byte offsets of arbitrary members wrt each other or its base address - while (albeit I'm not an expert) seemingly not breaking anything else.

In fact - only strengthening other cases that I can see. In point: Although memcpy was not the motivating factor for me asking this, it is something I quickly identified as being an issue, unless we hand-wave it away with 'well, the implementation can do whatever it wants'. That to me is not a full answer, although it might serve as a superficial one.

Probably, we need a formal proposal (which shows what should be portable
and why) to proceed the issue.

Yeah. I might try to start writing one later... although I worry that more experienced users would do a better job, so I kinda hope they can chime in. :-)

There's also the issue of trying to get my motivating case down to a small enough sample to (A) be useful and not just confusing, and (B) not basically end up constituting the source of half of my current project! What I have is a side-effect of larger architectural choices that I still think are really clever, but which mean I'm working with objects made of UCs, not array of UCs. Refactoring it around the latter just to satisfy 'UB by omission' in the former case is not something I want to do if I can avoid it... when all the other parts of the Standard make it clear that any supported CPU already has the machinery to do what I need - and, of course, my current one happily does exactly what I expected, even at -O3.

So again, maybe other users are better equipped at present to show convincing use-cases, at least until I can distill and generalise only the pertinent aspects behind my question/preferred behaviour.

However, I would certainly want to contribute what I can, so I'll try to get together a sufficiently small and useful looking sample.

D. B.

unread,

Aug 30, 2016, 8:32:55 AM8/30/16

to std-pr...@isocpp.org

On Sat, Aug 27, 2016 at 1:30 PM, Michał Dominiak <gri...@griwes.info> wrote:

Let's wait from someone from CWG to confirm this. (I'm somewhat tempted to CC Richard to this thread...).

Feel free! It sure would be great if he could swing by, as the Editor. So if you happen to have his address handy... ;-)

I notice, for whatever little it might be worth, cppreference implies pointer arithmetic into the object representation as if it is an array is OK; in this example, they manually manipulate a byte that forms part of the object but not value representation, as a way to demonstrate the concept of the latter. I have no idea whether this is kosher, though.
http://en.cppreference.com/w/cpp/language/object#Object_representation_and_value_representation

...but it does indicate, at least, that whoever wrote that thinks what I originally thought, that treating the object representation "sequence of unsigned char[sizeof(T)]" as an array was OK.

The unclear relationship xor distinction between sequence and array here is the fundamental question for the CWG, as it would answer my more specific points by association. And as Kazutoshi Satoda mentioned, there's already an issue in drafting...
http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1701

...so maybe we can get somewhere with it.

I still can't tell whether P0137R1 really clarifies anything here... (i.e. whether placement newing into such an array, which would then have a parallel lifetime, entitles me to do said arithmetic within the limitations of the current vague sequence/array wording)

Kazutoshi Satoda

unread,

Aug 30, 2016, 2:00:44 PM8/30/16

to std-pr...@isocpp.org

On 2016/08/28 19:59 +0900, D. B. wrote:
> On Sun, Aug 28, 2016 at 11:34 AM, Kazutoshi Satoda <k_sa...@f2.dion.ne.jp> wrote:

...

>> FYI, there is CWG issue #1701.
>> "Array vs sequence in object representation"
>> http://wg21.cmeerw.net/cwg/issue1701

...
> This definitional distinction between *array* and *sequence* is the crux of
> my question. To me, if the object representation is a *sequence* of *unsigned
> char*, and an instance can be copied into and back from an *array* thereof

> and retain its value - there absolutely must be the same count, layout, and

> relative positioning of *unsigned char* within both the copied-to/-from
> *array* and the object's own nebulous *sequence*... so why not just define
> them as equivalent?

That will affect the validity of some currently possible optimizations.
(Though I don't know if this is the reason which CWG is worrying.)

Here is a small example.

void f0(int* p);
struct X {
int a;
int b;
};
int f1(void)
{
X x;
x.a = 128;
f0(&x.b);
return x.a;
}

If there is no valid way to obtain a pointer to x.a from &x.b, a
compiler can transform the last "return x.a" into "return 128" to
eliminate a load from memory.

But if the existence of x also implies the existence of an array object
of type unsigned char [sizeof(x)] at the address of x (and there are
intuitive pointer-interconvertible rules on the array elements), that
means f0() can do something like the following:

void f0(int* p)
{
// I know p must point X::b here.
X* px = (X*)((unsigned char*)p - offsetof(X, b));
++px->a;
}

Then the above optimization will be no longer valid.

--
k_satoda

D. B.

unread,

Aug 30, 2016, 3:04:44 PM8/30/16

to std-pr...@isocpp.org

Well, yeah, but isn't that illegal because of aliasing though, rather than pointer arithmetic specifically?

Anyway, the objects I work with contain only unsigned char and classes wrapping the same, so there's not an aliasing issue there: the object literally is just a bunch of unsigned char in a sequence, with no padding or etc, but officially I'm not allowed to perform offset calculations between different members as if they were an array.... rather than just a "bunch". :/

Kazutoshi Satoda

unread,

Aug 30, 2016, 3:57:45 PM8/30/16

to std-pr...@isocpp.org

On 2016/08/31 4:04 +0900, D. B. wrote:
> Well, yeah, but isn't that illegal because of aliasing though, rather than
> pointer arithmetic specifically?

I don't see aliasing problem in f0(). Accessing px->a is to an object
of int, through lvalue of int. If I'm missing something, please explain
which access has the problem.

--
k_satoda

D. B.

unread,

Aug 30, 2016, 4:10:49 PM8/30/16

to std-pr...@isocpp.org

Maybe aliasing isn't the right word here, but I thought what your example does to form the pointer is already UB or implementation-defined at best... due to the cast from the weakest aligned char type to an object type that might be misaligned, and the fact the compiler has no evidence of there being a valid int at the adjusted position.

Whatever the many problems probably are, certainly it's horrid code and not something I would ever write! ;-) To be clear, I don't want the 'object as array of char' guarantee so that I can do offset acrobatics to other types... not at all. I'm not trying to define that given int a, b, c; then one should be able to say a[2] and get c.

All I want to do is calculate offsets between objects that are already unsigned char, in contiguous sequences, but which formally I can't because they weren't declared as an array of unsigned char.

Kazutoshi Satoda

unread,

Aug 30, 2016, 4:37:24 PM8/30/16

to std-pr...@isocpp.org

On 2016/08/31 5:10 +0900, D. B. wrote:
> Maybe aliasing isn't the right word here, but I thought what your example
> does to form the pointer is already UB or implementation-defined at best...
> due to the cast from the weakest aligned char type to an object type that
> might be misaligned, and the fact the compiler has no evidence of there
> being a valid int at the adjusted position.

I don't see no alignment problem, provided that f0() is called with a
pointer argument which actually points a X::b, and that offsetof() does
the right job.

> All I want to do is calculate offsets between objects that are already

> unsigned char, in contiguous *sequences*, but which formally I can't
> because they weren't declared as an *array* of unsigned char.

Sorry but, I got lost. How do you have "sequences" (what?) which are
not declared as an array? Probably some example code (which includes
how the objects are created, and how offsets are calculated) will help.
Sorry if it is already shown.

--
k_satoda

Edward Catmur

unread,

Aug 31, 2016, 6:41:35 AM8/31/16

to std-pr...@isocpp.org

On Tue, Aug 30, 2016 at 9:10 PM, D. B. <db0...@gmail.com> wrote:

Maybe aliasing isn't the right word here, but I thought what your example does to form the pointer is already UB or implementation-defined at best... due to the cast from the weakest aligned char type to an object type that might be misaligned, and the fact the compiler has no evidence of there being a valid int at the adjusted position.

But it might not be misaligned, and there might be a valid int at that position, so (especially if the programmer uses std::launder) the compiler has to assume that the access could be valid. This unfortunately is one of those cases where the benefit of doubt falls on the side of the programmer.

Whatever the many problems probably are, certainly it's horrid code and not something I would ever write! ;-) To be clear, I don't want the 'object as array of char' guarantee so that I can do offset acrobatics to other types... not at all. I'm not trying to define that given int a, b, c; then one should be able to say a[2] and get c.

Of course what you want is perfectly legitimate! The problem is that defining storage 'array of char' provides more than you need, and enough for some other programmer to go off and do tricky things that break the balancing act between programmer and implementor.

All I want to do is calculate offsets between objects that are already unsigned char, in contiguous sequences, but which formally I can't because they weren't declared as an array of unsigned char.

I think we agreed that you can; you just have to use linear scan or binary search - it's only direct subtraction that is barred to you.

Viacheslav Usov

unread,

Aug 31, 2016, 7:04:58 AM8/31/16

to ISO C++ Standard - Future Proposals

On Wed, Aug 31, 2016 at 12:41 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> I think we agreed that you can; you just have to use linear scan or binary search - it's only direct subtraction that is barred to you.

Does that means you now disagree with the analysis I presented earlier even in the case of a strict C++14 conformance? Care to explain why?

Cheers,

V.

Edward Catmur

unread,

Aug 31, 2016, 9:16:36 AM8/31/16

to std-pr...@isocpp.org

Sorry, I can't find that analysis. At one point you wrote:

> What I think you can do is take the address of the complete object, cast it to pointer to array of unsigned chars, get the address of the first element of that array, then increment the resultant pointer until it becomes equal to the address of the member of interest (that is cast to pointer to unsigned char).

I agree with this analysis, except that I don't believe that you can "cast [the address of the complete object] to pointer to array of unsigned chars"; I believe that you can cast a pointer to the complete object (with type pointer to the type of the complete object) to type pointer to unsigned char and get a pointer (to the complete object) that can be used to traverse the sequence of unsigned char that constitutes the object's storage. This after all is what memcpy is required to do.

Viacheslav Usov

unread,

Aug 31, 2016, 11:05:12 AM8/31/16

to ISO C++ Standard - Future Proposals

On Wed, Aug 31, 2016 at 3:16 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> Sorry, I can't find that analysis.

https://groups.google.com/a/isocpp.org/d/msg/std-proposals/HaQO5Q_uiKg/6pgDHdjXEAAJ

> I don't believe that you can "cast [the address of the complete object] to pointer to array of unsigned chars"

Such a cast is essential to the analysis I referenced. Why do you not believe it is possible?

Cheers,

V.

Edward Catmur

unread,

Aug 31, 2016, 11:28:44 AM8/31/16

to std-pr...@isocpp.org

On Wed, Aug 31, 2016 at 4:05 PM, Viacheslav Usov <via....@gmail.com> wrote:

On Wed, Aug 31, 2016 at 3:16 PM, 'Edward Catmur' via ISO C++ Standard - Future Proposals <std-pr...@isocpp.org> wrote:

> Sorry, I can't find that analysis.

https://groups.google.com/a/isocpp.org/d/msg/std-proposals/HaQO5Q_uiKg/6pgDHdjXEAAJ

Oh, sorry, I understand what you're saying now. You're saying that a C++14-conformant implementation is obliged to regard the storage as containing (as well as the object) an array of unsigned char completely overlaying the object and its storage, and so pointer arithmetic is legitimate?

> I don't believe that you can "cast [the address of the complete object] to pointer to array of unsigned chars"

Such a cast is essential to the analysis I referenced. Why do you not believe it is possible?

I agree that if C++14 were to be implemented strictly in isolation such an implementation would have to behave as you have described. However I also believe that such is not intended, and that P0137R1 is clarifying (a defect in) the wording.

Chris Hallock

unread,

Aug 31, 2016, 1:21:26 PM8/31/16

to ISO C++ Standard - Future Proposals

C++ allows accessing an object as a sequence of unsigned char, but (apparently?) not an array of unsigned char. If that's true, then it follows that you can't use pointer arithmetic on an unsigned char * to obtain offsets into the object.

This can be worked around on platforms with flat memory models by emulating pointer arithmetic yourself:
1) reinterpret_cast the pointer to a large-enough integral type
2) offset the integer
3) reinterpret_cast back to unsigned char *

But this is awkward and implementation-dependent (the casts are implementation-defined). It would be better to just straight up allow pointer arithmetic on an unsigned char * within the object representation of an object.

D. B.

unread,

Aug 31, 2016, 1:27:29 PM8/31/16

to std-pr...@isocpp.org

Right, and my question - and, previously unknown to me, that of issue 1701 - boils down to (A) what precisely is the distinction between sequence and array and (B) can it be reduced/removed so that such arithmetic would be OK.

I don't want to introduce pessimisations or exploitations in the process, but by the same token, if the Standard really says my only defined way to do this is to take the 2 pointers and increment the base one until it's equal to the offset one, then that is... amusing at best! We know that for trivially copyable types, the other allowances in the standard say such arithmetic can only have 1 possible result - and that's the one that compilers implement - and yet it's not codified.

Viacheslav Usov

unread,

Aug 31, 2016, 3:05:07 PM8/31/16

to ISO C++ Standard - Future Proposals

On Wed, Aug 31, 2016 at 7:21 PM, Chris Hallock <christoph...@gmail.com> wrote:

> C++ allows accessing an object as a sequence of unsigned char, but (apparently?) not an array of unsigned char. If that's true, then it follows that you can't use pointer arithmetic on an unsigned char * to obtain offsets into the object.

No, that does not follow. We are not accessing the object at all.

Cheers,

V.

Chris Hallock

unread,

Sep 1, 2016, 1:59:23 PM9/1/16

to ISO C++ Standard - Future Proposals

> C++ allows accessing an object as a sequence of unsigned char, but (apparently?) not an array of unsigned char. If that's true, then it follows that you can't use pointer arithmetic on an unsigned char * to obtain offsets into the object.

No, that does not follow. We are not accessing the object at all.

True; s/accessing/treating/. To clarify, I'm referring to the definition of adding or subtracting an integer from a pointer, 5.7/4 (emphasis mine): "[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." In other words, pointer arithmetic (with integer operands greater than 1) can only be done on array objects.

Viacheslav Usov

unread,

Sep 2, 2016, 7:50:51 AM9/2/16

to ISO C++ Standard - Future Proposals

On Thu, Sep 1, 2016 at 7:59 PM, Chris Hallock <christoph...@gmail.com> wrote:

> True; s/accessing/treating/. To clarify, I'm referring to the definition of adding or subtracting an integer from a pointer, 5.7/4 (emphasis mine): "[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." In other words, pointer arithmetic (with integer operands greater than 1) can only be done on array objects.

[expr.reinterpret.cast]/11 "A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type".

Cheers,

V.

Chris Hallock

unread,

Sep 2, 2016, 1:30:57 PM9/2/16

to ISO C++ Standard - Future Proposals

> True; s/accessing/treating/. To clarify, I'm referring to the definition of adding or subtracting an integer from a pointer, 5.7/4 (emphasis mine): "[...] If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined." In other words, pointer arithmetic (with integer operands greater than 1) can only be done on array objects.

[expr.reinterpret.cast]/11 "A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_cast. The result refers to the same object as the source glvalue, but with the specified type".

Indeed, "the result refers to the same object". The object's type is unchanged. The "but with the specified type" clause refers to the reinterpret_cast expression. reinterpret_cast does not change the object's type, because an object's type is immutable. reinterpret_cast<ArrayType&>(classObject) still refers to a class object, not an array object, and it must be so because no array object at that location exists.

Reply all

Reply to author

Forward