A pointer to type X can point to an object of type Y (after P0137R1)

Kazutoshi Satoda

unread,

Aug 28, 2016, 5:58:56 AM8/28/16

to std-pr...@isocpp.org

Given this example code:

struct X {...};
struct Y {...};
X oX;
Y oY;
X* pX = reinterpret_cast<X*>(&oY);

P0137R1 (http://wg21.link/P0137R1) introduces possibilities where
"pX points to oY" by reinterpret_cast like the above. A similar
interpretation is shown in a recent discussion, and the author of
P0137R1 agreed on.
https://groups.google.com/a/isocpp.org/d/msg/std-discussion/XYvVlTc3-to/B9wdNcYzBAAJ
On 2016/08/18 3:44 +0900, Kazutoshi Satoda wrote:
> The result of reinterpret_cast<Base*>(&d), which is the value of pb, is
> defined as static_cast<Base*>(static_cast<void*>(&d)) (in 5.2.10 p7),
> and the both static_cast are defined as "the pointer value is unchanged
> by the conversion" (4.11 p2, 5.2.9 p13) then the result points to d as
> &d does.
https://groups.google.com/a/isocpp.org/d/msg/std-discussion/XYvVlTc3-to/N5R0qoU7BAAJ
On 2016/08/18 6:06 +0900, Richard Smith wrote:
> The Base and Derived objects are not pointer-interconvertible, because
> Derived is not a standard-layout class type. Therefore pb is a pointer
> of type Base* that points to the d object, not to its Base subobject.

Previous definition of "point to" in C++14 was:
> ... If an object of type T is located at an address A, a pointer of
> type cv T* whose value is the address A is said to point to that
> object, regardless of how the value was obtained. ...
Then, a pointer to type X could point to an object only of type X.

I think the situation "pX points to oY" causes some problems:

- The semantics of "pX + 1"

5.7 [expr.add] p4 says:
> When an expression that has integral type is added to or subtracted
> from a pointer, the result has the type of the pointer operand. If
> the expression P points to element x[i] of an array object x with n
> elements, the expressions P + J and J + P (where J has the value j)
> point to the (possibly-hypothetical) element
> x[i + j] if 0 <= i + j <= n; otherwise, the behavior is undefined.
> Likewise, the expression P - J points to the (possibly-hypothetical)
> element x[i − j] if 0 <= i − j <= n; otherwise, the behavior is
> undefined.

To fulfill this definition, pX + 1 must produce a pointer to type X
whose value is past the end of oY (note that a non-array object is
considered to be an array of 1 element here).
Producing such a value requires sizeof(Y) which compilers can't know
in general unless the type information is tracked at runtime.

Proposed resolution (edits by <ins> and <del>):
When an expression that has integral type is added to or subtracted
from a pointer <ins>to T</ins>, the result has the type of the
pointer operand. If the expression P points to element x[i] of an
array object x <del>with n elements</del><ins>of type T2[n] where
T2 is similar to T</ins>, ...

The resolution makes pX + 1 cause undefined behavior.

- The semantics of "pX->...", along with "*pX"

5.2.5 [expr.ref] p2 says:
> For the first option (dot) the first expression shall have complete
> class type. For the second option (arrow) the first expression shall
> have pointer to complete class type. The expression E1->E2 is
> converted to the equivalent form (*(E1)).E2; the remainder of 5.2.5
> will address only the first option (dot). In either case, the
> id-expression shall name a member of the class or of one of its base
> classes. ...
and bullet (4.2) in p4 says:
> - If E2 is a non-static data member and the type of E1 is
> "cq1 vq1 X", and the type of E2 is "cq2 vq2 T", the expression
> designates the named member of the object designated by the first
> expression.
As for (*(E1)), 5.3.1 [expr.unary.op] p1 says:
> The unary * operator performs indirection: the expression to which
> it is applied shall be a pointer to an object type, or a pointer to
> a function type and the result is an lvalue referring to the object
> or function to which the expression points. If the type of the
> expression is "pointer to T," the type of the result is "T."

*pX is an lvalue referring to oY, while the type of the expression
is X. Then E2 shall name a member of X, which might not name a
member of Y, that (for example) render the bullet (4.2) nonsense.

I think this adds another case to be handled by CWG issue #232.
"Is indirection through a null pointer undefined behavior?"
http://wg21.cmeerw.net/cwg/issue232
(which is in "drafting" status for 10 years...)

I suspect there are more problems like these.

I'll try to make these to be NB comments. Please correct me if the
analysis are wrong. Ideas about the resolution are also welcome.

--
k_satoda

Edward Catmur

unread,

Aug 29, 2016, 5:43:53 PM8/29/16

to ISO C++ Standard - Future Proposals

I think that [expr.add]/6 handles this, if a little clumsily:

> For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array

element type are not similar (4.4), the behavior is undefined.

It seems apparent that the "array element type" here is the runtime type Y whereas T is X.

[expr.static.cast] handles this better for derived-base conversions (p2 and p11); the use "actually" makes it clear that the runtime type is what matters.

- The semantics of "pX->...", along with "*pX"

5.2.5 [expr.ref] p2 says:
> For the first option (dot) the first expression shall have complete
> class type. For the second option (arrow) the first expression shall
> have pointer to complete class type. The expression E1->E2 is
> converted to the equivalent form (*(E1)).E2; the remainder of 5.2.5
> will address only the first option (dot). In either case, the
> id-expression shall name a member of the class or of one of its base
> classes. ...
and bullet (4.2) in p4 says:
> - If E2 is a non-static data member and the type of E1 is
> "cq1 vq1 X", and the type of E2 is "cq2 vq2 T", the expression
> designates the named member of the object designated by the first
> expression.
As for (*(E1)), 5.3.1 [expr.unary.op] p1 says:
> The unary * operator performs indirection: the expression to which
> it is applied shall be a pointer to an object type, or a pointer to
> a function type and the result is an lvalue referring to the object
> or function to which the expression points. If the type of the
> expression is "pointer to T," the type of the result is "T."

*pX is an lvalue referring to oY, while the type of the expression
is X. Then E2 shall name a member of X, which might not name a
member of Y, that (for example) render the bullet (4.2) nonsense.

Agreed, that's pretty nasty.

I think this adds another case to be handled by CWG issue #232.
"Is indirection through a null pointer undefined behavior?"
http://wg21.cmeerw.net/cwg/issue232
(which is in "drafting" status for 10 years...)

I suspect there are more problems like these.

[conv.lval]/1 would seem to be another one, also following [expr.unary.op]/1.

I'll try to make these to be NB comments. Please correct me if the
analysis are wrong. Ideas about the resolution are also welcome.

Tightening [expr.unary.op]/1 would probably deal with the majority of cases, but might be viewed as too stringent (it's the old chestnut of whether &*expr is equivalent to expr...) as with CWG 232.

Thanks for your work on this.

Kazutoshi Satoda

unread,

Aug 30, 2016, 1:04:54 PM8/30/16

to std-pr...@isocpp.org

On 2016/08/30 6:43 +0900, Edward Catmur wrote:
> On Sunday, 28 August 2016 10:58:56 UTC+1, Kazutoshi SATODA wrote:

...

>> I think the situation "pX points to oY" causes some problems:
>>
>> - The semantics of "pX + 1"

>> Proposed resolution (edits by <ins> and <del>):
>> When an expression that has integral type is added to or subtracted
>> from a pointer <ins>to T</ins>, the result has the type of the
>> pointer operand. If the expression P points to element x[i] of an
>> array object x <del>with n elements</del><ins>of type T2[n] where
>> T2 is similar to T</ins>, ...
>>
>> The resolution makes pX + 1 cause undefined behavior.
>
> I think that [expr.add]/6 handles this, if a little clumsily:
>
>> For addition or subtraction, if the expressions P or Q have type “pointer
>> to cv T”, where T and the array
>> element type are not similar (4.4), the behavior is undefined.
>
> It seems apparent that the "array element type" here is the runtime type Y
> whereas T is X.

Aha, I missed that paragraph, which was added by CWG #1504.
http://wg21.cmeerw.net/cwg/issue1504.
I'm happy to withdraw this one. Thank you very much.

>> - The semantics of "pX->...", along with "*pX"

...

> Tightening [expr.unary.op]/1 would probably deal with the majority of
> cases, but might be viewed as too stringent (it's the old chestnut of
> whether &*expr is equivalent to expr...) as with CWG 232.

As for &*expr, I'm wondering that just borrowing the C wording may be
feasible. It says (from WG14 N1570 6.5.3.2 p3):
> The unary & operator yields the address of its operand. If the operand
> has type "type", the result has type "pointer to type". If the operand
> is the result of a unary * operator, neither that operator nor the
> & operator is evaluated and the result is as if both were omitted,
> except that the constraints on the operators still apply and the result
> is not an lvalue.
... with some additional words to exclude overloaded cases, of course.

--
k_satoda

Reply all

Reply to author

Forward