offsetof for non-standard-layout classes

瀏覽次數:1,275 次
跳到第一則未讀訊息

Andrey Semashev

未讀,
2016年10月12日 上午10:18:282016/10/12
收件者:std-dis...@isocpp.org
Hi,

[support.types.layout]/1 in N4606 says that "Use of the offsetof macro
with a type other than a standard-layout class (Clause 9) is
conditionally-supported." What exactly does "conditionally-supported"
mean here? I can see that the relevant paragraph in N4140 said the
result is undefined in this case.

Furthermore, is there a reason why offsetof is not defined for all
classes without virtual inheritance? For example:

struct A
{
int a;
};

struct B : A
{
int b;
};

I can't see in this example why offsetof(B, b) cannot be fully defined.

Thiago Macieira

未讀,
2016年10月12日 上午11:02:512016/10/12
收件者:std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 17:18:25 CEST, Andrey Semashev
escreveu:
> Furthermore, is there a reason why offsetof is not defined for all
> classes without virtual inheritance? For example:
>
> struct A
> {
> int a;
> };
>
> struct B : A

Assuming you meant struct B : virtual A here, otherwise B is standard-layout.

> {
> int b;
> };
>
> I can't see in this example why offsetof(B, b) cannot be fully defined.

There was a discussion a while ago that this is fully defined, but it is
implementation-defined and depends on the ABI.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Andrey Semashev

未讀,
2016年10月12日 上午11:13:332016/10/12
收件者:std-dis...@isocpp.org
On 10/12/16 18:02, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 17:18:25 CEST, Andrey Semashev
> escreveu:
>> Furthermore, is there a reason why offsetof is not defined for all
>> classes without virtual inheritance? For example:
>>
>> struct A
>> {
>> int a;
>> };
>>
>> struct B : A
>
> Assuming you meant struct B : virtual A here,

No, no virtual here.

> otherwise B is standard-layout.

Is it? I thought [class]/7 required that all non-static data members of
a standard-layout class be defined in a single class - either A or B in
this example.

Anyway, here's another example:

struct C
{
virtual ~C();
};

struct D : C
{
int d;
};

Despite there are virtual functions, binary layout of D is still known
to the compiler, and there seems to be no reason why offsetof(D, d)
wouldn't work.

>> {
>> int b;
>> };
>>
>> I can't see in this example why offsetof(B, b) cannot be fully defined.
>
> There was a discussion a while ago that this is fully defined, but it is
> implementation-defined and depends on the ABI.

I would guess that the particular offsets are specific to the ABI, but
the general definition of offsetof seems quire portable to me. Why not
just spell that in the standard?

Thiago Macieira

未讀,
2016年10月12日 下午1:48:572016/10/12
收件者:std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 18:13:30 CEST, Andrey Semashev
escreveu:
> struct C
> {
> virtual ~C();
> };
>
> struct D : C
> {
> int d;
> };
>
> Despite there are virtual functions, binary layout of D is still known
> to the compiler, and there seems to be no reason why offsetof(D, d)
> wouldn't work.

Not really. It's possible for an ABI to have a variable start position of D's
contents and store extra information in hidden sections of C or D.

Andrey Semashev

未讀,
2016年10月12日 下午2:13:532016/10/12
收件者:std-dis...@isocpp.org
On 10/12/16 20:47, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 18:13:30 CEST, Andrey Semashev
> escreveu:
>> struct C
>> {
>> virtual ~C();
>> };
>>
>> struct D : C
>> {
>> int d;
>> };
>>
>> Despite there are virtual functions, binary layout of D is still known
>> to the compiler, and there seems to be no reason why offsetof(D, d)
>> wouldn't work.
>
> Not really. It's possible for an ABI to have a variable start position of D's
> contents and store extra information in hidden sections of C or D.

That would seem like a very strange and inefficient ABI to me, but ok, I
guess.

Does this apply to my initial example with A and B as well?

Thiago Macieira

未讀,
2016年10月12日 下午2:23:502016/10/12
收件者:std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 21:13:50 CEST, Andrey Semashev
escreveu:
> Does this apply to my initial example with A and B as well?

If the standard says that B isn't standard-layout, then the ABI is allowed to
perform the same trick.

I just think that the definition of what is stdlayout changed recently and does
catch B now.

Andrey Semashev

未讀,
2016年10月12日 下午2:26:282016/10/12
收件者:std-dis...@isocpp.org
On 10/12/16 21:23, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 21:13:50 CEST, Andrey Semashev
> escreveu:
>> Does this apply to my initial example with A and B as well?
>
> If the standard says that B isn't standard-layout, then the ABI is allowed to
> perform the same trick.
>
> I just think that the definition of what is stdlayout changed recently and does
> catch B now.

Could you point me to a proposal or a place in the standard draft about
this change?

Thiago Macieira

未讀,
2016年10月12日 下午2:33:042016/10/12
收件者:std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 21:26:25 CEST, Andrey Semashev
escreveu:
> Could you point me to a proposal or a place in the standard draft about
> this change?

It was discussed this week in the thread "Standard-layout class" on this
mailign list.

Nicol Bolas

未讀,
2016年10月12日 下午2:59:452016/10/12
收件者:ISO C++ Standard - Discussion


On Wednesday, October 12, 2016 at 2:33:04 PM UTC-4, Thiago Macieira wrote:
Em quarta-feira, 12 de outubro de 2016, às 21:26:25 CEST, Andrey Semashev
escreveu:
> Could you point me to a proposal or a place in the standard draft about
> this change?

It was discussed this week in the thread "Standard-layout class" on this
mailign list.

You may have misinterpreted that discussion. [class]/7.6 says:

> has all non-static data members and bit-fields in the class and its base classes first declared in the same class

Emphasis added. `B` does not have all its non-static data members "first declared in the same class". `B::b` is first declared in `B`, while `B::a` is first declared in `A`.

Andrey Semashev

未讀,
2016年10月12日 下午3:01:012016/10/12
收件者:std-dis...@isocpp.org
On 10/12/16 21:32, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 21:26:25 CEST, Andrey Semashev
> escreveu:
>> Could you point me to a proposal or a place in the standard draft about
>> this change?
>
> It was discussed this week in the thread "Standard-layout class" on this
> mailign list.

Umm, I don't think that discussion addresses my example. In my case A
and B don't derive from a same base class multiple times, which was
discussed in that thread.

Myriachan

未讀,
2016年10月12日 下午3:27:402016/10/12
收件者:ISO C++ Standard - Discussion

I had a thread some months ago where I showed in a sort of "proof" that even with the current Standard, non-standard-layout trivially-copyable types must in concept be compatible with offsetof().  If trivially-copyable types weren't compatible (in concept) with offsetof(), you could reach a contradiction with other parts of the Standard.  (Essentially, the fact that trivially-copyable types can be memcpy'd to an unsigned char array forces an implementation to use a fixed offset for each element.)

Personally, I think offsetof() should be allowed for any non-reference members or members' members... that are not within a virtual base class.  Whether you get undefined behavior when *using* that offset is a separate issue to me.

Convincing the Committee to change the rules is another matter entirely, though.

Melissa

Andrey Semashev

未讀,
2016年10月12日 下午4:15:222016/10/12
收件者:std-dis...@isocpp.org
On 10/12/16 22:27, Myriachan wrote:
> On Wednesday, October 12, 2016 at 12:01:01 PM UTC-7, Andrey Semashev wrote:
>
> On 10/12/16 21:32, Thiago Macieira wrote:
> > Em quarta-feira, 12 de outubro de 2016, às 21:26:25 CEST, Andrey
> Semashev
> > escreveu:
> >> Could you point me to a proposal or a place in the standard draft
> about
> >> this change?
> >
> > It was discussed this week in the thread "Standard-layout class"
> on this
> > mailign list.
>
> Umm, I don't think that discussion addresses my example. In my case A
> and B don't derive from a same base class multiple times, which was
> discussed in that thread.
>
> I had a thread some months ago where I showed in a sort of "proof" that
> even with the current Standard, non-standard-layout trivially-copyable
> types must in concept be compatible with offsetof(). If
> trivially-copyable types weren't compatible (in concept) with
> offsetof(), you could reach a contradiction with other parts of the
> Standard. (Essentially, the fact that trivially-copyable types can be
> memcpy'd to an unsigned char array forces an implementation to use a
> fixed offset for each element.)

Come to think of it, yes, trivial copyability makes a lot of sense as
the criteria of support for offsetof. I think having the standard say
this would make offsetof much more useful.

> Personally, I think offsetof() should be allowed for any non-reference
> members or members' members... that are not within a virtual base
> class. Whether you get undefined behavior when *using* that offset is a
> separate issue to me.

In practice, I think so as well. Even in the presence of virtual
functions I don't think any sane implementation would add hidden offest
fields like Thiago suggested. But I can accept this limitation.

> Convincing the Committee to change the rules is another matter entirely,
> though.

Interesting. Was there a discussion? Any active resistance to the idea?

I'd like this issue to be resolved but I'm not sure if this can be
cosidered a defect or a full proposal is needed. The problem is that in
the latter case I'll be unable to present the proposal to the committee.

Thiago Macieira

未讀,
2016年10月12日 下午4:33:542016/10/12
收件者:std-dis...@isocpp.org
Em quarta-feira, 12 de outubro de 2016, às 23:15:18 CEST, Andrey Semashev
escreveu:
> > Personally, I think offsetof() should be allowed for any non-reference
> > members or members' members... that are not within a virtual base
> > class. Whether you get undefined behavior when *using* that offset is a
> > separate issue to me.
>
> In practice, I think so as well. Even in the presence of virtual
> functions I don't think any sane implementation would add hidden offest
> fields like Thiago suggested. But I can accept this limitation.

Virtual inheritance implies that the class is not trivially copyable, so we
can't use that criterion.

Also, we may be able to get offsetof(B, b) below for the current set of ABIs I
know of, but not for offsetof(B, a):

struct A { int a; };
struct B : virtual A
{
int b;
};

But given that the class is not a trivially copyable, the memcpy reason does
not apply -- you simply can't memcpy it. The compiler-generated copy
constructor B::B may be modifying the hidden members when a copy happens. So
it's possible that offsetof(B, b) may not be knowable either.

Andrey Semashev

未讀,
2016年10月12日 下午4:46:322016/10/12
收件者:std-dis...@isocpp.org
On 10/12/16 23:33, Thiago Macieira wrote:
> Em quarta-feira, 12 de outubro de 2016, às 23:15:18 CEST, Andrey Semashev
> escreveu:
>>> Personally, I think offsetof() should be allowed for any non-reference
>>> members or members' members... that are not within a virtual base
>>> class. Whether you get undefined behavior when *using* that offset is a
>>> separate issue to me.
>>
>> In practice, I think so as well. Even in the presence of virtual
>> functions I don't think any sane implementation would add hidden offest
>> fields like Thiago suggested. But I can accept this limitation.
>
> Virtual inheritance implies that the class is not trivially copyable, so we
> can't use that criterion.

Although I would expect offsetof(B, b) for your example to give
meaningful results, I can accept that the implementation might do
something fancy just like when virtual functions are present. Having
offsetof support just trivially copyable types would already be a big
step forward.

Andrey Semashev

未讀,
2016年10月12日 下午5:11:222016/10/12
收件者:std-dis...@isocpp.org
Although on the other hand, presence of a user-defined copy constructor
is not the proper reason to prohibit offsetof - the constructor does not
affect binary layout at all.

I propose the following set of restrictions for a type X that is
supported by offsetof(X, m):

- X must not have virtual base classes
- X must not have no virtual functions
- m must not identify a reference member or a bitfield
- if m identifies a nested member, the above restructions apply to all
types and members involved in identifying the final member

For example:

int g = 5;

struct A
{
int a;
};

struct AA
{
int aa;
};

struct C : virtual A
{
int c;
};

struct B : A, AA
{
int b;
int bitfield : 10;

struct Nested
{
std::string s;
int& x = g;
}
n;

C c;
};

struct D : A
{
int d;

virtual ~D();
};

// well-defined:
offsetof(A, a);
offsetof(B, a);
offsetof(B, aa);
offsetof(B, b);
offsetof(B, n);
offsetof(B, n.s);
offsetof(B, c);

// ill-formed:
offsetof(B, bitfield);
offsetof(B, n.x);
offsetof(B, c.a);
offsetof(B, c.c);
offsetof(C, a);
offsetof(C, c);
offsetof(D, a);
offsetof(D, d);

Since offsetof is a constant expression I would prefer the program to be
ill-formed if those conditions are violated. The current behavior is
undefined, which I guess is for compatibility with C. If compatibility
is considered too important, I guess, we could leave the behavior undefined.

Myriachan

未讀,
2016年10月12日 下午6:58:282016/10/12
收件者:ISO C++ Standard - Discussion
On Wednesday, October 12, 2016 at 2:11:22 PM UTC-7, Andrey Semashev wrote:

Since offsetof is a constant expression I would prefer the program to be
ill-formed if those conditions are violated. The current behavior is
undefined, which I guess is for compatibility with C. If compatibility
is considered too important, I guess, we could leave the behavior undefined.


I think that it'd be better as undefined.  On the majority of current implementations--all that I can think of, in fact--offsetof() will work properly on classes that have virtual functions but not virtual base classes.  Some implementations will throw a warning if you try (e.g. clang) but that's still fine.

A secondary reason for leaving the behavior undefined is that offsetof is often implemented as a simple macro that uses pointer arithmetic.  For example, the Visual C++ implementation:

#define offsetof(s,m) ((size_t)&reinterpret_cast<char const volatile&>((((s*)0)->m)))

(Such pointer arithmetic involving nullptr is undefined behavior, but if the compiler defines that as working, it's fine for compiler/library internals to use it.)

It would be complicated or impossible to have such a macro consider such an expression ill-formed.

Current C (and C++) rules consider it undefined behavior to use nested members or array elements, but barring warnings, it does work on the major implementations.

Melissa

Richard Smith

未讀,
2016年10月12日 晚上7:24:002016/10/12
收件者:std-dis...@isocpp.org
On Wed, Oct 12, 2016 at 2:11 PM, Andrey Semashev <andrey....@gmail.com> wrote:
On 10/12/16 23:46, Andrey Semashev wrote:
On 10/12/16 23:33, Thiago Macieira wrote:
Em quarta-feira, 12 de outubro de 2016, às 23:15:18 CEST, Andrey Semashev
escreveu:
Personally, I think offsetof() should be allowed for any non-reference
members or members' members... that are not within a virtual base
class.  Whether you get undefined behavior when *using* that offset
is a
separate issue to me.

In practice, I think so as well. Even in the presence of virtual
functions I don't think any sane implementation would add hidden offest
fields like Thiago suggested. But I can accept this limitation.

Virtual inheritance implies that the class is not trivially copyable,
so we
can't use that criterion.

Although I would expect offsetof(B, b) for your example to give
meaningful results, I can accept that the implementation might do
something fancy just like when virtual functions are present. Having
offsetof support just trivially copyable types would already be a big
step forward.

Although on the other hand, presence of a user-defined copy constructor is not the proper reason to prohibit offsetof - the constructor does not affect binary layout at all.

Can you provide standard wording justifying that claim? If a type is neither standard layout nor trivially copyable, I cannot off-hand think of any rule in the standard that requires the data members to even be stored within the object. In some cases it might be reasonable for a C++ implementation to do interesting things with such types, such as putting (some of) the members in a separate garbage-collected heap, or automatically applying the pimpl idiom for ABI stability, or adding or rearranging the members to support some kind of runtime instrumentation.
That's not true. Use on non-standard-layout types is conditionally-supported, which means:
1) An implementation must document which types are supported.
2) If a type is supported, the behavior must be as documented in the standard -- you get the offset of the member.
3) If a type is not supported, the program is ill-formed.
 
If compatibility is considered too important, I guess, we could leave the behavior undefined.


--

--- You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussion+unsubscribe@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.

Richard Smith

未讀,
2016年10月12日 晚上7:27:052016/10/12
收件者:std-dis...@isocpp.org
See also http://cplusplus.github.io/LWG/lwg-defects.html#2709 which changed the rule from "undefined" to "conditionally-supported" for more background.

Thiago Macieira

未讀,
2016年10月13日 凌晨3:57:412016/10/13
收件者:std-dis...@isocpp.org
Em quinta-feira, 13 de outubro de 2016, às 00:11:18 CEST, Andrey Semashev
escreveu:
> > Although I would expect offsetof(B, b) for your example to give
> > meaningful results, I can accept that the implementation might do
> > something fancy just like when virtual functions are present. Having
> > offsetof support just trivially copyable types would already be a big
> > step forward.
>
> Although on the other hand, presence of a user-defined copy constructor
> is not the proper reason to prohibit offsetof - the constructor does not
> affect binary layout at all.

It may, because it adds the opportunity for the compiler to insert some code
to do things differently than memcpy could. The presence or absence of trivial
copy constructors and/or destructors does influence how parameters are passed
in function calls, after all.

I don't know any ABI that does this. It would take a specially sadistic ABI
developer :-)
Sorry, I disagree. Since C is neither trivially copyable, trivially
destructible nor standard-layout, the same applies to B. Any liberties that
the compiler may have with C also apply to B because of that. B may not have a
vtable in the way the current ABIs think of it, but it may still have hidden
members because of C's presence.

That looks logical, but it does take away some liberty.

> Since offsetof is a constant expression I would prefer the program to be
> ill-formed if those conditions are violated. The current behavior is
> undefined, which I guess is for compatibility with C. If compatibility
> is considered too important, I guess, we could leave the behavior undefined.


Andrey Semashev

未讀,
2016年10月13日 清晨5:29:302016/10/13
收件者:std-dis...@isocpp.org
On 10/13/16 10:57, Thiago Macieira wrote:
> Em quinta-feira, 13 de outubro de 2016, às 00:11:18 CEST, Andrey Semashev
> escreveu:
>>> Although I would expect offsetof(B, b) for your example to give
>>> meaningful results, I can accept that the implementation might do
>>> something fancy just like when virtual functions are present. Having
>>> offsetof support just trivially copyable types would already be a big
>>> step forward.
>>
>> Although on the other hand, presence of a user-defined copy constructor
>> is not the proper reason to prohibit offsetof - the constructor does not
>> affect binary layout at all.
>
> It may, because it adds the opportunity for the compiler to insert some code
> to do things differently than memcpy could.

Technically, yes, the current standard gives much freedom to the
implementation wrt. classes with constructors. And although come magic
can be added to such classes, I don't think it should affect binary layout.

> The presence or absence of trivial
> copy constructors and/or destructors does influence how parameters are passed
> in function calls, after all.

Does it? How so?

> I don't know any ABI that does this. It would take a specially sadistic ABI
> developer :-)

Right.
I don't think that should be allowed or at all needed. All compiler
magic should be confined to the B::c member, and indeed when its member
functions are called, they only have access to that member (i.e. the
amount of storage of B that is used for c).

C is not standard-layout for multiple reasons, but the point I'm making
in the above examples is that even if a type uses compiler magic like
virtual base classes, its size and alignment are still known, and this
allows offsetof in the containing class to operate normally.

> That looks logical, but it does take away some liberty.

Liberty should bring some potential benefits, otherwise it impedes
progress in other areas for no good reason. I don't see benefits from
having a hidden offset member in a class.

Andrey Semashev

未讀,
2016年10月13日 清晨5:39:502016/10/13
收件者:std-dis...@isocpp.org
On 10/13/16 01:58, Myriachan wrote:
> On Wednesday, October 12, 2016 at 2:11:22 PM UTC-7, Andrey Semashev wrote:
>
>
> Since offsetof is a constant expression I would prefer the program
> to be
> ill-formed if those conditions are violated. The current behavior is
> undefined, which I guess is for compatibility with C. If compatibility
> is considered too important, I guess, we could leave the behavior
> undefined.
>
>
> I think that it'd be better as undefined. On the majority of current
> implementations--all that I can think of, in fact--offsetof() will work
> properly on classes that have virtual functions but not virtual base
> classes. Some implementations will throw a warning if you try (e.g.
> clang) but that's still fine.

Well, my aim is to convert that "unofficially works" status to "standard
behavior", at least in the obvious cases. Because otherwise for portable
code it's as good as "doesn't work".

> A secondary reason for leaving the behavior undefined is that offsetof
> is often implemented as a simple macro that uses pointer arithmetic.
> For example, the Visual C++ implementation:
>
> #define offsetof(s,m) ((size_t)&reinterpret_cast<char const
> volatile&>((((s*)0)->m)))
>
> (Such pointer arithmetic involving nullptr is undefined behavior, but if
> the compiler defines that as working, it's fine for compiler/library
> internals to use it.)
>
> It would be complicated or impossible to have such a macro consider such
> an expression ill-formed.
>
> Current C (and C++) rules consider it undefined behavior to use nested
> members or array elements, but barring warnings, it does work on the
> major implementations.

Ok, I suppose, to allow the naive implementation we can leave the
behavior undefined for cases not standardized.

Andrey Semashev

未讀,
2016年10月13日 清晨6:14:102016/10/13
收件者:std-dis...@isocpp.org
On 10/13/16 02:23, Richard Smith wrote:
> On Wed, Oct 12, 2016 at 2:11 PM, Andrey Semashev
> <andrey....@gmail.com <mailto:andrey....@gmail.com>> wrote:
>
> Although on the other hand, presence of a user-defined copy
> constructor is not the proper reason to prohibit offsetof - the
> constructor does not affect binary layout at all.
>
> Can you provide standard wording justifying that claim? If a type is
> neither standard layout nor trivially copyable, I cannot off-hand think
> of any rule in the standard that requires the data members to even be
> stored within the object. In some cases it might be reasonable for a C++
> implementation to do interesting things with such types, such as putting
> (some of) the members in a separate garbage-collected heap, or
> automatically applying the pimpl idiom for ABI stability, or adding or
> rearranging the members to support some kind of runtime instrumentation.

Off the top of my head I can remember the standard requires non-static
data members to have increasing addresses within the class and the same
access mode region. Strictly speaking, that does not preclude the
compiler from doing things you describe but it surely adds some
restrictions. (BTW: garbage collection is not allowed because the
members have to be destroyed in a particular order on the object
destruction.)

> Since offsetof is a constant expression I would prefer the program
> to be ill-formed if those conditions are violated. The current
> behavior is undefined, which I guess is for compatibility with C.
>
> That's not true. Use on non-standard-layout types is
> conditionally-supported, which means:
> 1) An implementation must document which types are supported.
> 2) If a type is supported, the behavior must be as documented in the
> standard -- you get the offset of the member.
> 3) If a type is not supported, the program is ill-formed.

In that case I don't see how "conditionally-supported" differs from
"implementation-specific" for non-standard-layout types. Imprecise
wording perhaps?

Anyway, that doesn't really change my point. I think the standard should
define offsetof for more types than it currently does.

Andrey Semashev

未讀,
2016年10月13日 清晨6:18:362016/10/13
收件者:std-dis...@isocpp.org
On 10/13/16 02:27, Richard Smith wrote:
>
> See also http://cplusplus.github.io/LWG/lwg-defects.html#2709 which
> changed the rule from "undefined" to "conditionally-supported" for more
> background.

Thanks for the link. The DR even mentions that the behavior should be
"implementation-defined" rather than "undefined", but somehow
"conditionally-supported" got into the final wording.

Thiago Macieira

未讀,
2016年10月13日 清晨7:28:042016/10/13
收件者:std-dis...@isocpp.org、Andrey Semashev
Em quinta-feira, 13 de outubro de 2016, às 12:29:26 CEST, Andrey Semashev
escreveu:
> > The presence or absence of trivial
> > copy constructors and/or destructors does influence how parameters are
> > passed in function calls, after all.
>
> Does it? How so?

It changes whether it's passed in registers or by implicit reference.

Thiago Macieira

未讀,
2016年10月13日 清晨7:30:552016/10/13
收件者:std-dis...@isocpp.org
Em quinta-feira, 13 de outubro de 2016, às 13:14:06 CEST, Andrey Semashev
escreveu:
> Anyway, that doesn't really change my point. I think the standard should
> define offsetof for more types than it currently does.

We haven't discussed why yet. In which cases is this going to be useful?

Andrey Semashev

未讀,
2016年10月13日 上午8:31:122016/10/13
收件者:std-dis...@isocpp.org
On 10/13/16 14:30, Thiago Macieira wrote:
> Em quinta-feira, 13 de outubro de 2016, às 13:14:06 CEST, Andrey Semashev
> escreveu:
>> Anyway, that doesn't really change my point. I think the standard should
>> define offsetof for more types than it currently does.
>
> We haven't discussed why yet. In which cases is this going to be useful?

My initial example with A and B is a simplified version of a real world
code that I have and want to use offsetof with it. I'll reproduce the
example here, for convenience:

struct A
{
int a;
};

struct B : A
{
int b;
};

Now imagine there are multiple Bs, all derive from A. All these types
are trivially copyable. I want to pass instances of Bs between processes
on the local machine, which I achieve by memcpying them to/from shared
memory.

Occasionally I want to only access one member of these structs (say,
B::a), and I don't want to copy the whole struct from the shared memory.
That's where offsetof comes into play. If I know the offset and the size
of the member in the given struct, I can extract it from the byte
sequence representing that struct.

For that purpose having offsetof support trivially copyable types would
be enough, but we all understand that most implementations go beyond that.

How about the following approach. Make offsetof support certain types
optionally and indicate the support with a feature macro. For an
offsetof(X, m) expression, the possible X types would be divided into
the following groups:

1. Baseline: trivially-copyable types. These types will be supported by
any offsetof implementation (i.e. this relaxes the current restriction
on only the standard-layout types). No feature test macro needed.

2. Non-trivially-copyable types without virtual functions or base
classes. If offsetof supports these types, define
__cpp_offsetof_non_trivially_copyable to a value greater than 0.

3. Classes with virtual functions but not virtual base classes. If
supported, define __cpp_offsetof_virtual_funcs to a value greater than 0.

4. Classes with virtual base classes but not virtual functions. If the
implementation supports offsetof(X, m) where m does not identify a
member of a virtual base, define __cpp_offsetof_virtual_base to 1. If
the implementation supports m to be a member in a virtual base, define
__cpp_offsetof_virtual_base to 2.

5. Classes with both virtual base classes and functions. Similar to 4,
only the feature test macro is __cpp_offsetof_virtuals.

Also, the restrictions on the membe identifier that I proposed earlier
still apply:

- m must not identify a reference member or a bitfield
- if m identifies a nested member, the effective category of X is the
max category of every type involved in identifying the final member
- also, if m identifies a nested member, it must not involve
dereferencing pointers or references

Other than the above, the behavior of offsetof is implementation-defined.

A note about virtual base classes. I believe, the commonly used
implementation of virtual base classes allows offsetof to work correctly
in compile time provided that the compiler knows the final static type
of the member. This is a typical case when the data member is stored by
value.

Richard Smith

未讀,
2016年10月13日 下午1:43:412016/10/13
收件者:std-dis...@isocpp.org
On 13 Oct 2016 3:14 am, "Andrey Semashev" <andrey....@gmail.com> wrote:
On 10/13/16 02:23, Richard Smith wrote:
On Wed, Oct 12, 2016 at 2:11 PM, Andrey Semashev
<andrey....@gmail.com <mailto:andrey.semashev@gmail.com>> wrote:

    Although on the other hand, presence of a user-defined copy
    constructor is not the proper reason to prohibit offsetof - the
    constructor does not affect binary layout at all.

Can you provide standard wording justifying that claim? If a type is
neither standard layout nor trivially copyable, I cannot off-hand think
of any rule in the standard that requires the data members to even be
stored within the object. In some cases it might be reasonable for a C++
implementation to do interesting things with such types, such as putting
(some of) the members in a separate garbage-collected heap, or
automatically applying the pimpl idiom for ABI stability, or adding or
rearranging the members to support some kind of runtime instrumentation.

Off the top of my head I can remember the standard requires non-static data members to have increasing addresses within the class and the same access mode region. Strictly speaking, that does not preclude the compiler from doing things you describe but it surely adds some restrictions. (BTW: garbage collection is not allowed because the members have to be destroyed in a particular order on the object destruction.)

The point of putting members on a GC heap is to allow them to be found as GC roots, not necessarily to allow them to be collected.

    Since offsetof is a constant expression I would prefer the program
    to be ill-formed if those conditions are violated. The current
    behavior is undefined, which I guess is for compatibility with C.

That's not true. Use on non-standard-layout types is
conditionally-supported, which means:
1) An implementation must document which types are supported.
2) If a type is supported, the behavior must be as documented in the
standard -- you get the offset of the member.
3) If a type is not supported, the program is ill-formed.

In that case I don't see how "conditionally-supported" differs from "implementation-specific" for non-standard-layout types. Imprecise wording perhaps?

Then you should look up the definition of "conditionally-supported" in the standard :)

Anyway, that doesn't really change my point. I think the standard should define offsetof for more types than it currently does.

I think it would make sense to also require it to be guaranteed to be available for trivially copyable types.

Myriachan

未讀,
2016年10月14日 下午4:00:422016/10/14
收件者:ISO C++ Standard - Discussion
On Thursday, October 13, 2016 at 10:43:41 AM UTC-7, Richard Smith wrote:
On 13 Oct 2016 3:14 am, "Andrey Semashev" <andrey....@gmail.com> wrote:
On 10/13/16 02:23, Richard Smith wrote:
On Wed, Oct 12, 2016 at 2:11 PM, Andrey Semashev
<andrey....@gmail.com <mailto:andrey....@gmail.com>> wrote:

    Although on the other hand, presence of a user-defined copy
    constructor is not the proper reason to prohibit offsetof - the
    constructor does not affect binary layout at all.

Can you provide standard wording justifying that claim? If a type is
neither standard layout nor trivially copyable, I cannot off-hand think
of any rule in the standard that requires the data members to even be
stored within the object. In some cases it might be reasonable for a C++
implementation to do interesting things with such types, such as putting
(some of) the members in a separate garbage-collected heap, or
automatically applying the pimpl idiom for ABI stability, or adding or
rearranging the members to support some kind of runtime instrumentation.

Off the top of my head I can remember the standard requires non-static data members to have increasing addresses within the class and the same access mode region. Strictly speaking, that does not preclude the compiler from doing things you describe but it surely adds some restrictions. (BTW: garbage collection is not allowed because the members have to be destroyed in a particular order on the object destruction.)

The point of putting members on a GC heap is to allow them to be found as GC roots, not necessarily to allow them to be collected.


An implementation in which members (that aren't in virtual bases) are outside the class's byte range may be currently allowed, but in many aspects is absurd.  It means things like the size of the class could be less than the sum of the size of its members.  It also means that using placement new could result in calling malloc, which is often what calling placement new is intending to avoid.

If an implementation wanted to put members into a garbage collection root, it could just register them in a list somewhere pointing within the class.  The members don't actually need to be outside the class.  Since any defined copy or move operation would have to go through a constructor and/or destructor, this works.
 
    Since offsetof is a constant expression I would prefer the program
    to be ill-formed if those conditions are violated. The current
    behavior is undefined, which I guess is for compatibility with C.

That's not true. Use on non-standard-layout types is
conditionally-supported, which means:
1) An implementation must document which types are supported.
2) If a type is supported, the behavior must be as documented in the
standard -- you get the offset of the member.
3) If a type is not supported, the program is ill-formed.

In that case I don't see how "conditionally-supported" differs from "implementation-specific" for non-standard-layout types. Imprecise wording perhaps?

Then you should look up the definition of "conditionally-supported" in the standard :)
 

One thing that concerns me about defect report 2709 is that "conditionally supported" seems to imply that using offsetof in unsupported situations is ill-formed, when it should probably be undefined behavior.  Otherwise, to conform to the Standard, every compiler would have to implement offsetof using a keyword, rather than the simple reinterpret_cast maze some implementations use (e.g. Visual C++).

Relevant Standard quote:
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in
this Standard as “conditionally-supported” when the implementation does not support that construct,
a conforming implementation shall issue at least one diagnostic message.

 
Anyway, that doesn't really change my point. I think the standard should define offsetof for more types than it currently does.

I think it would make sense to also require it to be guaranteed to be available for trivially copyable types.


Yes, but because members of reference type do not break being trivially-copyable, a statement about the behavior being undefined for reference types would be needed.

What do you think about this wording?

The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard.  Use of the offsetof macro with a type other than a standard-layout class or trivially copyable class (Clause 9) is conditionally-supported; if unsupported, the result is undefined.  The expression offsetof(type, member-designator) is never type-dependent (14.6.2.2) and it is value-dependent (14.6.2.3) if and only if type is dependent.  The result of applying the offsetof macro to a static data member or a function member, or a member-designator containing a member of reference type, is undefined.

Changes:
- Allows trivially-copyable types.
- If an implementation doesn't support some use case, the behavior is undefined.  Some existing implementations will simply explode if you try offsetof on e.g. members of virtual base classes, rather than throw a compiler error.  Others do throw an error.
- Since trivially-copyable types include classes with reference members, this notes that offsetof(type, referencemember) is undefined.  Also, offsetof(type, nonreferencemember.arraymember[1].referencemember.otherclassmember).

Melissa

Richard Smith

未讀,
2016年10月14日 下午5:23:412016/10/14
收件者:std-dis...@isocpp.org
That was intentional. A compiler intrinsic of some kind is now required.
 
Relevant Standard quote:
If a program contains a violation of any diagnosable rule or an occurrence of a construct described in
this Standard as “conditionally-supported” when the implementation does not support that construct,
a conforming implementation shall issue at least one diagnostic message.

 
Anyway, that doesn't really change my point. I think the standard should define offsetof for more types than it currently does.

I think it would make sense to also require it to be guaranteed to be available for trivially copyable types.


Yes, but because members of reference type do not break being trivially-copyable, a statement about the behavior being undefined for reference types would be needed.

We should actually change those cases to be ill-formed rather than undefined; it looks like we missed that when changing the "bad type" case to be ill-formed.

What do you think about this wording?

The macro offsetof(type, member-designator) accepts a restricted set of type arguments in this International Standard.  Use of the offsetof macro with a type other than a standard-layout class or trivially copyable class (Clause 9) is conditionally-supported; if unsupported, the result is undefined.  The expression offsetof(type, member-designator) is never type-dependent (14.6.2.2) and it is value-dependent (14.6.2.3) if and only if type is dependent.  The result of applying the offsetof macro to a static data member or a function member, or a member-designator containing a member of reference type, is undefined.

Changes:
- Allows trivially-copyable types.
- If an implementation doesn't support some use case, the behavior is undefined.  Some existing implementations will simply explode if you try offsetof on e.g. members of virtual base classes, rather than throw a compiler error.  Others do throw an error.
- Since trivially-copyable types include classes with reference members, this notes that offsetof(type, referencemember) is undefined.  Also, offsetof(type, nonreferencemember.arraymember[1].referencemember.otherclassmember).

Melissa

--

Nicol Bolas

未讀,
2016年10月15日 下午1:35:212016/10/15
收件者:ISO C++ Standard - Discussion
On Friday, October 14, 2016 at 4:00:42 PM UTC-4, Myriachan wrote:
Changes:
- Allows trivially-copyable types.

... why?

The presence or absence of a user-defined copy constructor has nothing to do with the layout of a type. That's why standard layout doesn't care about constructors.

Consider the following:

struct A { int x; };

struct B : A { int y; };

struct C : A
{
  C
(const C&c) {...}
 
int y;
};

Why should `offsetof(B, y)` work, but not `offsetof(C, y)`? If it's so important for us to be able to get offsets of the members of complex class hierarchies (and I don't believe that it is, but we're working under your rules here), why do you stop with trivial copyability?

The rules of trivial copyability are not rules about things that affect layout (the standard explicitly forbids trivial copying into a base class subobject). What you need is a different definition, based on things that actually affect layout. Like standard layout, but with fewer restrictions. A `fixed layout` that simply forbids virtual base classes or whatever.

Myriachan

未讀,
2016年10月15日 下午2:22:502016/10/15
收件者:ISO C++ Standard - Discussion

If I had my way, I'd also allow offsetof() on any class that doesn't have virtual base classes (or, I suppose, members with virtual base classes, recursively).  The trouble would be convincing the Committee that that is a good idea.

I would rather have it be based on the structural properties than anything to do with the constructor, as you say.  "Trivial-layout class" is what I'd call it.  However, with the Standard as it is now, the memcpyable property of trivially copyable types is the only thing enforcing that offsetof() must have a reasonable definition.  Like Richard Smith says, compilers can currently theoretically use simply the presence of a constructor to decide to do exotic things with structure layouts.

Melissa

Andrey Semashev

未讀,
2016年10月15日 下午3:08:552016/10/15
收件者:std-dis...@isocpp.org
On 10/15/16 21:22, Myriachan wrote:
>
> I would rather have it be based on the structural properties than
> anything to do with the constructor, as you say. "Trivial-layout class"
> is what I'd call it. However, with the Standard as it is now, the
> memcpyable property of trivially copyable types is the only thing
> enforcing that offsetof() must have a reasonable definition. Like
> Richard Smith says, compilers can currently theoretically use simply the
> presence of a constructor to decide to do exotic things with structure
> layouts.

Well, theoretically it is possible that a trivially-copyable type has a
hidden offset member. When you perform a memcpy, you would also copy
that member, and the copy would still be operable.

However, I would just forbid such tricks for all classes without a
virtual base.

Myriachan

未讀,
2016年10月15日 下午6:07:302016/10/15
收件者:ISO C++ Standard - Discussion

This doesn't work because you can take the address of some member of the class, memcpy it from a different object, then still use the pointer you got from before the memcpy to read the copied value.

It would be an absurd situation if this didn't work:

struct Meow
{
   
int a;
private:
   
// two different access-specifiers on data members
   
// means not standard-layout.
   
int b;
   
friend void Function();
};

void Function()
{
   
Meow x;
   
Meow y;

    x
.a = 4;
    x
.b = 0;
    y
.a = 2;
    y
.b = 0;

   
int *pxa = &x.a;

   
// Even x = y; would be the same
    std
::memcpy(&x, &y, sizeof(x));

   
// Why shouldn't this print 2?
    std
::printf("%d\n", *pxa);
}



Melissa

Nicol Bolas

未讀,
2016年10月15日 晚上7:56:572016/10/15
收件者:ISO C++ Standard - Discussion

But the reason we were able to expand the POD definition into trivially copyable and standard layout was because compilers were not taking advantage of the degrees of freedom that the C++98/03 POD definition allowed. The committee looked at implementations and saw that adding non-virtual member functions was never used to affect the layout of various types. They saw that non-virtual base classes did not affect whether it was possible to memcpy a type.

So they expanded the definitions of these things, and compilers didn't need to change their implementations to implement them.

It seems to me that all you need to do is demonstrate 2 things: a good reason to expand `offsetof`'s domain, and that no compiler changes the layout of a type based on the presence or absence of a default special member function. Personally, I'd say that the latter ought to be a piece of cake.

It's the former that's hardest. The only motivations I've seen are either systems of dubious merit (interprocess memcpy of objects, yet wanting to avoid fully memcpying them out) or simply "because we can."

Thiago Macieira

未讀,
2016年10月16日 上午11:18:112016/10/16
收件者:std-dis...@isocpp.org
Em sábado, 15 de outubro de 2016, às 10:35:21 PDT, Nicol Bolas escreveu:
> On Friday, October 14, 2016 at 4:00:42 PM UTC-4, Myriachan wrote:
> > Changes:
> > - Allows trivially-copyable types.
>
> ... why?
>
> The presence or absence of a user-defined copy constructor has nothing to
> do with the layout of a type. That's why standard layout doesn't care about
> constructors.

Correct, it doesn't define the layout, but the absence of a user-defined copy
constructor implies memcpy must be able to copy and duplicate the information
in the class. Since every byte in the class is addressable and contains useful
information or padding, then we can use offsetof.

Nicol Bolas

未讀,
2016年10月16日 下午3:06:412016/10/16
收件者:ISO C++ Standard - Discussion
On Sunday, October 16, 2016 at 11:18:11 AM UTC-4, Thiago Macieira wrote:
Em sábado, 15 de outubro de 2016, às 10:35:21 PDT, Nicol Bolas escreveu:
> On Friday, October 14, 2016 at 4:00:42 PM UTC-4, Myriachan wrote:
> > Changes:
> > - Allows trivially-copyable types.
>
> ... why?
>
> The presence or absence of a user-defined copy constructor has nothing to
> do with the layout of a type. That's why standard layout doesn't care about
> constructors.

Correct, it doesn't define the layout, but the absence of a user-defined copy
constructor implies memcpy must be able to copy and duplicate the information
in the class. Since every byte in the class is addressable and contains useful
information or padding, then we can use offsetof.

Trivial copyability is about whether the value representation of the object (its bits and bytes) is the sole defining characteristic of that specific object. If a class is trivially copyable, then it is just a block of bits. If it is not trivially copyable, then the object has logical state that extends beyond its binary representation. `unique_ptr` manages a resource, so it has logical state (the existence of that resource) that is outside of its binary state (a pointer to that resource).

Virtual functions/base classes disqualify a type from being trivially copyable as an implementation detail of virtual functions/base classes.

In any case, my point was that trivial copyability is essentially irrelevant. The question that `offsetof` asks is this: "Given just the definition of this class (and obviously any classes that definition relies on), can I statically compute the offset of every member subobject of this class?" That is not a question of binary state vs. logical state; it is a question of whether every instance of the type T has a fixed layout, no matter how it gets used or where it comes from.

So while trivially copyable types do have fixed layout, they are not the only types which do. Basically, my point is that there are plenty of non-trivially copyable classes (which are not also standard layout) on which `offsetof` could still work. The example I gave being one of them.

Thiago Macieira

未讀,
2016年10月16日 下午5:27:292016/10/16
收件者:std-dis...@isocpp.org
Em domingo, 16 de outubro de 2016, às 12:06:41 PDT, Nicol Bolas escreveu:
> So while trivially copyable types do have fixed layout, they are not the
> only types which do. Basically, my point is that there are plenty of
> non-trivially copyable classes (which are not also standard layout) on
> which `offsetof` could still work. The example I gave being one of them.

Right, but we'll take the sufficient condition.

Nicol Bolas

未讀,
2016年10月16日 晚上7:23:302016/10/16
收件者:ISO C++ Standard - Discussion
On Sunday, October 16, 2016 at 5:27:29 PM UTC-4, Thiago Macieira wrote:
Em domingo, 16 de outubro de 2016, às 12:06:41 PDT, Nicol Bolas escreveu:
> So while trivially copyable types do have fixed layout, they are not the
> only types which do. Basically, my point is that there are plenty of
> non-trivially copyable classes (which are not also standard layout) on
> which `offsetof` could still work. The example I gave being one of them.

Right, but we'll take the sufficient condition.

But who says that it is "sufficient"? Can you give a justification why `offsetof` should fail for a type, just because it has a user-provided default constructor?

Or more to the point, whatever reason you have for applying `offsetof` to types beyond standard layout applies just as well for types beyond the union of trivially copyable and standard layout.

Thiago Macieira

未讀,
2016年10月17日 凌晨12:55:432016/10/17
收件者:std-dis...@isocpp.org、Nicol Bolas
Em domingo, 16 de outubro de 2016, às 16:23:30 PDT, Nicol Bolas escreveu:
> On Sunday, October 16, 2016 at 5:27:29 PM UTC-4, Thiago Macieira wrote:
> > Em domingo, 16 de outubro de 2016, às 12:06:41 PDT, Nicol Bolas escreveu:
> > > So while trivially copyable types do have fixed layout, they are not the
> > > only types which do. Basically, my point is that there are plenty of
> > > non-trivially copyable classes (which are not also standard layout) on
> > > which `offsetof` could still work. The example I gave being one of them.
> >
> > Right, but we'll take the sufficient condition.
>
> But who says that it *is* "sufficient"? Can you give a justification why
> `offsetof` should fail for a type, just because it has a user-provided
> default constructor?

That would be a "necessary condition".

In any and every class that is trivially copyable, offsetof works.

> Or more to the point, whatever reason you have for applying `offsetof` to
> types beyond standard layout applies just as well for types beyond the
> union of trivially copyable and standard layout.

I personally don't have such a need. The OP dd want for more, but it gets
tricky very quickly.

Kazutoshi Satoda

未讀,
2016年10月22日 上午10:02:582016/10/22
收件者:std-dis...@isocpp.org
On 2016/10/12 23:18 +0900, Andrey Semashev wrote:
> [support.types.layout]/1 in N4606 says that "Use of the offsetof macro
> with a type other than a standard-layout class (Clause 9) is
> conditionally-supported." What exactly does "conditionally-supported"
> mean here? I can see that the relevant paragraph in N4140 said the
> result is undefined in this case.
>
> Furthermore, is there a reason why offsetof is not defined for all
> classes without virtual inheritance? For example:
>
> struct A
> {
> int a;
> };
>
> struct B : A
> {
> int b;
> };
>
> I can't see in this example why offsetof(B, b) cannot be fully defined.

AFAIK, pointer arithmetic using such offsets is currently undefined.
https://groups.google.com/a/isocpp.org/d/topic/std-discussion/qppMafIo5PQ
But the discussion in this thread goes like it is usable. Are you sure
of that? Or am I missing something?

--
k_satoda

Andrey Semashev

未讀,
2016年10月22日 下午5:12:232016/10/22
收件者:std-dis...@isocpp.org
I believe, it's not undefined, assuming that offsetof works. B occupies
a contiguous storage, as does B::a and B::b within that storage (see
[basic.types]/4). I see no reason why pointer arithmetics would not work
in these conditions.

Chris Hallock

未讀,
2016年10月22日 晚上10:00:262016/10/22
收件者:ISO C++ Standard - Discussion

Per 5.7/4, pointer arithmetic is explicitly specified as undefined unless the pointer and the result each point to the same object, or to elements of the same array object, or one past the end of such (paraphrasing). This is absurdly limiting; see core issues 1701 and 2182.

Andrey Semashev

未讀,
2016年10月23日 清晨6:05:452016/10/23
收件者:std-dis...@isocpp.org
But the pointer arithmetics is within the same object.

Chris Hallock

未讀,
2016年10月23日 上午10:22:062016/10/23
收件者:ISO C++ Standard - Discussion

Right, it's OK in practice, just not de jure legal with the current wording (a defect, IMO). What's interesting is that 5.7/4 originated (with different wording, but basically the same effect) with DR 1504 between C++11 and C++14, and (according to that DR) the driving motivation was merely to un-define pointer arithmetic using base class pointers with an array of derived objects.

Chris Hallock

未讀,
2016年10月23日 上午10:31:472016/10/23
收件者:ISO C++ Standard - Discussion
[...] What's interesting is that 5.7/4 originated [...] with DR 1504 [...]

Whoops, I got that wrong... getting 5.7/4 mixed up with 5.7/6.

Kazutoshi Satoda

未讀,
2016年10月23日 上午10:59:122016/10/23
收件者:std-dis...@isocpp.org
> <http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1701> and 2182
> <http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#2182>.

My understanding of the restriction (of ranges and types) is to allow
some optimizations. Though I don't have strong opinion whether it worth
to prohibit some wide spread pointer arithmetic.

Consider this code snippet:

struct X { int a, b; };
void f(int*);
int g(void)
{
X x = {123, 456};
f(&x.b);
return x.a;
}

In the above code, there is no (currently) defined way in f() to
modify x.a. So compilers can transform "return x.a" into "return 123".

If arbitrary pointer arithmetic with byte offset is allowed, such a
transformation is prohibited and compilers will be forced to reload x.a
from memory, to support possible modifications to x.a in f(), like
this:

void f(int* pb)
{
X* px = reinterpret_cast<X*>(reinterpret_cast<char*>(pb) - offsetof(X, b));
px->a++;
}

--
k_satoda

Chris Hallock

未讀,
2016年10月23日 下午4:02:152016/10/23
收件者:ISO C++ Standard - Discussion

That optimization is currently invalid anyway (at least on some platforms), because f() might be defined as follows:

void f(int* pb)
{
   
auto bAddress = reinterpret_cast<std::intptr_t>(pb);              // not portable
   
int* pa = reinterpret_cast<int*>(bAddress - int(offsetof(X, b))); // not portable
   
// don't even need std::launder (I think?)
   
*pa = 333; // mwahaha!
}

Pointer arithmetic can be simulated with integer arithmetic on some (most?) platforms.

Andrey Semashev

未讀,
2016年10月23日 晚上8:38:072016/10/23
收件者:std-dis...@isocpp.org
On Sun, Oct 23, 2016 at 5:22 PM, Chris Hallock
No, I meant this is defined by the standard. In particular, the 5.7/4
paragraph you mentioned makes pointer arithmetic defined, when applied
on `unsigned char*` that points at the beginning of the storage used
to store the complete object.

struct S
{
int x, y;
};

S s;
unsigned char* p1 = reinterpret_cast< unsigned char* >(&s);
unsigned char* p2 = p1 + offsetof(S, y); // well-defined

The dodgy part is not the arithmetics but using the adjusted pointer
to access the member of the complete object. The naive solution is to
use reinterpret_cast again:

int* p3 = reinterpret_cast< int* >(p2);
assert(p3 == &s.y);

I think, this was UB up until C++14, which has updated definition of
reinterpret_cast in 5.2.10/7 and static_cast in 5.2.9/13. But even if
it's still considered UB by some clause that I'm missing, a simple
workaround is still possible:

int y;
std::memcpy(&y, p2, sizeof(int));

Andrey Semashev

未讀,
2016年10月23日 晚上8:48:092016/10/23
收件者:std-dis...@isocpp.org
On Sun, Oct 23, 2016 at 5:59 PM, Kazutoshi Satoda
<k_sa...@f2.dion.ne.jp> wrote:
>
> My understanding of the restriction (of ranges and types) is to allow
> some optimizations. Though I don't have strong opinion whether it worth
> to prohibit some wide spread pointer arithmetic.
>
> Consider this code snippet:
>
> struct X { int a, b; };
> void f(int*);
> int g(void)
> {
> X x = {123, 456};
> f(&x.b);
> return x.a;
> }
>
> In the above code, there is no (currently) defined way in f() to
> modify x.a. So compilers can transform "return x.a" into "return 123".

I don't think this is a valid optimization because otherwise a similar
optimization would produce absurd results:

void f(int* p); // assigns p[1] = 10;

int g()
{
int p[2] = { 0, 1 };
f(p);
return p[1]; // returns 1?
}

In both examples the compiler must not assume the other members
reachable through the pointer are not modified. Here, reachability
means any object that resides in the storage accessible through p.

Chris Hallock

未讀,
2016年10月24日 凌晨12:03:492016/10/24
收件者:ISO C++ Standard - Discussion
On Sunday, October 23, 2016 at 8:38:07 PM UTC-4, Andrey Semashev wrote:
No, I meant this is defined by the standard. In particular, the 5.7/4
paragraph you mentioned makes pointer arithmetic defined, when applied
on `unsigned char*` that points at the beginning of the storage used
to store the complete object.

  struct S
  {
    int x, y;
  };

  S s;
  unsigned char* p1 = reinterpret_cast< unsigned char* >(&s);
  unsigned char* p2 = p1 + offsetof(S, y); // well-defined

The "snag" in this example is that p1 doesn't point to an element of an array object, so the expression p1 + J is undefined for J > 1. (The object representation isn't an array object; 3.9/4 only defines it as a "sequence of N unsigned char objects", which is issue 1701.)

Nicol Bolas

未讀,
2016年10月24日 凌晨12:08:302016/10/24
收件者:ISO C++ Standard - Discussion

How do you define "accessible through p"?

The reason the above works is because `p` points to an array. By contrast, the other code is not given a pointer to an array; it is given a pointer to an integer.

So where in the standard does it say that the address of `x.a` is "accessible through" `&x.b`?

Andrey Semashev

未讀,
2016年10月24日 上午8:30:012016/10/24
收件者:std-dis...@isocpp.org
> <http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#1701>.)

Hmm, I always assumed that the "sequence of N unsigned char objects"
meant a contiguous sequence, in other words an array. It doesn't say
that explicitly but that is the natural thing to assume because
otherwise lots of code such as memory pools and memory allocators are
UB, and there is apparently no way to implement them in C++ because
there is no way to iterate over raw storage returned by malloc and
operator new.

I think, the standard does idicate that iterating should be possible in
[basic.stc.dynamic.allocation]/2:

[...] The pointer returned shall be suitably aligned so that it can be
converted to a pointer to any suitable complete object type (18.6.2.1)
and then used to access the object or array in the storage allocated [...]

In other words, I should be anle to convert void* to unsigned char* and
iterate over storage as if it was an array of unsigned char, as
guaranteed by [expr.add]/4.

Now, you may argue that you don't call an allocation function for an
object with automatic storage duration. True. But does that mean such
objects use some different kind of storage? I don't think so.

Also, consider [basic.types]/2 and 3, which state that a trivially
copyable type can be copied with memcpy. memcpy, as a function, is
defined in the C standard as a function that copies an array of
characters (see C11, 7.24.2.1 and 7.24.1/2); the C++ standard makes no
provision on whether the pointers passed to memcpy are interpreted as
arrays, raw storage, objects or something else.

I agree there's a lot of uncertainty here, and I think this should be
considered a defect. I think, the standard should be clear that one can
iterate over storage used by objects in the same way it is possibe to
iterate over an array of (unsigned) chars, because otherwise lots of
code and parts of the standard break.

Whether dereferencing a pointer obtained through that iteration yields
any particularly defined value is another question, but at least it
should be guaranteed that when such obtained pointer actually points to
an object of type T, dereferencing that pointer will yield a valid
reference to T. Otherwise offsetof is simply unusable.

Andrey Semashev

未讀,
2016年10月24日 上午8:42:322016/10/24
收件者:std-dis...@isocpp.org
My definition is as follows. An object `x` is reachable through pointer
`p` if there exists a well-defined sequence of operations on `p` that
can yield a reference to `x`.

> The reason the above works is because `p` points to an array. By
> contrast, the other code is not given a pointer to an array; it is given
> a pointer to an integer.
>
> So where in the standard does it say that the address of `x.a` is
> "accessible through" `&x.b`?

I tried to explain my reasoning in my reply to Chris, `x.a` is reachable
through a pointer to `x.b` because the pointer refers to a storage
allocated for `x`, which includes both `x.a` and `x.b`. Here, I assume
that iterating over storage is a well-defined operation.

In this light, there is no difference whether `p` points at an array or
not. Indeed, `f()` has no knowledge of that.

Andrey Semashev

未讀,
2016年10月24日 下午1:10:552016/10/24
收件者:std-dis...@isocpp.org
On 10/12/16 17:18, Andrey Semashev wrote:
> Hi,
>
> [support.types.layout]/1 in N4606 says that "Use of the offsetof macro
> with a type other than a standard-layout class (Clause 9) is
> conditionally-supported." What exactly does "conditionally-supported"
> mean here? I can see that the relevant paragraph in N4140 said the
> result is undefined in this case.
>
> Furthermore, is there a reason why offsetof is not defined for all
> classes without virtual inheritance? For example:
>
> struct A
> {
> int a;
> };
>
> struct B : A
> {
> int b;
> };
>
> I can't see in this example why offsetof(B, b) cannot be fully defined.

I've written a proposal draft and started a discussion thread in
[std-proposals]:

[offsetof-ext] Draft proposal for extended offsetof

Everyone interested are welcome to comment on the proposal.

Thanks.

Myriachan

未讀,
2016年10月27日 晚上7:13:082016/10/27
收件者:ISO C++ Standard - Discussion
On Sunday, October 23, 2016 at 9:08:30 PM UTC-7, Nicol Bolas wrote:

How do you define "accessible through p"?

The reason the above works is because `p` points to an array. By contrast, the other code is not given a pointer to an array; it is given a pointer to an integer.

So where in the standard does it say that the address of `x.a` is "accessible through" `&x.b`?


I believe that this case with uintptr_t is (implementation-)defined behavior due to these things:

[reinterpret.cast]
4. A pointer can be explicitly converted to any integral type large enough to hold it. The mapping function is implementation-defined.  ...
5. A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined. 


Once the pointer is an integer, you can do whatever arithmetic you want with it.  Due to most implementations having a sane-enough definition of integer-pointer conversion, you now have an integer whose value is equal to reinterpret_cast<intptr_t>(&x.a).  Then, casting back to int *, the "same pointer type" as &x.a, is valid.  Lvalue-to-rvalue conversion is legal because x.a was instantiated in memory as type "int", so no aliasing violation occurs.

One problem with the model is that if "a" and "b" were independent variables rather than members of the same structure, and the called function magically acquired the distance from "b" to "a" (and "a" actually existed in memory), then it would be weird but valid.  As things are now, this is one of the self-inconsistencies in the Standard.

Melissa

Andrey Semashev

未讀,
2016年10月27日 晚上7:30:192016/10/27
收件者:std-dis...@isocpp.org
On Fri, Oct 28, 2016 at 2:13 AM, Myriachan <myri...@gmail.com> wrote:
> On Sunday, October 23, 2016 at 9:08:30 PM UTC-7, Nicol Bolas wrote:
>>
>> So where in the standard does it say that the address of `x.a` is
>> "accessible through" `&x.b`?
>
> I believe that this case with uintptr_t is (implementation-)defined behavior
> due to these things:
>
>> [reinterpret.cast]
>> 4. A pointer can be explicitly converted to any integral type large enough
>> to hold it. The mapping function is implementation-defined. ...
>>
>> 5. A value of integral type or enumeration type can be explicitly
>> converted to a pointer. A pointer converted to an integer of sufficient size
>> (if any such exists on the implementation) and back to the same pointer type
>> will have its original value; mappings between pointers and integers are
>> otherwise implementation-defined.
>
> Once the pointer is an integer, you can do whatever arithmetic you want with
> it.

I think the standard does not guarantee equivalence between pointer
arithmetics and integer arithmetics. The quoted paragraphs only
guarantee that you can cast the pointer to integer and back, but
nothing beyond that.

> One problem with the model is that if "a" and "b" were independent variables
> rather than members of the same structure, and the called function magically
> acquired the distance from "b" to "a" (and "a" actually existed in memory),
> then it would be weird but valid. As things are now, this is one of the
> self-inconsistencies in the Standard.

But, AFAIK, there is no way to obtain that distance. Doing pointer
arithmetics with pointers referring to separate storages (i.e.
different objects or elements of different arrays) is forbidden.

Thiago Macieira

未讀,
2016年10月27日 晚上8:44:082016/10/27
收件者:std-dis...@isocpp.org
Em sexta-feira, 28 de outubro de 2016, às 02:30:17 PDT, Andrey Semashev
escreveu:
> I think the standard does not guarantee equivalence between pointer
> arithmetics and integer arithmetics. The quoted paragraphs only
> guarantee that you can cast the pointer to integer and back, but
> nothing beyond that.

Indeed it isn't.

int *ptr = &array[0];
uintptr_t u = uintptr_t(ptr);
assert(u + 1 == uintptr_t(ptr + 1)); // fails!

Thiago Macieira

未讀,
2016年10月27日 晚上9:20:562016/10/27
收件者:std-dis...@isocpp.org
Em quinta-feira, 27 de outubro de 2016, às 17:44:03 PDT, Thiago Macieira
escreveu:
> Em sexta-feira, 28 de outubro de 2016, às 02:30:17 PDT, Andrey Semashev
>
> escreveu:
> > I think the standard does not guarantee equivalence between pointer
> > arithmetics and integer arithmetics. The quoted paragraphs only
> > guarantee that you can cast the pointer to integer and back, but
> > nothing beyond that.
>
> Indeed it isn't.
>
> int *ptr = &array[0];
> uintptr_t u = uintptr_t(ptr);
> assert(u + 1 == uintptr_t(ptr + 1)); // fails!

And back to DOS days, if you compiled the following in the huge memory model
(so you have 32-bit-wide pointers and can access all real-mode memory):

char array[131072];

char *ptr = &array[0];
uintptr_t u = uintptr_t(ptr);
assert(u + 65536 == uintptr_t(ptr + 65536)); // fails!

ptr + 65536 is actually equal to u + 16777216.

Myriachan

未讀,
2016年10月27日 晚上9:45:262016/10/27
收件者:ISO C++ Standard - Discussion
On Thursday, October 27, 2016 at 6:20:56 PM UTC-7, Thiago Macieira wrote:
Em quinta-feira, 27 de outubro de 2016, às 17:44:03 PDT, Thiago Macieira
escreveu:
> Em sexta-feira, 28 de outubro de 2016, às 02:30:17 PDT, Andrey Semashev
>
> escreveu:
> > I think the standard does not guarantee equivalence between pointer
> > arithmetics and integer arithmetics. The quoted paragraphs only
> > guarantee that you can cast the pointer to integer and back, but
> > nothing beyond that.
>
> Indeed it isn't.
>
>         int *ptr = &array[0];
>         uintptr_t u = uintptr_t(ptr);
>         assert(u + 1 == uintptr_t(ptr + 1));        // fails!

And back to DOS days, if you compiled the following in the huge memory model
(so you have 32-bit-wide pointers and can access all real-mode memory):

char array[131072];

        char *ptr = &array[0];
        uintptr_t u = uintptr_t(ptr);
        assert(u + 65536 == uintptr_t(ptr + 65536));        // fails!

ptr + 65536 is actually equal to u + 16777216.


The Standard doesn't say anything about what arithmetic on [u]intptr_t means, but it does say that the mapping between pointer and integer is implementation-defined.  If your implementation defines a flat memory space, or a flat-enough memory space, then arithmetic on uintptr_t is valid, provided that your resulting value equals the integral version of a pointer to a valid object and has the correct type or an aliasing-compatible type.  So long as you're aware of the nonportability of certain parts of your code, it will work.

I see certain implementations as having "semi-flat" memory spaces.  In what I call a "semi-flat" memory space, the memory space may not be flat overall, but it is flat within any allocated object.  In such a case, uintptr_t arithmetic is valid, provided you don't cross an object allocation boundary.  The MS-DOS and Win16 memory model "large" meets that criteria, but as you noted above, the "huge" model does not.

Melissa

Thiago Macieira

未讀,
2016年10月27日 晚上10:48:212016/10/27
收件者:std-dis...@isocpp.org
Em quinta-feira, 27 de outubro de 2016, às 18:20:48 PDT, Thiago Macieira
escreveu:
> char array[131072];
>
> char *ptr = &array[0];
> uintptr_t u = uintptr_t(ptr);
> assert(u + 65536 == uintptr_t(ptr + 65536)); // fails!
>
> ptr + 65536 is actually equal to u + 16777216.

Actually, my math was wrong. It was u + 268435456.

The reason why is left as an exercise to the reader.
回覆所有人
回覆作者
轉寄
0 則新訊息