// Possibly UB?bool is_this_ub = value->IsBoolean();Is this UB?
N4296 9.3.1(2) states "If a non-static member function of a class X is called for an object that is not of type X, or of a type derived from X, the behavior is undefined". It seems like a InternalValue*& is not a Value& so the access to Value::IsBoolean() on the reinterpet-casted pointer is UB, but I could be misreading the standard.If this is UB, what could be done to make it not UB?
One thought I had was to re-write Value() like this:class Value // single element POD struct{public:bool IsBoolean();private:Value(); // not constructibleInternalValue** mFirstMember;
};At this point, since Value is a POD, it must have the same representation as its only member, so a reinterpret-cast between pointers to those types should be safe. Is that true?
Alternatively, perhaps some magic with std::launder would also make this be defined behavior?
On Tue, Jun 7, 2016 at 11:40 AM, <ryani...@gmail.com> wrote:// Possibly UB?bool is_this_ub = value->IsBoolean();Is this UB?Yes.
If this is UB, what could be done to make it not UB?
Redesign this library to not do this. For instance:class Value {public:static bool IsBoolean(Value *v) { return (*ToHandle(value))->IsBoolean(); }// ...};
class Value // single element POD struct{public:bool IsBoolean();private:Value(); // not constructibleInternalValue** mFirstMember;You mean InternalValue * here, I think. (You want InternalValue** and Value* to have the same representation, not InternalValue** and Value.)
};At this point, since Value is a POD, it must have the same representation as its only member, so a reinterpret-cast between pointers to those types should be safe. Is that true?Not necessarily -- an object of type Value still doesn't exist, so it's still UB. If you want to know whether it's safe with any particular implementation, you'll need to ask the people providing that implementation. (If you originally created an object of type Value, rather than creating an object of type InternalValue*, then this *is* safe and correct if Value is a standard-layout class type.)
A hypothetical sufficiently-smart compiler could look at the whole program, determine that an object of type Value is never created, and then delete all the definitions of non-static member functions of that class. Or (in a sanitizig mode) it could build a side table listing which objects of what types exist at what addresses, and cause your program to crash with a diagnostic on the call to Value::IsBoolean. (And so on, these are just examples.)
Alternatively, perhaps some magic with std::launder would also make this be defined behavior?No; std::launder does not create objects. Keep in mind [intro.object]/6:
"Two objects that are not bit-fields may have the same address if one is a subobject of the other, or if at least one is a base class subobject of zero size and they are of different types; otherwise, they shall have distinct addresses."
You can't have an object of type InternalValue* and an unrelated object of type Value at the same address.
On 06/08/2016 02:53 AM, Nicol Bolas wrote:
> You're thinking of trivially copyable. Standard layout only governs byte-wise compatibility between two types. It does not mean that it is legal to create such an object by bytewise copying it. It is trivially copyable types that can be bytewise copied into new objects of that type.
Even with trivially copyable, 3.9p2 and 3.9p3 seem to say that an
object must already exist before memcpy works.
Richard is working on clarifications what "object exists" actually
means; see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0137r0.html .
(With either the new or the existing wording around "object", it's not
clear that std::vector can actually be implemented in C++. This seems
a sub-optimal state of affairs.)
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/p4BXNhTHY7U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
On Wed, Jun 8, 2016 at 5:52 PM, Richard Smith <ric...@metafoo.co.uk> wrote:On Wed, Jun 8, 2016 at 12:00 AM, Jens Maurer <Jens....@gmx.net> wrote:On 06/08/2016 02:53 AM, Nicol Bolas wrote:
> You're thinking of trivially copyable. Standard layout only governs byte-wise compatibility between two types. It does not mean that it is legal to create such an object by bytewise copying it. It is trivially copyable types that can be bytewise copied into new objects of that type.
Even with trivially copyable, 3.9p2 and 3.9p3 seem to say that an
object must already exist before memcpy works.Gabriel Dos Reis' http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2013/n3751.pdf made a start at specifying the behavior here, such that memcpy can be used to reinterpret the bits of one type as another, and can be used to start the lifetime of an object, but IIRC we've not seen any updates since Urbana-Champaign.Richard is working on clarifications what "object exists" actually
means; see http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/p0137r0.html .
(With either the new or the existing wording around "object", it's not
clear that std::vector can actually be implemented in C++. This seems
a sub-optimal state of affairs.)Dynamic array resizing remains a problem for our formal object model. :(--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/p4BXNhTHY7U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
You didn't specify how to implement the piece that's not implementable :-)T *vector<T>::data() { return ??? }vector<int> vi;vi.push_back(1);vi.push_back(2);vi.push_back(3);vi.data()[2] = 12; // ub, there is no array object on which to do array indexingC++98's vector was fine, since it didn't pretend to expose an array to the user (there was no data(), iterators could be used to encapsulate the reinterpret_casts, and there was no contiguous iterator guarantee), but this has been unimplementable in the formal C++ object model since C++03 guaranteed that (&vi.begin())[2] should work.Obviously it's actually fine in practice (and your implementation will certainly make sure it works), the question here is how to tweak the formal wording to give the guarantees we actually want. There are a number of different options with different tradeoffs (should we require explicit code in std::vector to create an array object? should we allow nontrivial pointer arithmetic / array indexing on pointers that don't point to arrays? should we magically conjure an array object into existence to make this work? should we allow an array object to be created without actually initializing all of its elements? how should the lifetime of an array object work anyway?). I'll probably write a paper on that once we're done with p0137.
From: Richard Smith Sent: Friday, June 10, 2016 4:16 PM Reply To: std-dis...@isocpp.org Subject: Re: [std-discussion] More UB questions |
Does malloc start the lifetime of anything?
> So we only get an object from a definition, new-expression, or temporary, and [basic.life]/1 can't start the lifetime of an array object because there is no such object in the first place.So I get why this is important to specify like this in the standard; it vastly simplifies talking about what behavior is defined and what isn't. However, why is it important that this be true for actual code?What I'm getting at is that there are a set of code transformations that we think of as 'valid', and the standard should recognize the transitive closure of those transformations when considering what objects are 'live',
For example, if C is trivially destructible and p is a pointer to a live object of type C, we should havep->~C();is equivalent to/* nothing */ ;
Similarly if C is also zero-constructible we havep->~C();new(p) C;is equivalent top->~C();memset(p, 0, sizeof(C));which by the above is equivalent tomemset(p, 0, sizeof(C));In particular, this would allow code like the above memset (which exists in every serious program I've seen) to be defined as starting *p's lifetime, as the transformation is valid in both directions!
And the program could have defined behavior both assuming that after this line p is a live, valid object, or assuming it is dead (destructed but not unallocated memory).
It's impossible for the compiler to know which the user intended here so it has to infer it from the code that follows.
Can you educate me as to what we gain from having the compiler know that this object is alive?
> And how exactly does that work? What statements would cause you to "infer" that an object's lifetime has begun? What statements would cause you to "infer" than an object's lifetime has ended?The standard already makes those statements:(paraphrased) "calling a non-static member function on a dead object is undefined". In order to invoke UB on p->Foo(), the compiler must prove that p is dead. If p is trivially constructible, then p can actually *never* be proved dead.
Honestly, we might fundamentally disagree on the purpose of C++ as a language. I see it as a systems programming language which offers zero-cost abstractions and ways to make abstractions zero-cost whenever the hardware supports it.
The memory model is that objects are equivalent to arrays of bytes.
The standard should legitimize the memory model it describes by making it easy to treat them that way when it's appropriate to do so.
An example from my previous company, using pre-C++11 compiler:
AutoRefCount<T>: A smart pointer type that calls AddRef() / Release() as needed on the contained object. ("intrusive" ref-counting).vector<T>: A stl-like vector class.Our codebase contained a vector of ref-counted pointers. We would see a performance degradation on frames where this vector was resized, caused by the equivalent of this loop:// exception-handling code omittedfor( i=0; i<size; ++i ){// calls AddRef() on the targetnew( &newMemory[i] ) T( oldMemory[i] );}// if no exceptions, destruct the old stufffor( i=0; i<size; ++i ){// calls Release() on the targetoldMemory[i]->~T();}The addref/release pairs were thrashing our data cache for no real benefit.We knew that semantically a copy-and-destruct operation for AutoRefCount was equivalent to memcpy, so we implemented this optimization to vector grow--it already had support for trivially copyable objects, but not for trivially copy+destructible.
(Note that this *isn't* the same as trivially movable as described in the standard, a concept which I think is mostly useless as specified as it doesn't handle this case and generally ends up being equivalent to trivial copy).
Now, you could argue that move semantics solves this problem, and you'd be somewhat correct, but it would be tricky for an optimizer to eliminate the second loop where it re-traverses the array and verifies that all the pointers are zero. But I bet there are other concepts which don't yet have special support in the compiler, and there always will be.So, how are you proposing that someone implement this sort of optimization in a world where we must explicitly declare to the compiler what objects are alive?
>> Can you educate me as to what we gain from having the compiler know that this object is alive?> I don't understand what you're saying here.
I am asking what benefit actual programs/programmers get from not having a simple memory model, and a simple model of object lifetime that matches what implementations actually do.
As it is, the standard is extremely divergent from actual practice (c.f. "can't actually implement vector in C++") and therefore not useful.
I don't want it to be tied down by notions of ideological purity as fundamentally programming languages exist to write programs, and C++ is foremost a pragmatic language. There are plenty of research languages that are offer fascinating work if you want to see what you can do by focusing on purity of ideas over pragmatism. I love those languages. I write lots of Haskell in my free time. But when I need to be pragmatic, I need a "pragmatic" tool in my belt, and C++ is the best one for it right now.
> The standard says that accessing an object after its lifetime has ended is undefined behavior. Because undefined behavior does not require a diagnostic, compilers are not required to detect that an object's lifetime has ended. They will simply access it as if it were live; whatever will be, will be.That may be what they do now, but it is of vital importance what the standard declares as UB as compilers continue to take advantage of UB detection to treat code as unreachable. As the theorem provers and whole-program optimization used by compilers get better, more and more UB will be found and (ab-)used, and suddenly my "working" code (because it relies on UB, as you say) causes demons to fly out of my nose.
> And thereby invoke undefined behavior. There is no such thing as "trivially copy+destructible".Yes there is. It may not be a defined term in the standard,
but it is a simple concept to understand, and it enables optimizations that are not possible without it. I consider it a defect that the standard does not allow me to define such concepts in terms of the lower-level definitions of memory layout, as it means the standard must either grow without bound to encompass all possible concepts for object usage, or else fail in its purpose of being a general-purpose language suitable for systems programming.
> `AutoRefCount` is not trivial in any way. It has no trivial constructors and it has no trivial destructors. Because it is not trivially copyable, you cause undefined behavior by copying it with mempcy ([basic.types]). Because it is not trivially destructible, you cause undefined behavior by deallocating the memory without calling the destructor ([basic.life]).This is incorrect. [Basic.life]:> (4) For an object of a class type with a non-trivial destructor, the program is> not required to call the destructor explicitly before the storage which the> object occupies is reused or released; however, if there is no explicit call> to the destructor or if a delete-expression (5.3.5) is not used to release> the storage, the destructor shall not be implicitly called and any program> that depends on the side effects produced by the destructor has undefined> behavior.Emphasis "any program that depends on the side effects produced by the destructor"; the whole point of this code transformation is that the resulting program does not rely on the side-effects of the destructor.
[basic.types] does not state that copying a non-trivially-copyable type with memcpy is UB, it simply states that the behavior is explicitly defined for trivially-copyable types.
AutoRefCount is a standard-layout class type, and therefore is represented by a contiguous array of bytes which contain its member variables:
a single raw pointer, which *is* trivially copyable.
The only question is how to indicate that the lifetime of the new AutoRefCount has begun; an analogue to [Basic.life](4) for object construction that says that you can avoid calling the constructor if you don't rely on the side-effects of said constructor.
[Basic.life]:> (4) For an object of a class type with a non-trivial destructor, the program is> not required to call the destructor explicitly before the storage which the> object occupies is reused or released; however, if there is no explicit call> to the destructor or if a delete-expression (5.3.5) is not used to release> the storage, the destructor shall not be implicitly called and any program> that depends on the side effects produced by the destructor has undefined> behavior.
As far as I'm concerned, [basic.life](4) is a mistake and should be removed. If you give an object a non-trivial destructor, and it doesn't get called, then your program should be considered broken.
I'm not sure I can continue having a discussion with you if you continue to be intellectually dishonest.For example, you ask> if you're just going to ignore the rules and do what your compiler lets you get> away with anyway... why does it matter to you what the standard says?which, guess what, I already answered -- in fact, you immediately quote my answer:> That may be what they do now, but it is of vital importance what the standard> declares as UB as compilers continue to take advantage of UB detection to> treat code as unreachable. As the theorem provers and whole-program> optimization used by compilers get better, more and more UB will be found> and (ab-)used, and suddenly my "working" code (because it relies on UB, as> you say) causes demons to fly out of my nose.and you follow with this claim> That's what happens when you rely on undefined behavior.You can't have it both ways; I care about what is in the standard because I care about my programs continuing to work,
but I also care about being able to write programs that make the hardware do what I want. The argument I am putting forward is that this behavior shouldn't be undefined; that the standard and common coding practice should be in agreement. There are ways to have a language that semantically makes sense without hardcoding everything into explicit syntax about object construction, but you dismiss this idea out of hand.
> Why should we want that? As far as I'm concerned, [basic.life](4) is a mistake and should be removed. If you give an object a non-trivial destructor, and it doesn't get called, then your program should be considered broken.More dishonesty here. You call me out when I propose that things are problems in the standard, but then you go ahead and do the same *in the same message*.
> That's not how a standard works.>
> A standard specifies behavior. It specifies what will happen if you do a> particular thing. If the standard does not explicitly say what the results> of something are, then those results are undefined by default.I agree with this statement. I was simply pointing out that [basic.types] is not sufficient to call this behavior UB; I don't know the entire standard word-by-word (and neither do you, as evidenced by your misquote of [basic.life]), and I'm not 100% convinced that there is nothing elsewhere in the standard that defines what should happen in this case, although given that it's not defined at that point, I am willing to believe that it's more likely than not UB. When something explicitly is declared UB it's easier to quote the relevant section!
So, lets go back to trying to find some common ground. (Apologies to any of our readers who were hoping for a good old-fashioned flame war!)Here's an example of some Haskell code using a GHC optimization extension:
{-# RULES"map/map" forall f g xs. map f (map g xs) = map (f.g) xs#-}Here the programmer of "map" knows that semantically the code on both sides of the equals is the same, but that the right-hand one will generally be more efficient to evaluate. Rewrite RULES inform the compiler of this knowledge and ask it to make a code transformation for us, adding additional optimization opportunities that can show up after inlining or other optimizations.
Now, you could trivially write a false rule{-# RULES"timesIsPlus" forall x y. x*y = x+y#-}The fact that the programmer could write a buggy program doesn't mean that the language makes no sense.
This feature is designed to be used by programmers who have proved that the code transformation being applied is valid. In C++-standards-ese I would say "a program with a rewrite rule where the right hand side is observably different from the left hand side has undefined behavior"; the compiler is free to apply, or not apply the rewrite rule, and it's up to the author of the rule to guarantee that the difference between the LHS and RHS of the rule is not observable.I am not suggesting a wild-west world where everyone just memcpy's objects everywhere. I think you're right that this isn't a useful place to be.
I am suggesting a world where it is up to the programmer to define places where that makes sense and is legal; one where the memory model works in tandem with the object model to allow low-level optimizations to be done where needed. [basic.life](4) is an example of this sort of world, it specifies that I can re-use the storage for an object without calling the destructor if I can prove that my program doesn't rely on the side-effects of that destructor. It doesn't say I'm allowed to just not call destructors willy-nilly--it puts an obligation on me as a programmer to be more careful if I am writing crazy code.
It's this same sentiment that puts reinterpret_cast and const_cast in the language; tools of great power but that also carry great responsibility.
Similarly, objects on real hardware are made out of bytes, and sometimes it's useful to think of them as objects, sometimes as bytes, and sometimes as both simultaneously. What are the best ways to enable this? Are there ways that make sense or do you think it's fundamentally incompatible with the design of the language?
From: Ryan Ingram Sent: Monday, June 13, 2016 2:55 AM To: Nicol Bolas Reply To: std-dis...@isocpp.org Cc: ISO C++ Standard - Discussion |
Subject: Re: [std-discussion] More UB questions |
I used to use it commonly like this:static_assert(sizeof(float) == sizeof(int32_t));float x = ...;int32_t x_representation = renterpret_cast<int32_t&>(x);// do IEEE floating-point magic here e.g. https://en.wikipedia.org/wiki/Fast_inverse_square_root// or use it for serialization.The current standard has no way to do this sort of 'representation-cast'.
Implementations now generally outlaw using reinterpret_cast to violate strict aliasing, but allow representation casting to be done via unions. But even that is an extension, and according to a strict reading of the standard it is UB. I think that this is exactly the kind of thing that should have "implementation-defined" behavior rather than be UB. The difference is that usually implementation-defined behavior has bounds to how wild implementations can go, for example: "the resulting value is implementation-defined" vs. "is UB". It puts boundaries on how non-portable this code is.
From: Nicol Bolas Sent: Monday, June 13, 2016 1:07 AM To: ISO C++ Standard - Discussion Reply To: std-dis...@isocpp.org |
Subject: Re: [std-discussion] More UB questions |
Can you educate me as to what we gain from having the compiler know that this object is alive?> By your reasoning, every use of `p` is illegal, because it is acting on an object who's lifetime has been ended by the non-statement right before it.No, p's state is nondeterminate. Each use of p that requires *p to be alive communicates information: *p must still be alive at this point." Similarly, each use that requires *p to be dead communicates that it must be dead at that point. If p must ever both be simultaneously alive and dead, *then* the behavior would be undefined.It's just like if you get passed an integer parameter x; inside an if(x >= 0) branch you can infer that x is non-negative and use unsigned operations if they happen to be faster on your hardware, but before that statement x's state is indeterminate.We already rely on the compiler to do these sorts of inferences. If a function has a pointer argument 'p' and immediately calls p->Foo(), then the compiler can assume (1) p is non-null, and (2) p refers to a live object of its type. But before that line the compiler doesn't and cannot know the programmers intention.> But it can't start C's lifetime. Because, by your rules, it starts the lifetime of `C` and every other type that can fit into that memory.Not exactly; in the absence of some sort of aliasing-laundering mechanism we know it only starts the lifetime of objects of type C (and whatever C's members are), since we have p : C*.> And how exactly does that work? What statements would cause you to "infer" that an object's lifetime has begun? What statements would cause you to "infer" than an object's lifetime has ended?The standard already makes those statements:(paraphrased) "calling a non-static member function on a dead object is undefined". In order to invoke UB on p->Foo(), the compiler must prove that p is dead. If p is trivially constructible, then p can actually *never* be proved dead. If p is zero-constructible, then after p->~C(), p is dead until a new-expression or a memclear of p's memory. etc.
Honestly, we might fundamentally disagree on the purpose of C++ as a language.
I see it as a systems programming language which offers zero-cost abstractions and ways to make abstractions zero-cost whenever the hardware supports it. The memory model is that objects are equivalent to arrays of bytes. The standard should legitimize the memory model it describes by making it easy to treat them that way when it's appropriate to do so.
>> Can you educate me as to what we gain from having the compiler know that this object is alive?> I don't understand what you're saying here.I am asking what benefit actual programs/programmers get from not having a simple memory model, and a simple model of object lifetime that matches what implementations actually do. As it is, the standard is extremely divergent from actual practice (c.f. "can't actually implement vector in C++") and therefore not useful.
I want the standard to be useful and also match what real programs and real programmers do. I don't want it to be tied down by notions of ideological purity as fundamentally programming languages exist to write programs, and C++ is foremost a pragmatic language. There are plenty of research languages that are offer fascinating work if you want to see what you can do by focusing on purity of ideas over pragmatism. I love those languages. I write lots of Haskell in my free time. But when I need to be pragmatic, I need a "pragmatic" tool in my belt, and C++ is the best one for it right now.
> The standard says that accessing an object after its lifetime has ended is undefined behavior. Because undefined behavior does not require a diagnostic, compilers are not required to detect that an object's lifetime has ended. They will simply access it as if it were live; whatever will be, will be.That may be what they do now, but it is of vital importance what the standard declares as UB as compilers continue to take advantage of UB detection to treat code as unreachable. As the theorem provers and whole-program optimization used by compilers get better, more and more UB will be found and (ab-)used, and suddenly my "working" code (because it relies on UB, as you say) causes demons to fly out of my nose.
On Saturday, June 11, 2016 at 4:35:28 PM UTC-4, Ryan Ingram wrote:>> Can you educate me as to what we gain from having the compiler know that this object is alive?> I don't understand what you're saying here.I am asking what benefit actual programs/programmers get from not having a simple memory model, and a simple model of object lifetime that matches what implementations actually do.
You get to have classes that make sense. You get to have encapsulation of data structures and functional invariants. You get reasonable assurance that invariants established by the constructor or other functions cannot be broken by external code, unless the external code does something which provokes undefined behavior (like, say, memcpy-ing a non-trivially copyable class).
So what we get with these rules is the ability to live and function in a reasonably sane world. That's what the C++ memory model exists to create.
The world you seem to want to live in is C-with-classes.As it is, the standard is extremely divergent from actual practice (c.f. "can't actually implement vector in C++") and therefore not useful.
The issue with `vector` has to do primarily with arrays, since they're kinda weird in C++. This is a defect because the standard is written contradictory: requiring something which cannot be implemented without provoking UB.
A standard specifies behavior. It specifies what will happen if you do a particular thing. If the standard does not explicitly say what the results of something are, then those results are undefined by default.
On 06/14/2016 09:16 AM, FrankHB1989 wrote:
>
>
> 在 2016年6月12日星期日 UTC+8上午7:29:46,Nicol Bolas写道:
>
>
>
> A standard specifies behavior. It specifies what will happen if you do a particular thing. If the standard does not explicitly say what the results of something are, then those results are undefined /by default/.
>
> I doubt this is true in ISO C++.
>
> A language standard, in general, gives you rules about /compliance /or /conformance/ on implementations. Requirements on behavior of programs or program executions are not essential, but they make the rules easier to understand.
>
> If something is not specified, that is /underspecified/. This is not necessary implying "undefined", since the meaning of "undefined" is also specified by some normative text elsewhere with clear definition. Sometimes rules may be missing, make the standard defective or even inconsistent. In such cases, they are also underspecified, but not undefined.
>
> Your "by default" statement can be achieved by a rule in normative text, e.g. in ISO C:
>
> 4/2 If a ‘‘shall’’ or ‘‘shall not’’ requirement that appears outside of a constraint or runtime constraint
> is violated, the behavior is undefined. Undefined behavior is otherwise
> indicated in this International Standard by the words ‘‘undefined behavior’’ or *by the
> omission of any explicit definition of behavior*. *There is no difference in emphasis among
> these three; they all describe ‘‘behavior that is undefined’’.*
>
> (Emphasized mine.)
>
> But I find no similar rules for the entire language in ISO C++.
See 1.3.25 with similar phrasing (although in a note) and 1.4 otherwise.
As an aside, I think it's a great advantage that C and C++ explicitly
specify "undefined behavior" in the standard, as opposed to relying on
the absence of specification. That avoids doubt whether the omission
of specification was intentional or not.
You didn't specify how to implement the piece that's not implementable :-)T *vector<T>::data() { return ??? }vector<int> vi;vi.push_back(1);vi.push_back(2);vi.push_back(3);vi.data()[2] = 12; // ub, there is no array object on which to do array indexingC++98's vector was fine, since it didn't pretend to expose an array to the user (there was no data(), iterators could be used to encapsulate the reinterpret_casts, and there was no contiguous iterator guarantee), but this has been unimplementable in the formal C++ object model since C++03 guaranteed that (&vi.begin())[2] should work.Obviously it's actually fine in practice (and your implementation will certainly make sure it works), the question here is how to tweak the formal wording to give the guarantees we actually want. There are a number of different options with different tradeoffs (should we require explicit code in std::vector to create an array object? should we allow nontrivial pointer arithmetic / array indexing on pointers that don't point to arrays? should we magically conjure an array object into existence to make this work? should we allow an array object to be created without actually initializing all of its elements? how should the lifetime of an array object work anyway?). I'll probably write a paper on that once we're done with p0137.
One problem could be that N objects really are not sequential. For
example if an array is limited to a single segment and you have two
other variables in different segments, how would you combine those?
Bo Persson
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/p4BXNhTHY7U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
On Friday, 23 February 2018 09:03:42 PST Todd Fleming wrote:
> On Friday, February 23, 2018 at 11:20:34 AM UTC-5, Thiago Macieira wrote:
> > On Friday, 23 February 2018 08:13:40 PST Bo Persson wrote:
> > > I see no problem with std::less et al being similar (for implementations
> > > that do seemingly odd things anyway).
> >
> > Like Edward said, I don't see a problem for std::less to have magic. I do
> > see
> > a problem for std::vector to do so, especially for handling the allocated
> > memory block as an array. There's just too much code out there that
> > depends on
> > this functionality, so we need a solution in the core language.
>
> Like http://wg21.link/p0593r2 ?
Yes. Primitives like std::bless and std::launder, even though the story is
getting really complex here, are acceptable magic.
There's no non-magical way to implement those things; they are
communicating library intent to the compiler.
On Fri, Feb 23, 2018 at 1:01 PM, Ville Voutilainen <ville.vo...@gmail.com> wrote:There's no non-magical way to implement those things; they are
communicating library intent to the compiler.
Type-based alias analysis has led C and C++ down the garden path.
Pointer arithmetic within the bounds of an allocated memory segment
is only a problem because the language standards are distorted and
wrong due to optimisationist influence.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
On Wed, Jun 8, 2016 at 11:08 PM, Ryan Ingram <ryani...@gmail.com> wrote:> (With either the new or the existing wording around "object", it's not
clear that std::vector can actually be implemented in C++. This seems
a sub-optimal state of affairs.)Hmm, perhaps I don't understand the memory model then. My usual understanding of how an implementation of vector<> would work in "strict" C++ is something along these lines (ignoring exception safety for the time being, and only showing a couple of the required methods):template <typename T>class vector {char* mBegin; // allocated with new char[]char* mEnd; // pointer within mBegin array or one-off endchar* mCapacity; // pointer one off end of mBegin arraypublic: // methods};T& vector<T>::operator[] (int index){return *reinterpret_cast<T*>(mBegin + index * sizeof(T));}void vector<T>::push_back(const T& elem){if(mEnd == mCapacity) Grow();new(mEnd) T(elem);mEnd += sizeof(T);}void vector<T>::Grow() // private{int nElems = (mCapacity - mBegin) / sizeof(T);nElems *= 2;if( nElems < 1 ) nElems = 1;char* newBegin = new char[ nElems * sizeof(T) ];char* oldCur = mBegin;char* newCur = newBegin;for(; oldCur < mEnd; oldCur += sizeof(T), newCur += sizeof(T)) {new(newCur) T(*reinterpet_cast<T*>(oldCur));reinterpret_cast<T*>(oldCur)->~T();}int size = mEnd - mBegin;delete [] mBegin;mBegin = newBegin;mEnd = mBegin + size;mCapacity = mBegin + (nElems * sizeof(T));}Which part of this is undefined according to the standard?
You didn't specify how to implement the piece that's not implementable :-)T *vector<T>::data() { return ??? }
On Saturday, March 3, 2018 at 6:34:54 AM UTC-8, Language Lawyer wrote:On Thursday, June 9, 2016 at 11:05:43 PM UTC+3, Richard Smith wrote:You didn't specify how to implement the piece that's not implementable :-)T *vector<T>::data() { return ??? }
BTW, is operator[] implementable?
I'll ask more specifically: what allows one to reinterpret_cast a pointer to storage into pointer to object stored there (reinterpret_cast<T*>(mBegin + index * sizeof(T)))?
I've thought it is related to whether the pointers are pointer-interconvertible (http://eel.is/c++draft/basic.compound#def:pointer-interconvertible ), but the rules there don't seem to allow such a cast.
The reinterpret_cast is not the problem. It's legal because you're reinterpret_casting to a pointer type matching the dynamic type of the object--namely T. What makes std::vector unimplementable is the pointer arithmetic, most obviously with std::vector<T>::data().
alignas(int) std::byte storage[sizeof(int) * 3];
int *p = new(&storage[sizeof(int) * 0]) int{ 1 };
new(&storage[sizeof(int) * 1]) int{ 2 };
new(&storage[sizeof(int) * 2]) int{ 3 };
assert(*(p + 2) == 3); // undefined behavior
What's undefined behavior is the addition of 2 to p. A singular object is considered an array of size 1 for pointer arithmetic purposes. Pointer arithmetic is only defined if the value that you're adding is such that the pointer remains within the array. It is completely irrelevant to the current text of the Standard that the object immediately after is of the same dynamic type.
Because std::vector is required to support emplace_back/push_back in such a way that iterators and pointers are not invalidated unless the backing store needs to be grown, std::vector by necessity must construct individual objects into storage much like the example I gave. So the following is equivalent to my example:
std::vector<int> v;
v.reserve(3);
v.emplace_back(1);
int *p = v.data();
v.emplace_back(2);
v.emplace_back(3);
assert(*(p + 2) == 3); // undefined behavior???
It is impossible to implement std::vector in such a way that the above is well-defined and while still meeting the other requirements of std::vector. Therefore, std::vector is necessarily a magic class, like, say, std::initializer_list.
This state of affairs is clearly broken, and there has been at least one proposal to fix the Standard in this regard, one by Richard Smith.
Melissa
--
---
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-discussion/p4BXNhTHY7U/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-discussion+unsubscribe@isocpp.org.
On Mon, Mar 5, 2018 at 8:37 PM, Myriachan <myri...@gmail.com> wrote:On Saturday, March 3, 2018 at 6:34:54 AM UTC-8, Language Lawyer wrote:On Thursday, June 9, 2016 at 11:05:43 PM UTC+3, Richard Smith wrote:You didn't specify how to implement the piece that's not implementable :-)T *vector<T>::data() { return ??? }
BTW, is operator[] implementable?
I'll ask more specifically: what allows one to reinterpret_cast a pointer to storage into pointer to object stored there (reinterpret_cast<T*>(mBegin + index * sizeof(T)))?
I've thought it is related to whether the pointers are pointer-interconvertible (http://eel.is/c++draft/basic.compound#def:pointer-interconvertible ), but the rules there don't seem to allow such a cast.
The reinterpret_cast is not the problem. It's legal because you're reinterpret_casting to a pointer type matching the dynamic type of the object--namely T. What makes std::vector unimplementable is the pointer arithmetic, most obviously with std::vector<T>::data().Unfortunately this isn't quite correct; the reinterpret_cast is insufficient. It is necessary to use launder [ptr.launder] to convert a T* pointer representing the address of a memory location to a pointer to a T object located at that address and pointer-interconvertible (as Language Lawyer mentions) with the T* pointer. launder is suitable for use by (e.g.) optional (an alternative is to use a union with a unit type), but as you correctly note below it is insufficient for the purposes of vector::data() returning a pointer that can be the operand of a non-trivial arithmetic expression.