vector<T*> vs vector<unique_ptr<T>> problems

3,121 views
Skip to first unread message

Matthew Fioravante

unread,
Jan 14, 2015, 6:55:55 PM1/14/15
to std-pr...@isocpp.org
 We have unique_ptr<T[]> and vector<unique_ptr<T>> but there is one big use case where both are inadequate.
 
 
struct FooSlow {
  std
::vector<T*> getSubset(int i) const { return { _v.begin() + i, _v.end() }; }
  std::vector<std::unique_ptr<T>> _v;
};

struct FooFast {
  carray_view
<T*> getSubset(int i) const { return { _v.begin() + 1, _v.end() }; }
  std
::vector<T*> _v;

 
~FooFast() { while(!_v.empty()) { delete v.back(); v.pop_back(); } }
};


Here we have a linear array of pointers in our Foo object and want to return a view to a subsection of the array. With vector<unique_ptr<T>>, we are forced to create a copy because of the type system.
 
With vector<T*>, we can directly return a view referencing a subset but now we have to take care of copy(), copy=, move(), move=, and the destructor.
 
Has there been any talk about a managed vector of pointers?
Having a std::ptr_vector<T*> or something similar which has the same operations as vector but cleans up the underlying memory would fix this issue.
 
struct FooFast {
  carray_view
<T*> getSubset(int i) const { return { _v.begin() + 1, _v.end() }; }
  std
::ptr_vector<T*> _v;
};

 

Another alternative would be to have some guarantee in the standard that reinterpret_cast<T*>(std::unique_ptr<T>()) is valid. That is that a unique_ptr with default deleter is exactly the same as raw pointer in terms of memory layout and we can do some ugly casting.
 
template <typename T>
T
** unique_ptr_cast<T>(unique_ptr<T>* ptr) { return reinterpret_cast<T**>(ptr); }

struct FooFast {
 carray_view
<T*> getSubset(int i) const { return { unique_ptr_cast(_v.data() + i), unique_ptr_cast(_v.data() + v.size()) }; }
 std
::vector<std::unique_ptr<T*>> _v;
};

Thoughts?

Nicola Gigante

unread,
Jan 15, 2015, 10:46:34 AM1/15/15
to std-pr...@isocpp.org
Hi.
I had a similar design problem recently but I solved it with ranges.

If you return a range that “adapts” the vector<unique_ptr<T>> by calling .get() on the instances, you’re done.

I used Boost.Range but I think that with the Niebler's library it would be as simple as:

auto getSubset(int i) const { return _v | view::slice(i, _v.size()) | view::transform(&unique_ptr<T>::get); }; }

This also nicely decouples your clients from the fact that you’re using a vector.

So I think this problem will be solved when we’ll have ranges.

That said, if in general reinterpret_cast from unique_ptr<T> to T* was not undefined behavior,
it would be great. (Or am I missing something here?)

Bye,
Nicola

 

Nicola Gigante

unread,
Jan 15, 2015, 10:50:03 AM1/15/15
to std-pr...@isocpp.org
Il giorno 15/gen/2015, alle ore 16:46, Nicola Gigante <nicola....@gmail.com> ha scritto:

That said, if in general reinterpret_cast from unique_ptr<T> to T* was not undefined behavior,
it would be great. (Or am I missing something here?)


I meant from unique_ptr<T>* to T**, of course.

Bye,
Nicola

Ville Voutilainen

unread,
Jan 15, 2015, 10:52:41 AM1/15/15
to std-pr...@isocpp.org
Sounds like such a conversion would create another opportunity to run
into the situation
described in
http://www.stroustrup.com/bs_faq2.html#conversion

Matthew Fioravante

unread,
Jan 15, 2015, 11:03:45 AM1/15/15
to std-pr...@isocpp.org
Enter code here...

Enter
code here...


On Thursday, January 15, 2015 at 10:46:34 AM UTC-5, Nicola Gigante wrote:
That said, if in general reinterpret_cast from unique_ptr<T> to T* was not undefined behavior,
it would be great. (Or am I missing something here?)
 
 
There are other cases where this kind of aliasing would be helpful:
 
extern "C" graphics_api_render(float* xyzw_points, size_t count);

struct Vec4 {
 
float x, y, z, w;
};

vec4 v
[256];
//Fill in the vectors

//Pass to graphics API
graphics_api_render
(reinterpret_cast<float*>(v), 256);

I'm not sure if it would be worth looking into adding a "user defined aliasing" feature to be able to do this portably and ensure correctness via the compiler.
 
Something like this:
 

//Specialization for default unique_ptr
template <typename T>
unique_ptr<T,std::default_delete<T>) {
  alias T*; //Says that we can alias_cast<T**>(this);
};

struct Vec4 {
  float x, y, z, w;
  alias float; //Says that we can alias_cast<float*>(this);
};

//Concepts: Fails to compile unless T can alias V* via alias keyword
template <typename T, typename V>
T alias_cast(V* obj) { return reinterpret_cast<T>(obj); }

The alias keyword could also be used to poke holes into the strict aliasing rule. That is if we have a Vec4* and a float*, the compiler may no longer assume that they do not alias each other.

Matthew Fioravante

unread,
Jan 15, 2015, 5:18:20 PM1/15/15
to std-pr...@isocpp.org
 


On Thursday, January 15, 2015 at 10:46:34 AM UTC-5, Nicola Gigante wrote:

 
If you return a range that “adapts” the vector<unique_ptr<T>> by calling .get() on the instances, you’re done.

I used Boost.Range but I think that with the Niebler's library it would be as simple as:

auto getSubset(int i) const { return _v | view::slice(i, _v.size()) | view::transform(&unique_ptr<T>::get); }; }

This also nicely decouples your clients from the fact that you’re using a vector.

So I think this problem will be solved when we’ll have ranges.
 
 
Adapting ranges don't actually solve it for me, I'll show more of my use case:
 
struct Foo {
  vector
<T*> inputs; //Owned by someone else
  vector
<std::vector<T*>> stages; //Owned by us
 
  array_view
<T*> getInputs() { return inputs; }
  array_view
<T*> getOutputs() {
   
return stages.empty() ? getInputs() : stages.back()
 
}
 
~Foo() { for(auto& s: stages) for(auto& t: s) delete s; }
};

 

In this situation I'm really forced to manage the memory myself because std::unique_ptr<T> and T* are different types. If we could portably alias unique_ptr<T>* to T**, then I could take advantage of the automatic memory management.

David Rodríguez Ibeas

unread,
Jan 16, 2015, 10:30:50 AM1/16/15
to std-pr...@isocpp.org
Not ideal, but you could have an additional member:

vector<T*> current;

And have it initialized with 'inputs' and updated with the current view of the data whenever a stage completes... If you don't need to go back to the inputs after the first stage completes, you could reuse the 'inputs' member.

Regarding the aliasing of 'unique_ptr<T>*' with 'T**' (or even 'T *const *'), the problem is that this makes sense only for specializations of 'std::unique_ptr<T,D>' with stateless deleters, and would probably be confusing (why is it sometimes "convertible", sometimes not) or dangerous (reinterpreting the deleter as a 'T*').

David

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Nicola Gigante

unread,
Jan 16, 2015, 12:46:57 PM1/16/15
to std-pr...@isocpp.org
Il giorno 15/gen/2015, alle ore 23:18, Matthew Fioravante <fmatth...@gmail.com> ha scritto:


 
Adapting ranges don't actually solve it for me, I'll show more of my use case:
 
struct Foo {
  vector<T*> inputs; //Owned by someone else
  vector<std::vector<T*>> stages; //Owned by us
 
  array_view<T*> getInputs() { return inputs; }
  array_view<T*> getOutputs() {
    return stages.empty() ? getInputs() : stages.back()
  }
  ~Foo() { for(auto& s: stages) for(auto& t: s) delete s; }
};
 
In this situation I'm really forced to manage the memory myself because std::unique_ptr<T> and T* are different types. If we could portably alias unique_ptr<T>* to T**, then I could take advantage of the automatic memory management.


I’m sorry, I don’t get it.

Why can’t you return a view of inputs if stages is empty and return a view of stages.back() otherwise?

Bye,
Nicola

Matthew Fioravante

unread,
Jan 16, 2015, 12:58:01 PM1/16/15
to std-pr...@isocpp.org

On Friday, January 16, 2015 at 10:30:50 AM UTC-5, David Rodríguez Ibeas wrote:
Not ideal, but you could have an additional member:

vector<T*> current;

And have it initialized with 'inputs' and updated with the current view of the data whenever a stage completes... If you don't need to go back to the inputs after the first stage completes, you could reuse the 'inputs' member.
 
Allocating another vector and wasting more space and paying for additional memory allocations just to work around a quirk of the language is a tradeoff I'm rarely willing to make. 

Regarding the aliasing of 'unique_ptr<T>*' with 'T**' (or even 'T *const *'), the problem is that this makes sense only for specializations of 'std::unique_ptr<T,D>' with stateless deleters, and would probably be confusing (why is it sometimes "convertible", sometimes not) or dangerous (reinterpreting the deleter as a 'T*').
 
Using something new like alias_cast<T>() could catch these issues (casting a unique_ptr* with a stateful deleter to T*) at compile time. I do not think it would be a good idea to just create one weird exception for reinterpret_cast<> and leave it to users to discover misuse at runtime.
 
 
On Friday, January 16, 2015 at 12:46:57 PM UTC-5, Nicola Gigante wrote:
 
I’m sorry, I don’t get it.

Why can’t you return a view of inputs if stages is empty and return a view of stages.back() otherwise?
 
 
I can if I manage the memory myself and use std::vector<T*> for both inputs and stages. If I want to use std::vector<std::unique_ptr<T>> for the stages I'm screwed because then stages and inputs are different type and I cannot create a view which may reference either one.
 
Adding a single new / delete pair to any class adds a lot of necessary boilerplate code (the big 5), potential for memory bugs, and careful exception safety concerns in implementation. I really don't like writing new and delete anymore.
 
 

inkwizyt...@gmail.com

unread,
Jan 16, 2015, 2:06:50 PM1/16/15
to std-pr...@isocpp.org
One solution is use unique pointer with custom deleter for both vector. Foreign pointer would have dummy destructor.
If you create custom class for that deleter, overhead would be 1 byte if you are lucky.

Another is replace vector with something that own pointers in it and can return array of them.

Nevin Liber

unread,
Jan 16, 2015, 2:11:31 PM1/16/15
to std-pr...@isocpp.org
On 14 January 2015 at 17:55, Matthew Fioravante <fmatth...@gmail.com> wrote:
 We have unique_ptr<T[]> and vector<unique_ptr<T>> but there is one big use case where both are inadequate.
 
 
struct FooSlow {
  std
::vector<T*> getSubset(int i) const { return { _v.begin() + i, _v.end() }; }
  std::vector<std::unique_ptr<T>> _v;
};

struct FooFast {
  carray_view
<T*> getSubset(int i) const { return { _v.begin() + 1, _v.end() }; }
  std
::vector<T*> _v;

 
~FooFast() { while(!_v.empty()) { delete v.back(); v.pop_back(); } }
};


Here we have a linear array of pointers in our Foo object and want to return a view to a subsection of the array. With vector<unique_ptr<T>>, we are forced to create a copy because of the type system.
 
With vector<T*>, we can directly return a view referencing a subset but now we have to take care of copy(), copy=, move(), move=, and the destructor.

Given that you said this is a big use case already, how are you dealing with copying FooSlow objects?

If you really want to do something like this, why can't you just use vector<shared_ptr<T>>?  You at least get the pointer syntax with any range you return.
 
  Another alternative would be to have some guarantee in the standard that reinterpret_cast<T*>(std::unique_ptr<T>()) is valid.

Seems extremely unlikely C++ will move in this direction.

I don't see why this case is different than any other case where you have a vector<A> and want to return a range<B>.  I don't see that the unique_ptr<T> -> T* case is all that special (or even common).
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Matthew Fioravante

unread,
Jan 16, 2015, 2:36:11 PM1/16/15
to std-pr...@isocpp.org

On Friday, January 16, 2015 at 2:11:31 PM UTC-5, Nevin ":-)" Liber wrote:
 
Given that you said this is a big use case already, how are you dealing with copying FooSlow objects?
 
I'm not. I've opted to manage the memory manually in my code by just using 2 vector<T*>.
 
 
If you really want to do something like this, why can't you just use vector<shared_ptr<T>>?  You at least get the pointer syntax with any range you return.
 
Using shared_ptr<> would indeed solve my current problem but that change has much farther rearching consequences. Adding shared_ptr<T> all over the place is not usually a good solution unless your system is designed shared ownership semantics. Shared ownership has performance overhead and it also makes it very hard to reason about when resources are actually cleaned up.
 
 
  Another alternative would be to have some guarantee in the standard that reinterpret_cast<T*>(std::unique_ptr<T>()) is valid.

Seems extremely unlikely C++ will move in this direction.
 
Thats not surprising, such a feature seems low level, dangerous, and only useful in some narrow cases. Still it would be nice to have. Even though its ugly, it would allow me in this case to design a nice interface with the ugly cast buried in the private implementation details.
 
I'd be interested to know if such a thing could be used within a greater scope for optimizing a ranges library.
 
I don't see why this case is different than any other case where you have a vector<A> and want to return a range<B>.  I don't see that the unique_ptr<T> -> T* case is all that special (or even common).
 
The general problem is as you said vector<A> -> range<B>. Actually my case its even worse because I'm trying to do construct a range<B> sometimes from vector<A> and sometimes from vector<B>, so the hypothetical range type needs to keep track of how it was constructed and do the adaptation as needed.
 
This is generally not possible unless your range object contains a lot of additional logic and references for transparently adapting A to B. This additional logic for doing the adaptation has some performance cost as well as adding complexity to the code. In terms of interface design, returning an array_view<T> is as self-explanatory as it gets. Returning some weird templated range object not as much.
 
My situation is the case is when A is bitwise compatible with B and has the same semantics. In this situation the range can just reinterpret_cast A to B and be iterated over by the called with 0 overhead.
 
For my particular example unique_ptr (with default deleter) is just a raw pointer with some extra type system magic to tell the C++ compiler to generate code to manage the memory for you. This seems the most common use case. The other I've seen a lot as mentioned before is in linear algebra vectors and matrices which need to be casted to float* when its time to send them to the graphics API.
 
 

Nevin Liber

unread,
Jan 16, 2015, 2:54:46 PM1/16/15
to std-pr...@isocpp.org
On 16 January 2015 at 13:36, Matthew Fioravante <fmatth...@gmail.com> wrote:
 
  Another alternative would be to have some guarantee in the standard that reinterpret_cast<T*>(std::unique_ptr<T>()) is valid.

Seems extremely unlikely C++ will move in this direction.
Thats not surprising, such a feature seems low level, dangerous, and only useful in some narrow cases. Still it would be nice to have.

Would it?  This provides a hole in encapsulation as big as a truck.

Do you really want to allow people to legally write

*reinterpret_cast<T**>(p) = nullptr;

instead of p.release()?  Or worse,

*reinterpret_cast<T**>(p)  = q;

thus bypassing all of unique_ptr's invariants.
 
Even though its ugly, it would allow me in this case to design a nice interface with the ugly cast buried in the private implementation details.

So you want your classes to have private implementation details but you don't want the standard library classes to have private implementation details.  Really?

The general problem is as you said vector<A> -> range<B>. Actually my case its even worse because I'm trying to do construct a range<B> sometimes from vector<A> and sometimes from vector<B>, so the hypothetical range type needs to keep track of how it was constructed and do the adaptation as needed.
 
This is generally not possible unless your range object contains a lot of additional logic and references for transparently adapting A to B. This additional logic for doing the adaptation has some performance cost as well as adding complexity to the code. In terms of interface design, returning an array_view<T> is as self-explanatory as it gets. Returning some weird templated range object not as much.
 
My situation is the case is when A is bitwise compatible with B and has the same semantics. In this situation the range can just reinterpret_cast A to B and be iterated over by the called with 0 overhead.
 
For my particular example unique_ptr (with default deleter) is just a raw pointer with some extra type system magic to tell the C++ compiler to generate code to manage the memory for you.

A.  Nothing in the standard requires that implementation.

B.  Classes maintain invariants.  You shouldn't be able to legally get around that just by performing a cast, or you'll never be able to reason about the resulting mess.

Greg Marr

unread,
Jan 16, 2015, 2:59:08 PM1/16/15
to std-pr...@isocpp.org
On Friday, January 16, 2015 at 2:36:11 PM UTC-5, Matthew Fioravante wrote:
My situation is the case is when A is bitwise compatible with B and has the same semantics. In this situation the range can just reinterpret_cast A to B and be iterated over by the called with 0 overhead.

Ah, but there's the problem.  Even though A may be bitwise compatible with B, it does not have the same semantics.  There is a major difference when it comes to assignment.

With std::vector<T *>, assigning to an element just changes the pointer.  With std::vector<std::unique_ptr<T>>, assigning to an element deletes an object.  Therefore, the two types are not compatible.

Matthew Fioravante

unread,
Jan 16, 2015, 3:21:22 PM1/16/15
to std-pr...@isocpp.org

On Friday, January 16, 2015 at 2:54:46 PM UTC-5, Nevin ":-)" Liber wrote:
Would it?  This provides a hole in encapsulation as big as a truck.

Do you really want to allow people to legally write

*reinterpret_cast<T**>(p) = nullptr;

instead of p.release()?  Or worse,

*reinterpret_cast<T**>(p)  = q;

thus bypassing all of unique_ptr's invariants.
This is a very good point, I forgot to add in const. I didn't show it my example here but I'm actually returning a carray_view<T*> (i.e array_view<T* const>).
 
In this case we can only view the pointer but not modify it. We can of course do non-const operations on the pointed to objects, but thats the same for a const unique_ptr. We can't do anything here that would violate unique_ptr's invariants.
 
For the case of the Vec4, aliasing to non-const float* may be fine as well as long as the Vec4 type doesn't maintain any invariants over its elements (basically a POD style struct).
 
 
Even though its ugly, it would allow me in this case to design a nice interface with the ugly cast buried in the private implementation details.

So you want your classes to have private implementation details but you don't want the standard library classes to have private implementation details.  Really?
 
Adding a specification that std::unique_ptr<T> can alias to T* const makes that detail a part of its interface and as a consequence places restrictions on the private implementation.
 

The general problem is as you said vector<A> -> range<B>. Actually my case its even worse because I'm trying to do construct a range<B> sometimes from vector<A> and sometimes from vector<B>, so the hypothetical range type needs to keep track of how it was constructed and do the adaptation as needed.
 
This is generally not possible unless your range object contains a lot of additional logic and references for transparently adapting A to B. This additional logic for doing the adaptation has some performance cost as well as adding complexity to the code. In terms of interface design, returning an array_view<T> is as self-explanatory as it gets. Returning some weird templated range object not as much.
 
My situation is the case is when A is bitwise compatible with B and has the same semantics. In this situation the range can just reinterpret_cast A to B and be iterated over by the called with 0 overhead.
 
For my particular example unique_ptr (with default deleter) is just a raw pointer with some extra type system magic to tell the C++ compiler to generate code to manage the memory for you.

A.  Nothing in the standard requires that implementation.
 
It's the most reasonable implementation of unique_ptr. The whole idea of this thread is discussing whether or not some feature to specify that restriction and sub-sequently adding to unique_ptr would make sense to do.
 

B.  Classes maintain invariants.  You shouldn't be able to legally get around that just by performing a cast, or you'll never be able to reason about the resulting mess.
 
I agree, and if used correctly via const the invariants can be maintained.
 

Douglas Boffey

unread,
Jan 17, 2015, 3:01:24 AM1/17/15
to std-pr...@isocpp.org
Of course, you could always use a union (I can't believe I just said that :) )

Matthew Fioravante

unread,
Feb 23, 2015, 6:52:17 PM2/23/15
to std-pr...@isocpp.org
Sorry to beat a probably dead horse, but I really keep running into this problem over and over again. Even if you aren't using array_view, its still a problem.

class Foo {
public:
 
const auto& getA() { return a; }
 
const auto& getB() { return b; }
private:
  vector
<std::unique_ptr<T>> a;
  vector
<T*> b;
}:

In order to avoid a copy, we're forced to leak the implementation details that A is managed by unique_ptr but B is not. Even if we decide making a copy is ok, this still means that getA() has to return by value while getB() can continue to return by const reference. The ideal scenario which is just returning array_view<T*> is impossible because there is nowhere to store the copy of a.

Both a and b are just vectors of pointers. The fact that a has additional logic tacked on to automatically manage memory is irrelevant to the client of Foo.

This issue is making unique_ptr very cumbersome to use. Ideally I'd like to just be able to specify memory management once within the containing class data members and then pass around array_view<T*> everywhere externally, clients not knowing, caring, or recompiling whether the view points to a set of managed or unmanaged pointers.

Nicol Bolas

unread,
Feb 23, 2015, 9:24:43 PM2/23/15
to std-pr...@isocpp.org
On Monday, February 23, 2015 at 6:52:17 PM UTC-5, Matthew Fioravante wrote:
Sorry to beat a probably dead horse, but I really keep running into this problem over and over again. Even if you aren't using array_view, its still a problem.

class Foo {
public:
 
const auto& getA() { return a; }
 
const auto& getB() { return b; }
private:
  vector
<std::unique_ptr<T>> a;
  vector
<T*> b;
}:

In order to avoid a copy, we're forced to leak the implementation details that A is managed by unique_ptr but B is not. Even if we decide making a copy is ok, this still means that getA() has to return by value while getB() can continue to return by const reference. The ideal scenario which is just returning array_view<T*> is impossible because there is nowhere to store the copy of a.

Both a and b are just vectors of pointers. The fact that a has additional logic tacked on to automatically manage memory is irrelevant to the client of Foo.

But it's very relevant to Foo. And since he's the owner and creator of the array, he's the one who decides these things.

For all intents and purposes, your problem is the lack of modules in C++. Here's what I mean.

What you really want to do is give people a non-modifiable view of your stored data. Your users won't care what specific types you may wrap your stored data in, so long as they obey certain basic principles (ie: the expectations of a pointer to T).

The "correct" way to handle this is with templates. That is, code that wants to consume your array_view would be a template, on the array view's type (or the template type could be more general, a non-modifiable range of pointer-to-T-like objects. Whatever). The problem is that now, all your consuming code needs to live in header files. As much any code that just passes your stuff along to others. Templates are rather viral like that.

If you had modules, you wouldn't have to care. You'd just return an array_view<unique_ptr<T>>, and those who wanted to consume it would be a template that takes an appropriate range. Nobody would be subjected to the details of your memory management (unless they deliberately choose to be). And thanks to modules, they wouldn't be subjected to the horrible compile-times that come along with doing all this (which is the #1 reason why people avoid doing it today. #2 is the lack of concepts).

Basically, this would be a solved problem if compile-time wasn't an issue.

This issue is making unique_ptr very cumbersome to use. Ideally I'd like to just be able to specify memory management once within the containing class data members and then pass around array_view<T*> everywhere externally, clients not knowing, caring, or recompiling whether the view points to a set of managed or unmanaged pointers.

Well, that's just not going to happen. std::unique_ptr does not (necessarily) have zero overhead. It all depends on the deleter. Plus, that would require aliasing.

The only solution here is a container that specifically handles memory allocation, like Boost's ptr_vector. But I don't think this is a frequent enough problem to have them standardize such a special use container.

Matthew Fioravante

unread,
Feb 23, 2015, 10:26:48 PM2/23/15
to std-pr...@isocpp.org
If unique_ptr<T> is untouchable, the other solution could be to make a new smart pointer class trivial_ptr<T> which is like unique_ptr<T> but only supporting only a stateless custom deleter and is guaranteed to alias with T* const. It can provide functions to safety perform the cast. One can easily move a unique_ptr<T> into and out of trivial_ptr<T>.

class Foo {
 
public:
    array_view
<T*> getA() const { return _a.empty() ? {} : { &_a.front()->alias(), _a.size() };
    array_view
<T*> getB() const { return _b; }
 
private:
    vector
<trivial_ptr<T>> _a;
    vector
<T*> _b;
};



On Monday, February 23, 2015 at 9:24:43 PM UTC-5, Nicol Bolas wrote:
On Monday, February 23, 2015 at 6:52:17 PM UTC-5, Matthew Fioravante wrote:
Sorry to beat a probably dead horse, but I really keep running into this problem over and over again. Even if you aren't using array_view, its still a problem.

class Foo {
public:
 
const auto& getA() { return a; }
 
const auto& getB() { return b; }
private:
  vector
<std::unique_ptr<T>> a;
  vector
<T*> b;
}:

In order to avoid a copy, we're forced to leak the implementation details that A is managed by unique_ptr but B is not. Even if we decide making a copy is ok, this still means that getA() has to return by value while getB() can continue to return by const reference. The ideal scenario which is just returning array_view<T*> is impossible because there is nowhere to store the copy of a.

Both a and b are just vectors of pointers. The fact that a has additional logic tacked on to automatically manage memory is irrelevant to the client of Foo.

But it's very relevant to Foo. And since he's the owner and creator of the array, he's the one who decides these things.

Yes that's right, but Foo doesn't have to advertise the fact unless asked for it. Which data is or is not managed may be a part of the Foo specification, but this information is not necessary for clients requesting a const view over the data. The fact that its managed by unique_ptr and not some other unique ownership mechanism is a pure implementation detail which should not be leaked to clients.

By returning different types, we make it more difficult to write generic code without templates because we have this artificial dependency leaking all over the place. Even if foo does manage the resources referred to by getA(), the resource management may be performed using another data structure other than the array getA() references. If getA() could return array_view<T*>, we would have better encapsulation and freedom for the implementation of Foo.

For all intents and purposes, your problem is the lack of modules in C++. Here's what I mean.

What you really want to do is give people a non-modifiable view of your stored data. Your users won't care what specific types you may wrap your stored data in, so long as they obey certain basic principles (ie: the expectations of a pointer to T).

The "correct" way to handle this is with templates. That is, code that wants to consume your array_view would be a template, on the array view's type (or the template type could be more general, a non-modifiable range of pointer-to-T-like objects. Whatever). The problem is that now, all your consuming code needs to live in header files. As much any code that just passes your stuff along to others. Templates are rather viral like that.

If you had modules, you wouldn't have to care. You'd just return an array_view<unique_ptr<T>>, and those who wanted to consume it would be a template that takes an appropriate range. Nobody would be subjected to the details of your memory management (unless they deliberately choose to be). And thanks to modules, they wouldn't be subjected to the horrible compile-times that come along with doing all this (which is the #1 reason why people avoid doing it today. #2 is the lack of concepts).

Basically, this would be a solved problem if compile-time wasn't an issue.

I haven't studied the latest work in modules yet. If I understand correctly, you're saying with modules basically everything can be a template and whether or not the range iterates over unique_ptr<T> or T* is irrelevant because it will just flow though the type system via type deduction like magic.

This works fine until you need to pass the thing to a third party library which views a pointer by taking T* by reference. It also burdens all of your code with more template syntax when it could just be simple function calls. The more meta-programming you add to your code, the more complex it becomes and the more time you have to spend making things generic. Finally, templates generate code. If I have 2 implementations of a function, 1 for T* and 1 for std::unique_ptr<T> then the template system will generate 2 copies of the same code.
 

This issue is making unique_ptr very cumbersome to use. Ideally I'd like to just be able to specify memory management once within the containing class data members and then pass around array_view<T*> everywhere externally, clients not knowing, caring, or recompiling whether the view points to a set of managed or unmanaged pointers.

Well, that's just not going to happen. std::unique_ptr does not (necessarily) have zero overhead. It all depends on the deleter.

If there is a custom or maybe even stateless deleter, I have yet to find a good reason (other than attaching debugging info) to not restrict implementations of unique_ptr to store only a T*. Adding a new type like trivial_ptr<T> seems less optimal than just reusing unique_ptr<T> with a stateless deleter. The alias() call can be conceptified to reject unique_ptr's, with stateful deleters.
 
Plus, that would require aliasing.

Ugly scary aliasing is the most efficient way to solve this problem. On modern machines, arrays are almost always the most efficient data structure to store data. array_view<T> brings an extremely flexible range library like solution for this most ubiquitous and efficient data layout, often without requiring templates at all.

If you have an array of wrapper<T> stored in memory whose lifetime is managed by a stateless wrapper type, you need aliasing to efficiently retrieve a const view over the array of T.
 

The only solution here is a container that specifically handles memory allocation, like Boost's ptr_vector. But I don't think this is a frequent enough problem to have them standardize such a special use container.

I think the ptr_vector approach is not generic enough. Basically you have to create a copy of the std::vector<T> specification with some customized behavior only for pointers. Lifetime management is fully specified by T so a better solution is just to create a new stateless wrapper type to manage T* and be aliasable to to T*.
 

Nicol Bolas

unread,
Feb 24, 2015, 9:39:41 AM2/24/15
to std-pr...@isocpp.org
On Monday, February 23, 2015 at 10:26:48 PM UTC-5, Matthew Fioravante wrote:
But it's very relevant to Foo. And since he's the owner and creator of the array, he's the one who decides these things.

Yes that's right, but Foo doesn't have to advertise the fact unless asked for it. Which data is or is not managed may be a part of the Foo specification, but this information is not necessary for clients requesting a const view over the data. The fact that its managed by unique_ptr and not some other unique ownership mechanism is a pure implementation detail which should not be leaked to clients.

By returning different types, we make it more difficult to write generic code without templates because we have this artificial dependency leaking all over the place. Even if foo does manage the resources referred to by getA(), the resource management may be performed using another data structure other than the array getA() references. If getA() could return array_view<T*>, we would have better encapsulation and freedom for the implementation of Foo.

Foo wouldn't be free to use an internal implementation that wasn't magically aliasable to a C-array of T. For example, it couldn't suddenly decide that it needs shared ownership.

If your consuming code was fully generic (ie: templates), then Foo would truly be free to implement whatever it wanted.
 
For all intents and purposes, your problem is the lack of modules in C++. Here's what I mean.

What you really want to do is give people a non-modifiable view of your stored data. Your users won't care what specific types you may wrap your stored data in, so long as they obey certain basic principles (ie: the expectations of a pointer to T).

The "correct" way to handle this is with templates. That is, code that wants to consume your array_view would be a template, on the array view's type (or the template type could be more general, a non-modifiable range of pointer-to-T-like objects. Whatever). The problem is that now, all your consuming code needs to live in header files. As much any code that just passes your stuff along to others. Templates are rather viral like that.

If you had modules, you wouldn't have to care. You'd just return an array_view<unique_ptr<T>>, and those who wanted to consume it would be a template that takes an appropriate range. Nobody would be subjected to the details of your memory management (unless they deliberately choose to be). And thanks to modules, they wouldn't be subjected to the horrible compile-times that come along with doing all this (which is the #1 reason why people avoid doing it today. #2 is the lack of concepts).

Basically, this would be a solved problem if compile-time wasn't an issue.

I haven't studied the latest work in modules yet. If I understand correctly, you're saying with modules basically everything can be a template and whether or not the range iterates over unique_ptr<T> or T* is irrelevant because it will just flow though the type system via type deduction like magic.

This works fine until you need to pass the thing to a third party library which views a pointer by taking T* by reference.

Yes, that happens. But how is that any different from any interface with a third party library? There are always going to be libraries that use something odd in their interfaces, that requires translation. What if you're trying to stick that array into a Lua table or a Python array? You need some form of translation.

You would have this exact problem if 'Foo' stopped using a contiguous, alias-able array. Isn't that the whole point? To not have third-party libraries place arbitrary restrictions on your data structures (like "you must use a contiguous array of pointers to some type")?
 
It also burdens all of your code with more template syntax when it could just be simple function calls. The more meta-programming you add to your code, the more complex it becomes and the more time you have to spend making things generic.

That's the point of concepts; it makes "template syntax" not be a "burden". The point of modules is to make using "more template syntax" practical.

So my original point stands. It's not that you don't have a solution in C++. It's that the C++ solution has these burdens on it that shouldn't be there. Once those burdens are eased and/or eliminated, then your problem is effectively solved.
 

This issue is making unique_ptr very cumbersome to use. Ideally I'd like to just be able to specify memory management once within the containing class data members and then pass around array_view<T*> everywhere externally, clients not knowing, caring, or recompiling whether the view points to a set of managed or unmanaged pointers.

Well, that's just not going to happen. std::unique_ptr does not (necessarily) have zero overhead. It all depends on the deleter.

If there is a custom or maybe even stateless deleter, I have yet to find a good reason (other than attaching debugging info) to not restrict implementations of unique_ptr to store only a T*. Adding a new type like trivial_ptr<T> seems less optimal than just reusing unique_ptr<T> with a stateless deleter. The alias() call can be conceptified to reject unique_ptr's, with stateful deleters.
 
Plus, that would require aliasing.

Ugly scary aliasing is the most efficient way to solve this problem. On modern machines, arrays are almost always the most efficient data structure to store data. array_view<T> brings an extremely flexible range library like solution for this most ubiquitous and efficient data layout, often without requiring templates at all.

If you have an array of wrapper<T> stored in memory whose lifetime is managed by a stateless wrapper type, you need aliasing to efficiently retrieve a const view over the array of T.

That would require aliasing. Which is forbidden in C++, regardless of the size or representation of whatever wrapper you're using. Therefore, what you're asking for is a language change, not merely a library thing.

And breaking strict aliasing just ain't gonna happen.

You can either keep wanting something you're not going to get, or work with what you actually have. I prefer option 2.
 
 

The only solution here is a container that specifically handles memory allocation, like Boost's ptr_vector. But I don't think this is a frequent enough problem to have them standardize such a special use container.

I think the ptr_vector approach is not generic enough. Basically you have to create a copy of the std::vector<T> specification with some customized behavior only for pointers. Lifetime management is fully specified by T so a better solution is just to create a new stateless wrapper type to manage T* and be aliasable to to T*.

Why is it "not generic enough"? Sure, it specifically supports only unique ownership, but... so what? What other kind of "lifetime management" are you going to be able to do that is both stateless and (theoretically) aliasable to T*?

ptr_vector is specifically designed to solve your exact problem. And it doesn't require breaking strict aliasing (which again, is not gonna happen). So why would you not use it?

Matthew Fioravante

unread,
Feb 27, 2015, 10:03:28 PM2/27/15
to std-pr...@isocpp.org
If we have ptr_vector, why not other kinds of ptr_ data structures? Are absolutely sure vector is the only data structure worth this specialization?

 
Sure, it specifically supports only unique ownership, but... so what?
What other kind of "lifetime management" are you going to be able to do that is both stateless and (theoretically) aliasable to T*?

Any stateless memory allocator can be used in this context.
 

ptr_vector is specifically designed to solve your exact problem. And it doesn't require breaking strict aliasing (which again, is not gonna happen). So why would you not use it?

If ptr_vector is the only possible solution then it will do because at least it solves the immediate problem.

inkwizyt...@gmail.com

unread,
Feb 28, 2015, 8:35:49 AM2/28/15
to std-pr...@isocpp.org
I think is possible to create `unique_container` that will warp around normal containers and have limited interface to maintain invariants.
Simplest and dummiest implementation will look like:
template<typename Container>
class unique_container
{
   
Container c;
public:
   
template<typename... T>
   
void emplace_back(T... t) { c.push_back(new Container::value_type(std::forward<T>(t)...)); }
   
void erase(size_t i) { delete c[i]; c.erase(i); }
   
const Container& get() const { returns c; }
   
~unique_container() { for(auto& i : c) { delete i; } }
};

Nicol Bolas

unread,
Feb 28, 2015, 12:40:31 PM2/28/15
to std-pr...@isocpp.org

Because they don't make sense. The only reason ptr_vector is even needed is to allow aliasing of what is to effectively allow std::vector<unique_ptr<T>> to be aliased with T*. If you need a std::list<T*> with some particular containment strategy, just wrap that strategy in a smart pointer and make a std::list<smart_ptr<T>>.

 
Sure, it specifically supports only unique ownership, but... so what?
What other kind of "lifetime management" are you going to be able to do that is both stateless and (theoretically) aliasable to T*?

Any stateless memory allocator can be used in this context.

I'll assume you said, "any memory allocator who's state is not stored in the smart pointer". Since that allows for intrusive smart pointer emulation. They do have state; it's just that their state isn't stored in the smart pointer. It's stored in the T object.

So yes, I could see some kind of "deleter" object being used for a standardized ptr_vector, which would provide an interface for intrusive state. On acquiring a pointer, it would call an acquire method, and on releasing it, it would call a release method. So if you're using shared ownership via intrusive data, then your acquire and release methods would cover that.
Reply all
Reply to author
Forward
0 new messages