Complexities caused by using unique_ptr and move semantics

185 views
Skip to first unread message

Matthew Fioravante

unread,
Mar 23, 2017, 12:38:19 PM3/23/17
to ISO C++ Standard - Future Proposals
I don't have a proposal or a real solution. This is more intended to highlight some of the problems related to unique ownership and move semantics.

1. Its too easy to accidentally use a moved from object by mistake

Consider this code:

void Container::addThing(unique_ptr<Thing> p) {
 
this->_things.push_back(std::move(p));
 
if(logging_enabled) {
   std
::cout << "Added thing " << p->name() << " " << p->foo() << " " << p->bar() << std::endl;
 
}
 
return;
}

I've seen this kind of bug many times. Its very easy to write and the compiler provides no help to you. Even better since the bug is hidden when logging is disabled, it could easily pass into production. Its also somewhat of an expert problem when explaining to novices.

Now here's an attempt to fix the bug the right way. But we still have a mistake! Again no help from the compiler. Also unless you're really paying attention, a quick skim through the code will likely miss this one.

void Container::addThing(unique_ptr<Thing> p) {
 
auto* pc = p.get();
 
this->_things.push_back(std::move(p));
 
if(logging_enabled) {
   std
::cout << "Added thing " << pc->name() << " " << pc->foo() << " " << p->bar() << std::endl;
 
}
 
return;
}

After we move p into _things, we never need to touch p again. Unfortunately in this case and many others, we can't really introduce scopes with {} to eliminate p from the local namespace and prevent these kinds of bugs. 

A possible solution here might be some kind of [[discard]] attribute or a std::move_final() which is std::move() + [[discard]] together. So that the compiler would warn on any use of p after the move. 


2. unique_ptr<T> and T* being different types can cause pessimizations because of language rules.

Of course its a good thing that these types are different. One is an owner and one is not. Using a different type means we can enlist the help of the compiler to enforce correctness.

Sometimes I have a container with keeps a set of unique_ptrs, and I want to view that set.

class OwningContainer {
 
public:
 
array_view<const unique_ptr<Thing>> getThings() const { return _things; }
 
private:
  std
::vector<unique_ptr<Thing>> _things;
};

class NonOwningContainer {
 
public:
 
array_view<const Thing*> getThings() const { return _things; }
 
private:
  std
::vector<Thing*> _things;
};

//Out of line function, maybe lives in a 3rd party library.
void f(array_view<const Thing*> things);

void g(const OwningContainer& c){
  f
(c.getThings()); //Compiler Error
}
void h(const NonOwningContainer& c) {
  f
(c.getThings()); //Ok
}

The getThings() method of OwningContainer and NonOwningContainer methods have the exact same semantics. We're getting a const view of the stored thing pointers. The returned objects are even bitwise and machine code (after optimization) identical, but are "marked up" by the compiler with different types.

From the limited perspective of the code calling getThings(), whether or not the container owns the pointers is an implementation detail. The caller does not and should not care whether the pointers are stored raw, unique_ptr. He just wants to view the collection and do something with it.

The big problem here is that the return types of getThings() are different. This means that in a generic context, you need to start introducing templates in order to handle all possible pointer types. Adding templates complicates the code and slows down compile times.

Also since const unique_ptr<T> and const T* are semantically and even bitwise identical, using templates here will unnecessarily bloat your code with 2 functions that do the exact same thing. This increases your binary size and puts more pressure on the icache.

In this example, we must change f() to be a template. Even though the actual compiled down machine code will be identical. The biggest problem I have with this example is that the problem comes from artificial language rules and not physical limits about how hardware and memory works. That goes against the zero overhead principle of C++.

In order to avoid making f() a template here, there are a few options today we can try with OwningContainer:
1. Return vector<Thing*> by value, doing a copy and memory allocations at every call. (very slow)
2. Store a second vector<Thing*> inside OwningContainer, keep it in sync with the unique_ptr version, and return it in getThings(). (twice memory usage, slow, complicated, error prone)
3. Abandon unique_ptr inside of OwningContainer, and go back to manually managing the memory. (error prone and greatly increases development time)

Possible solutions to this include:
1. Add some kind of way to alias a unique_ptr<T> into a T*. Letting me essentially convert an array_view<const unique_ptr<T>> to array_view<const T>. In terms of how the implementation actually works on the machine, this is a trivial no-op. In terms of language rules its a complete nightmare.
2. Provide a specialized unique_vector<T*> which essentially operates like vector<unique_ptr<T>>, but exposes T* in its const interface. Then we can construct array_view<const T> over vector<T> and unique_vector<T>. Hiding the ownership implementation details and avoiding the need to for artificial templates.  This would solve the immediate example I've shown, but I'm not sure if its too specific and leaves out other similar situations.

How would you solve these 2 issues?

Have you seen any other complexities show up in your code from adopting unique_ptr?

Sean Middleditch

unread,
Mar 24, 2017, 5:57:11 PM3/24/17
to ISO C++ Standard - Future Proposals
On Thursday, March 23, 2017 at 9:38:19 AM UTC-7, Matthew Fioravante wrote:
1. Its too easy to accidentally use a moved from object by mistake

This is a problem with C++'s move semantics (e.g., non-destructive move and unspecific states). These problems have been discussed plenty of times before, particularly in destructive-move threads.

One solution in the works may be contract programming. Some of the proposals allow a function to set a post-condition state on an object which can be verified by a compiler or static analysis in a pre-condition. e.g., unique_ptr might have something like:

  [[post_condition(src): empty]
  unique_ptr(unique_ptr&& src);

  [[pre_condition(this): !empty]
  unique_ptr::operator->();

Such a setup can further hypothetically address other issues like accidental dereferencing of default-constructed unique_ptrs. As well as a wide host of other issues.
 
2. unique_ptr<T> and T* being different types can cause pessimizations because of language rules.

I personally most see this as a problem caused by a lack of generators, or more specifically at the lack of range concepts (which of course are in the works).

You usually don't want to require specifically a vector nor specifically a pointer of any kind. You want to return an object that can be enumerated to produce references to T objects. The vector itself or unique_ptr vs raw pointers is an implementation detail that callers should not have to care about.

With range concepts, this all still has to be a template of course, but you don't have to over-worry about the specifics while using the interface.

With generators, this can be abstracted without templates (though perhaps at a runtime cost).

Basically, you want to write something like:

  iterable<T&> foo::get_things() { return make_iterable(_things, [](auto& p){ return *p; }); }

so user code looks unanimously like so:

  for (auto& val : f.get_things())
   do_stuff_with(val);

that specific case can be implemented as a set of "mapping_iterator" (an iterator that returns the result of a wrapped iterator filtered through a function) stored in a range, but there'd likely need to be something more general to cover more use cases.

I don't know for sure if such functionality is in the ranges ts anywhere, but I'd honestly be surprised if it wasn't.

Nicol Bolas

unread,
Mar 24, 2017, 6:25:51 PM3/24/17
to ISO C++ Standard - Future Proposals
On Friday, March 24, 2017 at 5:57:11 PM UTC-4, Sean Middleditch wrote:
On Thursday, March 23, 2017 at 9:38:19 AM UTC-7, Matthew Fioravante wrote:
1. Its too easy to accidentally use a moved from object by mistake

This is a problem with C++'s move semantics (e.g., non-destructive move and unspecific states). These problems have been discussed plenty of times before, particularly in destructive-move threads.

One solution in the works may be contract programming. Some of the proposals allow a function to set a post-condition state on an object which can be verified by a compiler or static analysis in a pre-condition. e.g., unique_ptr might have something like:

  [[post_condition(src): empty]
  unique_ptr(unique_ptr&& src);

  [[pre_condition(this): !empty]
  unique_ptr::operator->();

Such a setup can further hypothetically address other issues like accidental dereferencing of default-constructed unique_ptrs. As well as a wide host of other issues.
 
2. unique_ptr<T> and T* being different types can cause pessimizations because of language rules.

I personally most see this as a problem caused by a lack of generators, or more specifically at the lack of range concepts (which of course are in the works).

You usually don't want to require specifically a vector nor specifically a pointer of any kind. You want to return an object that can be enumerated to produce references to T objects. The vector itself or unique_ptr vs raw pointers is an implementation detail that callers should not have to care about.

With range concepts, this all still has to be a template of course, but you don't have to over-worry about the specifics while using the interface.

With generators, this can be abstracted without templates (though perhaps at a runtime cost).

Basically, you want to write something like:

  iterable<T&> foo::get_things() { return make_iterable(_things, [](auto& p){ return *p; }); }


That's a really bad tradeoff: type-erasing performance plus the cost of filtering, all just to gain some increased genericity.

But Boost.Range already has something like this: `any_range`, coupled with transforming the input range by a function.

Matthew Fioravante

unread,
Mar 25, 2017, 1:00:23 PM3/25/17
to ISO C++ Standard - Future Proposals

I agree that generic ranges are great and badly needed, but I don't think they apply here.

You're actually not gaining anything here doing it this way. You're even losing. The paradigm is backwards (generic outputs instead of inputs).

To achieve generic code support, Container only needs to return *something* that looks like a range. The generic caller doesn't care at all what that *something* actually is. An array_view<T> is a range. By returning a very specific range type array_view<T>, you satisfy anyone writing generic code. For free you also satisfy the specific case of someone who is only interested in arrays. You can do everything on an array_view<T> that you can on a generic range. Its easier to debug because you know the actual type returned. By returning a generic object, all you're doing is throwing away useful type information. Lets also not forget the horribly long compiler error messages that will come when this is used wrong.

In terms of generic code, its better for inputs to be more generic, and outputs to be more specific (while still fulfilling a generic concept). Generic inputs minimizes the constraints on what I can pass into the API. Specific outputs that match concepts don't provide any limitations, but they can add optimization opportunities and compatibility with other API's which aren't generic. If I don't need the extra information provided by having a specific type, I can just erase it myself  when I pass it along to the next generic API. The important point is I had the choice to do the erasure on my end, instead of it being forced upon me by the Container author's getThings().

The great thing about array_view<T> is that it gives you a level of genericity (contiguous ranges), without the heavy handedness of templates. Compile times are fast, no code bloat, out of line compilation. Its also damn efficient.

Sean Middleditch

unread,
Mar 26, 2017, 6:11:08 PM3/26/17
to std-pr...@isocpp.org
On Fri, Mar 24, 2017 at 3:25 PM, Nicol Bolas <jmck...@gmail.com> wrote:

Basically, you want to write something like:

  iterable<T&> foo::get_things() { return make_iterable(_things, [](auto& p){ return *p; }); }


That's a really bad tradeoff: type-erasing performance plus the cost of filtering, all just to gain some increased genericity.

Not at all, because that'd be a template that gets fully inlined and result in identical machine code (assuming a suitably capable optimizer). Again, that's all writeable _today_ if you put in the leg work (we have in our libraries), it's just not a standard pattern in the C++ libraries, and you don't get the niceties that concepts provides wrt errors and composition.

Generators and mapped iterables aren't new here, just like C++ lambdas weren't new - they're just syntactic sugar for a well-established pattern.
 

But Boost.Range already has something like this: `any_range`, coupled with transforming the input range by a function.

--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/nc5Sj-LtKbA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/136fc743-bc63-48c8-b1d5-dc4084dd13cf%40isocpp.org.



--

Nicol Bolas

unread,
Mar 26, 2017, 9:12:05 PM3/26/17
to ISO C++ Standard - Future Proposals


On Sunday, March 26, 2017 at 6:11:08 PM UTC-4, Sean Middleditch wrote:
On Fri, Mar 24, 2017 at 3:25 PM, Nicol Bolas <jmck...@gmail.com> wrote:

Basically, you want to write something like:

  iterable<T&> foo::get_things() { return make_iterable(_things, [](auto& p){ return *p; }); }


That's a really bad tradeoff: type-erasing performance plus the cost of filtering, all just to gain some increased genericity.

Not at all, because that'd be a template that gets fully inlined and result in identical machine code (assuming a suitably capable optimizer).

How? `iterable` doesn't specify the type of iterator used to iterate over `_things`, nor does it specify the type of range that `_things` is. But it still has to store either the range or an iterator pair over that range. Therefore, the storage for those things must be type-erased in some fashion. And that requires some form of indirect call to move to the next item in the iterator, as well as to call the functor. Not to mention having to type-erase the functor itself.

And type-erasure is the enemy of inlining.

If that were an `iterable<span<T>, function_type>`, then I could buy the inlining. But not so long as type erasure has to be used.

Sean Middleditch

unread,
Mar 26, 2017, 11:59:58 PM3/26/17
to std-pr...@isocpp.org
On Mar 26, 2017 6:12 PM, "Nicol Bolas" <jmck...@gmail.com> wrote:


On Sunday, March 26, 2017 at 6:11:08 PM UTC-4, Sean Middleditch wrote:
On Fri, Mar 24, 2017 at 3:25 PM, Nicol Bolas <jmck...@gmail.com> wrote:

Basically, you want to write something like:

  iterable<T&> foo::get_things() { return make_iterable(_things, [](auto& p){ return *p; }); }


That's a really bad tradeoff: type-erasing performance plus the cost of filtering, all just to gain some increased genericity.

Not at all, because that'd be a template that gets fully inlined and result in identical machine code (assuming a suitably capable optimizer).

How? `iterable` doesn't specify the

It's an expositional snippet. This might shock and surprise you, but it wasn't copied verbatim from a working "thingy" codebase I just had lying around. :)

I expect that `Iterator` could be defined as a concept in C++2x. The C++14 template version should use `auto` as the return value, yes. My mistake.

Thanks for the totally necessary lesson on runtime abstraction costs though.


--
You received this message because you are subscribed to a topic in the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this topic, visit https://groups.google.com/a/isocpp.org/d/topic/std-proposals/nc5Sj-LtKbA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to std-proposals+unsubscribe@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.

Matthew Woehlke

unread,
Mar 28, 2017, 2:22:47 PM3/28/17
to std-pr...@isocpp.org
On 2017-03-23 12:38, Matthew Fioravante wrote:
> 2. Provide a specialized unique_vector<T*> which essentially operates like
> vector<unique_ptr<T>>, but exposes T* in its const interface.

See http://doc.qt.io/qt-5/qlist.html. Despite Marc Mutz' crusade against
them, I still tend to think that a container type that is like vector<T>
but stores its elements indirectly seems like it should be one of the
basic container types.

--
Matthew
Reply all
Reply to author
Forward
0 new messages