is_relocatable type trait

632 views
Skip to first unread message

Andrey Davydov

unread,
Mar 28, 2017, 3:33:55 AM3/28/17
to ISO C++ Standard - Future Proposals
Introduction
I'd like to raise the issue of the relocatable types one more time. It is well known concept used under different names in the following libraries: Folly (IsRelocatable), BDE (IsBitwiseMoveable), EASTL (has_trivial_relocate), Qt (Q_MOVABLE_TYPE).
It is natural generalization of the trivially copyable types concept which can be used for the optimization of objects relocation.

Motivation
Let's consider how the operation of std::vector content relocation can be implemented for some complex user defined type T:
template<typename T>
void relocate(T* new_buffer, T* old_buffer, size_t size)
{
 
for (size_t i = 0; i != size; ++i)
   
new (new_buffer[i]) T(std::move_if_noexcept(old_buffer[i]));

 
for (size_t i = 0; i != size; ++i)
    old_buffer
[i].~T();
}
On the other side for trivially copyable types this operation reduces to std::memcpy(new_buffer, old_buffer, size * sizeof(T)). Let's call the type T relocatable if two consequtive calls
new(new_place) T(std::move(old_place));
old_place
.~T();
can be replaced by memcpy(&new_place, &old_place, sizeof(T)). Of course, all trivially copyable types are relocatable by definition. Let's enumerate some relocatable types from the standard library.
  • shared_ptr: move constructor of the shared_ptr and destructor of the empty shared_ptr don't change shared count.
  • unique_ptr<T, D>, if D is relocatable (std::default_delete is trivially copyable therefore it is relocatable).
  • string and containers (at least when std::allocator is used).
  • optional<T>, variant<Ts...>, if T and Ts... are relocatable.
At least following operations from the standard library can be optimized for relocatable types:
  • resize of vector,
  • resize of deque,
  • swap of the empty and non-empty optionals,
  • swap of the variants with the different current indices.
Also it is possible to specialize std::swap for relocatable types and consequently a little optimize some of standard algorithms: *sort*, inplace_merge, partition, unique, rotate, remove*, ...
template<typename T>
enable_if_t<is_relocatable_v<T>> swap(T& x, T& y)
{
 
constexpr auto size = sizeof(T);
  aligned_storage_t
<size> tmp;
  memcpy
(&tmp, &x,   size);
  memcpy
(&x,   &y,   size);
  memcpy
(&y,   &tmp, size);
}

Yet another possible usage of the
 concept of relocatable type is extension of copy ellision. Let's consider following code fragment: 
vector<int> foo(bool c)
{
  vector
<int> xs = { 1, 2, 3 }, ys = { 4, 5, 6 };
 
if (c)
   
return xs;
 
else
   
return ys;
}

auto result = foo(rand());
copy ellision doesn't work here, and so `result` will be move constructed, but, in principle, return value of `foo` can be relocated to `result`.

Proposed language changes
  • Define relocatable types in [basic.types] p9 after trivially copyable types.
... are collectively called trivially copyable types. Trivially copyable types, relocatable classes, arrays of such types and cv-qualified versions of these types are collectively called relocable types.
  • Define relocatable class in [class] after trivially copyable class.
A class s is a relocatable class if it either explicitly marked as relocatable or
  1. all basic classes are relocatable,
  2. all non-static data members are relocatable,
  3. s has a non-virtual, non-deleted and non user-defined destructor.
It should be discussed how to explicitly mark a class as relocatable. One of the possible solutions is to allow specializations of the type trait std::is_relocatable. For example,
template<typename T, typename D>
struct is_relocatable<unique_ptr<T, D>> : bool_constant<is_relocatable_v<D>> {};

Any comments are welcome!

Jens Maurer

unread,
Mar 28, 2017, 3:47:06 AM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 09:33, Andrey Davydov wrote:
> Introduction
> I'd like to raise the issue of the /relocatable /types one more time. It
> is well known concept used under different names in the following
> libraries: Folly (IsRelocatable), BDE (IsBitwiseMoveable), EASTL
> (has_trivial_relocate), Qt (Q_MOVABLE_TYPE).
> It is natural generalization of the /trivially copyable /types concept
> which can be used for the optimization of objects relocation.
>
> Motivation
> Let's consider how the operation of std::vector content relocation can
> be implemented for some complex user defined type T:
> |
> template<typenameT>
> voidrelocate(T*new_buffer,T*old_buffer,size_t size)
> {
> for(size_t i =0;i !=size;++i)
> new(new_buffer[i])T(std::move_if_noexcept(old_buffer[i]));
>
> for(size_t i =0;i !=size;++i)
> old_buffer[i].~T();
> }
> |
> On the other side for trivially copyable types this operation reduces to
> std::memcpy(new_buffer, old_buffer, size * sizeof(T)). Let's call the
> type T /relocatable /if two consequtive calls
> |
> new(new_place)T(std::move(old_place));
> old_place.~T();
> |
> can be replaced by memcpy(&new_place, &old_place, sizeof(T)). Of course,
> all trivially copyable types are /relocatable /by definition. Let's
> enumerate some relocatable types from the standard library.

Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
leaves the lifetime of the stuff at old_place untouched. Not to mention
that bit-blasting doesn't start the lifetime of something at new_place,
either.

> * shared_ptr: move constructor of the shared_ptr and destructor of the
> empty shared_ptr don't change shared count.
> * unique_ptr<T, D>, if D is relocatable (std::default_delete is
> trivially copyable therefore it is relocatable).
> * string and containers (at least when std::allocator is used).

What assembly code do you get from the existing move-construct and
destroy operations, compared to memcpy? Is there something we could
teach the optimizer to help out?

> * optional<T>, variant<Ts...>, if T and Ts... are relocatable.

[...]

> * Define /relocatable class /in [class] after /trivially copyable class./
>
> A class s is a /relocatable class/ if it either explicitly marked as
> relocatable or
>
> 1. all basic classes are relocatable,

What are "basic classes"?

> 2. all non-static data members are relocatable,
> 3. s has a non-virtual, non-deleted and non user-defined destructor.

Well, classes that have a pointer to itself are also not relocatable,
but your definition seems to make them so.

> It should be discussed how to explicitly mark a class as relocatable.
> One of the possible solutions is to allow specializations of the type
> trait std::is_relocatable. For example,
> |
> template<typenameT,typenameD>
> structis_relocatable<unique_ptr<T,D>>:bool_constant<is_relocatable_v<D>>{};

I'm opposed to defining core language properties depending on the
specialization of a library-level type trait.

Jens


Andrey Davydov

unread,
Mar 28, 2017, 4:54:05 AM3/28/17
to ISO C++ Standard - Future Proposals
Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
leaves the lifetime of the stuff at old_place untouched.  Not to mention
that bit-blasting doesn't start the lifetime of something at new_place,
either.
It's true, but what practical problems for relocatable types does it cause to? Is it valid, for example, to implement std::copy of contiguous range of POD types as one call of memcpy?

>   * shared_ptr: move constructor of the shared_ptr and destructor of the
>     empty shared_ptr don't change shared count.
>   * unique_ptr<T, D>, if D is relocatable (std::default_delete is
>     trivially copyable therefore it is relocatable).
>   * string and containers (at least when std::allocator is used).

What assembly code do you get from the existing move-construct and
destroy operations, compared to memcpy?  Is there something we could
teach the optimizer to help out?
I'd like to replace n calls of move constructor and n calls of destructor (for std::unique_ptr it, for example, contains if) by single memcpy during resize of std::vector.
Nothing from my proposal (may be, except the extension of copy ellision) requires additional logic in optimizer or code generator. It's pure language frontend feature which should be used by library authors.
>   * Define /relocatable class /in [class] after /trivially copyable class./
>
> A class s is a /relocatable class/ if it either explicitly marked as
> relocatable or
>
>  1. all basic classes are relocatable,

What are "basic classes"?
It's my fault. Of course, there should be "base classes". 
 
>  2. all non-static data members are relocatable,
>  3. s has a non-virtual, non-deleted and non user-defined destructor.

Well, classes that have a pointer to itself are also not relocatable,
but your definition seems to make them so.
Agree, it is a serious issue. The first thought is to extend list of requirements by following item:
4. s doesn't have user defined copy constructor, move constructor, copy assignment operator, and move assignment operator.

> It should be discussed how to explicitly mark a class as relocatable.
> One of the possible solutions is to allow specializations of the type
> trait std::is_relocatable. For example,
> |
> template<typenameT,typenameD>
> structis_relocatable<unique_ptr<T,D>>:bool_constant<is_relocatable_v<D>>{};

I'm opposed to defining core language properties depending on the
specialization of a library-level type trait.
I also have concern about it. Another possible solution for marking class as relocatable is to introduce special class std::relocatable_marker and check that it is a direct base of the defined class. For example,
namespace std
{
 
template<bool> struct conditional_relocatable_marker;
 
 
template<>
 
struct conditional_relocatable_marker<true>
 
{
   
using type = relocatable_marker;
 
};
 
 
template<>
 
struct conditional_relocatable_marker<false>
 
{
   
struct type {};
 
};

 
 
template<typename T, typename D>

 
class unique_ptr : typename conditional_relocatable_marker<is_relocatable_v<D>>::type
 
{
   
// ...
 
};
}


Jens Maurer

unread,
Mar 28, 2017, 5:19:35 AM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 10:54, Andrey Davydov wrote:
> Well, but ~T does end the lifetime of T, whereas the equivalent
> "memcpy"
> leaves the lifetime of the stuff at old_place untouched. Not to
> mention
> that bit-blasting doesn't start the lifetime of something at new_place,
> either.
>
> It's true, but what practical problems for relocatable types does it
> cause to?

The concept of "lifetime" is increasingly exploited by optimizers
to determine, for example, possible aliasing.

> Is it valid, for example, to implement std::copy of contiguous
> range of POD types as one call of memcpy?

Sorry, what's POD doing in this discussion?

> > * shared_ptr: move constructor of the shared_ptr and destructor
> of the
> > empty shared_ptr don't change shared count.
> > * unique_ptr<T, D>, if D is relocatable (std::default_delete is
> > trivially copyable therefore it is relocatable).
> > * string and containers (at least when std::allocator is used).
>
> What assembly code do you get from the existing move-construct and
> destroy operations, compared to memcpy? Is there something we could
> teach the optimizer to help out?
>
> I'd like to replace *n* calls of move constructor and *n* calls of
> destructor (for std::unique_ptr it, for example, contains *if) *by
> single memcpy during resize of std::vector.

Can the optimizer determine that the "if" in unique_ptr's destructor
is redundant for your case? If not, what's preventing it from doing
so? Certainly, the move constructor is inlined, so what does it
actually do? Presumably zero the value that the "if" is checking.

> Nothing from my proposal (may be, except the extension of copy ellision)
> requires additional logic in optimizer or code generator. It's pure
> language frontend feature which should be used by library authors.

I'm questioning whether we need this feature, and exploring how
far we can get using other means.

> > * Define /relocatable class /in [class] after /trivially
> copyable class./
> >
> > A class s is a /relocatable class/ if it either explicitly marked as
> > relocatable or
> >
> > 1. all basic classes are relocatable,
>
> What are "basic classes"?
>
> It's my fault. Of course, there should be "*base* classes".

So, where does this recursion start? Which types (e.g. built-in types)
are defined to be relocatable to start with?

>
> > 2. all non-static data members are relocatable,
> > 3. s has a non-virtual, non-deleted and non user-defined destructor.
>
> Well, classes that have a pointer to itself are also not relocatable,
> but your definition seems to make them so.
>
> Agree, it is a serious issue. The first thought is to extend list of
> requirements by following item:
> 4. s doesn't have user defined copy constructor, move constructor, copy
> assignment operator, and move assignment operator.

How is that different from "trivially copyable", then?

> > It should be discussed how to explicitly mark a class as relocatable.
> > One of the possible solutions is to allow specializations of the type
> > trait std::is_relocatable. For example,
> > |
> > template<typenameT,typenameD>
> >
> structis_relocatable<unique_ptr<T,D>>:bool_constant<is_relocatable_v<D>>{};
>
>
> I'm opposed to defining core language properties depending on the
> specialization of a library-level type trait.
>
> I also have concern about it. Another possible solution for marking
> class as relocatable is to introduce special class
> std::relocatable_marker and check that it is a direct base of the
> defined class. For example,

I agree it's a possible solution, but no thanks.

Jens

Nicol Bolas

unread,
Mar 28, 2017, 10:05:08 AM3/28/17
to ISO C++ Standard - Future Proposals
On Tuesday, March 28, 2017 at 4:54:05 AM UTC-4, Andrey Davydov wrote:
Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
leaves the lifetime of the stuff at old_place untouched.  Not to mention
that bit-blasting doesn't start the lifetime of something at new_place,
either.
It's true, but what practical problems for relocatable types does it cause to? Is it valid, for example, to implement std::copy of contiguous range of POD types as one call of memcpy?

If by "POD types" you mean trivially copyable types, then yes it's legal to use `memcpy` to implement `std::copy`, so long as `T` is copy-assignable (yes, it is possible to create a `T` which is trivially copyable and yet not copy-assignable).

>   * shared_ptr: move constructor of the shared_ptr and destructor of the
>     empty shared_ptr don't change shared count.
>   * unique_ptr<T, D>, if D is relocatable (std::default_delete is
>     trivially copyable therefore it is relocatable).
>   * string and containers (at least when std::allocator is used).

What assembly code do you get from the existing move-construct and
destroy operations, compared to memcpy?  Is there something we could
teach the optimizer to help out?
I'd like to replace n calls of move constructor and n calls of destructor (for std::unique_ptr it, for example, contains if) by single memcpy during resize of std::vector.

But you can't really do that. Why? Because you haven't really moved the object properly.

That's why "relocation" is often called "destructive-move". Because it only makes sense if the move operation simultaneously ends the lifetime of the old object. Doing just a `memcpy` won't destroy the old objects, and if the user calls those destructors, bad things happen.

A better way of dealing with this would be to have a function specifically to invoke a "relocation" operation:

T* std::relocate_and_construct(void *dst, T *src, size_t count = 1);
T
* std::relocate_and_assign(T *dst, T *src, size_t count = 1);

Both functions are explicitly defined to end the lifetime of `count` elements of `src`. The thing is though that they won't call `T`'s destructor. The lifetime will be ended as if the function had started the lifetime of some other (trivial) object in that storage.

The first function explicitly begins the lifetime of `T`s within `dst`. The second function assigns the values to existing `T`s. In both cases, the return value is a pointer to the newly created `T`s (in the original storage, of course).

The first function requires that `T` is relocatable and copy constructible.

Matthew Woehlke

unread,
Mar 28, 2017, 10:05:31 AM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 05:19, Jens Maurer wrote:
> How is that different from "trivially copyable", then?

Answering this somewhat out of context... a class like unique_ptr is
obviously *not* trivially copyable, but it *should* be *relocatable*.
Relocatability is a special concept that in some cases needs to be
opt-in that says a type's invariants can be preserved when a) one
creates a new instance of the type that is bitwise-identical to the old
instance *and* then b) drops the old instance on the floor (that is,
ends its lifetime but without executing any destruction code).

Obviously, types with trivial copy construction and trivial destruction
meet these criteria. Such types meet (a) and (b) *independently*, which
is a superset of types which meet (a) and (b) only when those operations
are coupled. We need this feature for the latter category (again,
offering unique_ptr as an example).

The best proposal I recall seeing for this was from J. "Nicol Bolas"
McKesson that IIRC introduced a (defaultable) relocation operator. I'm
not sure what happened to it, though; I don't think it was ever
published in a mailing.

--
Matthew

Thiago Macieira

unread,
Mar 28, 2017, 12:19:31 PM3/28/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 00:33:54 PDT Andrey Davydov wrote:
> A class s is a *relocatable class* if it either explicitly marked as
> relocatable or
>
> 1. all basic classes are relocatable,
> 2. all non-static data members are relocatable,
> 3. s has a non-virtual, non-deleted and non user-defined destructor.

Classes with user-provided destructors should be allowed to be relocatable
too.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center

Thiago Macieira

unread,
Mar 28, 2017, 12:22:05 PM3/28/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 00:47:02 PDT Jens Maurer wrote:
> > can be replaced by memcpy(&new_place, &old_place, sizeof(T)). Of course,
> > all trivially copyable types are /relocatable /by definition. Let's
> > enumerate some relocatable types from the standard library.
>
> Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
> leaves the lifetime of the stuff at old_place untouched. Not to mention
> that bit-blasting doesn't start the lifetime of something at new_place,
> either.

Obviously that needs to change. Relocatable types have trivial destructors, so
the lifetime of the object is tied to the lifetime of the storage allocation.

> > * shared_ptr: move constructor of the shared_ptr and destructor of the
> >
> > empty shared_ptr don't change shared count.
> >
> > * unique_ptr<T, D>, if D is relocatable (std::default_delete is
> >
> > trivially copyable therefore it is relocatable).
> >
> > * string and containers (at least when std::allocator is used).
>
> What assembly code do you get from the existing move-construct and
> destroy operations, compared to memcpy? Is there something we could
> teach the optimizer to help out?

We cannot. The most important gain we're going to have is not the memcpy, it's
the fact that the resize() operation in the container can be implemented by a
call to realloc(), as opposed to malloc() + memcpy().

> I'm opposed to defining core language properties depending on the
> specialization of a library-level type trait.

True, but we need to figure out a way.

Jens Maurer

unread,
Mar 28, 2017, 12:31:27 PM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 18:21, Thiago Macieira wrote:
> On terça-feira, 28 de março de 2017 00:47:02 PDT Jens Maurer wrote:
>>> can be replaced by memcpy(&new_place, &old_place, sizeof(T)). Of course,
>>> all trivially copyable types are /relocatable /by definition. Let's
>>> enumerate some relocatable types from the standard library.
>>
>> Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
>> leaves the lifetime of the stuff at old_place untouched. Not to mention
>> that bit-blasting doesn't start the lifetime of something at new_place,
>> either.
>
> Obviously that needs to change. Relocatable types have trivial destructors, so
> the lifetime of the object is tied to the lifetime of the storage allocation.

Specific proposals that include core wording are welcome.

>>> * shared_ptr: move constructor of the shared_ptr and destructor of the
>>>
>>> empty shared_ptr don't change shared count.
>>>
>>> * unique_ptr<T, D>, if D is relocatable (std::default_delete is
>>>
>>> trivially copyable therefore it is relocatable).
>>>
>>> * string and containers (at least when std::allocator is used).
>>
>> What assembly code do you get from the existing move-construct and
>> destroy operations, compared to memcpy?

This question is still unanswered.

> Is there something we could
>> teach the optimizer to help out?
>
> We cannot. The most important gain we're going to have is not the memcpy, it's
> the fact that the resize() operation in the container can be implemented by a
> call to realloc(), as opposed to malloc() + memcpy().

The standard allocators (and related containers)
don't have a realloc() interface.

Also, I've seen proposals floating around to allow the allocator to
inform the container of the "real" size that was allocated. That
appears to mean that any realloc() beyond that will cause a
malloc + memcpy under the hood.

Jens

Thiago Macieira

unread,
Mar 28, 2017, 12:47:02 PM3/28/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 09:31:21 PDT Jens Maurer wrote:
> > We cannot. The most important gain we're going to have is not the memcpy,
> > it's the fact that the resize() operation in the container can be
> > implemented by a call to realloc(), as opposed to malloc() + memcpy().
>
> The standard allocators (and related containers)
> don't have a realloc() interface.

Because it couldn't be used until now. Chicken and the egg.

Qt containers do have the ability to reallocate (we don't use std::allocator).

> Also, I've seen proposals floating around to allow the allocator to
> inform the container of the "real" size that was allocated. That
> appears to mean that any realloc() beyond that will cause a
> malloc + memcpy under the hood.

Right, we'd like a more powerful allocation mechanism that did that, plus a
non-relocating reallocation. However, every time that was brought up, it
seemed that we couldn't get WG14 on board, so it's unlikely to happen in the C
library.

If it won't happen in the C library, we can't have it. We're left with
realloc() as it stands. I'd like to use it.

Matthew Woehlke

unread,
Mar 28, 2017, 12:59:45 PM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 12:21, Thiago Macieira wrote:
> On terça-feira, 28 de março de 2017 00:47:02 PDT Jens Maurer wrote:
>> Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
>> leaves the lifetime of the stuff at old_place untouched. Not to mention
>> that bit-blasting doesn't start the lifetime of something at new_place,
>> either.
>
> Obviously that needs to change. Relocatable types have trivial destructors, so
> the lifetime of the object is tied to the lifetime of the storage allocation.

*Not necessarily*. See unique_ptr... relocatable (or should be), but the
default dtor certainly isn't trivial!

As I said in my other message, a relocatable type is one for which copy
construction and destruction are trivial *when those operations are
coupled*. They *do not* need to be individually trivial.

>> What assembly code do you get from the existing move-construct and
>> destroy operations, compared to memcpy? Is there something we could
>> teach the optimizer to help out?
>
> We cannot. The most important gain we're going to have is not the memcpy, it's
> the fact that the resize() operation in the container can be implemented by a
> call to realloc(), as opposed to malloc() + memcpy().

Actually, we can't¹ even use memcpy now for some relocatable types. The
*biggest* gain is that a container resize can be implemented with memcpy
instead of (non-trivial) move ctor + dtor for every element.

The real question is if the optimizer can optimize something like:

auto new_item = std::move(old_item);
~old_item;

...to just a memcpy, if the dtor is non-trivial but has no side effects
if the object being destroyed is moved-from. Keep in mind this includes
being able to tell that the side effects of the move ctor on the *old*
object become dead code that can be elided. Then add to the above
trivial example that you are doing this for a lot of objects at once,
and the destruction does not immediately follow move construction
(especially when it's via `delete[]`). Also keep in mind that to
"succeed" here, the optimizer also has to be able to detect and merge
adjacent memcpy's.

This assumes that the ctor and dtor are both inline; if either is in a
separate TU, then you're SOL. (Such a class could still have a defaulted
relocation operation!)

(¹ Various libraries' containers — including Qt's — do it anyway, but
it's technically UB. The whole point is to a) be able to do it *without*
UB, and b) using a standardized mechanism that anyone implementing their
own container can use.)

--
Matthew

Nicol Bolas

unread,
Mar 28, 2017, 1:20:25 PM3/28/17
to ISO C++ Standard - Future Proposals


On Tuesday, March 28, 2017 at 12:22:05 PM UTC-4, Thiago Macieira wrote:
On terça-feira, 28 de março de 2017 00:47:02 PDT Jens Maurer wrote:
> > can be replaced by memcpy(&new_place, &old_place, sizeof(T)). Of course,
> > all trivially copyable types are /relocatable /by definition. Let's
> > enumerate some relocatable types from the standard library.
>
> Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
> leaves the lifetime of the stuff at old_place untouched.  Not to mention
> that bit-blasting doesn't start the lifetime of something at new_place,
> either.

Obviously that needs to change. Relocatable types have trivial destructors, so
the lifetime of the object is tied to the lifetime of the storage allocation.

But they don't. `unique_ptr` does not have a trivial destructor. This is why it's important for the relocation operation to explicitly terminate the lifetime of the input value. Because for relocation to work, that termination cannot call the destructor.

Jens Maurer

unread,
Mar 28, 2017, 1:47:01 PM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 18:46, Thiago Macieira wrote:
> Qt containers do have the ability to reallocate (we don't use std::allocator).

How do those containers deal with the lifetime issues?

>> Also, I've seen proposals floating around to allow the allocator to
>> inform the container of the "real" size that was allocated. That
>> appears to mean that any realloc() beyond that will cause a
>> malloc + memcpy under the hood.
>
> Right, we'd like a more powerful allocation mechanism that did that, plus a
> non-relocating reallocation. However, every time that was brought up, it
> seemed that we couldn't get WG14 on board, so it's unlikely to happen in the C
> library.
>
> If it won't happen in the C library, we can't have it.

Why? The C++ interface to allocation is "new" and "delete",
not "malloc" and "free".

Jens

Thiago Macieira

unread,
Mar 28, 2017, 1:51:03 PM3/28/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 10:20:24 PDT Nicol Bolas wrote:
> > Obviously that needs to change. Relocatable types have trivial
> > destructors, so
> > the lifetime of the object is tied to the lifetime of the storage
> > allocation.
>
> But they don't. `unique_ptr` does not have a trivial destructor. This is
> why it's important for the relocation operation to explicitly terminate the
> lifetime of the input value. Because for relocation to work, that
> termination cannot call the destructor.

Right.

I think we need to analyse this in the context of a larger problem, otherwise
we won't solve the long-term problems.

First, there's the destructive move, an operation that ties both a move
construction of the destination with the destruction of the source. It needs
to be a new function that one can add to their class. In most cases, it will
be what the move constructor does, except that it knows the source will never
ever be used again, so it need not leave that in a valid state.

For example, for std::shared_ptr:
- copy: increase reference count, leave source unchanged
- move: no change in reference count, reset source to empty
- move-destruct: no change in reference count, leave source unchanged

Second, we need a way to invoke this operation. It's not simply
a = std::move(b);

It could be:

template <typename T>
T *move_destroy(T *dest, T *source, size_t count = 1);

a function that requires ABI knowledge and possibly intrinsics. We also need
to solve the problem of whether the size cookie should be copied (probably
not).

Third, we have the relocation, which is a special case of move-destruction
that is a simple memcpy and abandon source.

Fourth, we have the case of triviality. A class that is trivially move
constructible and trivially destructible is trivially move-destructible. A
trivial move-destruction is a memcpy and abandon, so trivial move-destruction
is relocation.

The reason I'm separating the concerns here is that a type could implement a
move-destruction which is not simple memcpy + drop. I don't know if we could
say that a complex type with non-trivial move constructor and non-trivial
destructor could have trivial move-destructor. If we can, then the concept of
trivial move destruction is the same as relocation.

Finally, there's realloc() and lifetimes. I want to apply realloc() to
anything that is relocatable, whether it has trivial constructor and
destructor or not.

Matthew Woehlke

unread,
Mar 28, 2017, 2:07:21 PM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 13:50, Thiago Macieira wrote:
> Third, we have the relocation, which is a special case of move-destruction
> that is a simple memcpy and abandon source.

Uh...

> Fourth, we have the case of triviality. A class that is trivially move
> constructible and trivially destructible is trivially move-destructible. A
> trivial move-destruction is a memcpy and abandon, so trivial move-destruction
> is relocation.

Okay, ugh... what I think most of us call "relocation" you are calling
"move-destruction". Can we please stick to "relocation" (what you call
move-destruction) and "trivial relocation" (what you call relocation)?

> I don't know if we could say that a complex type with non-trivial
> move constructor and non-trivial destructor could have trivial
> move-destructor.

We can, and we need to be able to do so, for unique_ptr and similar classes.

--
Matthew

Nicol Bolas

unread,
Mar 28, 2017, 2:10:55 PM3/28/17
to ISO C++ Standard - Future Proposals


On Tuesday, March 28, 2017 at 1:51:03 PM UTC-4, Thiago Macieira wrote:
On terça-feira, 28 de março de 2017 10:20:24 PDT Nicol Bolas wrote:
> > Obviously that needs to change. Relocatable types have trivial
> > destructors, so
> > the lifetime of the object is tied to the lifetime of the storage
> > allocation.
>
> But they don't. `unique_ptr` does not have a trivial destructor. This is
> why it's important for the relocation operation to explicitly terminate the
> lifetime of the input value. Because for relocation to work, that
> termination cannot call the destructor.

Right.

I think we need to analyse this in the context of a larger problem, otherwise
we won't solve the long-term problems.

First, there's the destructive move, an operation that ties both a move
construction of the destination with the destruction of the source. It needs
to be a new function that one can add to their class. In most cases, it will
be what the move constructor does, except that it knows the source will never
ever be used again, so it need not leave that in a valid state.

It's not just that it doesn't leave it in a valid state. From a wording perspective, this operation must terminate the lifetime of the old object.

Otherwise, the lifetime hasn't ended and the source is still a live object.

For example, for std::shared_ptr:
        - copy: increase reference count, leave source unchanged
        - move: no change in reference count, reset source to empty
        - move-destruct: no change in reference count, leave source unchanged

Second, we need a way to invoke this operation. It's not simply
        a = std::move(b);

It could be:

        template <typename T>
        T *move_destroy(T *dest, T *source, size_t count = 1);

a function that requires ABI knowledge and possibly intrinsics. We also need
to solve the problem of whether the size cookie should be copied (probably
not).

Third, we have the relocation, which is a special case of move-destruction
that is a simple memcpy and abandon source.

I'm not sure I agree with the need for such a back-door. It seems to me that, if `T` is trivially move-destructible, `move_destroy` can do this just fine. We shouldn't encourage people to use memcpy&drop instead of the equally-performance-friendly alternative.

Fourth, we have the case of triviality. A class that is trivially move
constructible and trivially destructible is trivially move-destructible. A
trivial move-destruction is a memcpy and abandon, so trivial move-destruction
is relocation.

But we also need to recognize that users need to declare a type which is not trivially moveable nor trivially destructible is trivially move-destructible. `unique_ptr` being the perfect example (assuming the deleter type is empty or is trivially move-destructible); there is no reason `unique_ptr<T>` should have a non-trivial move-destruction operator.

So we need to be able to declare that a type's move-destruction operation is trivial, even if its other operations like trivial copy and so forth are non-trivial.

The reason I'm separating the concerns here is that a type could implement a
move-destruction which is not simple memcpy + drop.

Here's the question I have for that: why implement move-destruction at all in that case? If your type needs to do actual work in its move-destructor, why bother? The `move_destroy` call is going to have to loop over `count` and call your move-destruction function. How will that perform better than calling the move constructor, followed by the destructor?

Also, move-destruction cannot ever throw.

It seems to me that there are really only two types: trivially-move-destructible types and types that cannot be move-destructed. The third type, non-trivial move destruction, seems rather pointless.
 
I don't know if we could say that a complex type with non-trivial move constructor and non-trivial destructor could have trivial move-destructor.

`unique_ptr`.
 

Nicol Bolas

unread,
Mar 28, 2017, 2:13:50 PM3/28/17
to ISO C++ Standard - Future Proposals
On Tuesday, March 28, 2017 at 2:07:21 PM UTC-4, Matthew Woehlke wrote:
On 2017-03-28 13:50, Thiago Macieira wrote:
> Fourth, we have the case of triviality. A class that is trivially move
> constructible and trivially destructible is trivially move-destructible. A
> trivial move-destruction is a memcpy and abandon, so trivial move-destruction
> is relocation.

Okay, ugh... what I think most of us call "relocation" you are calling
"move-destruction". Can we please stick to "relocation" (what you call
move-destruction) and "trivial relocation" (what you call relocation)?

The problem is that this discussion is not new. Some people call it "relocation". Others "destructive-move". There have been several formal proposals for doing it.

This is well-trod ground, and the terminology for it changes with every iteration.

And personally, I think destructive-move is the right wording. It is vital to recognize the importance that the operation simultaneously ends the lifetime of the old object. Because without that, you break the object model.

Matthew Woehlke

unread,
Mar 28, 2017, 2:32:37 PM3/28/17
to std-pr...@isocpp.org
Honestly, I can live with either... what I find objectionable is using
two *different* terms, "A" and "B", instead of "A" and "trivial A", and
in particular, defining "B" as a trivial operation when it doesn't have
"trivial" in the name (because what you really mean by "B" is "trivial A").

Please don't do that :-).

--
Matthew

Matthew Woehlke

unread,
Mar 28, 2017, 2:40:37 PM3/28/17
to std-pr...@isocpp.org
On 2017-03-28 14:10, Nicol Bolas wrote:
> On 2017-03-28 13:50, Thiago Macieira wrote:
>> The reason I'm separating the concerns here is that a type could implement
>> a move-destruction which is not simple memcpy + drop.
>
> Here's the question I have for that: why implement move-destruction at all
> in that case? If your type needs to do actual work in its move-destructor,
> why bother? The `move_destroy` call is going to have to loop over `count`
> and call your move-destruction function. How will that perform better than
> calling the move constructor, followed by the destructor?

Let's say I have a type that has pointer members that relate to `this`.
It is not trivially move-destructible since it needs to fix up these
pointers when it is relocated. But it might need to do complicated
things in its move ctor and/or dtor.

Rare? Probably, but what's the value in precluding the possibility?

--
Matthew

Nicol Bolas

unread,
Mar 28, 2017, 2:54:37 PM3/28/17
to ISO C++ Standard - Future Proposals
It makes declaring a type to be move-destructibe/relocatable/etc to mean something very specific about the nature of that operation.

The only advantage I could see in your case would be if the move constructor would have to be non-`noexcept`, while the destructive-move function could be `noexcept`.

Ville Voutilainen

unread,
Mar 28, 2017, 3:33:54 PM3/28/17
to ISO C++ Standard - Future Proposals
On 28 March 2017 at 21:10, Nicol Bolas <jmck...@gmail.com> wrote:
> It's not just that it doesn't leave it in a valid state. From a wording
> perspective, this operation must terminate the lifetime of the old object.

That is far beyond something from a wording perspective; it's the
fundamental issue
why destructive move proposals thus far haven't been accepted. The motivation of
a destructive move has not been strong enough to add a new way of
ending the lifetime
of an object. Let alone starting to talk about a new kind of a special
member function,
or new kinds of type classifications.

Nicol Bolas

unread,
Mar 28, 2017, 3:52:47 PM3/28/17
to ISO C++ Standard - Future Proposals

So the already explained performance improvements are not strong enough of a motivation? I really don't understand that kind of thinking. What exactly do people have to do to show you why this functionality is needed? What exactly would be good enough for you?

Ville Voutilainen

unread,
Mar 28, 2017, 4:20:27 PM3/28/17
to ISO C++ Standard - Future Proposals
On 28 March 2017 at 22:52, Nicol Bolas <jmck...@gmail.com> wrote:
>> That is far beyond something from a wording perspective; it's the
>> fundamental issue
>> why destructive move proposals thus far haven't been accepted. The
>> motivation of
>> a destructive move has not been strong enough to add a new way of
>> ending the lifetime
>> of an object. Let alone starting to talk about a new kind of a special
>> member function,
>> or new kinds of type classifications.
> So the already explained performance improvements are not strong enough of a
> motivation? I really don't understand that kind of thinking. What exactly do

The *"explained"* performance benefits? Show us hard numbers, and show
that similar
improvements aren't achievable without fundamental changes to the
object and memory model.

Nicol Bolas

unread,
Mar 28, 2017, 4:41:11 PM3/28/17
to ISO C++ Standard - Future Proposals

No "fundamental changes to the object and memory model" are being proposed. At its most minimal, there is a single function which has a part of its definition that it ends the lifetime of one of its parameters.

Ending the lifetime of an object without calling the destructor is legal. Right now, in the memory model as it currently stands: [basic.life]/5

> For an object of a class type with a non-trivial destructor, the program is not required to call the destructor explicitly before the storage which the object occupies is reused or released; however, if there is no explicit call to the destructor or if a delete-expression (8.3.5) is not used to release the storage, the destructor shall not be implicitly called and
any program that depends on the side effects produced by the destructor has undefined behavior.

A type which is relocatable/destructive-moveable is essentially saying, "when invoking the relocation/destructive-move operation, this program does not depend on side effects produced by the destructor of this type." And therefore, the destructive move operation is able to say "I'm ending the lifetime of the source object(s)."

The object and memory models are unchanged. We're simply using what the memory model currently allows, in a way that other types can key off of.

Ville Voutilainen

unread,
Mar 28, 2017, 4:57:08 PM3/28/17
to ISO C++ Standard - Future Proposals
On 28 March 2017 at 23:41, Nicol Bolas <jmck...@gmail.com> wrote:
>> The *"explained"* performance benefits? Show us hard numbers, and show
>> that similar
>> improvements aren't achievable without fundamental changes to the
>> object and memory model.
>
>
> No "fundamental changes to the object and memory model" are being proposed.

Save for a new kind of function that can end the lifetime of objects,
and provides a means
to transfer ownership at the same time. Sure, the underlying model
allows this, but providing
a codified mechanism for it in order to make certain kinds of
std::list implementations
faster hasn't thus far convinced enough of the committee to adopt the proposals.

> At its most minimal, there is a single function which has a part of its
> definition that it ends the lifetime of one of its parameters.

Yes, I've seen Halpern's destructive move proposals.

Thiago Macieira

unread,
Mar 28, 2017, 6:57:08 PM3/28/17
to std-pr...@isocpp.org
Then please implement the resizing of a memory allocation without changing the
pointer address. For example:

void *operator new(size_t size);
void operator delete(void *ptr);
void resize_inplace(void *ptr, size_t newsize);

I'd like to see the implementation of that third function.

Thiago Macieira

unread,
Mar 28, 2017, 6:58:21 PM3/28/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 11:07:18 PDT Matthew Woehlke wrote:
> > Fourth, we have the case of triviality. A class that is trivially move
> > constructible and trivially destructible is trivially move-destructible. A
> > trivial move-destruction is a memcpy and abandon, so trivial
> > move-destruction is relocation.
>
> Okay, ugh... what I think most of us call "relocation" you are calling
> "move-destruction". Can we please stick to "relocation" (what you call
> move-destruction) and "trivial relocation" (what you call relocation)?

Only if:

> > I don't know if we could say that a complex type with non-trivial
> > move constructor and non-trivial destructor could have trivial
> > move-destructor.
>
> We can, and we need to be able to do so, for unique_ptr and similar classes.

We can make a non-trivial type have a trivial relocation.

Thiago Macieira

unread,
Mar 29, 2017, 12:42:01 AM3/29/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 11:10:55 PDT Nicol Bolas wrote:
> > First, there's the destructive move, an operation that ties both a move
> > construction of the destination with the destruction of the source. It
> > needs
> > to be a new function that one can add to their class. In most cases, it
> > will
> > be what the move constructor does, except that it knows the source will
> > never
> > ever be used again, so it need not leave that in a valid state.
>
> It's not just that it doesn't leave it in a valid state. From a wording
> perspective, this operation must *terminate* the lifetime of the old object.
>
> Otherwise, the lifetime hasn't ended and the source is still a live object.

I agree.

> > Third, we have the relocation, which is a special case of move-destruction
> > that is a simple memcpy and abandon source.
>
> I'm not sure I agree with the need for such a back-door. It seems to me
> that, if `T` is trivially move-destructible, `move_destroy` can do this
> just fine. We shouldn't encourage people to use memcpy&drop instead of the
> equally-performance-friendly alternative.

I would agree, except for the realloc case below.

> > The reason I'm separating the concerns here is that a type could implement
> > a
> > move-destruction which is not simple memcpy + drop.
>
> Here's the question I have for that: why implement move-destruction at all
> in that case? If your type needs to do actual work in its move-destructor,
> why bother? The `move_destroy` call is going to have to loop over `count`
> and call your move-destruction function. How will that perform better than
> calling the move constructor, followed by the destructor?

I just wasn't ruling it out. Let's take for example:

struct S
{
struct Private {
struct S *q;
};
Private *d;
S(); // out-of-line

S(S &&other) : d(other.d)
{
d.q = this;
other.d = nullptr;
}
~S() { delete d; }
};

In this case, the move-destruction could be more efficient than move followed by
destruction, especially if the destructor were out-of-line.

In fact, in any case where either the move constructor or the destructor is
out-of-line, a move-destructor would be more efficient.

> Also, move-destruction cannot *ever* throw.

I'd say it can throw as much as a destructor can.

> It seems to me that there are really only two types:
> trivially-move-destructible types and types that cannot be move-destructed.
> The third type, non-trivial move destruction, seems rather pointless.

Maybe pointless, but like I said, I wasn't ruling them out.

> > Finally, there's realloc() and lifetimes. I want to apply realloc() to
> > anything that is relocatable, whether it has trivial constructor and
> > destructor or not.

Marc Mutz

unread,
Mar 29, 2017, 5:27:47 AM3/29/17
to std-pr...@isocpp.org
These classes naturally come up if your pimpl'd class has a back pointer to
the public class. It still might be faster to just fix up the back link in the
private class than to perform a move + dtor call, in particular because
compilers suck at removing dtor calls, the more so if any non-inline function
is called in between.

--
Marc Mutz <marc...@kdab.com> | Senior Software Engineer
KDAB (Deutschland) GmbH & Co.KG, a KDAB Group Company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

Matthew Woehlke

unread,
Mar 29, 2017, 2:19:22 PM3/29/17
to std-pr...@isocpp.org
On 2017-03-28 16:20, Ville Voutilainen wrote:
> On 28 March 2017 at 22:52, Nicol Bolas wrote:
>> So the already explained performance improvements are not strong enough of a
>> motivation? I really don't understand that kind of thinking. What exactly do
>
> The *"explained"* performance benefits? Show us hard numbers,

Okay.

I created a type that contains a unique_ptr<string>, which is always
initialized to "hello, world!". The copy ctor, dtor, and assignment
operator were all defaulted. I then inserted 1000 of these into a
modified¹ QVector in three trials; one with the move ctor suppressed,
one with a defaulted move ctor, and one with Q_MOVABLE_TYPE. I also ran
both sets with -O0 and -O3.

Results:

-O0 -O3
copy 5.53 10.37
move 5.40 8.53
relocate 4.43 8.24

(The -O0 numbers are smaller because I ran 5x as many iterations for the
-O3 case.)

I'm actually surprised the difference is so small, considering the
difference in code paths:

// relocate case
memcpy(dst, srcBegin, srcEnd - srcBegin);
dst += srcEnd - srcBegin;
if (false) ...

// move case
while (srcBegin != srcEnd)
new (dst++) T(std::move(*srcBegin++));

This suggests that either a) memcpy is slower than we think it is, or b)
the compiler is able to optimize the move case better than we think it
can. (Note: I have not attempted to examine the generated machine
instructions for any of these tests.)

Interestingly, if I increase the item count to 100k, the difference
between the move and relocate cases becomes noise.

(¹ Stock QVector does not use move construction during resizing; I used
a local copy that I modified to do so.)

Another interesting test I have not done would be sorting a list when
the swap operation can be implemented either as relocation or "the hard
way". Relocation would especially win if

> and show that similar improvements aren't achievable without
> fundamental changes to the object and memory model.

I can't do that; you're asking us to prove a negative, which any
educated person knows is next to impossible. I think the onus should
instead be on the nay-sayers to demonstrate a plausible way of achieving
similar benefits *without* relocation.

--
Matthew

Ville Voutilainen

unread,
Mar 29, 2017, 2:37:34 PM3/29/17
to ISO C++ Standard - Future Proposals
On 29 March 2017 at 21:19, Matthew Woehlke <mwoehlk...@gmail.com> wrote:
> This suggests that either a) memcpy is slower than we think it is, or b)
> the compiler is able to optimize the move case better than we think it
> can. (Note: I have not attempted to examine the generated machine
> instructions for any of these tests.)

Perhaps a proposal author should.

> Interestingly, if I increase the item count to 100k, the difference
> between the move and relocate cases becomes noise.

Which was, I think, a claim made by some opponents of destructive move.

>> and show that similar improvements aren't achievable without
>> fundamental changes to the object and memory model.
>
> I can't do that; you're asking us to prove a negative, which any
> educated person knows is next to impossible. I think the onus should
> instead be on the nay-sayers to demonstrate a plausible way of achieving
> similar benefits *without* relocation.

You can think whatever you want, but the onus is on a proposal author
to convince
the committee to accept the proposal, not the other way around. I'm
also not asking you
to prove a negative; I'm hinting at exploring whether compilers can be
made smarter,
and what the effort difference of that compared to teaching compilers
new lifetime rules is.

Nicol Bolas

unread,
Mar 29, 2017, 3:28:18 PM3/29/17
to ISO C++ Standard - Future Proposals

I think a better case would be for a type which is not trivially copyable. I expect compilers to be able to optimize away a loop over a trivial destructor. It's a lot harder for them to optimize away a loop over a non-trivial destructor.

Compare the difference between a `vector<T*>` and `vector<unique_ptr<T>>`. The former will have the same move performance characteristics as a relocatable `unique_ptr` would.

Nicol Bolas

unread,
Mar 29, 2017, 3:41:56 PM3/29/17
to ISO C++ Standard - Future Proposals
On Wednesday, March 29, 2017 at 2:37:34 PM UTC-4, Ville Voutilainen wrote:
On 29 March 2017 at 21:19, Matthew Woehlke <mwoehlk...@gmail.com> wrote:
> This suggests that either a) memcpy is slower than we think it is, or b)
> the compiler is able to optimize the move case better than we think it
> can. (Note: I have not attempted to examine the generated machine
> instructions for any of these tests.)

Perhaps a proposal author should.

> Interestingly, if I increase the item count to 100k, the difference
> between the move and relocate cases becomes noise.

Which was, I think, a claim made by some opponents of destructive move.

>> and show that similar improvements aren't achievable without
>> fundamental changes to the object and memory model.
>
> I can't do that; you're asking us to prove a negative, which any
> educated person knows is next to impossible. I think the onus should
> instead be on the nay-sayers to demonstrate a plausible way of achieving
> similar benefits *without* relocation.

You can think whatever you want, but the onus is on a proposal author
to convince
the committee to accept the proposal, not the other way around.

Tell me: did the various rvalue-reference proposals have to go through so much scrutiny? Did their authors have to prove that compilers couldn't find ways to detect "movement" type operations without explicit syntax and object models for it? What about the proposals that led to the creation of the TriviallyCopyable concept; did the authors of those have to prove that compilers would be incapable of achieving performance equivalent to a `memcpy` without letting us actually `memcpy` them?

There have been plenty of proposals that have been accepted that don't rise to such a level of scrutiny. Why is this one so special?

And most importantly of all, why does it matter? What is the reason for being against explicitly providing users the ability to do such things, when the alternative is essentially hoping that compilers can optimize the cruft away?

I don't understand the logic of declaring QoI to be an effective strategy of making C++ fast.

I'm
also not asking you
to prove a negative; I'm hinting at exploring whether compilers can be
made smarter,
and what the effort difference of that compared to teaching compilers
new lifetime rules is.

But there is no effort in teaching compilers "new lifetime rules," since there are no "new lifetime rules". Consider the following function:

void func(T *src, size_t count = 1)
{
 
new(src) unsigned char[count * sizeof(T)];
}

This function ends the lifetime of all objects stored between `src` and `src + count`. The compiler must already be able to handle this, because that's part of the standard right now.

So a function that ends the lifetime of one of its arguments without calling the destructor is not new.

Jens Maurer

unread,
Mar 29, 2017, 4:16:06 PM3/29/17
to std-pr...@isocpp.org
On 03/29/2017 09:41 PM, Nicol Bolas wrote:
> Tell me: did the various rvalue-reference proposals have to go through so much scrutiny?

I'm not sure about "various", but the one that ended up in the standard
did get its share of scrutiny.

> Did their authors have to prove that compilers couldn't find ways to detect "movement" type operations without explicit syntax and object models for it?

For an example such as std::vector<std::string>, I believe
we had performance charts and, as far as I remember, it was
clear that we would need to require the compiler to fold
allocations and deallocations of dynamic memory, which is
(in general) and non-analyzed external function call, absent
LTO or similar.

> What about the proposals that led to the creation of the TriviallyCopyable concept; did the authors of those have to /prove/ that compilers would be incapable of achieving performance equivalent to a `memcpy` without letting us actually `memcpy` them?

I don't know what you mean here. From a standards perspective, all
that "trivially copyable" does is enable [basic.types] p2 and p3;
note that these paragraphs are carefully crafted to refer to
existing objects (presumably within their lifetimes), whose values
are then modified using memcpy. That doesn't really help the
std::vector reallocation case.

On the implementation level, "trivially copyable" is a useful term
to indicate that copies are not observable by the user unless the
address of an object is taken (i.e. no user code is called to
perform the copy), so those types are suitable for putting into
registers or, in fact, for memcpy'ing on the implementation level.

> There have been plenty of proposals that have been accepted that /don't/ rise to such a level of scrutiny. Why is this one so special?

Because 20+ years of exposure to the C++ memory model makes us
scared as hell to try to mess with it. For example, it took us
that long to realize that std::vector (in particular the data()
member returning a pointer to an array lookalike) isn't actually
implementable with the current memory model. See also P0137R1.

> And most importantly of all, why does it matter? What is the reason for being against explicitly providing users the ability to do such things, when the alternative is essentially /hoping/ that compilers can optimize the cruft away?

Because breaking the memory model means breaking good-looking
C++ code in unpredictable ways once the optimizers get an
understanding of the memory model update.

That is much more scary than adding yet another syntax-level
feature such as structured bindings or implicit comparisons or
even concepts.

> I don't understand the logic of declaring QoI to be an effective strategy of making C++ fast.

Because QoI is what keeps the workload for the committee manageable?

> But there is no effort in teaching compilers "new lifetime rules," since there are no "new lifetime rules". Consider the following function:

> void func(T *src, size_t count =1)
> {
> new(src) unsigned char[count * sizeof(T)];
> }
>
> This function ends the lifetime of all objects stored between `src` and `src + count`.

Yes,
- possibly already causing undefined behavior if those objects had a non-trivial
destructor,
- using "new", which we already taught to the optimizers as being "special".

> The compiler must already be able to handle this, because that's part of the standard right now.
>
> So a function that ends the lifetime of one of its arguments without calling the destructor is not new.

Sure, but is it worth the effort (for the destructive move case)?

Jens

Matthew Woehlke

unread,
Mar 29, 2017, 4:17:10 PM3/29/17
to std-pr...@isocpp.org
On 2017-03-29 15:28, Nicol Bolas wrote:
> On Wednesday, March 29, 2017 at 2:19:22 PM UTC-4, Matthew Woehlke wrote:
>> I created a type that contains a unique_ptr<string>, which is always
>> initialized to "hello, world!". The copy ctor, dtor, and assignment
>> operator were all defaulted. I then inserted 1000 of these into a
>> modified¹ QVector in three trials; one with the move ctor suppressed,
>> one with a defaulted move ctor, and one with Q_MOVABLE_TYPE. I also ran
>> both sets with -O0 and -O3.
>>
>> Results:
>>
>> -O0 -O3
>> copy 5.53 10.37
>> move 5.40 8.53
>> relocate 4.43 8.24
>>
>> (The -O0 numbers are smaller because I ran 5x as many iterations for the
>> -O3 case.)
>>
>> I'm actually surprised the difference is so small [...]
>
> I think a better case would be for a type which is *not* trivially
> copyable. I expect compilers to be able to optimize away a loop over a
> trivial destructor. It's a lot harder for them to optimize away a loop over
> a non-trivial destructor.

...but the dtor *isn't* trivial! It has to check if the pointer is null,
and free it if not. Nor is move construction trivial; it has to steal
the pointer from the moved-from unique_ptr. Apparently the compiler is
able to detect that it will always be null because the unique_ptr was
moved-from and can thus optimize away the dtor.

The only other possibility that comes to mind is that Qt is treating my
type as relocatable *without* Q_MOVABLE_TYPE if it has a move ctor...

> Compare the difference between a `vector<T*>` and `vector<unique_ptr<T>>`.
> The former will have the same move performance characteristics as a
> relocatable `unique_ptr` would.

Okay, that seems like a plausible test:

vector<unique_ptr<int> 4.4
vector<int*> 2.8

Well. Those are better numbers, anyway :-).

--
Matthew

Jens Maurer

unread,
Mar 29, 2017, 5:05:29 PM3/29/17
to std-pr...@isocpp.org
On 03/29/2017 08:19 PM, Matthew Woehlke wrote:
> On 2017-03-28 16:20, Ville Voutilainen wrote:
>> The *"explained"* performance benefits? Show us hard numbers,
>
> Okay.
>
> I created a type that contains a unique_ptr<string>, which is always
> initialized to "hello, world!". The copy ctor, dtor, and assignment
> operator were all defaulted. I then inserted 1000 of these into a
> modified¹ QVector in three trials; one with the move ctor suppressed,
> one with a defaulted move ctor, and one with Q_MOVABLE_TYPE. I also ran
> both sets with -O0 and -O3.
>
> Results:
>
> -O0 -O3
> copy 5.53 10.37

How does "copy" work, since the type you created doesn't seem
have a copy constructor? (At least unique_ptr doesn't.)

> move 5.40 8.53
> relocate 4.43 8.24

Thanks for the experiment. On my 32-bit Intel box,

struct S {
std::unique_ptr<std::string> p;
};

void f(S * dst, S * src, S * end)
{
while (src != end)
new (dst++) S(std::move(*src++));
}

compiles to (inner loop shown only; gcc 6.3.0):

.L3:
addl $4, %eax
testl %edx, %edx
je .L4
movl -4(%eax), %ecx
movl $0, -4(%eax)
movl %ecx, (%edx)
.L4:
addl $4, %edx
cmpl %eax, %ebx
jne .L3

(I'm not sure what the null pointer check for the
destination is supposed to do here.)

Presumably, the optimizer doesn't know in this case that
dst and src won't overlap. I've tried both "restrict" and
a local "malloc" for dst to teach the optimizer that fact,
but failed.

(Even for memcpy, the optimizer needs to assert non-overlap.)

This doesn't look far from "memcpy" and "memset", it seems to me.
All memory accesses are sequential, which greatly helps throughput
and automatic prefetch.

> Interestingly, if I increase the item count to 100k, the difference
> between the move and relocate cases becomes noise.

Yup, you're measuring RAM access latency and not much else.

> (¹ Stock QVector does not use move construction during resizing; I used
> a local copy that I modified to do so.)
>
> Another interesting test I have not done would be sorting a list when
> the swap operation can be implemented either as relocation or "the hard
> way". Relocation would especially win if

Sentence chopped off?

Jens

Ville Voutilainen

unread,
Mar 29, 2017, 5:14:12 PM3/29/17
to ISO C++ Standard - Future Proposals
On 30 March 2017 at 00:05, Jens Maurer <Jens....@gmx.net> wrote:
> .L3:
> addl $4, %eax
> testl %edx, %edx
> je .L4
> movl -4(%eax), %ecx
> movl $0, -4(%eax)
> movl %ecx, (%edx)
> .L4:
> addl $4, %edx
> cmpl %eax, %ebx
> jne .L3
>
> (I'm not sure what the null pointer check for the
> destination is supposed to do here.)

That would be https://gcc.gnu.org/bugzilla/show_bug.cgi?id=35878

Thiago Macieira

unread,
Mar 29, 2017, 7:06:08 PM3/29/17
to std-pr...@isocpp.org
On terça-feira, 28 de março de 2017 13:20:25 PDT Ville Voutilainen wrote:
> The *"explained"* performance benefits? Show us hard numbers, and show
> that similar
> improvements aren't achievable without fundamental changes to the
> object and memory model.

Here are my numbers. This was also posted to the Qt development mailing list,
where the same discussion is happening.

Source code: http://paste.opensuse.org/83426813
[Does not compile with standard Qt, use [1]]

Without further ado, results. Analysis and conclusions below.

GCC 6.3:,CPU cycles,% of move,
copyTrivial,"7.015.001,04688",105%,
moveTrivial,"6.653.361,60938",100%,
memcpyTrivial,"5.556.582,67969",84%,
reallocTrivial,"3.209.727,01953",48%,
copyComplex,"50.938.402,71875",99%,
moveComplex,"51.306.523,56250",100%,
memcpyComplex,"40.689.058,50000",79%,
reallocComplex,"13.450.587,92188",26%,
,,,
Clang trunk:,,,
copyTrivial,"6.366.532,53906",101%,
moveTrivial,"6.284.109,14063",100%,
memcpyTrivial,"6.410.652,50000",102%,
reallocTrivial,"3.926.226,85547",62%,
copyComplex,"44.778.841,93750",84%,
moveComplex,"53.097.208,06250",100%,
memcpyComplex,"38.966.088,87500",73%,
reallocComplex,"12.701.627,09375",24%,
,,,
ICC 17:,,,
copyTrivial,"7.656.487,46094",100%,
moveTrivial,"7.688.472,17969",100%,
memcpyTrivial,"8.346.315,19531",109%,
reallocTrivial,"4.885.068,30859",64%,
copyComplex,"68.326.384,37500",91%,
moveComplex,"75.145.256,50000",100%,
memcpyComplex,"52.546.942,75000",70%,
reallocComplex,"29.427.426,37500",39%,

Analysis:

1) I've valground the application and it leaks no memory

2) the application was compiled with exceptions disabled, as we're not testing
the exceptional code paths. This also simulates a complex type that has a
noexcept move constructor.

3) because this is using QArrayData allocation with GrowsForward, the
reallocation strategy does not happen on every execution. In fact, to append
one million items, the allocate() function is called only 59 times.

4) the benchmarks only the appending into the vector. The freeing of the
vector's elements at the end is not included.

5) there are four times two tests:
copy + destroy original
move + destroy original
memcpy
pure realloc
run each for
a trivial type (int)
a complex but relocatable type (QString)

6) the test is skewed towards realloc because there's no other memory
allocation, so realloc() is free to resize the memory block in-place, as it
sees fit. Tough luck for the other tests.

7) the copy and move operations for the trivial type are, as expected, within
the noise of each other. The 5% of the spike for GCC does not appear in all
runs (the first test suffers a little due to warming up and the need to obtain
memory from the OS).

8) the difference between copy/move and memcpy for the trivial type depend on
the compiler. I compiled using -O2, which means GCC did not use its
vectoriser, but Clang did (-march=native, so they used AVX2). Meanwhile, all
three called the libc memcpy function in the memcpy version, which has
vectorisation (in ICC's case, it called __intel_avx_rep_memcpy).

Conclusions:

A) for trivial types, the relocation provides negligible, if any, improvement,
as the operation is already there. But as shown, not all compilers are as
smart and may miss optimisation opportunities by its absence. The more
information the compiler has, the better it can generate code.

B) for non-trivial types, the relocation provides 25% or more performance
improvement (1 / 80% = 125%). That's probably because QString's destructor is
actually non-negligible, as it needs to check if the data block it points to
needs to be freed.

C) the realloc strategy is clearly the winner here, achieving up to 316%
improvement over today's state-of-the-art. As stated in the analysis, this
happens with only 59 allocations out of 1 million elements appended. Now,
since nothing else was allocated in this benchmark, realloc() most likely did
not have to perform memcpy, so these numbers are not representative of all
applications, but they are possible and can even be surpassed with dedicated
arena allocators.

[1] https://gitlab.com/thiagomacieira/qtbase (rebases often!)

Andrey Davydov

unread,
Mar 30, 2017, 1:23:12 AM3/30/17
to ISO C++ Standard - Future Proposals
Perhaps a proposal author should.
Ok, there is my benchmark, tell me please if it contains any systematic error.
#include <memory>
#include <vector>
#include <cstring>
#include <chrono>
#include <iostream>

template<class T>
void relocate_using_move(T * new_buffer, T * old_buffer, size_t size)
{
    for (size_t i = 0; i != size; ++i)
        new_buffer[i] = std::move_if_noexcept(old_buffer[i]);

    for (size_t i = 0; i != size; ++i)
        old_buffer[i].~T();
};

template<class T>
void relocate_using_memcpy(T * new_buffer, T * old_buffer, size_t size)
{
    using value_storage_type = std::aligned_storage_t<sizeof(T), alignof(T)>; 
    std::memcpy(new_buffer, old_buffer, size * sizeof(value_storage_type));
};

using clock_type = std::chrono::high_resolution_clock;

template<class Functor>
clock_type::duration repeat_n_times(Functor f, size_t times)
{
    auto start = clock_type::now();
    while (times --> 0)
        f();
    auto finish = clock_type::now();
    return finish - start;
}

void print(const char * message, clock_type::duration time)
{
    using namespace std::chrono;

    std::cout << message << ": " << duration_cast<milliseconds>(time).count() << "ms\n";
}

template<class T>
void destroy(T * buffer, size_t size)
{
    for (size_t i = 0; i != size; ++i)
        buffer[i].~T();
}

int main()
{
    using value_type = std::vector<int>;
    using value_storage_type = std::aligned_storage_t<sizeof(value_type), alignof(value_type)>;

    const size_t N = 100000;

    std::unique_ptr<value_storage_type[]> buf1(new value_storage_type[N]);
    std::unique_ptr<value_storage_type[]> buf2(new value_storage_type[N]);

    for (size_t i = 0; i != N; ++i)
        new(&buf1[i]) value_type(10);

    const auto buf_ptr1 = reinterpret_cast<value_type *>(buf1.get());
    const auto buf_ptr2 = reinterpret_cast<value_type *>(buf2.get());

    const size_t invocations_num = 10;

    print("relocate using move  ", repeat_n_times([&] () {
        relocate_using_move(buf_ptr2, buf_ptr1, N);
        relocate_using_move(buf_ptr1, buf_ptr2, N);
    }, 10));
    print("relocate using memcpy", repeat_n_times([&] () {
        relocate_using_memcpy(buf_ptr2, buf_ptr1, N);
        relocate_using_memcpy(buf_ptr1, buf_ptr2, N);
    }, 10));

    destroy(buf_ptr1, N);
}
Results for `gcc -O2`:
relocate using move  : 18ms
relocate using memcpy: 4ms
Results for `gcc -O3`:
relocate using move  : 15ms 
relocate using memcpy: 4ms

System information:
gcc version 6.3.1
Linux 4.9.11
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Matthew Woehlke

unread,
Mar 30, 2017, 10:12:09 AM3/30/17
to std-pr...@isocpp.org
On 2017-03-29 17:05, Jens Maurer wrote:
> On 03/29/2017 08:19 PM, Matthew Woehlke wrote:
>> I created a type that contains a unique_ptr<string>, which is always
>> initialized to "hello, world!". The copy ctor, dtor, and assignment
>> operator were all defaulted. I then inserted 1000 of these into a
>> modified¹ QVector in three trials; one with the move ctor suppressed,
>> one with a defaulted move ctor, and one with Q_MOVABLE_TYPE. I also ran
>> both sets with -O0 and -O3.
>>
>> Results:
>>
>> -O0 -O3
>> copy 5.53 10.37
>
> How does "copy" work, since the type you created doesn't seem
> have a copy constructor? (At least unique_ptr doesn't.)

Right; sorry. My type had a reasonable but not defaulted copy ctor:

item(item const& other)
: data{std::make_unique<std::string>(*other.data)}
{}

(I was originally using plain old `int` as the data type; the results
were comparable to using `std::string`. The original intent was to have
a type that acted like a simple CoW value type.)

>> Another interesting test I have not done would be sorting a list when
>> the swap operation can be implemented either as relocation or "the hard
>> way". Relocation would especially win if
>
> Sentence chopped off?

I guess... alas, now I can't remember what I meant to say :-).

--
Matthew

Jens Maurer

unread,
Mar 30, 2017, 11:25:36 AM3/30/17
to std-pr...@isocpp.org
On 03/29/2017 11:05 PM, Jens Maurer wrote:
> void f(S * dst, S * src, S * end)
> {
> while (src != end)
> new (dst++) S(std::move(*src++));
> }
>
> compiles to (inner loop shown only; gcc 6.3.0):
>
> .L3:
> addl $4, %eax
> testl %edx, %edx
> je .L4
> movl -4(%eax), %ecx
> movl $0, -4(%eax)
> movl %ecx, (%edx)
> .L4:
> addl $4, %edx
> cmpl %eax, %ebx
> jne .L3

Update with calling the destructor interleaved:

#define __restrict__
void f(S * __restrict__ dst, S * __restrict__ src, S * end)
{
while (src != end) {
new (dst++) S(std::move(*src));
(*src).~S();
++src;
}
}

.L12:
testl %esi, %esi
je .L3
movl (%ebx), %eax
movl $0, (%ebx)
movl %eax, (%esi)
.L3:
movl (%ebx), %edi
testl %edi, %edi
je .L4
/* destructor of std::string */
.L4:
addl $4, %ebx
addl $4, %esi
cmpl %ebx, %ebp
jne .L12

Again, the point here is that the compiler doesn't know that
"dst" and "src" don't overlap. Adding __restrict__ helps
(I don't know why it didn't work before):

void f(S * __restrict__ dst, S * __restrict__ src, S * end)
{
while (src != end) {
new (dst++) S(std::move(*src));
(*src).~S();
++src;
}
}

.L10:
testl %esi, %esi
je .L3
movl (%ebx), %eax
movl $0, (%ebx)
movl %eax, (%esi)
.L4:
addl $4, %ebx
addl $4, %esi
cmpl %ebx, %edi
jne .L10

So, the destructor invocation went away entirely, including
the condition branch. It seems worthwhile to invest in syntax
means to inform the compiler of pointer value restrictions;
such a facility would presumably be more widely applicable
than destructive moves.

Jens

Jens Maurer

unread,
Mar 30, 2017, 11:28:18 AM3/30/17
to std-pr...@isocpp.org
On 03/30/2017 01:05 AM, Thiago Macieira wrote:
> Here are my numbers. This was also posted to the Qt development mailing list,
> where the same discussion is happening.
>
> Source code: http://paste.opensuse.org/83426813
> [Does not compile with standard Qt, use [1]]
>
> Without further ado, results. Analysis and conclusions below.

> copyComplex,"44.778.841,93750",84%,
> moveComplex,"53.097.208,06250",100%,

> copyComplex,"68.326.384,37500",91%,
> moveComplex,"75.145.256,50000",100%,

This is odd. Why is the copy less expensive than the move?

Jens

Thiago Macieira

unread,
Mar 30, 2017, 11:54:37 AM3/30/17
to std-pr...@isocpp.org
On quarta-feira, 29 de março de 2017 22:23:12 PDT Andrey Davydov wrote:
> Ok, there is my benchmark, tell me please if it contains any systematic
> error.
[cut]
> Results for `gcc -O2`:
> relocate using move : 18ms
> relocate using memcpy: 4ms
> Results for `gcc -O3`:
> relocate using move : 15ms
> relocate using memcpy: 4ms

Nope, that looks like what I'd expect. You gave me a scare for a second for
using std::vector<int>, but you're not measuring vector's time for moving
ints, but the time to move std::vector<int> itself. Your code makes a
(correct) assumption about whether libstdc++'s std::vector<int> is
relocatable.

One hint: run for longer than a couple of milliseconds, to get any transients
smoothed out.

Thiago Macieira

unread,
Mar 30, 2017, 12:44:15 PM3/30/17
to std-pr...@isocpp.org
Checking the assembly that Clang and ICC generated...

For ICC: I can see the move constructor there. ICC decided to unroll the
moving loop once, so it moves two QStrings per loop, which makes each loop
operate on a multiple of 16 (64-bit QString for me is 24 bytes). In the copy
case, I do not see any unrolling. So it's possible that the difference can be
attributed to the additional overhead due to the unrolling.

Remember: QString is copy-on-write and since the QStrings in this are all
empty, the copy constructor copies the 24 bytes, tests one bit and then
continues to the next iteration. It's quite fast.

For Clang: there's no difference! Somehow, Clang called the move constructor
in the copy case! That doesn't make sense to me. Why is it doing that?

As for the difference in time, it appears to be noise. When running again, the
move and copy trade places as the fastest.

Andrey Davydov

unread,
Mar 31, 2017, 2:00:33 AM3/31/17
to ISO C++ Standard - Future Proposals
On quarta-feira, 29 de março de 2017 22:23:12 PDT Andrey Davydov wrote:
> Ok, there is my benchmark, tell me please if it contains any systematic
> error.
[cut]
> Results for `gcc -O2`:
> relocate using move  : 18ms
> relocate using memcpy: 4ms
> Results for `gcc -O3`:
> relocate using move  : 15ms
> relocate using memcpy: 4ms

Nope, that looks like what I'd expect. You gave me a scare for a second for
using std::vector<int>, but you're not measuring vector's time for moving
ints, but the time to move std::vector<int> itself. Your code makes a
(correct) assumption about whether libstdc++'s std::vector<int> is
relocatable.
I have used std::vector (instead of std::unique_ptr, for example) because its move constructor consists of quite a few operations (relatively to unique_ptr).

One hint: run for longer than a couple of milliseconds, to get any transients
smoothed out.
It can be seen that when the test dataset is placed entirely in L3 cache, then relocation using memcpy is faster about 4 times (for instance, 213ms against 854 ms). In the other case it's faster slightly less th 2 times (2594 against 4751ms).

Antony Polukhin

unread,
Mar 31, 2017, 3:27:15 AM3/31/17
to std-pr...@isocpp.org
2017-03-31 9:00 GMT+03:00 Andrey Davydov <andrey.a...@gmail.com>:
>> On quarta-feira, 29 de março de 2017 22:23:12 PDT Andrey Davydov wrote:
>> > Ok, there is my benchmark, tell me please if it contains any systematic
>> > error.
>> [cut]
>> > Results for `gcc -O2`:
>> > relocate using move : 18ms
>> > relocate using memcpy: 4ms
>> > Results for `gcc -O3`:
>> > relocate using move : 15ms
>> > relocate using memcpy: 4ms

As I understand, the is_relocatable trait is mostly for speeding up
the std::vector::resize/uinitialized_move+deallocate_range/std::vector::reserve.

GCC's std::vector::reserve has the following code:

pointer __tmp = _M_allocate_and_copy(__n,
_GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(this->_M_impl._M_start),
_GLIBCXX_MAKE_MOVE_IF_NOEXCEPT_ITERATOR(this->_M_impl._M_finish));
std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
_M_get_Tp_allocator());

From the banchmarks from above it seems that std::vector::reserve is
not well optimized.



Here are some of my experiments. The following chunk of code is very
well optimized by GCC 7.0.1 20170330 with -std=c++1z -O2

#include <memory>
void relocate_using_move(std::shared_ptr<int>*__restrict__ new_buffer,
std::shared_ptr<int>*__restrict__ old_buffer, size_t size) {
typedef std::shared_ptr<int> T;
for (size_t i = 0; i != size; ++i) {
new (new_buffer + i) T{std::move(old_buffer[i])};
old_buffer[i].~T();
}
}


Disassembly looks like a simple loop:

relocate_using_move(std::shared_ptr<int>*, std::shared_ptr<int>*,
unsigned long):
test rdx, rdx
je .L1
sal rdx, 4
add rdx, rdi
.L3:
mov rax, QWORD PTR [rsi]
add rdi, 16
add rsi, 16
mov QWORD PTR [rdi-16], rax
mov rax, QWORD PTR [rsi-8]
mov QWORD PTR [rdi-8], rax
cmp rdi, rdx
jne .L3
.L1:
rep ret


But any minor change to that code and the disassembly becomes *much*
worse. Minor changes I've tried:
* removing the first __restrict__
* removing the second __restrict__
* making 2 loops instead of a single one:

void relocate_using_move(std::shared_ptr<int>*__restrict__ new_buffer,
std::shared_ptr<int>*__restrict__ old_buffer, size_t size) {
typedef std::shared_ptr<int> T;
for (size_t i = 0; i != size; ++i) {
new (new_buffer + i) T{std::move(old_buffer[i])};
}

for (size_t i = 0; i != size; ++i) {
old_buffer[i].~T();
}
}


GCC's std::vector::reserve has all 3 optimization barriers from above:
no __restrict__s and two loops under the hood.


QUESTION: Does the C++ memory model allows to transform

void f(T* new_buffer, T* old_buffer, size_t size) {
for (size_t i = 0; i != size; ++i) {
new (new_buffer + i) T{std::move(old_buffer[i])};
}

for (size_t i = 0; i != size; ++i) {
old_buffer[i].~T();
}
}


into

void f(T* __restrict__ new_buffer, T* __restrict__ old_buffer, size_t size) {
for (size_t i = 0; i != size; ++i) {
new (new_buffer + i) T{std::move(old_buffer[i])};
old_buffer[i].~T();
}
}


If the answer to the question is "yes", then it's better not to have
the proposed trait as tuning the optimizer will do the job. If the
answer is "no", then looks like the C++ memory model has to be
improved. If the answer is "no" and the memory model could not be
improved, then the proposed trait or some unitialized_move_destroy
function seem to be the right answer.

--
Best regards,
Antony Polukhin

Jens Maurer

unread,
Mar 31, 2017, 3:41:47 AM3/31/17
to std-pr...@isocpp.org
This matches my experience; see my other e-mail.
(The useless null pointer check is now removed; good.
And the dead store to the move-source has gone away, too.)

In order to get optimal performance, we now need to teach
the gcc optimizer to recognize that this is a simple
memcpy loop. Although recent Intel hardware should get
pretty close to memcpy speeds with a simple loop like
the above.

> QUESTION: Does the C++ memory model allows to transform
>
> void f(T* new_buffer, T* old_buffer, size_t size) {
> for (size_t i = 0; i != size; ++i) {
> new (new_buffer + i) T{std::move(old_buffer[i])};
> }
>
> for (size_t i = 0; i != size; ++i) {
> old_buffer[i].~T();
> }
> }
>
>
> into
>
> void f(T* __restrict__ new_buffer, T* __restrict__ old_buffer, size_t size) {
> for (size_t i = 0; i != size; ++i) {
> new (new_buffer + i) T{std::move(old_buffer[i])};
> old_buffer[i].~T();
> }
> }

The standard library can certainly do that transformation inside its
std::vector implementation, provided that the move constructor is
non-throwing.

On a C++ level, we also need a non-throwing move constructor.
Other than that, I don't think std::vector specifies when it
calls move constructors vs. destructors when it reallocates,
so the transformation is probably good under as-if.

> If the answer to the question is "yes", then it's better not to have
> the proposed trait as tuning the optimizer will do the job.

Why don't we simply ask the standard library implementer to fuse
the two loops and add a few __restrict__ annotations?

> If the
> answer is "no", then looks like the C++ memory model has to be
> improved. If the answer is "no" and the memory model could not be
> improved, then the proposed trait or some unitialized_move_destroy
> function seem to be the right answer.

Jens

Thiago Macieira

unread,
Mar 31, 2017, 12:06:33 PM3/31/17
to std-pr...@isocpp.org
On sexta-feira, 31 de março de 2017 00:41:43 PDT Jens Maurer wrote:
> Why don't we simply ask the standard library implementer to fuse
> the two loops and add a few __restrict__ annotations?

We should, but from my benchmarks, we're still leaving 50% of the performance
on the floor by not permitting realloc().

Jens Maurer

unread,
Mar 31, 2017, 1:31:42 PM3/31/17
to std-pr...@isocpp.org
On 03/31/2017 06:06 PM, Thiago Macieira wrote:
> On sexta-feira, 31 de março de 2017 00:41:43 PDT Jens Maurer wrote:
>> Why don't we simply ask the standard library implementer to fuse
>> the two loops and add a few __restrict__ annotations?
>
> We should, but from my benchmarks, we're still leaving 50% of the performance
> on the floor by not permitting realloc().

But I thought the benchmark setup was such that realloc() would actually
extend the allocation in-place and not perform a copy at all?

If so, a future interface for C++ realloc() could take that into
account. Not copying or moving at all is certainly an option for
std::vector when being enlarged. if the memory subsystem allows it.
Note that this also works for types that are not relocatable.

Jens


Thiago Macieira

unread,
Mar 31, 2017, 7:11:00 PM3/31/17
to std-pr...@isocpp.org
On sexta-feira, 31 de março de 2017 10:31:27 PDT Jens Maurer wrote:
> On 03/31/2017 06:06 PM, Thiago Macieira wrote:
> > On sexta-feira, 31 de março de 2017 00:41:43 PDT Jens Maurer wrote:
> >> Why don't we simply ask the standard library implementer to fuse
> >> the two loops and add a few __restrict__ annotations?
> >
> > We should, but from my benchmarks, we're still leaving 50% of the
> > performance on the floor by not permitting realloc().
>
> But I thought the benchmark setup was such that realloc() would actually
> extend the allocation in-place and not perform a copy at all?

Yes and no.

Yes, I did design it so that it could do that. And as I said in the analysis
and conclusions, given a good arena allocator, that's exactly what will happen
and therefore is a valid condition.

No because realloc() actually doesn't do it. When debugging another issue
related to qReallocAligned, I found out that glibc's realloc() will often move
things when the block grows. It probably has a heuristic algorithm to avoid
fragmentation. More importantly, when the size crosses a certain threshold
(128kB on glibc's malloc), it will start using mmap instead of the regular
heap, so it *has* to move, but again after that it will begin using mremap()
which will have a ~O(1) cost instead of O(n).

In all, I find that this is actually very representative of real world
situations, since there were realloc() calls that did not need memcpy and
there were some that did.

> If so, a future interface for C++ realloc() could take that into
> account. Not copying or moving at all is certainly an option for
> std::vector when being enlarged. if the memory subsystem allows it.
> Note that this also works for types that are not relocatable.

Non-relocatable types can't benefit from realloc() that can memcpy or mremap
behind your back.

We've been through this: that would require realloc_in_place() but no one has
got WG14 to accept that.

Matthew Fioravante

unread,
Apr 17, 2017, 1:18:52 PM4/17/17
to ISO C++ Standard - Future Proposals
I have not read the entire thread, but I can say one thing about this.

If one of the perceived goals of this proposal is to enable realloc() support in vector, then I think that's extremely misguided. The problem there is not being able to define relocatability but instead that the interface to realloc() is fundamentally broken. Instead of adding a kludge around realloc(), just fix realloc() instead.

A better approach is a memory allocation function like bool expand(T*, size_t new_size), which either grows the allocation in place if it can or returns false if it cannot. Then the calling code (which is templated on T) can fallback to allocating a new buffer and doing the copy/move operations as it currently does.

I think one of the reasons facebook and others introduce is_relocatable() is because realloc() is all we have currently. If expand() existed, do you think they would bother? I've done this is_relocatable() thing before. Its a huge pain and if I had expand() I wouldn't have bothered.

Matthew Fioravante

unread,
Apr 17, 2017, 1:30:14 PM4/17/17
to ISO C++ Standard - Future Proposals
I'd also like to point out that in place reallocation from expand() works for any type T. Whereas realloc() is limited to only T's that are relocatable. That's another huge disadvantage of realloc(). Its really not a function that can be useful in C++ and we need to stop discussing complex hacks like is_relocatable<T> to try to use it.

Thiago Macieira

unread,
Apr 17, 2017, 1:57:18 PM4/17/17
to std-pr...@isocpp.org
Em segunda-feira, 17 de abril de 2017, às 10:18:52 PDT, Matthew Fioravante
escreveu:
> If one of the perceived goals of this proposal is to enable realloc()
> support in vector, then I think that's extremely misguided. The problem
> there is not being able to define relocatability but instead that the
> interface to realloc() is fundamentally broken. Instead of adding a kludge
> around realloc(), just fix realloc() instead.

I don't remember for certain if anyone even asked WG14 for this. I have a
vague memory of someone reporting that they did and WG14 wasn't interested.
That means it's practically a closed case: we can't fix realloc, period.

> A better approach is a memory allocation function like bool expand(T*,
> size_t new_size), which either grows the allocation in place if it can or
> returns false if it cannot. Then the calling code (which is templated on T)
> can fallback to allocating a new buffer and doing the copy/move operations
> as it currently does.

Yes, we would like that. Other proposed names were realloc_inplace() or
malloc_resize(). That would be even better than using realloc, since we could
attempt to extend the allocation for non-relocatable T.

But looks like it will not be.

> I think one of the reasons facebook and others introduce is_relocatable()
> is because realloc() is all we have currently. If expand() existed, do you
> think they would bother? I've done this is_relocatable() thing before. Its
> a huge pain and if I had expand() I wouldn't have bothered.

Yes. expand() or resize() or whatever won't obviate the need for using realloc
for relocatable T, though. It's possible some implementations of realloc can
be even more efficient than malloc+memcpy+free, such as for example changing the
page tables and doing no memcpy. Therefore, realloc support is still required.

Magnus Fromreide

unread,
Apr 18, 2017, 2:28:51 AM4/18/17
to std-pr...@isocpp.org
On Mon, Apr 17, 2017 at 10:57:12AM -0700, Thiago Macieira wrote:
> Em segunda-feira, 17 de abril de 2017, às 10:18:52 PDT, Matthew Fioravante
> escreveu:
> > If one of the perceived goals of this proposal is to enable realloc()
> > support in vector, then I think that's extremely misguided. The problem
> > there is not being able to define relocatability but instead that the
> > interface to realloc() is fundamentally broken. Instead of adding a kludge
> > around realloc(), just fix realloc() instead.
>
> I don't remember for certain if anyone even asked WG14 for this. I have a
> vague memory of someone reporting that they did and WG14 wasn't interested.
> That means it's practically a closed case: we can't fix realloc, period.

Are you talking about c/n1085? That proposal was submitted to the WG14 but
nobody presented it and offered to do the work - as far as I understand
the standardization business that means they weren't interested enough
to do the work for us but they were not outright hostile about it.

This is at least how I am interpreting Howard Hinnant's message in the
thread with subject "try_realloc" from 2015-08-02.

/MF

(Does anyone know how to provide a stable link to a message?)
Reply all
Reply to author
Forward
0 new messages