template<typename T>
void relocate(T* new_buffer, T* old_buffer, size_t size)
{
for (size_t i = 0; i != size; ++i)
new (new_buffer[i]) T(std::move_if_noexcept(old_buffer[i]));
for (size_t i = 0; i != size; ++i)
old_buffer[i].~T();
}
new(new_place) T(std::move(old_place));
old_place.~T();template<typename T>
enable_if_t<is_relocatable_v<T>> swap(T& x, T& y)
{
constexpr auto size = sizeof(T);
aligned_storage_t<size> tmp;
memcpy(&tmp, &x, size);
memcpy(&x, &y, size);
memcpy(&y, &tmp, size);
}vector<int> foo(bool c)
{
vector<int> xs = { 1, 2, 3 }, ys = { 4, 5, 6 };
if (c)
return xs;
else
return ys;
}
auto result = foo(rand());
template<typename T, typename D>
struct is_relocatable<unique_ptr<T, D>> : bool_constant<is_relocatable_v<D>> {};Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
leaves the lifetime of the stuff at old_place untouched. Not to mention
that bit-blasting doesn't start the lifetime of something at new_place,
either.
> * shared_ptr: move constructor of the shared_ptr and destructor of the
> empty shared_ptr don't change shared count.
> * unique_ptr<T, D>, if D is relocatable (std::default_delete is
> trivially copyable therefore it is relocatable).
> * string and containers (at least when std::allocator is used).
What assembly code do you get from the existing move-construct and
destroy operations, compared to memcpy? Is there something we could
teach the optimizer to help out?
> * Define /relocatable class /in [class] after /trivially copyable class./
>
> A class s is a /relocatable class/ if it either explicitly marked as
> relocatable or
>
> 1. all basic classes are relocatable,
What are "basic classes"?
> 2. all non-static data members are relocatable,
> 3. s has a non-virtual, non-deleted and non user-defined destructor.
Well, classes that have a pointer to itself are also not relocatable,
but your definition seems to make them so.
> It should be discussed how to explicitly mark a class as relocatable.
> One of the possible solutions is to allow specializations of the type
> trait std::is_relocatable. For example,
> |
> template<typenameT,typenameD>
> structis_relocatable<unique_ptr<T,D>>:bool_constant<is_relocatable_v<D>>{};
I'm opposed to defining core language properties depending on the
specialization of a library-level type trait.
namespace std
{
template<bool> struct conditional_relocatable_marker;
template<>
struct conditional_relocatable_marker<true>
{
using type = relocatable_marker;
};
template<>
struct conditional_relocatable_marker<false>
{
struct type {};
};
template<typename T, typename D>
class unique_ptr : typename conditional_relocatable_marker<is_relocatable_v<D>>::type
{
// ...
};
}Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
leaves the lifetime of the stuff at old_place untouched. Not to mention
that bit-blasting doesn't start the lifetime of something at new_place,
either.It's true, but what practical problems for relocatable types does it cause to? Is it valid, for example, to implement std::copy of contiguous range of POD types as one call of memcpy?
> * shared_ptr: move constructor of the shared_ptr and destructor of the
> empty shared_ptr don't change shared count.
> * unique_ptr<T, D>, if D is relocatable (std::default_delete is
> trivially copyable therefore it is relocatable).
> * string and containers (at least when std::allocator is used).
What assembly code do you get from the existing move-construct and
destroy operations, compared to memcpy? Is there something we could
teach the optimizer to help out?I'd like to replace n calls of move constructor and n calls of destructor (for std::unique_ptr it, for example, contains if) by single memcpy during resize of std::vector.
T* std::relocate_and_construct(void *dst, T *src, size_t count = 1);
T* std::relocate_and_assign(T *dst, T *src, size_t count = 1);On terça-feira, 28 de março de 2017 00:47:02 PDT Jens Maurer wrote:
> > can be replaced by memcpy(&new_place, &old_place, sizeof(T)). Of course,
> > all trivially copyable types are /relocatable /by definition. Let's
> > enumerate some relocatable types from the standard library.
>
> Well, but ~T does end the lifetime of T, whereas the equivalent "memcpy"
> leaves the lifetime of the stuff at old_place untouched. Not to mention
> that bit-blasting doesn't start the lifetime of something at new_place,
> either.
Obviously that needs to change. Relocatable types have trivial destructors, so
the lifetime of the object is tied to the lifetime of the storage allocation.
On terça-feira, 28 de março de 2017 10:20:24 PDT Nicol Bolas wrote:
> > Obviously that needs to change. Relocatable types have trivial
> > destructors, so
> > the lifetime of the object is tied to the lifetime of the storage
> > allocation.
>
> But they don't. `unique_ptr` does not have a trivial destructor. This is
> why it's important for the relocation operation to explicitly terminate the
> lifetime of the input value. Because for relocation to work, that
> termination cannot call the destructor.
Right.
I think we need to analyse this in the context of a larger problem, otherwise
we won't solve the long-term problems.
First, there's the destructive move, an operation that ties both a move
construction of the destination with the destruction of the source. It needs
to be a new function that one can add to their class. In most cases, it will
be what the move constructor does, except that it knows the source will never
ever be used again, so it need not leave that in a valid state.
For example, for std::shared_ptr:
- copy: increase reference count, leave source unchanged
- move: no change in reference count, reset source to empty
- move-destruct: no change in reference count, leave source unchanged
Second, we need a way to invoke this operation. It's not simply
a = std::move(b);
It could be:
template <typename T>
T *move_destroy(T *dest, T *source, size_t count = 1);
a function that requires ABI knowledge and possibly intrinsics. We also need
to solve the problem of whether the size cookie should be copied (probably
not).
Third, we have the relocation, which is a special case of move-destruction
that is a simple memcpy and abandon source.
Fourth, we have the case of triviality. A class that is trivially move
constructible and trivially destructible is trivially move-destructible. A
trivial move-destruction is a memcpy and abandon, so trivial move-destruction
is relocation.
The reason I'm separating the concerns here is that a type could implement a
move-destruction which is not simple memcpy + drop.
I don't know if we could say that a complex type with non-trivial move constructor and non-trivial destructor could have trivial move-destructor.
On 2017-03-28 13:50, Thiago Macieira wrote:
> Fourth, we have the case of triviality. A class that is trivially move
> constructible and trivially destructible is trivially move-destructible. A
> trivial move-destruction is a memcpy and abandon, so trivial move-destruction
> is relocation.
Okay, ugh... what I think most of us call "relocation" you are calling
"move-destruction". Can we please stick to "relocation" (what you call
move-destruction) and "trivial relocation" (what you call relocation)?
On 29 March 2017 at 21:19, Matthew Woehlke <mwoehlk...@gmail.com> wrote:
> This suggests that either a) memcpy is slower than we think it is, or b)
> the compiler is able to optimize the move case better than we think it
> can. (Note: I have not attempted to examine the generated machine
> instructions for any of these tests.)
Perhaps a proposal author should.
> Interestingly, if I increase the item count to 100k, the difference
> between the move and relocate cases becomes noise.
Which was, I think, a claim made by some opponents of destructive move.
>> and show that similar improvements aren't achievable without
>> fundamental changes to the object and memory model.
>
> I can't do that; you're asking us to prove a negative, which any
> educated person knows is next to impossible. I think the onus should
> instead be on the nay-sayers to demonstrate a plausible way of achieving
> similar benefits *without* relocation.
You can think whatever you want, but the onus is on a proposal author
to convince
the committee to accept the proposal, not the other way around.
I'm
also not asking you
to prove a negative; I'm hinting at exploring whether compilers can be
made smarter,
and what the effort difference of that compared to teaching compilers
new lifetime rules is.
void func(T *src, size_t count = 1)
{
new(src) unsigned char[count * sizeof(T)];
}Perhaps a proposal author should.
#include <memory>#include <vector>#include <cstring>#include <chrono>#include <iostream>
template<class T>void relocate_using_move(T * new_buffer, T * old_buffer, size_t size){ for (size_t i = 0; i != size; ++i) new_buffer[i] = std::move_if_noexcept(old_buffer[i]);
for (size_t i = 0; i != size; ++i) old_buffer[i].~T();};
template<class T>void relocate_using_memcpy(T * new_buffer, T * old_buffer, size_t size){ using value_storage_type = std::aligned_storage_t<sizeof(T), alignof(T)>; std::memcpy(new_buffer, old_buffer, size * sizeof(value_storage_type));};
using clock_type = std::chrono::high_resolution_clock;
template<class Functor>clock_type::duration repeat_n_times(Functor f, size_t times){ auto start = clock_type::now(); while (times --> 0) f(); auto finish = clock_type::now(); return finish - start;}
void print(const char * message, clock_type::duration time){ using namespace std::chrono;
std::cout << message << ": " << duration_cast<milliseconds>(time).count() << "ms\n";}
template<class T>void destroy(T * buffer, size_t size){ for (size_t i = 0; i != size; ++i) buffer[i].~T();}
int main(){ using value_type = std::vector<int>; using value_storage_type = std::aligned_storage_t<sizeof(value_type), alignof(value_type)>;
const size_t N = 100000;
std::unique_ptr<value_storage_type[]> buf1(new value_storage_type[N]); std::unique_ptr<value_storage_type[]> buf2(new value_storage_type[N]);
for (size_t i = 0; i != N; ++i) new(&buf1[i]) value_type(10);
const auto buf_ptr1 = reinterpret_cast<value_type *>(buf1.get()); const auto buf_ptr2 = reinterpret_cast<value_type *>(buf2.get());
const size_t invocations_num = 10;
print("relocate using move ", repeat_n_times([&] () { relocate_using_move(buf_ptr2, buf_ptr1, N); relocate_using_move(buf_ptr1, buf_ptr2, N); }, 10)); print("relocate using memcpy", repeat_n_times([&] () { relocate_using_memcpy(buf_ptr2, buf_ptr1, N); relocate_using_memcpy(buf_ptr1, buf_ptr2, N); }, 10));
destroy(buf_ptr1, N);}On quarta-feira, 29 de março de 2017 22:23:12 PDT Andrey Davydov wrote:
> Ok, there is my benchmark, tell me please if it contains any systematic
> error.
[cut]
> Results for `gcc -O2`:
> relocate using move : 18ms
> relocate using memcpy: 4ms
> Results for `gcc -O3`:
> relocate using move : 15ms
> relocate using memcpy: 4ms
Nope, that looks like what I'd expect. You gave me a scare for a second for
using std::vector<int>, but you're not measuring vector's time for moving
ints, but the time to move std::vector<int> itself. Your code makes a
(correct) assumption about whether libstdc++'s std::vector<int> is
relocatable.
One hint: run for longer than a couple of milliseconds, to get any transients
smoothed out.