During the process of implementing the proposal P0957 (https://wg21.link/p0957), I found that if the concept of "Trivially Swappable" is defined, the performance of the implementation of will be improved to a certain extent without reducing usability.
I am also wondering if this concept could help in generating default move constructors.
On Thursday, June 28, 2018 at 11:45:08 PM UTC-4, Mingxin Wang wrote:During the process of implementing the proposal P0957 (https://wg21.link/p0957), I found that if the concept of "Trivially Swappable" is defined, the performance of the implementation of will be improved to a certain extent without reducing usability.OK, so... what would this concept mean? Can you provide a definition of these requirements and what they would allow you to do?
I am also wondering if this concept could help in generating default move constructors.Do we need help generating default move constructors? Is `= default` not good enough? Or are you talking about something else?
--
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
To view this discussion on the web visit https://groups.google.com/a/isocpp.org/d/msgid/std-proposals/18469feb-bdda-466d-ba8c-37933c1ea807%40isocpp.org.
On Friday, June 29, 2018 at 12:16:05 PM UTC+8, Nicol Bolas wrote:On Thursday, June 28, 2018 at 11:45:08 PM UTC-4, Mingxin Wang wrote:During the process of implementing the proposal P0957 (https://wg21.link/p0957), I found that if the concept of "Trivially Swappable" is defined, the performance of the implementation of will be improved to a certain extent without reducing usability.OK, so... what would this concept mean? Can you provide a definition of these requirements and what they would allow you to do?Informally, a type meets the TriviallySwappable requirements if the "std::swap" function overload of this type performs bitwise swap operation.
How does this interface with the concept of relocatable? One would think relocatable implies trivially swappable, same as noexcept movable implies noexcept swappable.
On Friday, June 29, 2018 at 9:19:41 AM UTC+2, Mingxin Wang wrote:Informally, a type meets the TriviallySwappable requirements if the "std::swap" function overload of this type performs bitwise swap operation.And how you detect/define this? Probably best way would be if function `bit_swap(&a, &b)` exists. This function could work for relocate too (you swap with uninitialized memory).
On Friday, June 29, 2018 at 1:56:01 AM UTC-7, Gašper Ažman wrote:How does this interface with the concept of relocatable? One would think relocatable implies trivially swappable, same as noexcept movable implies noexcept swappable.In my C++Now 2018 talk on "trivially relocatable," I promised in the outline to talk about its relationship to "trivially swappable," and then did not actually do so — sorry!Essentially, yes, if a type is trivially relocatable then it intuitively ought to be considered trivially swappable. However, there are two minor caveats that I can think of off the top of my head (and the reason I didn't talk about it at C++Now is that I haven't thought about it much, and the reason for *that* is that I don't have a motivating use-case).Caveat (A): Trivial relocation can be optimized into memcpy() or memmove(). Trivial swap cannot be optimized into mem-anything, because there is no libc primitive for swapping arrays of bytes. We could certainly propose to add a __builtin_memswap() that would perform the swap "in-place" in cache-line-sized blocks, but I'm not aware of any proposals nor prior art in that area.
void trivial_swap(T &a, T &b)
{
std::byte buff[sizeof(T)];
memcpy(buff, &a, sizeof(T));
memcpy(&a, &b, sizeof(T)); //b is no longer valid.
memcpy(&b, buff, sizeof(T)); //b is valid again.
}
Caveat (A): Trivial relocation can be optimized into memcpy() or memmove(). Trivial swap cannot be optimized into mem-anything, because there is no libc primitive for swapping arrays of bytes. We could certainly propose to add a __builtin_memswap() that would perform the swap "in-place" in cache-line-sized blocks, but I'm not aware of any proposals nor prior art in that area.
Caveat (B): Notice that whereas "relocate" means "move-construct, then destroy", we might say that "swap" means "move-construct, then move-assign, then move-assign, then destroy." (This being the operation done by the unconstrained std::swap template.) This involves a relationship among 3 operations, which might be a little scarier than relocate's relationship among 2 operations, which is scarier than the current Standard Library's "trivially X" traits which all involve only a single operation.
Caveat (C): For small types like unique_ptr, __builtin_memswap() will not be any faster than the unconstrained std::swap template. The point of optimizing into mem-anything is to get speedups on large arrays, such as during std::vector reallocation. std::vector swapping is already fast, and cannot be made faster by __builtin_memswap(). Now, std::array swapping could be made faster; consider—std::array<std::unique_ptr<int>, 10000> a;std::array<std::unique_ptr<int>, 10000> b;a.swap(b); // could probably get a factor-of-2 speedup on this operation by using __builtin_memswapBut, this is not an operation that happens often enough in real programs for anyone to get really motivated about.
On Friday, 29 June 2018 00:19:40 PDT Mingxin Wang wrote:
> Informally, a type meets the *TriviallySwappable* requirements if the
> "std::swap" function overload of this type performs bitwise swap operation.
Why do you need the concept? Why can't you just use std::swap for your use-
case?
On Saturday, June 30, 2018 at 6:51:12 AM UTC+8, Arthur O'Dwyer wrote:Caveat (A): Trivial relocation can be optimized into memcpy() or memmove(). Trivial swap cannot be optimized into mem-anything, because there is no libc primitive for swapping arrays of bytes. We could certainly propose to add a __builtin_memswap() that would perform the swap "in-place" in cache-line-sized blocks, but I'm not aware of any proposals nor prior art in that area.I think it is acceptible just to make it "implementation-defined". If the type to swap is small, the compiler may generate specific instructions performing efficient copy operations for 1, 2, 4, 8... bytes (or tricks like `a ^= b ^= a ^= b`). Otherswise, the implementation may invoke `memcpy` twice. This is the usual implementation I have seen for the specializations of `std::swap`.Caveat (B): Notice that whereas "relocate" means "move-construct, then destroy", we might say that "swap" means "move-construct, then move-assign, then move-assign, then destroy." (This being the operation done by the unconstrained std::swap template.) This involves a relationship among 3 operations, which might be a little scarier than relocate's relationship among 2 operations, which is scarier than the current Standard Library's "trivially X" traits which all involve only a single operation.By saying "I am also wondering if this concept could help in generating default move constructors", I am thinking of the possibility to make "swap" a primitive, in other words, to make "swap beneath move", and the generated move constructors are always exception-safe.
Compiler-generated move constructors are always as exception-safe as the move constructors they call. And your feature here can't change that.
On Saturday, June 30, 2018 at 9:09:21 AM UTC+8, Nicol Bolas wrote:On Friday, June 29, 2018 at 9:03:26 PM UTC-4, Mingxin Wang wrote:By saying "I am also wondering if this concept could help in generating default move constructors", I am thinking of the possibility to make "swap" a primitive, in other words, to make "swap beneath move", and the generated move constructors are always exception-safe.That would not have that effect. A type with a throwing move constructor is one that cannot be empty, and therefore allocates state even if it is empty. The classic example being `std::list` implementations that are required to have a single node. Your "swap primitive" wouldn't be able to allocate that memory, so it could not use that implementation.I do not see the difference between allocating constructions and non-allocating constructions. Generating move constructors with `swap` requires the types to be trivially swappable and default constructible, rather than trivially default constructible. Thus I think the move constructor of `std::list` can theoretically be generated with `swap`.
Compiler-generated move constructors are always as exception-safe as the move constructors they call. And your feature here can't change that.You are right about that. However, the default constructors are not always correct, e.g. for `std::unique_ptr`, because there is a chance for the default move constructors to have different semantics from other hand-written constructors. Generating move constructors with `swap` could avoid such abuse.
On Friday, June 29, 2018 at 6:51:12 PM UTC-4, Arthur O'Dwyer wrote:On Friday, June 29, 2018 at 1:56:01 AM UTC-7, Gašper Ažman wrote:How does this interface with the concept of relocatable? One would think relocatable implies trivially swappable, same as noexcept movable implies noexcept swappable.In my C++Now 2018 talk on "trivially relocatable," I promised in the outline to talk about its relationship to "trivially swappable," and then did not actually do so — sorry!Essentially, yes, if a type is trivially relocatable then it intuitively ought to be considered trivially swappable. However, there are two minor caveats that I can think of off the top of my head (and the reason I didn't talk about it at C++Now is that I haven't thought about it much, and the reason for *that* is that I don't have a motivating use-case).Caveat (A): Trivial relocation can be optimized into memcpy() or memmove(). Trivial swap cannot be optimized into mem-anything, because there is no libc primitive for swapping arrays of bytes. We could certainly propose to add a __builtin_memswap() that would perform the swap "in-place" in cache-line-sized blocks, but I'm not aware of any proposals nor prior art in that area.
void trivial_swap(T &a, T &b)
{
std::byte buff[sizeof(T)];
memcpy(buff, &a, sizeof(T));
memcpy(&a, &b, sizeof(T)); //b is no longer valid.
memcpy(&b, buff, sizeof(T)); //b is valid again.
}I believe the validity of this code naturally falls out of a type being TriviallyRelocatable.
> Say you have a type-erased type like `any`. It's storing some type-erased
> value, and it's using small buffer optimization. If you move an `any`, and
> the stored object fits within the small buffer (like a `unique_ptr<T>`),
> then you can only move it by invoking `unique_ptr<T>`'s move constructor.
> This requires an indirect call through the type-erasure machinery.
Sorry, but why does it? Why can't there be a std::swap(std::any &, std::any &)
that knows that it can simply swap the internals, fast?
> A swap operation would require 3 of these moves. That's really slow.
>
> However, if the `any` could, at swap time, detect that its contents were
> TriviallySwappable, it could perform the swap with 3 memcpy operations.
std::any would know that its contents are trivially swappable without the need
for the concept. It knows that by design.
On Friday, 29 June 2018 18:04:50 PDT Nicol Bolas wrote:
> On Friday, June 29, 2018 at 6:08:10 PM UTC-4, Thiago Macieira wrote:
> > On Friday, 29 June 2018 00:19:40 PDT Mingxin Wang wrote:
> > > Informally, a type meets the *TriviallySwappable* requirements if the
> > > "std::swap" function overload of this type performs bitwise swap
> >
> > operation.
> >
> > Why do you need the concept? Why can't you just use std::swap for your
> > use-
> > case?
>
> He didn't really explain the problem very well. It's a performance issue,
> not a functionality issue.
Ok, so XY issue. He has a problem X (performance), he thinks he can solve it
with Y (concept) and asked about Y.
> Say you have a type-erased type like `any`. It's storing some type-erased
> value, and it's using small buffer optimization. If you move an `any`, and
> the stored object fits within the small buffer (like a `unique_ptr<T>`),
> then you can only move it by invoking `unique_ptr<T>`'s move constructor.
> This requires an indirect call through the type-erasure machinery.
Sorry, but why does it? Why can't there be a std::swap(std::any &, std::any &)
that knows that it can simply swap the internals, fast?
> A swap operation would require 3 of these moves. That's really slow.
>
> However, if the `any` could, at swap time, detect that its contents were
> TriviallySwappable, it could perform the swap with 3 memcpy operations.
std::any would know that its contents are trivially swappable without the need
for the concept. It knows that by design.
On Saturday, 30 June 2018 09:45:49 PDT Thiago Macieira wrote:
> That's why I am saying this is QoI: the implementation can design it in such
> a way. Or they can make different trade-offs, by saying that it will avoid
> memory allocation but requiring then that there be a runtime dispatch in
> order to do moves and swaps.
>
> Can we have the cake and eat it too? I like that idea.
Actually, is this worth it? Maybe with a different example, other than
std::any. I'd like to see such a concrete example.
std::any can only be trivially swapped with another std::any if *both* are
trivially swappable.
> std::any would know that its contents are trivially swappable without the need
for the concept. It knows that by design
If the type inside the any has a pointer to itself, you can't swap the bytes.
On Saturday, 30 June 2018 06:26:01 PDT Nicol Bolas wrote:
> > std::any would know that its contents are trivially swappable without the
> > need
> > for the concept. It knows that by design.
>
> It can't know that because "trivially swappable" isn't a thing. With no
> standard-defined mechanism to key into, the only thing it could do is pick
> a number of standard-defined types and declare that they fit the bill.
My point is that std::any can be designed in such a way that it knows by
construction that it is always trivially swappable. A trivial implementation
contains a pointer to a base class of polymorphic type that is derived for
each hosted type. A pointer is trivially swappable, therefore std::any's
swap() function is trivially swappable.
That's why I am saying this is QoI: the implementation can design it in such a
way. Or they can make different trade-offs, by saying that it will avoid
memory allocation but requiring then that there be a runtime dispatch in order
to do moves and swaps.
Can we have the cake and eat it too? I like that idea.
On Saturday, 30 June 2018 10:31:39 PDT Nicol Bolas wrote:
> > Can we have the cake and eat it too? I like that idea.
>
> How is your suggestion having cake and eating it? With your way, an
> implementation must pick one or the other. With the suggested solution, you
> get both.
It isn't. I meant to say that I am convinced there can be a better situation.
#include <cstdio>
#include <utility>#include <type_traits>
/** * The definition of is_trivially_relocatable * Let's assume that any move constructible type is trivially relocatable by * default. */template <class T>struct is_trivially_relocatable : std::is_move_constructible<T> {};
template <class T>inline constexpr bool is_trivially_relocatable_v = is_trivially_relocatable<T>::value;
/** * The configuration of SBO and corresponding data structure */inline constexpr std::size_t SBO_SIZE = sizeof(void*);
template <class T>inline constexpr bool USE_SBO = is_trivially_relocatable_v<T> && sizeof(T) <= SBO_SIZE && std::alignment_of_v<T> <= std::alignment_of_v<void*>;
union storage_t { void* data_ptr_; mutable char data_[SBO_SIZE];};
/** * A basic implementation for the dispatch specifically for Callable types * Note that there is NO runtime overhead in checking whether a type is suitable * for SBO. */template <class R, class... Args>struct vtable_t { public: template <class T> constexpr explicit vtable_t(std::in_place_type_t<T>) : fun_(fun_impl<T>), destroy_(destroy_impl<T>) {}
R(*fun_)(storage_t&, Args&&...); void(*destroy_)(storage_t&);
private: template <class T> static R fun_impl(storage_t& s, Args&&... args) { if constexpr (USE_SBO<T>) { return (*reinterpret_cast<T*>(s.data_))(std::forward<Args>(args)...); } else { return (*static_cast<T*>(s.data_ptr_))(std::forward<Args>(args)...); } }
template <class T> static void destroy_impl(storage_t& s) { if constexpr (USE_SBO<T>) { reinterpret_cast<T*>(s.data_)->~T(); } else { delete static_cast<T*>(s.data_ptr_); } }};
/** * Declare the value of the vtables */template <class T, class R, class... Args>inline constexpr vtable_t<R, Args...> VTABLE{std::in_place_type<T>};
/** * A basic functionality defined by `std::function` */template <class T>class my_function;
template <class R, class... Args>class my_function<R(Args...)> { public: /** * An empty state of the vtable is not designed to keep the code simple */ my_function() : vtable_(nullptr) {}
template <class T> my_function(T&& data) : my_function(std::in_place_type<std::decay_t<T>>, std::forward<T>(data)) {}
template <class T, class... _Args> explicit my_function(std::in_place_type_t<T>, _Args&&... args) { if constexpr (USE_SBO<T>) { new (reinterpret_cast<T*>(storage_.data_)) T(std::forward<_Args>(args)...); } else { storage_.data_ptr_ = new T(std::forward<_Args>(args)...); } vtable_ = &VTABLE<T, R, Args...>; }
/** * This is what I want: No dispatch when moving or swapping * The code is correct because if the type is NOT trivially relocatable, it * will be stored on the heap, and its pointer is simply trivially relocatable. */ my_function(my_function&& rhs) { storage_ = rhs.storage_; vtable_ = rhs.vtable_; rhs.vtable_ = nullptr; }
/** * In order to keep the code simple, the copy constructor was not implemented */ my_function(const my_function&) = delete;
~my_function() { if (vtable_ != nullptr) { vtable_->destroy_(storage_); } }
R operator()(Args... args) { vtable_->fun_(storage_, std::forward<Args>(args)...); }
/** * Performing bit-wise swap operation is OK! */ void swap(my_function& rhs) { std::swap(storage_, rhs.storage_); std::swap(vtable_, rhs.vtable_); }
private: storage_t storage_; const vtable_t<R, Args...>* vtable_;};
int main() { my_function<void()> f([] { puts("lalala"); }); f(); auto g = std::move(f); // Nice! g(); f.swap(g); // Nice! f(); return 0;}
After reading the code, I find the implementation was not exactly consistent with what I imagine. I did not mean that the implementation should not be compatible with non-trivially-relocatable types, but should optimize for trivially-relocatable types. In order to clarify the motivation, I made an illustrative example similar to `std::function`, which is also the equivalent code expected to be generated with `static_proxy<Callable<void>>` defined in the PFA (p0957).
#include <cstdio>#include <utility>#include <type_traits>/*** The definition of is_trivially_relocatable* Let's assume that any move constructible type is trivially relocatable by* default.*/template <class T>struct is_trivially_relocatable : std::is_move_constructible<T> {};
template <class T>inline constexpr bool USE_SBO =is_trivially_relocatable_v<T> &&sizeof(T) <= SBO_SIZE &&std::alignment_of_v<T> <= std::alignment_of_v<void*>;
template <class T>static void destroy_impl(storage_t& s) {if constexpr (USE_SBO<T>) {reinterpret_cast<T*>(s.data_)->~T();} else {delete static_cast<T*>(s.data_ptr_);}}};