| From: Tristan Brindle Sent: Monday, January 4, 2016 2:41 AM To: ISO C++ Standard - Future Proposals Reply To: std-pr...@isocpp.org Subject: [std-proposals] Views on a small_vector<T, N>? |
| From: Tony V E Sent: Monday, January 4, 2016 9:11 AM To: Tristan Brindle; ISO C++ Standard - Future Proposals Subject: Re: [std-proposals] Views on a small_vector<T, N>? |
Would there be any interest in proposing such a container for standardisation?
Also, for nonthrowing or trivial T, could a small buffer optimization inside std::vector<T> be conforming? (as a constant such as 16 elements is still constant time?).
I am not yet convinced that such a special-purpose container belongsin the main standard, the concern being how many more special-purposecontainers would then follow the precedent. I think that is a worthwhilequestion to pursue though, either as part of this as a proposal, or as aseparate follow-up.
Details for any eventual proposal:
1) Make sure that `small_vector` can be moved into/outof regular `vector`. Obviously, they must use compatible allocator types. Also obviously, if the incoming `vector` is smaller than the small size, then no memory allocation takes place and the old `vector` doesn't necessarily lose its memory. It would then look like a copy operation, and it would only be noexcept if the copy for `T` were noexcept.
2) Since the number of "small" elements is stated up-front, there should be a function to ask "are you heap allocated yet?", rather than asking for the capacity and comparing it to the small element count.
On 5 January 2016 at 09:36, Nicol Bolas <jmck...@gmail.com> wrote:Details for any eventual proposal:
1) Make sure that `small_vector` can be moved into/outof regular `vector`. Obviously, they must use compatible allocator types. Also obviously, if the incoming `vector` is smaller than the small size, then no memory allocation takes place and the old `vector` doesn't necessarily lose its memory. It would then look like a copy operation, and it would only be noexcept if the copy for `T` were noexcept.What is the motivation? This seems like a lot of complexity for an incredibly minor use case.
If an API takes a vector, and I use a small_vector (or vice-versa), I want to be able to transfer that information with the fastest performance possible. After all, `small_vector` exists for performance reasons. We wouldn't want to do things to make code slower.
On Wednesday, 6 January 2016 04:59:39 UTC+13, Nicol Bolas wrote:
On Tuesday, January 5, 2016 at 10:45:16 AM UTC-5, Nevin ":-)" Liber wrote:On 5 January 2016 at 09:36, Nicol Bolas <jmck...@gmail.com> wrote:Details for any eventual proposal:
1) Make sure that `small_vector` can be moved into/outof regular `vector`. Obviously, they must use compatible allocator types. Also obviously, if the incoming `vector` is smaller than the small size, then no memory allocation takes place and the old `vector` doesn't necessarily lose its memory. It would then look like a copy operation, and it would only be noexcept if the copy for `T` were noexcept.What is the motivation? This seems like a lot of complexity for an incredibly minor use case.
If an API takes a vector, and I use a small_vector (or vice-versa), I want to be able to transfer that information with the fastest performance possible. After all, `small_vector` exists for performance reasons. We wouldn't want to do things to make code slower.
This should also extend to different `small_vector` types that have different small element counts.Let's view a `vector<T>` as being the special case `small_vector<T, 0>`, i.e. zero elements preallocated*. Then imagine we had a move operation
I disagree. I'd love to use this to replace vector, at a minimum because it doesn't have a bool specialization.
> On Jan 6, 2016, at 6:48 AM, Andrew Tomazos <andrew...@gmail.com> wrote:
>
> Not sure if this is a good or bad idea quite yet - but may be worth exploring.
Adding a third parameter at the end is an ABI breakage and hinders usability, as most people rarely have to specify allocators.
Adding it as the second parameter breaks source for anyone using allocators or passing vectors generically.
The only benefit is reusing the name. Doesn't seem worth it to me.
Other than these trivially obvious concerns, what else makes this worth exploring?
--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> +1-312-623-5420
On Wed, Jan 6, 2016 at 4:12 PM, Nevin ":-)" Liber <nli...@gmail.com> wrote:
> On Jan 6, 2016, at 6:48 AM, Andrew Tomazos <andrew...@gmail.com> wrote:
>
> Not sure if this is a good or bad idea quite yet - but may be worth exploring.
Adding a third parameter at the end is an ABI breakage and hinders usability, as most people rarely have to specify allocators.
Adding it as the second parameter breaks source for anyone using allocators or passing vectors generically.
The only benefit is reusing the name. Doesn't seem worth it to me.You are underestimating the advantages I think. It's not just "reusing the name", its reusing the entire template. It's conceptually simpler, easier to specify, easier to implement, its a single class template definition (the branch if (num_preallocated_elements > 0) would be optimized out in the zero case). The interfaces and implementations of operations that dispatch on two std::vectors of different num_preallocated_elements (comparisons, conversions, etc, etc) would be cleaner.I don't however have a good answer currently for your ABI and source compatibility concerns.
class T,
size_t NumPreallocatedElements = 0,
class Allocator = std::allocator<T>
Hmmm...
--
Tristan
Looking at it this way, `small_vector` is actually a poor name choice.
Perhaps a better name would be `varray`, or variable array.
Can't you do that by:
template <typename T, typename Allocator>
class vector : public small_vector<T, 0, Allocator>
{
using small_vector;
};
On 6 January 2016 at 12:01, Thiago Macieira <thi...@macieira.org> wrote:Can't you do that by:
template <typename T, typename Allocator>
class vector : public small_vector<T, 0, Allocator>
{
using small_vector;
};You still have to forward non-compiler-generated constructors and assignment operators.Is this something you would want the standard to mandate, or is it just an implementation detail?
template<typename T, typename Allocator>
class small_vector<T, 0, Allocator> : public vector<T, Allocator>
{
using vector;
//Add interfaces specific to small_vector.
}
On 6 January 2016 at 10:46, Tristan Brindle <tcbr...@gmail.com> wrote:
Looking at it this way, `small_vector` is actually a poor name choice.I'd much rather we not use the word "vector" in the name for the initial discussion, as it limits thinking on design choices. No one says it has to have exactly the same interface (no more, no less) as vector.
--For instance, it might make sense to have a bool emplace_back_if_space(...) member that returns false if the container is already at its capacity. We may or may not want to add such a thing to std::vector.Perhaps a better name would be `varray`, or variable array.I would say no. In C++, arrays refer to things that have exactly N elements (well, maybe not valarray; I don't know, as I've never had cause to use it).I also think that bike shedding the name at this point is the least of our worries. (Note: while I'm against it, calling it vector is not just a bike shedding issue.)Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> +1-847-691-1404
--
On Wednesday 06 January 2016 17:05:23 Andrew Tomazos wrote:
> You are underestimating the advantages I think. It's not just "reusing the
> name", its reusing the entire template. It's conceptually simpler, easier
> to specify, easier to implement, its a single class template definition
> (the branch if (num_preallocated_elements > 0) would be optimized out in
> the zero case). The interfaces and implementations of operations that
> dispatch on two std::vectors of different num_preallocated_elements
> (comparisons, conversions, etc, etc) would be cleaner.
Can't you do that by:
template <typename T, typename Allocator>
class vector : public small_vector<T, 0, Allocator>
{
using small_vector;
};
On Wed, Jan 6, 2016 at 7:35 PM, Nevin Liber <ne...@eviloverlord.com> wrote:On 6 January 2016 at 10:46, Tristan Brindle <tcbr...@gmail.com> wrote:
Looking at it this way, `small_vector` is actually a poor name choice.I'd much rather we not use the word "vector" in the name for the initial discussion, as it limits thinking on design choices. No one says it has to have exactly the same interface (no more, no less) as vector.The advantage of having exactly the same interface (or even being exactly the same type) is the same as why we didn't standardize two different flavors of, for example, variant. Having multiple different, mostly similar, but slightly different versions of things adds complexity to the language for little gain.
On Wed, Jan 6, 2016 at 7:35 PM, Nevin Liber <ne...@eviloverlord.com> wrote:On 6 January 2016 at 10:46, Tristan Brindle <tcbr...@gmail.com> wrote:
Looking at it this way, `small_vector` is actually a poor name choice.I'd much rather we not use the word "vector" in the name for the initial discussion, as it limits thinking on design choices. No one says it has to have exactly the same interface (no more, no less) as vector.The advantage of having exactly the same interface (or even being exactly the same type) is the same as why we didn't standardize two different flavors of, for example, variant. Having multiple different, mostly similar, but slightly different versions of things adds complexity to the language for little gain.
I'd much rather we not use the word "vector" in the name for the initial discussion, as it limits thinking on design choices. No one says it has to have exactly the same interface (no more, no less) as vector.
Perhaps a better name would be `varray`, or variable array.
I would say no. In C++, arrays refer to things that have exactly N elements (well, maybe not valarray; I don't know, as I've never had cause to use it).
I also think that bike shedding the name at this point is the least of our worries. (Note: while I'm against it, calling it vector is not just a bike shedding issue.)
Option E -- replacing std::vector as the default container! -- might seem at first glance to be unrealistic, perhaps even a good suggestion for an April Fool. At first I dismissed it out of hand, but having given it some more thought I'm warming to the idea. With the arrival of concepts and ranges, and the talk of "STL2", this is a unique opportunity to do something so radical. A vararray, as proposed, is a strict superset of vector. It is never worse and sometimes better. We can easily maintain API compatibility. The hundreds of tutorials and millions of lines of code that use vector wouldn't suddenly be wrong, any more than the millions of lines of code that use pair are wrong now that tuple exists. There is an appetite in the C++ community at the moment to embrace the new; if we are ever going to do something like this, now is the time.
Ranges makes modifications to the iterator paradigm. Since it makes such sweeping changes to iterators and creates new algorithms (and new concepts like ranges and adapters), it brings with it the possibility of "STL2", which is good. But here's the thing: until we actually get all of that stuff finished, we're not in a position to start specifying how containers should function.
I'd love to see a new vector where you can extract the memory from it as a free-standing object, and pass it around elsewhere (along with the allocator, of course). It could be given to a C-style API. Or more to the point, a C-style API could provide such data, which could be used as the initial storage for the vector (again, along with the allocator).
And if you're doing this for vector, you'd have to do it for string. Both what I just suggested and explicit sizing for small-strings.
Some people will want a vector that can have an explicit limit to how big it gets.
Oh and since we're talking about changing the container paradigm, why not throw away allocators and use a better allocation interface? Maybe one that understands the concept of "type erasure". There are going to be API suggestions for the associative containers. There are going to be changes suggested for list and deque. Someone will probably suggest a blanket ban on any throwing-move container implementation for the new containers (I wouldn't say no to that one). And so on.
The discussion you're wanting to start is endless. Everyone and their mother will come out and add their own pet feature.
If you want to start having that discussion, fine. And maybe in preparation of the completion of the Ranges TS (probably version 2), it might be appropriate to start collecting ideas now. But just be prepared for where it leads.
On Thu, Jan 7, 2016 at 3:41 PM, Nicol Bolas <jmck...@gmail.com> wrote:Ranges makes modifications to the iterator paradigm. Since it makes such sweeping changes to iterators and creates new algorithms (and new concepts like ranges and adapters), it brings with it the possibility of "STL2", which is good. But here's the thing: until we actually get all of that stuff finished, we're not in a position to start specifying how containers should function.That's the standard dilemma between waiting for the slow big vague new thing or making fast incremental concrete changes. The best approach is to propose both, and let the committee gate/merge things as they will.
I'd love to see a new vector where you can extract the memory from it as a free-standing object, and pass it around elsewhere (along with the allocator, of course). It could be given to a C-style API. Or more to the point, a C-style API could provide such data, which could be used as the initial storage for the vector (again, along with the allocator).Why can't that be proposed as an extension to vector today? I don't see the relevance. It seems orthogonal to everything discussed, and a compatible extension to either/both vector and vararray.
And if you're doing this for vector, you'd have to do it for string. Both what I just suggested and explicit sizing for small-strings.Is there an existing practice of small_string? Isn't there already a "small string optimization" possible and used in std::string? std::string is mainly targetted at text, whereas vector/vararray are targeted at contiguous homogenous sequences of an arbitrary type. They are intentionally distinct.
Some people will want a vector that can have an explicit limit to how big it gets.This could be a feature of vararray, it just depends how much demand there is. The tradeoff is between additional complexity versus extra functionality. Whether it is worth it depends on how much it would be used.
Oh and since we're talking about changing the container paradigm, why not throw away allocators and use a better allocation interface? Maybe one that understands the concept of "type erasure". There are going to be API suggestions for the associative containers. There are going to be changes suggested for list and deque. Someone will probably suggest a blanket ban on any throwing-move container implementation for the new containers (I wouldn't say no to that one). And so on.This is some sort of slippery slope fallacy. Noone is going to reject vararray because they think we should wait for an allocators v2. Very few people use anything but the default allocators anyway - virtually noone cares.
The discussion you're wanting to start is endless. Everyone and their mother will come out and add their own pet feature.I don't agree. A proposal of vararray should not be disccouraged by this thought. All proposals get pulled in all directions. It doesn't mean it won't be successful.
> On 8/01/2016, at 3:41 AM, Nicol Bolas <jmck...@gmail.com> wrote:
> Also, if you're going to start saying "hey, new container paradigm!", then there are going to be a lot of people who want in on that conversation.
Personally, I absolutely don’t want a new container *paradigm*. What I want is a new container that’s identical to vector, except that it can optionally use some storage in the object itself to avoid the allocator in certain situations. That was what I suggested in the first place.
But consequences of that are that either
* You need to forbid people from creating small_vector/vararray with zero elements stack-reserved, or
* You need to make vararray<T, 0> and vector<T> the same type, which implies a specialisation for vararray<bool, 0>, or
* If you do neither of these, you need to accept that people will start using vararray<T, 0> as a “better vector”, simply because they can use it generically without worrying about the bool case.
I don't have an answer to that. My abject hatred for the `vector<bool>` specialization (and my continued annoyance with the fact that nobody's put forth any effort to fix it)
On Thursday, January 7, 2016 at 9:27:30 PM UTC-5, Tristan Brindle wrote:
> On 8/01/2016, at 3:41 AM, Nicol Bolas <jmck...@gmail.com> wrote:
> Also, if you're going to start saying "hey, new container paradigm!", then there are going to be a lot of people who want in on that conversation.
Personally, I absolutely don’t want a new container *paradigm*. What I want is a new container that’s identical to vector, except that it can optionally use some storage in the object itself to avoid the allocator in certain situations. That was what I suggested in the first place.
But consequences of that are that either
* You need to forbid people from creating small_vector/vararray with zero elements stack-reserved, or
* You need to make vararray<T, 0> and vector<T> the same type, which implies a specialisation for vararray<bool, 0>, or* If you do neither of these, you need to accept that people will start using vararray<T, 0> as a “better vector”, simply because they can use it generically without worrying about the bool case.
On the one hand, the conceptual differences between `small_vector<T, 0>` and `small_vector<T, 1>` are minimal. Which means if you forbid the 0 case, or if you make the 0 case an alias of `vector` with its bool specialization intact, people will just use `small_vector<T, 1>` to work around it.
So `small_vector` is going to compete with `vector`, no matter what you do.
However, there are substantive differences between `small_vector<T, 0>` and `small_vector<T, 1>` Move behavior/iterator invalidation is a good example. `small_vector<T, 0>` would function in every way exactly like `vector`, while `small_vector<T, 1>` would function like a `small_vector<T, 20>`. Or more to the point, like `string`. No matter how small the size is, as long as it isn't zero, you can't guarantee the same move behavior.
Given these facts, we need to ask ourselves these questions:
1) Do we want to create a type that is intended to be special case, but will compete with a fundamental library type beyond that special case, just because of an idiotic decision made 20+ years ago?
2) If we do want that, do we want to encourage such competition by allowing the new type to completely subsume all uses of the old?
After all, people are only supposed to use `small_vector` to be an optimization. But the moment people brought up the fact that `small_vector` wouldn't have a `bool` specialization, I started mentally doing a find/replace for `vector` to `small_vector` in my code.
The correct answer to question 2 seems, to me, to be "yes". If there's going to be competition, then the new type ought to win. It does nobody any good to see template code using `small_vector<T, 1>` just to avoid the `T = bool` case for `vector`. So `small_vector<T, 0>` should be in every way equivalent to `vector`, and it should be interoperable with `vector` (movement from one to the other). With the exception of `vector<bool>` of course; `small_vector` shouldn't have that. If we're going to allow `small_vector` to compete with `vector` outside of its particular optimization domain, then just replace it.
So the big question is really #1: do we want to add a type, originally proposed solely for a narrow case of optimization, that subsumes an existing type which is widely used (not to mention promoted) in lots of C++?
I don't have an answer to that. My abject hatred for the `vector<bool>` specialization (and my continued annoyance with the fact that nobody's put forth any effort to fix it) makes me want to say "Hell yes!" At the same time, I despise the idea of replacing a perfectly good class over some idiotic experimental decision from 20+ years. I also hate the idea of using a type outside of its intended purpose; `small_vector` is supposed to be for a narrow optimization.
Plus, if there's going to be an eventual STL2 that includes containers, we don't want to have to create a third dynamic array type. That's just too confusing.
To me, the only viable solutions are:
1) Design `small_vector` so that it completely takes over `vector` in all situations (except for people who like `vector<bool>`.
2) We don't have `small_vector` at all.
On Fri, Jan 8, 2016 at 10:18 AM, Nicol Bolas <jmck...@gmail.com> wrote:On Thursday, January 7, 2016 at 9:27:30 PM UTC-5, Tristan Brindle wrote:
> On 8/01/2016, at 3:41 AM, Nicol Bolas <jmck...@gmail.com> wrote:
> Also, if you're going to start saying "hey, new container paradigm!", then there are going to be a lot of people who want in on that conversation.
Personally, I absolutely don’t want a new container *paradigm*. What I want is a new container that’s identical to vector, except that it can optionally use some storage in the object itself to avoid the allocator in certain situations. That was what I suggested in the first place.
But consequences of that are that either
* You need to forbid people from creating small_vector/vararray with zero elements stack-reserved, or
* You need to make vararray<T, 0> and vector<T> the same type, which implies a specialisation for vararray<bool, 0>, or* If you do neither of these, you need to accept that people will start using vararray<T, 0> as a “better vector”, simply because they can use it generically without worrying about the bool case.
On the one hand, the conceptual differences between `small_vector<T, 0>` and `small_vector<T, 1>` are minimal. Which means if you forbid the 0 case, or if you make the 0 case an alias of `vector` with its bool specialization intact, people will just use `small_vector<T, 1>` to work around it.
So `small_vector` is going to compete with `vector`, no matter what you do.
However, there are substantive differences between `small_vector<T, 0>` and `small_vector<T, 1>` Move behavior/iterator invalidation is a good example. `small_vector<T, 0>` would function in every way exactly like `vector`, while `small_vector<T, 1>` would function like a `small_vector<T, 20>`. Or more to the point, like `string`. No matter how small the size is, as long as it isn't zero, you can't guarantee the same move behavior.small_vector<T,0> only behaves like vector<T> if we specify that behaviour.
ie if we specify that iterators are not invalidated, etc.The typical implementation may allow for that, but I don't think we should specify it.
ie let's not specialize the specification for the 0 case. Allow small_vector<T,0>, but don't specify its behaviour (nor T's requirements) differently that small_vector<T,1>.
small_vector<int, 4> v(12, 2220);
small_vector<int, 4> v2(std::move(v));void func(small_vector<int, 4> &&v)
{
if(v.size() > v.buffer_size())
v.reserve(v.buffer_size() * 1.5);
small_vector<int, 4> v2(std::move(v));
}
Let's design small vector targeting usage like small_vector<T,20>, not to replace vector. Let small_vector<T,0> fall out naturally, without special casing it.
The correct answer to question 2 seems, to me, to be "yes". If there's going to be competition, then the new type ought to win. It does nobody any good to see template code using `small_vector<T, 1>` just to avoid the `T = bool` case for `vector`. So `small_vector<T, 0>` should be in every way equivalent to `vector`, and it should be interoperable with `vector` (movement from one to the other). With the exception of `vector<bool>` of course; `small_vector` shouldn't have that. If we're going to allow `small_vector` to compete with `vector` outside of its particular optimization domain, then just replace it.small_vector<T,1> has different properties than vector<T>. It is _mostly_ the same, but one handles move-only T (or unmovable even) better than the other, etc.
Plus, if there's going to be an eventual STL2 that includes containers, we don't want to have to create a third dynamic array type. That's just too confusing.
To me, the only viable solutions are:
1) Design `small_vector` so that it completely takes over `vector` in all situations (except for people who like `vector<bool>`.
2) We don't have `small_vector` at all.3) design small_vector to be a sbo_vector. Ignore whether it replaces vector.
So it is OK if it complete takes over `vector` by accident?
On 8 January 2016 at 16:32, Nicol Bolas <jmck...@gmail.com> wrote:So it is OK if it complete takes over `vector` by accident?
Sure. If a better type wins, so be it.First you complain that the committee never fixes the vector<bool> wart.This will fix that wart. All we have to do afterwards is deprecate std::vector except for the bool specialization. (Some might say that was my secret plan all along :-))For some reason, you no longer want it fixed.
Tongue and cheek aside, boost::container::small_vector allows 0. I suspect (but haven't looked) that the other existing ones do also. The whole point of standardizing this is because people keep inventing this type; if the committee goes and creates a type that is less useful than the ones in existing practice, we will have failed and wasted both committee time and implementor time.
I consider not allowing 0 to be as bad a wart as specializing vector for bool.
On 2016–01–09, at 10:48 PM, Nicol Bolas <jmck...@gmail.com> wrote:I didn't say that. I'm saying that, if `small_vector` progresses forward as it is, it will replace `vector`. And that fact needs to be recognized and discussed for its implications.
Also, once a small vector allocates, then gets small again, does it free the allocation and go back to local memory? I would guess not. So checking the number of elements doesn't tell you if iterators will be maintained. I guess you need to check capacity or we offer an explicit function?
small_vector<int, 6> v;
v.reserve(12);I imagine code trying to work with maybe-valid-if-this-or-that terators is going to be convoluted and error-prone.
On 8 January 2016 at 22:32, Nicol Bolas <jmck...@gmail.com> wrote:
> On Friday, January 8, 2016 at 11:06:46 AM UTC-5, Tony V E wrote:
>
> So what you're saying is that, if I have a `small_vector<T, 4>` which
> contains 12 elements, and I move it to another `small_vector<T, 4>`, you're
> telling me that the specification of `small_vector` should not allow me to
> keep the iterators?
>
Here is one reason to forbid keeping the iterators.
In my implementation of small_vector, sizeof(small_vector<char, 3>) ==
sizeof(small_vector<char, 8>), because of padding. Giving the space is
already used, small_vector<char, 3> stores 8 elements before it
allocates from the heap.
On 2016–01–09, at 10:48 PM, Nicol Bolas <jmck...@gmail.com> wrote:I didn't say that. I'm saying that, if `small_vector` progresses forward as it is, it will replace `vector`. And that fact needs to be recognized and discussed for its implications.small_vector is more complex (prone to bloat) and less capable (no iterator transfers, at least not reliably).
Also, just a note, one use-case would be forbidding heap allocation and letting the local capacity be an upper bound on the size.