vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below.
I can vouch that it makes a difference. For frequent enough insertions, the cost comes not from mispredicted branches but the integer instructions to fetch and compare the capacity pointer.
The workaround is to not use push_back, but IMHO it should be the right tool for the job.
vector (size_type n, const value_type& val,
const allocator_type& alloc = allocator_type());
Into:
template<typename Generator>
vector (size_type n, Generator gen,
const allocator_type& alloc = allocator_type());
If we then had a way to write the algorithm as a coroutine, the vector would determine its capacity only once and let the filling flow.
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.
In principle it's a valid problem, such checks can adversely affect inner loops, but the solution is way too specialized and obtuse.How about a means to pass hints to the allocator, maybe a dispatch tag indicating to forward arguments from push_*, insert_*, emplace_*, and constructors to allocator::allocate? This would in general enable dumb allocators where the client can specify where allocations go, and when they are allowed. It would be a powerful interface.
Shouldn't this be a task of the compiler? (To inline push_back and optimize away the 'if (size == capacity)' check if it knows that 'size < capacity' because of a previous .reserve().)
You should refactor your loop into a range, and construct the vector directly from that. This would resolve the issue.
Well, if I don't want to resize the vector (and construct objects), I can't see what work-around would
allow avoiding push_back. A vector of optionals would allow the resize without constructing
the underlying objects, and then I could just use operator[].
Ok, point taken about the v[i] case, but the whole insertion in the
grow() case could be passed of to another method (grow(parameter)) since
that is a problem only when a reallocation happens so there still is no
reason this would prevent inlining of the non-growing case.
/MF
void push_back(value_type&& _Val)
{ // insert by moving into element at end
if (_Inside(_STD addressof(_Val)))
{ // push back an element
size_type _Idx = _STD addressof(_Val) - this->_Myfirst;
if (this->_Mylast == this->_Myend)
_Reserve(1);
_Orphan_range(this->_Mylast, this->_Mylast);
this->_Getal().construct(this->_Mylast,
_STD forward<value_type>(this->_Myfirst[_Idx]));
++this->_Mylast;
}
else
{ // push back a non-element
if (this->_Mylast == this->_Myend)
_Reserve(1);
_Orphan_range(this->_Mylast, this->_Mylast);
this->_Getal().construct(this->_Mylast,
_STD forward<value_type>(_Val));
++this->_Mylast;
}
}
Assuming that the Widgets in your original example are costly to construct and overwrite,
the optional<Widget>s are not costly to construct.
I think a elegant solution for this (and a lot of other similar)
problem would be
something like this:
void foo(std::vector<char>& v)
{
assume(v.reserve() > v.size());
v.push_back('a'); // the compiler would do dead-code elimination and
then inline
}
It would unfortunately be much easier because ranges are so irritating to construct. This, however, is immaterial to the fact that the other solutions are a disgusting hack which is only an approximation to the real solution.
Why is that check any cheaper than the check for push_back to see ifthe vector should grow?
That is a quality of implementation issue but I would measure ratherthan guess regarding optimizations of standard library functions sinceoptimizers can be surprisingly good.
If one looks at GCC instead the implementation is
If we're going to further increase the performance of vector, the next most effective step to take is to teach allocators how to expand existing allocations in place (realloc without the possibility of an address change), and teach containers how to use such allocators. Doing this can produce as much as a 2X performance improvement compared to an un-reserved push_back loop. However even this at-best-2X improvement was not sufficiently enticing for the committee in the past. I have no reason to think it will be in the future.
Howard
Howard
--
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.
/MF
>Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default construction.If your allocator::construct does nothing, then the memory will be uninitialized. Every compiler on earth will inline a completely empty function. std::allocator's default behavior is to zero initialize the memory. If you want to use uninitialized behavior, just change out the allocator to do that.
>The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.If you allow this, how does vector know the number of elements to destroy (call destructors on) in vector's destructor? Do you want to make it undefined to allow the vector to be destroyed with uninitialized data?
How many cases are actually out there that need to avoid zero-initialization of the memory that aren't met by unique_ptr<T[]>?
We could consider a totally different way to handle this, which also has other uses. Admittedly, this can easily crash if misused (just as index out of range or too many push_back_ calls). I will call this tentatively uninitialized_resize(size_t sz). It just moves the end pointer and reallocates as required, but leaves the new elements uninitialized.
Fwiw, I decided it was time to see some real code and real numbers so we know what we're dealing with.
1 - generate_Widget
2 - reserve, push_back
3 - reserve, emplace_back
4 - reserve, push_back_
What I ended up with was that
* 1 and 4 resulted in the same code
Looking at the generated assembler it seems that the real killer is that
the ptr_end and ptr_capacity values are updated (written to memory) on
each lap through the code in case 3 and 4.
Have you profiled and found the capacity check significant in comparison to choosing the container instance to push_back to?Even if significant, are you sure the easily branch predicted capacity check is significant compared to updating size?
Has noone mentioned that the name is abysmal? OK, I'll do it: push_back_ is a terrible, terrible name. I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario. std::vector doesn't need to be ideal for every possible use case.
To use std::vector<POD> as a buffer to store data raw data to be filled by an external function without paying for zero initialization of POD,
--
Has noone mentioned that the name is abysmal? OK, I'll do it: push_back_ is a terrible, terrible name. I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario. std::vector doesn't need to be ideal for every possible use case.
What about "unchecked_push_back()" ?
Ok, well the push_back_ proposal doesn't get rid of the size modifications. Which was my whole point. The extra 1 or 2 instructions for the capacity check are negligible in comparison.
Have you shown that this comparison is at all significant in a real world use case? The size update is the expensive portion, and this proposal doesn't remove the size update.
Instead of an uninitialized_resize() this involves another ugly method, which we could tentatively call just_set_the_end_pointer(). The usage scenario is this:
However, I still think that there are practical uses for this, and as sadly enough not all vectors are equal, if you change the allocator you change the type and the vector is useless in non-template APIs written for regular vectors. While such APIs may be deemed as bad practize now that template oriented programming is prevalent it is very common in real world applications and in some cases necessary, such as when the called method is virtual.
Sorry but I don't understand your response. I'm just giving an alternative to some proposals in this thread like the one proposed by Bengt:and other posts talking about the cost of zero-initialization when dealing with C libraries.
vector.reserve(n);
int actual_count = socket.read(vector.data(), n);
vector.just_set_the_end_pointer(actual_count);
Writing a custom allocator just to avoid the value-initialization overhead is not a trivial task
and you are changing the type of the vector, which is not very convenient to maintain ABI compatibility,
and you miss move semantics with other classes holding std::allocator-based vectors.
I know that std::default_init is not in any proposal, it's was a quick example to show how default initialization for all allocators could be achieved (just like in C++11 variadic construct functions for emplace were added). allocator_traits is the proxy object that dispatches object construction for containers between allocators' "construct" methods and default implementations based on placement new. Currently, with allocator_traits+std::allocator you can obtain these placement calls:
::new (p) T() //value initialization for DefaultInsertable types
::new (p) T(a) //1 arg constructor for EmplaceInsertable types
::new (p) T(a, b) //2 arg, etc.. for EmplaceInsertable types
but there is no equivalent to:
::new (p) T //default initialization for DefaultInsertable types
--
On Monday, September 9, 2013 3:59:35 PM UTC+1, Jonathan Wakely wrote:Has noone mentioned that the name is abysmal? OK, I'll do it: push_back_ is a terrible, terrible name. I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.That trailing unserscore is a common convention.
In any case, the proposal is not about the name, I'm quite flexible to change it. E.g. unchecked_push_back suggested by Ion is perfectly fine with me.I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario. std::vector doesn't need to be ideal for every possible use case.The problem is that no STL container is good enough for this particular use case. It looks like falling back to malloc is the most natural approach here. (While I like the generating iterators methods, it's far from natural - how many programmers would invent it?)
On segunda-feira, 9 de setembro de 2013 11:28:51, Victor....@ncl.ac.uk
wrote:
> That trailing unserscore is a common convention.
Can you point to examples in the Standard Library public API?
vector<Widget> vw;
vector<Widget> others;
//vw.reserve(...); commented out since the extra allocation will be done in one pass below, so adding this won't changes a lot of things.
vw.push_back(others.begin(), others.end()); // efficient because the capacity checks will be done only once.
[I suspect I am re-inventing the wheel, but a quick search of this forum didn't yield any similar proposal.]vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below. Hence, the proposal is to add vector::push_back_() which skips this check. Calling it for a full vector results in undefined behaviour (consistent with operator[] which does not check that the index is valid). Of course, an assertion checking the capacity should be used in push_back_() to simplify debugging, but the optimised code has no overhead.---------Example: The vector class is often used like this:vector<Widget> vw;vw.reserve(...);for(...){vw.push_back(...); // inefficient due to unnecessary capacity checks; push_back_() should be used instead}The capacity check is unnecessary in this example and leads to loss of performance (code like this is likely to be in a tight internal loop). Furthermore, unnecessary overhead cannot be removed, e.g. replacing this code byvector<Widget> vw(...);for(...){vw[...]=...;}does not solve the problem, as now the default constructor of Widget is called many times, only for the created objects to be immediately overwritten in the loop.---------Of course, the same argument works for emplace_back_(). One might also consider adding back_inserter_ to further exploit the new functionality, e.g.:vector<Widget> vw;vw.reserve(...);transform(...,back_inserter_(vw),...);
--
I'd prefer push_back_unchecked because it allows for better code completion.
--
[quote]Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate![/quote]At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.Note again that we have to distinguish the behavior of not checking capacity (which was the original proposal) from not updating size. If you are proposing the capacity check being removed, I doubt that was a significant part of that 40%. If you are removing the size check, then using vector without causing undefined behavior is pretty much impossible.
If you want uninitialized semantics, then the container should not be called vector.
You want something else. That is a specialized use case for which the existing containers are not a good fit.
7. Custom allocators suggested by several posters. The idea is, roughly, to pass a custom allocator to a vector, such that when the vector's constructor taking the size is called, the memory would be left uninitialised. Then placement-new can be used to create Widgets.
On Sep 12, 2013, at 1:48 PM, Nevin Liber <ne...@eviloverlord.com> wrote:
> Howard's benchmark minimized the cost of (2).
Thank you Nevin, I was beginning to wonder if anyone recalled that.
[quote]Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate![/quote]At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.
Note again that we have to distinguish the behavior of not checking capacity (which was the original proposal) from not updating size. If you are proposing the capacity check being removed, I doubt that was a significant part of that 40%. If you are removing the size check, then using vector without causing undefined behavior is pretty much impossible.
If you want uninitialized semantics, then the container should not be called vector. You want something else. That is a specialized use case for which the existing containers are not a good fit.
Oh, and (7) is about adding an allocator which does default initialization instead of a value initialization on its object when passed no constructor parameters. There is nothing unsafe about doing so, and I would be strongly in favor of adding such an allocator.
I think it is time to reign in the excitement just a notch or two. Here is a very slightly modified test that puts a std::string in Widget instead of an int:
[...]
explicit Widget(int data) : data_(data, '\0') {}
Under the best of circumstances, with the cheapest of Widgets, you might see up to a 40% gain. However it would not be unexpected to see no gain at all, even with just a moderately expensive Widget (e.g. contains a single std::string that gets constructed beyond the short string buffer).
Here I should probably mention that I did some inspection of the
generated assembly and found that if one could teach the optimizer to
only store the end_ptr when it needs to expand the storage or when the
loop ends and not on each lap then the gain from unchecked_push_back is
pretty low.
I don't think anyone suggested that, although I could be wrong.
My suggestion about custom allocators was to extend the library to allow allocator (and sequence storage) propagation in cases where the allocator is of different, but convertible type. This is a very general improvement, allowing any kind of change to a container's "memory contract." For the problem at hand, the implementation should also ensure that if allocator::allocate cannot allocate anything (according to some criteria), that no capacity check occurs.
You would reserve space inside the vector, then move it to a non-allocating vector of different specialization, then everything else would be as usual except incurring another allocation would be UB.
I think those who don't want to add std::vector::buffer_overrun() recalled that your test "emphasizes any performance gains as much as possible" and those who do want to add std::vector::buffer_overrun() recalled only the performance gains :-)
Please add it (or potential_buffer_overrun(), if it makes the OP feel better) to the list if the OP intends to go through with this proposal.
On Thursday, September 12, 2013 6:21:21 PM UTC+1, Billy O'Neal wrote:[quote]Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate![/quote]At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.As I wrote, the so called "vector's safety guarantees" are a bad joke if you have data(), operator[] and iterators that can become invalid. My argument was that unchecked_push_back is quite in line with those "safety guarantees", maybe even on the safer side of them.
Is bit difference between making some pointer and iterators invalid and permanently breaking vector. every exception during `unchecked_push_back` will break program.
2. My other issue which is for filling vectors which initially act as buffers for some legacy function etc.
--
I can't see that a special allocator solves the problem, as vectors with different allocators are different types.
A problem with the assignable allocator solution
template<typename T>
struct DefaultInitAllocator {
template<typename U>
void construct(U* p)
{ ::new (static_cast<void*>(p)) U; }
template<typename U, typename... Args>
void construct(U* p, Args&&... args)
{ ::new (static_cast<void*>(p)) U(std::forward<Args>(args)...); }
// ... rest of the allocator interface
};
is that it uses allocators to differentiate something that is completely different from setting a policy on from where memory is allocated, which is what allocators are designed for.
Even with the invention of some sort of assignable allocator suggested earlier in this thread and provisions in all classes that use allocators so that you can move and swap the vector with the special allocator with a vector with the standard allocator it results in rather convoluted code compared to an extra ctor parameter or special resize method.
I can't see that a special allocator solves the problem, as vectors with different allocators are different types. Even with the invention of some sort of assignable allocator suggested earlier in this thread and provisions in all classes that use allocators so that you can move and swap the vector with the special allocator with a vector with the standard allocator it results in rather convoluted code compared to an extra ctor parameter or special resize method.
vector< int > q;
q.reserve( 30 );
vector< int, other_allocator > r( std::move( q ) ); // Now r has whatever special semantics.
A problem with the assignable allocator solution is that it uses allocators to differentiate something that is completely different from setting a policy on from where memory is allocated, which is what allocators are designed for. With assignable allocators the writer of the new allocator class is promising that as these allocators can be assigned to each other they are equivalent when it comes to doing their work, allocation/deallocation, and the only difference is that they do something else, i.e. construction, in different ways. To me this is not a very clean design.
Alternatively an allocator could have a bool member which is inpected in each construct to see if value or default construction is required.
Maybe easier but now construct() performance is reduced which does not sit well with this performance directed improvement, and it is still not the default allocator.
vector< int > q;
q.reserve( 30 );
vector< int, other_allocator > r( std::move( q ) ); // Now r has whatever special semantics.
Is this so convoluted?
--
Of course, presumably you would want to use resize rather than reserve.
--
In C++03 resize was only one function with a default parameter and
used effectively value-initialization. The effect of the
default-allocator of std::vector has the same effect.