proposal to add vector::push_back_()

1409 views
Skip to first unread message

Victor....@ncl.ac.uk

unread,
Sep 6, 2013, 6:05:03 PM9/6/13
to std-pr...@isocpp.org
[I suspect I am re-inventing the wheel, but a quick search of this forum didn't yield any similar proposal.]
 
vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below. Hence, the proposal is to add vector::push_back_() which skips this check. Calling it for a full vector results in undefined behaviour (consistent with operator[] which does not check that the index is valid). Of course, an assertion checking the capacity should be used in push_back_() to simplify debugging, but the optimised code has no overhead.
 
---------
Example: The vector class is often used like this:
 
vector<Widget> vw;
vw.reserve(...);
for(...){
    vw.push_back(...); // inefficient due to unnecessary capacity checks; push_back_() should be used instead
}
 
The capacity check is unnecessary in this example and leads to loss of performance (code like this is likely to be in a tight internal loop). Furthermore, unnecessary overhead cannot be removed, e.g. replacing this code by
 
vector<Widget> vw(...);
for(...){
    vw[...]=...;
}
 
does not solve the problem, as now the default constructor of Widget is called many times, only for the created objects to be immediately overwritten in the loop.
---------
 
Of course, the same argument works for emplace_back_(). One might also consider adding back_inserter_ to further exploit the new functionality, e.g.:
 
vector<Widget> vw;
vw.reserve(...);
transform(...,back_inserter_(vw),...);
 
 

David Krauss

unread,
Sep 6, 2013, 10:34:31 PM9/6/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
In principle it's a valid problem, such checks can adversely affect inner loops, but the solution is way too specialized and obtuse.

How about a means to pass hints to the allocator, maybe a dispatch tag indicating to forward arguments from push_*, insert_*, emplace_*, and constructors to allocator::allocate? This would in general enable dumb allocators where the client can specify where allocations go, and when they are allowed. It would be a powerful interface.

the.ultim...@gmail.com

unread,
Sep 7, 2013, 4:59:45 AM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Hello Victor,

 
vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below.

I don't deny that there is a textual check at every push_back(). Do you have measurements on the acutal impact on execution? My theory is, if you reserved enough memory, the check should pass everytime and branch prediction would make your insertions throttle full speed.

Laurent

David Krauss

unread,
Sep 7, 2013, 7:10:25 AM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

I can vouch that it makes a difference. For frequent enough insertions, the cost comes not from mispredicted branches but the integer instructions to fetch and compare the capacity pointer.

The workaround is to not use push_back, but IMHO it should be the right tool for the job.

Ville Voutilainen

unread,
Sep 7, 2013, 7:57:45 AM9/7/13
to std-pr...@isocpp.org
Well, if I don't want to resize the vector (and construct objects), I can't see what work-around would
allow avoiding push_back. A vector of optionals would allow the resize without constructing
the underlying objects, and then I could just use operator[].

the.ultim...@gmail.com

unread,
Sep 7, 2013, 8:31:22 AM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

I can vouch that it makes a difference. For frequent enough insertions, the cost comes not from mispredicted branches but the integer instructions to fetch and compare the capacity pointer.

The workaround is to not use push_back, but IMHO it should be the right tool for the job.

And we're back at the von Neumann bottleneck. Wouldn't a cleaner way to work around this be to leave the control flow of the filling to the vector itself?

My thought on this isn't complete yet, but now that we can write function objects more easily, wouldn't the right approach be to generalize the following "fill" constructor:

vector (size_type n, const value_type& val,
 
const allocator_type& alloc = allocator_type());

Into
:

template<typename Generator>
vector
(size_type n, Generator gen,
 
const allocator_type& alloc = allocator_type());

If we then had a way to write the algorithm as a coroutine, the vector would determine its capacity only once and let the filling flow.

Maurice Bos

unread,
Sep 7, 2013, 8:40:08 AM9/7/13
to std-pr...@isocpp.org
Shouldn't this be a task of the compiler? (To inline push_back and optimize away the 'if (size == capacity)' check if it knows that 'size < capacity' because of a previous .reserve().)

--
 
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

DeadMG

unread,
Sep 7, 2013, 9:45:35 AM9/7/13
to std-pr...@isocpp.org
You should refactor your loop into a range, and construct the vector directly from that. This would resolve the issue.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:12:24 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Saturday, September 7, 2013 3:34:31 AM UTC+1, David Krauss wrote:
In principle it's a valid problem, such checks can adversely affect inner loops, but the solution is way too specialized and obtuse.
 
How about a means to pass hints to the allocator, maybe a dispatch tag indicating to forward arguments from push_*, insert_*, emplace_*, and constructors to allocator::allocate? This would in general enable dumb allocators where the client can specify where allocations go, and when they are allowed. It would be a powerful interface.
 
If we modify vector class anyway, such a solution is more heavy-weight. Also difficult to teach. In fact, I'm not convinced it would actually give any speedup in practice.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:19:03 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Just to add: the implementation of push_back_() would be something like
 
*ptr_end++=parameter;
 
and so almost certainly inlined. In contrast, push_back() is unlikely to get inlined, as it does some complicated stuff, like reallocation and handling the ugly special case like v.push_back(v[i]).
 
 

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:26:21 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
I believe this will not result in any speedup, as the generator will have internal state, and will need extra calls (unless fully inlined). Moreover, it is less convinient to use - there are generate and generate_n functions in <algorithm> which do essentially the same, and in my experience they are rarely used.
 

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:32:48 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 1:40:08 PM UTC+1, Maurice Bos wrote:
Shouldn't this be a task of the compiler? (To inline push_back and optimize away the 'if (size == capacity)' check if it knows that 'size < capacity' because of a previous .reserve().)
 
 
As I said before, push_back is quite complicated, so unlikely to be inlined. Even it it is, the compiler has to be very clever to find this optimisation. In some situations, like inner loop or critical section, the programmer would want to ensure that this optimisation has been applied.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:36:12 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 2:45:35 PM UTC+1, DeadMG wrote:
You should refactor your loop into a range, and construct the vector directly from that. This would resolve the issue.
 
Or, much easier, allocate raw memory using malloc, and work with it.

DeadMG

unread,
Sep 7, 2013, 2:39:14 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
It would unfortunately be much easier because ranges are so irritating to construct. This, however, is immaterial to the fact that the other solutions are a disgusting hack which is only an approximation to the real solution.

Magnus Fromreide

unread,
Sep 7, 2013, 2:58:51 PM9/7/13
to std-pr...@isocpp.org
I would expect the implementetion of push_back to be something along the
line of

inline void vector<T>::push_back(const T& parameter) {
if (ptr_end == ptr_capacity)
grow();
allocator.construct(ptr_end, parameter);
++ptr_end;
}

where grow() grows the vector if needed. Here I don't expect grow() to
be fully inlined but I do not see why the rest couldn't be inlined.

Note that the your push_back_ also needs to do

allocator.construct(ptr_end, parameter);
++ptr_end; // Increment after as construct could throw

since I expect the memory from ptr_end to ptr_capacity to be
uninitialized, so it really isn't that much simpler.

/MF


Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:03:15 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 12:57:45 PM UTC+1, Ville Voutilainen wrote:

Well, if I don't want to resize the vector (and construct objects), I can't see what work-around would
allow avoiding push_back. A vector of optionals would allow the resize without constructing
the underlying objects, and then I could just use operator[].
 
Doesn't solve the problem: still many optionals are constructed, only to be immediately overwritten.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:09:47 PM9/7/13
to std-pr...@isocpp.org
Ok, point taken about allocator.construct.
 
However, your implementation of puch_back has a subtle bug: it has undefined behaviour for calls like v.push_back(v[0]) when v is full. Hence, you need to add some code to handle this special case. This complicates the implementation and makes it less likely to be inlined.

Ville Voutilainen

unread,
Sep 7, 2013, 3:14:17 PM9/7/13
to std-pr...@isocpp.org
Assuming that the Widgets in your original example are costly to construct and overwrite,
the optional<Widget>s are not costly to construct.

Magnus Fromreide

unread,
Sep 7, 2013, 3:21:58 PM9/7/13
to std-pr...@isocpp.org
Ok, point taken about the v[i] case, but the whole insertion in the
grow() case could be passed of to another method (grow(parameter)) since
that is a problem only when a reallocation happens so there still is no
reason this would prevent inlining of the non-growing case.

/MF

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:41:28 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 8:21:58 PM UTC+1, Magnus Fromreide wrote:
Ok, point taken about the v[i] case, but the whole insertion in the
grow() case could be passed of to another method (grow(parameter)) since
that is a problem only when a reallocation happens so there still is no
reason this would prevent inlining of the non-growing case.

/MF

 
Ok, maybe one can achieve inlining for push_bask, in principle at least. Below is an actual implementation from VS2012. I find it difficult to believe it's inlined...

 

void push_back(value_type&& _Val)
 { // insert by moving into element at end
 if (_Inside(_STD addressof(_Val)))
  { // push back an element
  size_type _Idx = _STD addressof(_Val) - this->_Myfirst;
  if (this->_Mylast == this->_Myend)
   _Reserve(1);
  _Orphan_range(this->_Mylast, this->_Mylast);
  this->_Getal().construct(this->_Mylast,
   _STD forward<value_type>(this->_Myfirst[_Idx]));
  ++this->_Mylast;
  }
 else
  { // push back a non-element
  if (this->_Mylast == this->_Myend)
   _Reserve(1);
  _Orphan_range(this->_Mylast, this->_Mylast);
  this->_Getal().construct(this->_Mylast,
   _STD forward<value_type>(_Val));
  ++this->_Mylast;
  }
 }

 

 

 
 

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:50:33 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 8:14:17 PM UTC+1, Ville Voutilainen wrote:

Assuming that the Widgets in your original example are costly to construct and overwrite,
the optional<Widget>s are not costly to construct.
 
Sure - this can work in some circumstances. However, we are talking about saving a couple of processor instructions here, so "costly" should be viewed from this perspective. Also, that extra flag in optional<> can increase the element size, spoil the alignment, and result in something that's actually slower and takes more memory. The bottom line: occasionally it works, but it's not a general solution to the problem.

Felipe Magno de Almeida

unread,
Sep 7, 2013, 3:54:30 PM9/7/13
to std-pr...@isocpp.org
On Fri, Sep 6, 2013 at 7:05 PM, <Victor....@ncl.ac.uk> wrote:
> [I suspect I am re-inventing the wheel, but a quick search of this forum
> didn't yield any similar proposal.]
>
> vector::push_back() is inefficient, as every time it checks whether the
> vector has reached its full capacity and needs to allocate more memory. Such
> a check is often unnecessary, e.g. if reserve() was called, see the example
> below. Hence, the proposal is to add vector::push_back_() which skips this
> check. Calling it for a full vector results in undefined behaviour
> (consistent with operator[] which does not check that the index is valid).
> Of course, an assertion checking the capacity should be used in push_back_()
> to simplify debugging, but the optimised code has no overhead.

I think a elegant solution for this (and a lot of other similar)
problem would be
something like this:

void foo(std::vector<char>& v)
{
assume(v.reserve() > v.size());
v.push_back('a'); // the compiler would do dead-code elimination and
then inline
}

[snip - example]

Regards,
--
Felipe Magno de Almeida

Ville Voutilainen

unread,
Sep 7, 2013, 3:58:50 PM9/7/13
to std-pr...@isocpp.org
True enough. I think this proposal has quite decent merit. I can't really work around it otherwise than postponing
the construction with optional, so it looks like a change to vector is what's necessary. If benchmarks show
that it's necessary, that is. ;)

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 4:28:16 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 8:54:30 PM UTC+1, Felipe Almeida wrote:

I think a elegant solution for this (and a lot of other similar)
problem would be
something like this:

void foo(std::vector<char>& v)
{
  assume(v.reserve() > v.size());
  v.push_back('a'); // the compiler would do dead-code elimination and
then inline
}
 
This is an interesting idea, but not quite perfect:
 * in a critical section or a tight loop I'd positively want to make sure that the optimisation has occured - but "assume" does not give any positive guarantees
 * it would be non-trivial for the compiler to do this optimisation for the following reasons:
    - the user can only refer to public methods like capacity() and size() in the "assume" clause while the implementation of puch_back would probably refer to private data directly, and the compiler will have to figure out the relashionship between these.
    - there is a special case in push_back that is handled separately: v.push_back(v[0]); you need to say to the compiler somehow that if the allocation is not required then this case should not be treated as special.

Magnus Fromreide

unread,
Sep 7, 2013, 4:36:27 PM9/7/13
to std-pr...@isocpp.org
That is a quality of implementation issue but I would measure rather
than guess regarding optimizations of standard library functions since
optimizers can be surprisingly good.

If one looks at GCC instead the implementation is

void
push_back(value_type&& __x)
{ emplace_back(std::move(__x)); }

template<typename _Tp, typename _Alloc>
template<typename... _Args>
void
vector<_Tp, _Alloc>::
emplace_back(_Args&&... __args)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
std::forward<_Args>(__args)...);
++this->_M_impl._M_finish;
}
else
_M_emplace_back_aux(std::forward<_Args>(__args)...);
}



Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 4:41:08 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Saturday, September 7, 2013 7:39:14 PM UTC+1, DeadMG wrote:
It would unfortunately be much easier because ranges are so irritating to construct. This, however, is immaterial to the fact that the other solutions are a disgusting hack which is only an approximation to the real solution.
 
By that logic, vector::operator[] is also a "disgusting hack" as it results in indefined behaviour if the index is out of range, instead of throwing an exception. However, the committee felt that efficiency was more important here than protecting the user from shooting themselves in a foot. The C++ philosophy is not to protect programmers from themselves, but to offer a choice - purists can still use the old good push_back. However, in some situation I do want maximum efficiency.
 
As I said in the original posting, any reasonable STL implementation would add an assertion to push_back_ to check that the assumption on the capacity actually holds - this would make debugging such violations as easy as debugging out-of[range problems.

Nevin Liber

unread,
Sep 7, 2013, 4:49:20 PM9/7/13
to std-pr...@isocpp.org
Why is that check any cheaper than the check for push_back to see if
the vector should grow?

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (312) 623-5420

Howard Hinnant

unread,
Sep 7, 2013, 4:50:51 PM9/7/13
to std-pr...@isocpp.org
On Sep 7, 2013, at 3:41 PM, Victor....@ncl.ac.uk wrote:

> Ok, maybe one can achieve inlining for push_bask, in principle at least. Below is an actual implementation from VS2012. I find it difficult to believe it's inlined...
>
> void push_back(value_type&& _Val)
> { // insert by moving into element at end
> if (_Inside(_STD addressof(_Val)))
> { // push back an element
> size_type _Idx = _STD addressof(_Val) - this->_Myfirst;
> if (this->_Mylast == this->_Myend)
> _Reserve(1);
> _Orphan_range(this->_Mylast, this->_Mylast);
> this->_Getal().construct(this->_Mylast,
> _STD forward<value_type>(this->_Myfirst[_Idx]));
> ++this->_Mylast;
> }
> else
> { // push back a non-element
> if (this->_Mylast == this->_Myend)
> _Reserve(1);
> _Orphan_range(this->_Mylast, this->_Mylast);
> this->_Getal().construct(this->_Mylast,
> _STD forward<value_type>(_Val));
> ++this->_Mylast;
> }
> }

I recommend you stick to quoting open sourced implementations. Here's one where you won't get in trouble:

http://libcxx.llvm.org

template <class _Tp, class _Allocator>
_LIBCPP_INLINE_VISIBILITY inline
void
vector<_Tp, _Allocator>::push_back(const_reference __x)
{
if (this->__end_ != this->__end_cap())
{
__alloc_traits::construct(this->__alloc(),
_VSTD::__to_raw_pointer(this->__end_), __x);
++this->__end_;
}
else
__push_back_slow_path(__x);
}

As others have alluded to, if push_back, and reserve+push_back is too expensive for you, then you are in to a specialty niche situation, and you should be prepared to do some extra work.

1. You need to tell the vector up front how many elements there are going to be.

2. You need to create a Widget generator which the function can use to construct widgets with.

3. Cast the generator in the form of a random access iterator. Random access is important as that is how the vector will get your up-front size statement. I strongly suspect you can safely ignore the requirement that a random access iterator must return a Widget& from operator*(). vector doesn't need or use that requirement. And it is a dumb requirement anyway that we should change.

Here's a sketch:

#include <vector>

class Widget
{
int data_;
public:
explicit Widget(int data) : data_(data) {}
};

class generator_Widget
{
int count_;
int state_;

public:
typedef std::random_access_iterator_tag iterator_category;
typedef Widget value_type;
typedef std::ptrdiff_t difference_type;
typedef const Widget* pointer;
typedef Widget reference;

explicit generator_Widget(int count, int state = 0)
: count_(count), state_(state) {}

reference operator*() const {return Widget(state_);}
pointer operator->() const {return nullptr;}

generator_Widget& operator++() {++count_; --state_; return *this;}
generator_Widget operator++(int)
{generator_Widget tmp(*this); ++(*this); return tmp;}

generator_Widget& operator--() {--count_; ++state_; return *this;}
generator_Widget operator--(int)
{generator_Widget tmp(*this); --(*this); return tmp;}

generator_Widget& operator+=(difference_type n) {count_ += n; state_ -= n; return *this;}
generator_Widget operator+(difference_type n) const
{generator_Widget tmp(*this); tmp += n; return tmp;}
friend generator_Widget operator+(difference_type n, generator_Widget x)
{x += n; return x;}
generator_Widget& operator-=(difference_type n) {return *this += -n;}
generator_Widget operator-(difference_type n) const
{generator_Widget tmp(*this); tmp -= n; return tmp;}

reference operator[](difference_type n) const {return *(*this + n);}

friend
bool
operator==(const generator_Widget& x, const generator_Widget& y)
{return x.count_ == y.count_;}

friend
bool
operator!=(const generator_Widget& x, const generator_Widget& y)
{return !(x == y);}

friend
bool
operator<(const generator_Widget& x, const generator_Widget& y)
{return x.count_ < y.count_;}

friend
bool
operator>(const generator_Widget& x, const generator_Widget& y)
{return y < x;}

friend
bool
operator<=(const generator_Widget& x, const generator_Widget& y)
{return !(y < x);}

friend
bool
operator>=(const generator_Widget& x, const generator_Widget& y)
{return !(x < y);}

friend
difference_type
operator-(const generator_Widget& x, const generator_Widget& y)
{return x.count_ - y.count_;}

};

int
main()
{
std::vector<Widget> v(generator_Widget(0, 10), generator_Widget(10));
}

Yes, its a pain to write all this boilerplate. Maybe you can generalize the iterator/generator so you only have to write it once. Or maybe somebody (boost?) has already written it for you. But it isn't going to get much faster than this. The vector constructor will subtract the pair of random access iterators in its constructor and allocate the right amount of memory. And then it will iterate through the range constructing each Widget from the dereferenced iterator.

Another technique you could use is to give the vector an allocator whose construct member is a no-op:

template <class U, class... Args>
void
construct(U* p, Args&&...)
{
}


Then construct your vector:

vector<Widget, my_alloc<Widget>> v(N); // allocate but don't construct the Widgets

And then loop through your vector and actually construct each Widget:

for (auto& w : v)
::new(&w) Widget(...);

You'll need to watch out for exception safety on this one. If any of your Widget constructors throws, then ~Widget() better be able to cope with an unspecified Widget state.

Neither of these techniques is very elegant. But they are both workable. They seem to me a better approach than adding an extremely error-prone member function to vector.

If we're going to further increase the performance of vector, the next most effective step to take is to teach allocators how to expand existing allocations in place (realloc without the possibility of an address change), and teach containers how to use such allocators. Doing this can produce as much as a 2X performance improvement compared to an un-reserved push_back loop. However even this at-best-2X improvement was not sufficiently enticing for the committee in the past. I have no reason to think it will be in the future.

Howard

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 4:59:32 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 9:49:20 PM UTC+1, Nevin ":-)" Liber wrote:
Why is that check any cheaper than the check for push_back to see if
the vector should grow?
Because assertion are only active in the debug mode, and are removed in the release build.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 5:12:44 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 9:36:27 PM UTC+1, Magnus Fromreide wrote:
That is a quality of implementation issue but I would measure rather
than guess regarding optimizations of standard library functions since
optimizers can be surprisingly good.
 
 
I wouldn't overestimate the intellect of optimisers, especially when pointers are involved...
 
In any case, there is no positive assurance that the optimisation is applied. This is also unstable - minor changes in the code can upset it. Moreover, it is easy enought to confuse the optimiser in practice. E.g. imagine several threads compute some good stuff and push their results into a vector, protecting puch_back calls by a lock. The user knows the upper bound on the total number of results, and so can reserve memory in advance. However, the code executed by the threads is far away from the call to reserve(), so the compiler is powerless to optimise the critical section.
 
 
If one looks at GCC instead the implementation is
 
Ok, can actually be inlined if you're lucky with your STL vendor :-)

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 5:33:10 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 9:50:51 PM UTC+1, Howard Hinnant wrote:
If we're going to further increase the performance of vector, the next most effective step to take is to teach allocators how to expand existing allocations in place (realloc without the possibility of an address change), and teach containers how to use such allocators.  Doing this can produce as much as a 2X performance improvement compared to an un-reserved push_back loop.  However even this at-best-2X improvement was not sufficiently enticing for the committee in the past.  I have no reason to think it will be in the future.

Howard
 
The generator solution is unelegant, I think I'd rather use malloc and manage memory myself than doing anything like this. It is also of limited use compared to push_back_ - e.g. imagine several threads push their results into a vector, protecting puch_back_ calls by a lock; this is difficult to reformulate as generators.
 
The allocator solution is interesting, and would probably work, but I believe it is no less "error-prone" than push_back_. Do you really want to recommend it? I think push_back_ is no more difficult / error-prone than operator[], so the users can cope with it.

Howard Hinnant

unread,
Sep 7, 2013, 5:45:32 PM9/7/13
to std-pr...@isocpp.org
On Sep 7, 2013, at 5:33 PM, Victor....@ncl.ac.uk wrote:

> The generator solution is unelegant, I think I'd rather use malloc and manage memory myself than doing anything like this.

That is certainly another possibility.

> It is also of limited use compared to push_back_ - e.g. imagine several threads push their results into a vector, protecting puch_back_ calls by a lock; this is difficult to reformulate as generators.

And the generator method would also be unnecessary. The cost of the lock would put the cost of the one if-statement you're trying to avoid into the noise.

>
> The allocator solution is interesting, and would probably work, but I believe it is no less "error-prone" than push_back_. Do you really want to recommend it? I think push_back_ is no more difficult / error-prone than operator[], so the users can cope with it.

The no-construct allocator technique is very error prone. The one advantage it has over push_back_ is that it is in C++11 and later, and push_back_ isn't.

Howard

Howard Hinnant

unread,
Sep 7, 2013, 7:45:19 PM9/7/13
to std-pr...@isocpp.org
On Sep 7, 2013, at 4:50 PM, Howard Hinnant <howard....@gmail.com> wrote:

> Here's a sketch:
>
> #include <vector>
>
> class Widget
> {
> int data_;
> public:
> explicit Widget(int data) : data_(data) {}
> };
>
> class generator_Widget
> {

Fwiw, I decided it was time to see some real code and real numbers so we know what we're dealing with. I'm using the cheapest Widget I can imagine - just wraps an int. This emphasizes any performance gains as much as possible. As Widget becomes more expensive to construct, the performance gains of one technique over the other will probably become negligible.

Here's the code I'm testing:

#include <iostream>
#include <chrono>
#include <vector>

class Widget
{
int data_;
public:
explicit Widget(int data) : data_(data) {}
};

class generator_Widget
{
int count_;
int state_;

public:
// typedef std::input_iterator_tag iterator_category;
const unsigned N = 1000000;
using namespace std::chrono;
auto t0 = high_resolution_clock::now();
#if 1
std::vector<Widget> v(generator_Widget(0, N), generator_Widget(N));
#else
std::vector<Widget> v;
int state = N;
v.reserve(N);
for (unsigned i = 0; i < N; ++i, --state)
v.push_back(Widget(state));
#endif
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << duration_cast<microseconds>(t1-t0).count() << "\xC2\xB5s\n";
}

I'm testing the above with clang++ -O3, tip-of-trunk libc++. For me this outputs:

$ a.out
2468µs
$ a.out
2431µs
$ a.out
2443µs
$ a.out
2709µs

If I change the above to use push_back():

#if 0

Then I get:

$ a.out
3848µs
$ a.out
3608µs
$ a.out
3635µs
$ a.out
3765µs

That's around a 40% performance hit, which is significant.

I also noted a generator/iterator adaptor in boost:

http://www.boost.org/doc/libs/1_54_0/libs/utility/generator_iterator.htm

However this adaptor is strictly conforming, and thus classifying itself as an input_iterator. This is a performance killer. I modeled this with:

typedef std::input_iterator_tag iterator_category;
// typedef std::random_access_iterator_tag iterator_category;

and got:

$ a.out
5256µs
$ a.out
5426µs
$ a.out
5233µs
$ a.out
5388µs

Another 48% slower than the push_back loop. The main reason for this is that vector can't predict a-preori the final capacity it will need, and thus must grow its capacity geometrically during the iteration. libc++ grows with a factor of 2. This is also the approximate time I get with the push_back() method if I remove the v.reserve(N).

I haven't bothered testing the allocator-no-op-constructor method. I believe it will be no faster, and possibly slower than the random-access-iterator-generator technique because it requires two passes over the elements, and one of the passes might be optimized out.

Howard

Billy O'Neal

unread,
Sep 8, 2013, 12:43:27 AM9/8/13
to std-proposals
Note that you can use the generator even if you aren't constructing the vector by calling assign or insert. While the generator solution is clunky if someone has really really profiled their code, and really found that a significant fraction is spent in push_back (which I think is *exceedingly* unlikely), then they can use the generator solution.

> The no-construct allocator technique is very error prone.  The one advantage it has over push_back_ is that it is in C++11 and later, and push_back_ isn't.
 
It also has the advantage that the client can't "screw up" and go off the end of the allocated region. The resulting program is perfectly safe, which wouldn't happen with the proposed push_back_ solution. (Yes, the generator is more complex, but the vast majority of bugs there will result in easy to diagnose crashes or simply fail to compile)
 
The benchmark above neglects to mention that there is also a size modification going on inside push_back -- writes are typically more expensive than reads, such as the comparison against capacity. If the vector has been reserved in advance, then the capacity check is going to be predicted with high accuracy by any machine worth its salt, so it should parallelize well.
 
However, writing the value of size in the above example changes on every iteration of the loop, and causes cross loop body memory dependencies. It also doubles or triples the amount of memory being written to, because each int being written is causing a size_t to be written for the size update.
 
The push_back_ proposal doesn't remove the size modification, only the capacity check. Are we sure that the capacity check is actually the cause of that 40% difference?
 
 

Billy O'Neal
Malware Response Instructor - BleepingComputer.com



Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Magnus Fromreide

unread,
Sep 8, 2013, 2:25:38 AM9/8/13
to std-pr...@isocpp.org
I did some quick performance runs with Howard's code (gcc-3.9 20130903,
x86_64) but added two more cases

1 - generate_Widget
2 - reserve, push_back
3 - reserve, emplace_back
4 - reserve, push_back_

What I ended up with was that

* 1 and 4 resulted in the same code
* 3 is marginally faster than 2, but still in the same ballpark

Looking at the generated assembler it seems that the real killer is that
the ptr_end and ptr_capacity values are updated (written to memory) on
each lap through the code in case 3 and 4. (The state variable is also
spilled but forcing that into a register doesn't seem to affect the
performance)

/MF


Billy O'Neal

unread,
Sep 8, 2013, 3:44:43 AM9/8/13
to std-proposals
It seems like the implementation shouldn't be writing ptr_capacity if the capacity is unchanged.... :/

Billy O'Neal
Malware Response Instructor - BleepingComputer.com



/MF


Magnus Fromreide

unread,
Sep 8, 2013, 3:54:16 AM9/8/13
to std-pr...@isocpp.org
On Sun, 2013-09-08 at 00:44 -0700, Billy O'Neal wrote:
> It seems like the implementation shouldn't be writing ptr_capacity if
> the capacity is unchanged.... :/

It doesn't, but it reads it from memory every turn.

/MF


Bengt Gustafsson

unread,
Sep 8, 2013, 11:18:56 AM9/8/13
to std-pr...@isocpp.org
We could consider a totally different way to handle this, which also has other uses. Admittedly, this can easily crash if misused (just as index out of range or too many push_back_ calls). I will call this tentatively uninitialized_resize(size_t sz). It just moves the end pointer and reallocates as required, but leaves the new elements uninitialized. Then, to initialize the elements, placement new is used. This avoids the increments for each push_back_() call. The additional use is for interfacing with legacy libraries and similar which construct an array of objects at a specified "buffer" address. Today there is no way to use a vector as such a buffer without extra initialization code running at resize() or when constructing the vector with the size. Note that this is true for basic types such as char and int too (in C++11), although I'm uncertain if this is intended (see note below).

Example:

vector<Widget> v;
v.uninitialized_resize(10);

for (auto& i : v)
    new (&i) Widget;

// Get an image from a third party library into a vector:

vector<char> image;
image.uninitialized_resize(size.x*size.y);
GetImage(image.data());

The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.


Note: The rather complicated reason for not being able to set the size of even a vector<char> without clearing the memory block is that the new resize() and ctor overloads without the initializer element does not just default construct the elements (which would be a nop for char) but rather calls Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default constuction. To me it seems that this combination effect may not have been desired. On the other hand this resize() overload replaces the older =0 on the second argument of resize() so even in C++98 it was impossible to construct a vector<char> without writing the elements. Probably many programmers have used the feature that in contrast with a plain char or array of char a vector of char has 0 in its elements. uninitialized_resize() thus adds a new functionality to vector in addition to what push_back_ would yield.

Billy O'Neal

unread,
Sep 8, 2013, 11:41:06 AM9/8/13
to std-proposals
>Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default construction.
 
If your allocator::construct does nothing, then the memory will be uninitialized. Every compiler on earth will inline a completely empty function. std::allocator's default behavior is to zero initialize the memory. If you want to use uninitialized behavior, just change out the allocator to do that.
 
>The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.
 
If you allow this, how does vector know the number of elements to destroy (call destructors on) in vector's destructor? Do you want to make it undefined to allow the vector to be destroyed with uninitialized data?
 
How many cases are actually out there that need to avoid zero-initialization of the memory that aren't met by unique_ptr<T[]>?
 
I remain unconvinced that any of this makes a measurable difference in real applications; or even that if it did that the class allowing these different behaviors should be called vector.

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


Bengt Gustafsson

unread,
Sep 8, 2013, 2:35:11 PM9/8/13
to std-pr...@isocpp.org


Den söndagen den 8:e september 2013 kl. 17:41:06 UTC+2 skrev Billy O'Neal:
>Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default construction.
 
If your allocator::construct does nothing, then the memory will be uninitialized. Every compiler on earth will inline a completely empty function. std::allocator's default behavior is to zero initialize the memory. If you want to use uninitialized behavior, just change out the allocator to do that.
 
Right, but as vectors with different allocators are different types you can't subsequently send the vector as an argument to a non-template function. Also, at least if the vector is a member of a class there may well be other uses where you want the allocator to initialize the memory.

 
>The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.
 
If you allow this, how does vector know the number of elements to destroy (call destructors on) in vector's destructor? Do you want to make it undefined to allow the vector to be destroyed with uninitialized data?
 
This is a main drawback if exceptions are possible. In many cases you don't need to consider this, but especially for generic code you would have to be careful and add a try block which sets the size back to where you were in case you got an exception.
 
How many cases are actually out there that need to avoid zero-initialization of the memory that aren't met by unique_ptr<T[]>?
 
It is often a matter of trying to reconcile C libraries or legacy libraries, other languages, device drivers etc. You want to work with the convenient data structures such as vector, but especially when receiving data from such external sources it can be a deterrant to have to either copy the received data an extra time or clear the buffer before it is overwritten. All just because there is no way to tell vector that "someone else" has constructed the data.

Victor....@ncl.ac.uk

unread,
Sep 8, 2013, 3:17:54 PM9/8/13
to std-pr...@isocpp.org

On Sunday, September 8, 2013 4:18:56 PM UTC+1, Bengt Gustafsson wrote:
We could consider a totally different way to handle this, which also has other uses. Admittedly, this can easily crash if misused (just as index out of range or too many push_back_ calls). I will call this tentatively uninitialized_resize(size_t sz). It just moves the end pointer and reallocates as required, but leaves the new elements uninitialized.
 
I think this will never be adopted as vector no longer can enforce its invariants. Example:
{
    vector<Widget> v;
    v.uninitialized_resize(10);
    // at this point the desctructors are called, resulting in a crash
}

Victor....@ncl.ac.uk

unread,
Sep 8, 2013, 3:24:12 PM9/8/13
to std-pr...@isocpp.org

On Sunday, September 8, 2013 12:45:19 AM UTC+1, Howard Hinnant wrote:
Fwiw, I decided it was time to see some real code and real numbers so we know what we're dealing with.

 
Many thanks, Howard, for your effort! I know it was my job to do the experiments :-\

Victor....@ncl.ac.uk

unread,
Sep 8, 2013, 3:31:21 PM9/8/13
to std-pr...@isocpp.org

On Sunday, September 8, 2013 7:25:38 AM UTC+1, Magnus Fromreide wrote:
1 - generate_Widget
2 - reserve, push_back
3 - reserve, emplace_back
4 - reserve, push_back_
 
Many thanks for the figures!
 
 
What I ended up with was that

* 1 and 4 resulted in the same code
 
do you mean the same assembly code or the same performance? (I'd expect the code to be different, as 1 doesn't have to store ptr_end on each loop iteration.)
 
 Looking at the generated assembler it seems that the real killer is that
the ptr_end and ptr_capacity values are updated (written to memory) on
each lap through the code in case 3 and 4.
 
 
Did you mean 2 and 3?
 
If I'm interpreting your and Howard's results right, push_back_ and generator have similar performance, which is 40% better than push_back. Correct?
 

Magnus Fromreide

unread,
Sep 8, 2013, 4:29:30 PM9/8/13
to std-pr...@isocpp.org
On Sun, 2013-09-08 at 12:31 -0700, Victor....@ncl.ac.uk wrote:
>
> On Sunday, September 8, 2013 7:25:38 AM UTC+1, Magnus Fromreide wrote:
> 1 - generate_Widget
> 2 - reserve, push_back
> 3 - reserve, emplace_back
> 4 - reserve, push_back_
>
> Many thanks for the figures!
>
>
> What I ended up with was that
>
> * 1 and 4 resulted in the same code
>
> do you mean the same assembly code or the same performance? (I'd
> expect the code to be different, as 1 doesn't have to store ptr_end on
> each loop iteration.)

The same assembly code between the calls to system_clock::now.
Down to the label names.
Yes, I was just as surprised as you.

I suppose the compiler figures out that nothing can disturb the sequence
of assignments so it can hoist the saving of ptr_end out of the loop
when Widget is this simple.

> Looking at the generated assembler it seems that the real
> killer is that
> the ptr_end and ptr_capacity values are updated (written to
> memory) on
> each lap through the code in case 3 and 4.
>
>
> Did you mean 2 and 3?

Yes.

> If I'm interpreting your and Howard's results right, push_back_ and
> generator have similar performance, which is 40% better than
> push_back. Correct?

Yes.

/MF

David Krauss

unread,
Sep 8, 2013, 10:42:45 PM9/8/13
to std-pr...@isocpp.org

The problem is that generators are far less flexible. You can't simultaneously insert into several containers. Inserting a non-predetermined number of objects requires the generator to return a status code as well as the result, which adds overhead in C++ (I don't suppose it's even part of the Generator concept; haven't looked it up).

One use case with lots of containers is sorting a large number of objects into bins, for example generating graph nodes with unknown but bounded degree, given a list of edges.

I had a thought, that if conversions were allowed when propagating allocators, you could use std::allocator (or whatever) to get memory for a container, then convert to a dummy allocator which does not perform additional allocations. Ideally the bounds-overflow check would become a no-op and be removed by dead code elimination. But, looking at the actual implementation of libstdc++-v3, that could be too difficult to arrange.

billy...@gmail.com

unread,
Sep 9, 2013, 12:07:34 AM9/9/13
to std-proposals
Have you profiled and found the capacity check significant in comparison to choosing the container instance to push_back to?

Even if significant, are you sure the easily branch predicted capacity check is significant compared to updating size?

Sent from a touchscreen. Please excuse the brevity and tpyos.

David Krauss

unread,
Sep 9, 2013, 2:10:22 AM9/9/13
to std-pr...@isocpp.org
On Monday, September 9, 2013 12:07:34 PM UTC+8, Billy O'Neal wrote:
Have you profiled and found the capacity check significant in comparison to choosing the container instance to push_back to?

Even if significant, are you sure the easily branch predicted capacity check is significant compared to updating size?

The specific case I mentioned was from years ago, and I don't know the performance impact of push_back partly because I abandoned standard containers and used a custom scheme (based on open hashing and circular linked lists). Updating vector::size is deadly not only because it requires extra instructions, but because the vector object and the data sequence are on different cache lines, so you can easily double the memory write bandwidth requirement.

Again, the topic here is what happens after branch prediction does its job. The extra instructions still need to be retired.

Correct me if I'm wrong, but using a generator here requires
  1. The length of the sequence is predetermined before the elements are generated.
  2. The length of the sequence is not predetermined at compile time, or you'd just use an array instead.
  3. Consecutive elements from the generated sequence go into the same container.
It's not hard to find a case which is simple enough that the capacity check has an impact, yet fails #1 and 3. Tokenizing a string requires the sequence length to vary (although the length is usually unbounded, I'm just going off the top of my head). Partitioning an input sequence into two output containers would fail #3. The container selection could just be a range check or modulus; such a thing could easily be a very tight inner loop.

I wonder if any cases where construction by generator is applicable pass requirement #2 — that it shouldn't be a fixed-size array.

As for exception safety and construction-destruction semantics in a fixed-size array, I think the performant generic solution would be an RAII array construction sentry. It's nice for the library to maintain a size count, but redundant if the final size is fixed. But, I'm not sure how best to prevent the full array destructor from running to allow such a sentry to do its job, in the general case where the sentry lifetime is during initialization but the array lifetime is longer. I guess it would require union trickery and an explicit destructor call in the non-exceptional case.

Micro-optimization tends to get very esoteric very fast, and although I did say this problem is valid, I don't know if it's really possible to address such problems in a general fashion in the standard library. Boost has plenty of tools for slightly-different data structures. Maybe that array sentry would fit in nicely there. But we can't try to accommodate every specific inner loop with a member function, or extrapolate too surely from specific member functions to real-world application solutions.

Custom allocators could improve container performance if they were more flexible. That's the best way to expand the library. The contract to obtain memory is the real issue. There are fewer such contracts than specific kinds of inner loops. Handling more memory allocation schemes would more usefully expand the applicability of the container library than adding canned inner loops. In this case, the user should meaningfully be able to say they don't want more memory.

For what it's worth, the last time I was bit by something like this was in a tokenizer, and the problem was malloc allocating the small strings, not the size/end update or capacity check. A third-party arena allocator might have solved my problem, but nothing included with GCC did the trick. I ended up coding a custom subset of std::string functionality as a "drop-in replacement."

Jonathan Wakely

unread,
Sep 9, 2013, 10:59:35 AM9/9/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.

I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.

Nevin Liber

unread,
Sep 9, 2013, 11:15:57 AM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 09:59, Jonathan Wakely <c...@kayari.org> wrote:
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.

I prefer buffer_overrun(), since that is what this will be enabling.
 
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.

+1
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Ion Gaztañaga

unread,
Sep 9, 2013, 12:07:26 PM9/9/13
to std-pr...@isocpp.org
El 09/09/2013 17:15, Nevin Liber escribi�:
> Has noone mentioned that the name is abysmal? OK, I'll do it:
> push_back_ is a terrible, terrible name. I prefer the
> uninitialized_resize() suggestion if only because the name at least
> gives a clue that it's dangerous if you use it wrong.

What about "unchecked_push_back()" ?

Resizing without constructing some elements (as the vector could store
some already constructed elements) breaks vector invariants. For
non-trivial types the user should construct the data using the same
allocator as vector::allocator_type and it should implement the
allocator_traits dispatching logic to be consistent with what vector
currently does.

"unchecked_push_back" just tells the vector that it should add a new
element without any boundary checking, just like we have operator[] vs at().

To use std::vector<POD> as a buffer to store data raw data to be filled
by an external function without paying for zero initialization of POD,
shouldn't something like this do the trick?

vector.resize(N, std::default_init);
receive_network_packet
(vector.data(), vector.size(), /*out*/ received_bytes);
vector.resize(received_bytes);

vector calls allocator_traits::construct(alloc, p, std::default_init)
for each element if the allocator does not define such function. The
default implementation for "allocator_traits::construct(p,
std::default_init)" would be "::new (p) T". I guess the compiler could
inline the call to allocator_traits::construct and see ::new(p) T does
nothing, simplifying the initialization loop to nothing. Otherwise, a
compiler traits is needed to detect trivially default-initializable types.

Best,

Ion

Nevin Liber

unread,
Sep 9, 2013, 12:32:52 PM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 11:07, Ion Gaztañaga <igazt...@gmail.com> wrote:
To use std::vector<POD> as a buffer to store data raw data to be filled by an external function without paying for zero initialization of POD,

That case can already be solved with allocators.

This proposal is purely to allow optimizing libraries the flexibility to remove one if check.

std::default_init is (a) not proposed (I didn't see it in the papers for Chicago) and (b) a whole different can of worms, as it either requires revising allocators (since allocators, not containers, are currently responsible for construction of objects inside containers) and/or moving/sharing responsibility of construction from the allocator to the container.  If you want to discuss that again, please start a new thread or resurrect one of the previous discussions on it.

Billy O'Neal

unread,
Sep 9, 2013, 1:06:21 PM9/9/13
to std-proposals
> Updating vector::size is deadly not only because it requires extra instructions, but because the vector object and the data sequence are on different cache lines, so you can easily double the memory write bandwidth requirement
 
Ok, well the push_back_ proposal doesn't get rid of the size modifications. Which was my whole point. The extra 1 or 2 instructions for the capacity check are negligible in comparison.
 
I don't think the cache line thing is going to be a big deal because the cache line containing size already needs to be accessed to get the pointer to vector's allocated buffer.
 
> I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
Ok, but that's a different proposal, which is going to be difficult because it breaks semantics of vector's destructor. I think if you want to restrict this to POD types that don't need destructor calls, then there's no reason someone can't write a container with the desired semantics and contains static_asserts that T be a POD type.
 
> This proposal is purely to allow optimizing libraries the flexibility to remove one if check.
 
Have you shown that this comparison is at all significant in a real world use case? The size update is the expensive portion, and this proposal doesn't remove the size update.
 

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Victor....@ncl.ac.uk

unread,
Sep 9, 2013, 2:28:51 PM9/9/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Monday, September 9, 2013 3:59:35 PM UTC+1, Jonathan Wakely wrote:
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
 
That trailing unserscore is a common convention. In any case, the proposal is not about the name, I'm quite flexible to change it. E.g. unchecked_push_back suggested by Ion is perfectly fine with me.
 
 
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.
 
The problem is that no STL container is good enough for this particular use case. It looks like falling back to malloc is the most natural approach here. (While I like the generating iterators methods, it's far from natural - how many programmers would invent it?)
 

Victor....@ncl.ac.uk

unread,
Sep 9, 2013, 2:29:58 PM9/9/13
to std-pr...@isocpp.org

On Monday, September 9, 2013 5:07:26 PM UTC+1, Ion Gaztañaga wrote:

What about "unchecked_push_back()" ?
 
Fine with me. 

Victor....@ncl.ac.uk

unread,
Sep 9, 2013, 2:38:18 PM9/9/13
to std-pr...@isocpp.org

On Monday, September 9, 2013 6:06:21 PM UTC+1, Billy O'Neal wrote:
Ok, well the push_back_ proposal doesn't get rid of the size modifications. Which was my whole point. The extra 1 or 2 instructions for the capacity check are negligible in comparison.
 
In Magnus' experiments the compiler was clever enough to remove these stores. I'd expect it to be easy to spot for the optimiser: on each iteration it loads and stores this value, and no one else modifies it, so the store instruction can be postponed to the very end.
 
 
 
Have you shown that this comparison is at all significant in a real world use case? The size update is the expensive portion, and this proposal doesn't remove the size update.
 
So far there are only small benchmarks by Howard and Magnus showing 40% speedup.

Thiago Macieira

unread,
Sep 9, 2013, 3:28:13 PM9/9/13
to std-pr...@isocpp.org
On segunda-feira, 9 de setembro de 2013 11:28:51, Victor....@ncl.ac.uk
wrote:
> That trailing unserscore is a common convention.

Can you point to examples in the Standard Library public API?

I can see examples of that in the private code in libstdc++-v3:

bits/stl_set.h: { return _M_t._M_insert_unique_(__position, __x); }
bits/stl_tree.h: _M_insert_(_Const_Base_ptr __x, _Const_Base_ptr __p,
_Arg&& __v)
ext/pb_ds/detail/unordered_iterator/iterator.hpp: iterator_()

In libc++ I can't find a single function ending in underscore, only internal
macros and variable names.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
signature.asc

Bengt Gustafsson

unread,
Sep 9, 2013, 5:03:26 PM9/9/13
to std-pr...@isocpp.org
I also think that the uninitialized_resize() I suggested would not be accepted due to the problems with the destruction of the maybe-constructed elements. However, I still think that there are practical uses for this, and as sadly enough not all vectors are equal, if you change the allocator you change the type and the vector is useless in non-template APIs written for regular vectors. While such APIs may be deemed as bad practize now that template oriented programming is prevalent it is very common in real world applications and in some cases necessary, such as when the called method is virtual. Due to this I'm willing to give it another try. I think that from the standpoint of understandability this one is worse, but from the point of view of keeping the vector promises it may be better.

Instead of an uninitialized_resize() this involves another ugly method, which we could tentatively call just_set_the_end_pointer(). The usage scenario is this:

vector.reserve(n);
int actual_count = socket.read(vector.data(), n);
vector.just_set_the_end_pointer(actual_count);

There is actually not much difference here, except that if the element type has a dtor it would be the socket or whatever functionality is messing around in the memory area that is responsible for handling destruction in case of a problem. As far as the vector is concerned there's still nothing there. Only when the just_set_the_end_pointer() method is called the vector takes responsiblity and at this time there is no problem in doing so. Some quite obvious nothrow guarantee must be placed on just_set_the_end_pointer() I guess, but as it just sets the end pointer that would not be a problem. The case that the initializing functionality is overstepping the buffer bounds is no worse than if the buffer was an array, a heap block or whatever.

The function name of course has to be given serious thought so that it really conveys what is going on. I would be quite happy to enable_if this function for POD elements only but I think the original poster proposing push_back_() would not. Also it is kind of falling into the vector<bool> trap again, although not as seriously.

Nevin Liber

unread,
Sep 9, 2013, 5:19:30 PM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 16:03, Bengt Gustafsson <bengt.gu...@beamways.com> wrote:

Instead of an uninitialized_resize() this involves another ugly method, which we could tentatively call just_set_the_end_pointer(). The usage scenario is this:

-1.  Classes have invariants; we should not be providing easy methods to break those invariants. 

I really don't see why any of this has to be shoehorned into std::vector.   If people can change their code to use the new methods you propose, they can also change their implementations to use more appropriate data structures.

Ion Gaztañaga

unread,
Sep 9, 2013, 6:19:23 PM9/9/13