proposal to add vector::push_back_()

1,493 views
Skip to first unread message

Victor....@ncl.ac.uk

unread,
Sep 6, 2013, 6:05:03 PM9/6/13
to std-pr...@isocpp.org
[I suspect I am re-inventing the wheel, but a quick search of this forum didn't yield any similar proposal.]
 
vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below. Hence, the proposal is to add vector::push_back_() which skips this check. Calling it for a full vector results in undefined behaviour (consistent with operator[] which does not check that the index is valid). Of course, an assertion checking the capacity should be used in push_back_() to simplify debugging, but the optimised code has no overhead.
 
---------
Example: The vector class is often used like this:
 
vector<Widget> vw;
vw.reserve(...);
for(...){
    vw.push_back(...); // inefficient due to unnecessary capacity checks; push_back_() should be used instead
}
 
The capacity check is unnecessary in this example and leads to loss of performance (code like this is likely to be in a tight internal loop). Furthermore, unnecessary overhead cannot be removed, e.g. replacing this code by
 
vector<Widget> vw(...);
for(...){
    vw[...]=...;
}
 
does not solve the problem, as now the default constructor of Widget is called many times, only for the created objects to be immediately overwritten in the loop.
---------
 
Of course, the same argument works for emplace_back_(). One might also consider adding back_inserter_ to further exploit the new functionality, e.g.:
 
vector<Widget> vw;
vw.reserve(...);
transform(...,back_inserter_(vw),...);
 
 

David Krauss

unread,
Sep 6, 2013, 10:34:31 PM9/6/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
In principle it's a valid problem, such checks can adversely affect inner loops, but the solution is way too specialized and obtuse.

How about a means to pass hints to the allocator, maybe a dispatch tag indicating to forward arguments from push_*, insert_*, emplace_*, and constructors to allocator::allocate? This would in general enable dumb allocators where the client can specify where allocations go, and when they are allowed. It would be a powerful interface.

the.ultim...@gmail.com

unread,
Sep 7, 2013, 4:59:45 AM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Hello Victor,

 
vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below.

I don't deny that there is a textual check at every push_back(). Do you have measurements on the acutal impact on execution? My theory is, if you reserved enough memory, the check should pass everytime and branch prediction would make your insertions throttle full speed.

Laurent

David Krauss

unread,
Sep 7, 2013, 7:10:25 AM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

I can vouch that it makes a difference. For frequent enough insertions, the cost comes not from mispredicted branches but the integer instructions to fetch and compare the capacity pointer.

The workaround is to not use push_back, but IMHO it should be the right tool for the job.

Ville Voutilainen

unread,
Sep 7, 2013, 7:57:45 AM9/7/13
to std-pr...@isocpp.org
Well, if I don't want to resize the vector (and construct objects), I can't see what work-around would
allow avoiding push_back. A vector of optionals would allow the resize without constructing
the underlying objects, and then I could just use operator[].

the.ultim...@gmail.com

unread,
Sep 7, 2013, 8:31:22 AM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

I can vouch that it makes a difference. For frequent enough insertions, the cost comes not from mispredicted branches but the integer instructions to fetch and compare the capacity pointer.

The workaround is to not use push_back, but IMHO it should be the right tool for the job.

And we're back at the von Neumann bottleneck. Wouldn't a cleaner way to work around this be to leave the control flow of the filling to the vector itself?

My thought on this isn't complete yet, but now that we can write function objects more easily, wouldn't the right approach be to generalize the following "fill" constructor:

vector (size_type n, const value_type& val,
 
const allocator_type& alloc = allocator_type());

Into
:

template<typename Generator>
vector
(size_type n, Generator gen,
 
const allocator_type& alloc = allocator_type());

If we then had a way to write the algorithm as a coroutine, the vector would determine its capacity only once and let the filling flow.

Maurice Bos

unread,
Sep 7, 2013, 8:40:08 AM9/7/13
to std-pr...@isocpp.org
Shouldn't this be a task of the compiler? (To inline push_back and optimize away the 'if (size == capacity)' check if it knows that 'size < capacity' because of a previous .reserve().)

--
 
---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

DeadMG

unread,
Sep 7, 2013, 9:45:35 AM9/7/13
to std-pr...@isocpp.org
You should refactor your loop into a range, and construct the vector directly from that. This would resolve the issue.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:12:24 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Saturday, September 7, 2013 3:34:31 AM UTC+1, David Krauss wrote:
In principle it's a valid problem, such checks can adversely affect inner loops, but the solution is way too specialized and obtuse.
 
How about a means to pass hints to the allocator, maybe a dispatch tag indicating to forward arguments from push_*, insert_*, emplace_*, and constructors to allocator::allocate? This would in general enable dumb allocators where the client can specify where allocations go, and when they are allowed. It would be a powerful interface.
 
If we modify vector class anyway, such a solution is more heavy-weight. Also difficult to teach. In fact, I'm not convinced it would actually give any speedup in practice.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:19:03 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Just to add: the implementation of push_back_() would be something like
 
*ptr_end++=parameter;
 
and so almost certainly inlined. In contrast, push_back() is unlikely to get inlined, as it does some complicated stuff, like reallocation and handling the ugly special case like v.push_back(v[i]).
 
 

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:26:21 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
I believe this will not result in any speedup, as the generator will have internal state, and will need extra calls (unless fully inlined). Moreover, it is less convinient to use - there are generate and generate_n functions in <algorithm> which do essentially the same, and in my experience they are rarely used.
 

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:32:48 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 1:40:08 PM UTC+1, Maurice Bos wrote:
Shouldn't this be a task of the compiler? (To inline push_back and optimize away the 'if (size == capacity)' check if it knows that 'size < capacity' because of a previous .reserve().)
 
 
As I said before, push_back is quite complicated, so unlikely to be inlined. Even it it is, the compiler has to be very clever to find this optimisation. In some situations, like inner loop or critical section, the programmer would want to ensure that this optimisation has been applied.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 2:36:12 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 2:45:35 PM UTC+1, DeadMG wrote:
You should refactor your loop into a range, and construct the vector directly from that. This would resolve the issue.
 
Or, much easier, allocate raw memory using malloc, and work with it.

DeadMG

unread,
Sep 7, 2013, 2:39:14 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
It would unfortunately be much easier because ranges are so irritating to construct. This, however, is immaterial to the fact that the other solutions are a disgusting hack which is only an approximation to the real solution.

Magnus Fromreide

unread,
Sep 7, 2013, 2:58:51 PM9/7/13
to std-pr...@isocpp.org
I would expect the implementetion of push_back to be something along the
line of

inline void vector<T>::push_back(const T& parameter) {
if (ptr_end == ptr_capacity)
grow();
allocator.construct(ptr_end, parameter);
++ptr_end;
}

where grow() grows the vector if needed. Here I don't expect grow() to
be fully inlined but I do not see why the rest couldn't be inlined.

Note that the your push_back_ also needs to do

allocator.construct(ptr_end, parameter);
++ptr_end; // Increment after as construct could throw

since I expect the memory from ptr_end to ptr_capacity to be
uninitialized, so it really isn't that much simpler.

/MF


Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:03:15 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 12:57:45 PM UTC+1, Ville Voutilainen wrote:

Well, if I don't want to resize the vector (and construct objects), I can't see what work-around would
allow avoiding push_back. A vector of optionals would allow the resize without constructing
the underlying objects, and then I could just use operator[].
 
Doesn't solve the problem: still many optionals are constructed, only to be immediately overwritten.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:09:47 PM9/7/13
to std-pr...@isocpp.org
Ok, point taken about allocator.construct.
 
However, your implementation of puch_back has a subtle bug: it has undefined behaviour for calls like v.push_back(v[0]) when v is full. Hence, you need to add some code to handle this special case. This complicates the implementation and makes it less likely to be inlined.

Ville Voutilainen

unread,
Sep 7, 2013, 3:14:17 PM9/7/13
to std-pr...@isocpp.org
Assuming that the Widgets in your original example are costly to construct and overwrite,
the optional<Widget>s are not costly to construct.

Magnus Fromreide

unread,
Sep 7, 2013, 3:21:58 PM9/7/13
to std-pr...@isocpp.org
Ok, point taken about the v[i] case, but the whole insertion in the
grow() case could be passed of to another method (grow(parameter)) since
that is a problem only when a reallocation happens so there still is no
reason this would prevent inlining of the non-growing case.

/MF

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:41:28 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 8:21:58 PM UTC+1, Magnus Fromreide wrote:
Ok, point taken about the v[i] case, but the whole insertion in the
grow() case could be passed of to another method (grow(parameter)) since
that is a problem only when a reallocation happens so there still is no
reason this would prevent inlining of the non-growing case.

/MF

 
Ok, maybe one can achieve inlining for push_bask, in principle at least. Below is an actual implementation from VS2012. I find it difficult to believe it's inlined...

 

void push_back(value_type&& _Val)
 { // insert by moving into element at end
 if (_Inside(_STD addressof(_Val)))
  { // push back an element
  size_type _Idx = _STD addressof(_Val) - this->_Myfirst;
  if (this->_Mylast == this->_Myend)
   _Reserve(1);
  _Orphan_range(this->_Mylast, this->_Mylast);
  this->_Getal().construct(this->_Mylast,
   _STD forward<value_type>(this->_Myfirst[_Idx]));
  ++this->_Mylast;
  }
 else
  { // push back a non-element
  if (this->_Mylast == this->_Myend)
   _Reserve(1);
  _Orphan_range(this->_Mylast, this->_Mylast);
  this->_Getal().construct(this->_Mylast,
   _STD forward<value_type>(_Val));
  ++this->_Mylast;
  }
 }

 

 

 
 

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 3:50:33 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 8:14:17 PM UTC+1, Ville Voutilainen wrote:

Assuming that the Widgets in your original example are costly to construct and overwrite,
the optional<Widget>s are not costly to construct.
 
Sure - this can work in some circumstances. However, we are talking about saving a couple of processor instructions here, so "costly" should be viewed from this perspective. Also, that extra flag in optional<> can increase the element size, spoil the alignment, and result in something that's actually slower and takes more memory. The bottom line: occasionally it works, but it's not a general solution to the problem.

Felipe Magno de Almeida

unread,
Sep 7, 2013, 3:54:30 PM9/7/13
to std-pr...@isocpp.org
On Fri, Sep 6, 2013 at 7:05 PM, <Victor....@ncl.ac.uk> wrote:
> [I suspect I am re-inventing the wheel, but a quick search of this forum
> didn't yield any similar proposal.]
>
> vector::push_back() is inefficient, as every time it checks whether the
> vector has reached its full capacity and needs to allocate more memory. Such
> a check is often unnecessary, e.g. if reserve() was called, see the example
> below. Hence, the proposal is to add vector::push_back_() which skips this
> check. Calling it for a full vector results in undefined behaviour
> (consistent with operator[] which does not check that the index is valid).
> Of course, an assertion checking the capacity should be used in push_back_()
> to simplify debugging, but the optimised code has no overhead.

I think a elegant solution for this (and a lot of other similar)
problem would be
something like this:

void foo(std::vector<char>& v)
{
assume(v.reserve() > v.size());
v.push_back('a'); // the compiler would do dead-code elimination and
then inline
}

[snip - example]

Regards,
--
Felipe Magno de Almeida

Ville Voutilainen

unread,
Sep 7, 2013, 3:58:50 PM9/7/13
to std-pr...@isocpp.org
True enough. I think this proposal has quite decent merit. I can't really work around it otherwise than postponing
the construction with optional, so it looks like a change to vector is what's necessary. If benchmarks show
that it's necessary, that is. ;)

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 4:28:16 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 8:54:30 PM UTC+1, Felipe Almeida wrote:

I think a elegant solution for this (and a lot of other similar)
problem would be
something like this:

void foo(std::vector<char>& v)
{
  assume(v.reserve() > v.size());
  v.push_back('a'); // the compiler would do dead-code elimination and
then inline
}
 
This is an interesting idea, but not quite perfect:
 * in a critical section or a tight loop I'd positively want to make sure that the optimisation has occured - but "assume" does not give any positive guarantees
 * it would be non-trivial for the compiler to do this optimisation for the following reasons:
    - the user can only refer to public methods like capacity() and size() in the "assume" clause while the implementation of puch_back would probably refer to private data directly, and the compiler will have to figure out the relashionship between these.
    - there is a special case in push_back that is handled separately: v.push_back(v[0]); you need to say to the compiler somehow that if the allocation is not required then this case should not be treated as special.

Magnus Fromreide

unread,
Sep 7, 2013, 4:36:27 PM9/7/13
to std-pr...@isocpp.org
That is a quality of implementation issue but I would measure rather
than guess regarding optimizations of standard library functions since
optimizers can be surprisingly good.

If one looks at GCC instead the implementation is

void
push_back(value_type&& __x)
{ emplace_back(std::move(__x)); }

template<typename _Tp, typename _Alloc>
template<typename... _Args>
void
vector<_Tp, _Alloc>::
emplace_back(_Args&&... __args)
{
if (this->_M_impl._M_finish != this->_M_impl._M_end_of_storage)
{
_Alloc_traits::construct(this->_M_impl, this->_M_impl._M_finish,
std::forward<_Args>(__args)...);
++this->_M_impl._M_finish;
}
else
_M_emplace_back_aux(std::forward<_Args>(__args)...);
}



Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 4:41:08 PM9/7/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Saturday, September 7, 2013 7:39:14 PM UTC+1, DeadMG wrote:
It would unfortunately be much easier because ranges are so irritating to construct. This, however, is immaterial to the fact that the other solutions are a disgusting hack which is only an approximation to the real solution.
 
By that logic, vector::operator[] is also a "disgusting hack" as it results in indefined behaviour if the index is out of range, instead of throwing an exception. However, the committee felt that efficiency was more important here than protecting the user from shooting themselves in a foot. The C++ philosophy is not to protect programmers from themselves, but to offer a choice - purists can still use the old good push_back. However, in some situation I do want maximum efficiency.
 
As I said in the original posting, any reasonable STL implementation would add an assertion to push_back_ to check that the assumption on the capacity actually holds - this would make debugging such violations as easy as debugging out-of[range problems.

Nevin Liber

unread,
Sep 7, 2013, 4:49:20 PM9/7/13
to std-pr...@isocpp.org
Why is that check any cheaper than the check for push_back to see if
the vector should grow?

--
Nevin ":-)" Liber <mailto:ne...@eviloverlord.com> (312) 623-5420

Howard Hinnant

unread,
Sep 7, 2013, 4:50:51 PM9/7/13
to std-pr...@isocpp.org
On Sep 7, 2013, at 3:41 PM, Victor....@ncl.ac.uk wrote:

> Ok, maybe one can achieve inlining for push_bask, in principle at least. Below is an actual implementation from VS2012. I find it difficult to believe it's inlined...
>
> void push_back(value_type&& _Val)
> { // insert by moving into element at end
> if (_Inside(_STD addressof(_Val)))
> { // push back an element
> size_type _Idx = _STD addressof(_Val) - this->_Myfirst;
> if (this->_Mylast == this->_Myend)
> _Reserve(1);
> _Orphan_range(this->_Mylast, this->_Mylast);
> this->_Getal().construct(this->_Mylast,
> _STD forward<value_type>(this->_Myfirst[_Idx]));
> ++this->_Mylast;
> }
> else
> { // push back a non-element
> if (this->_Mylast == this->_Myend)
> _Reserve(1);
> _Orphan_range(this->_Mylast, this->_Mylast);
> this->_Getal().construct(this->_Mylast,
> _STD forward<value_type>(_Val));
> ++this->_Mylast;
> }
> }

I recommend you stick to quoting open sourced implementations. Here's one where you won't get in trouble:

http://libcxx.llvm.org

template <class _Tp, class _Allocator>
_LIBCPP_INLINE_VISIBILITY inline
void
vector<_Tp, _Allocator>::push_back(const_reference __x)
{
if (this->__end_ != this->__end_cap())
{
__alloc_traits::construct(this->__alloc(),
_VSTD::__to_raw_pointer(this->__end_), __x);
++this->__end_;
}
else
__push_back_slow_path(__x);
}

As others have alluded to, if push_back, and reserve+push_back is too expensive for you, then you are in to a specialty niche situation, and you should be prepared to do some extra work.

1. You need to tell the vector up front how many elements there are going to be.

2. You need to create a Widget generator which the function can use to construct widgets with.

3. Cast the generator in the form of a random access iterator. Random access is important as that is how the vector will get your up-front size statement. I strongly suspect you can safely ignore the requirement that a random access iterator must return a Widget& from operator*(). vector doesn't need or use that requirement. And it is a dumb requirement anyway that we should change.

Here's a sketch:

#include <vector>

class Widget
{
int data_;
public:
explicit Widget(int data) : data_(data) {}
};

class generator_Widget
{
int count_;
int state_;

public:
typedef std::random_access_iterator_tag iterator_category;
typedef Widget value_type;
typedef std::ptrdiff_t difference_type;
typedef const Widget* pointer;
typedef Widget reference;

explicit generator_Widget(int count, int state = 0)
: count_(count), state_(state) {}

reference operator*() const {return Widget(state_);}
pointer operator->() const {return nullptr;}

generator_Widget& operator++() {++count_; --state_; return *this;}
generator_Widget operator++(int)
{generator_Widget tmp(*this); ++(*this); return tmp;}

generator_Widget& operator--() {--count_; ++state_; return *this;}
generator_Widget operator--(int)
{generator_Widget tmp(*this); --(*this); return tmp;}

generator_Widget& operator+=(difference_type n) {count_ += n; state_ -= n; return *this;}
generator_Widget operator+(difference_type n) const
{generator_Widget tmp(*this); tmp += n; return tmp;}
friend generator_Widget operator+(difference_type n, generator_Widget x)
{x += n; return x;}
generator_Widget& operator-=(difference_type n) {return *this += -n;}
generator_Widget operator-(difference_type n) const
{generator_Widget tmp(*this); tmp -= n; return tmp;}

reference operator[](difference_type n) const {return *(*this + n);}

friend
bool
operator==(const generator_Widget& x, const generator_Widget& y)
{return x.count_ == y.count_;}

friend
bool
operator!=(const generator_Widget& x, const generator_Widget& y)
{return !(x == y);}

friend
bool
operator<(const generator_Widget& x, const generator_Widget& y)
{return x.count_ < y.count_;}

friend
bool
operator>(const generator_Widget& x, const generator_Widget& y)
{return y < x;}

friend
bool
operator<=(const generator_Widget& x, const generator_Widget& y)
{return !(y < x);}

friend
bool
operator>=(const generator_Widget& x, const generator_Widget& y)
{return !(x < y);}

friend
difference_type
operator-(const generator_Widget& x, const generator_Widget& y)
{return x.count_ - y.count_;}

};

int
main()
{
std::vector<Widget> v(generator_Widget(0, 10), generator_Widget(10));
}

Yes, its a pain to write all this boilerplate. Maybe you can generalize the iterator/generator so you only have to write it once. Or maybe somebody (boost?) has already written it for you. But it isn't going to get much faster than this. The vector constructor will subtract the pair of random access iterators in its constructor and allocate the right amount of memory. And then it will iterate through the range constructing each Widget from the dereferenced iterator.

Another technique you could use is to give the vector an allocator whose construct member is a no-op:

template <class U, class... Args>
void
construct(U* p, Args&&...)
{
}


Then construct your vector:

vector<Widget, my_alloc<Widget>> v(N); // allocate but don't construct the Widgets

And then loop through your vector and actually construct each Widget:

for (auto& w : v)
::new(&w) Widget(...);

You'll need to watch out for exception safety on this one. If any of your Widget constructors throws, then ~Widget() better be able to cope with an unspecified Widget state.

Neither of these techniques is very elegant. But they are both workable. They seem to me a better approach than adding an extremely error-prone member function to vector.

If we're going to further increase the performance of vector, the next most effective step to take is to teach allocators how to expand existing allocations in place (realloc without the possibility of an address change), and teach containers how to use such allocators. Doing this can produce as much as a 2X performance improvement compared to an un-reserved push_back loop. However even this at-best-2X improvement was not sufficiently enticing for the committee in the past. I have no reason to think it will be in the future.

Howard

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 4:59:32 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 9:49:20 PM UTC+1, Nevin ":-)" Liber wrote:
Why is that check any cheaper than the check for push_back to see if
the vector should grow?
Because assertion are only active in the debug mode, and are removed in the release build.

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 5:12:44 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 9:36:27 PM UTC+1, Magnus Fromreide wrote:
That is a quality of implementation issue but I would measure rather
than guess regarding optimizations of standard library functions since
optimizers can be surprisingly good.
 
 
I wouldn't overestimate the intellect of optimisers, especially when pointers are involved...
 
In any case, there is no positive assurance that the optimisation is applied. This is also unstable - minor changes in the code can upset it. Moreover, it is easy enought to confuse the optimiser in practice. E.g. imagine several threads compute some good stuff and push their results into a vector, protecting puch_back calls by a lock. The user knows the upper bound on the total number of results, and so can reserve memory in advance. However, the code executed by the threads is far away from the call to reserve(), so the compiler is powerless to optimise the critical section.
 
 
If one looks at GCC instead the implementation is
 
Ok, can actually be inlined if you're lucky with your STL vendor :-)

Victor....@ncl.ac.uk

unread,
Sep 7, 2013, 5:33:10 PM9/7/13
to std-pr...@isocpp.org

On Saturday, September 7, 2013 9:50:51 PM UTC+1, Howard Hinnant wrote:
If we're going to further increase the performance of vector, the next most effective step to take is to teach allocators how to expand existing allocations in place (realloc without the possibility of an address change), and teach containers how to use such allocators.  Doing this can produce as much as a 2X performance improvement compared to an un-reserved push_back loop.  However even this at-best-2X improvement was not sufficiently enticing for the committee in the past.  I have no reason to think it will be in the future.

Howard
 
The generator solution is unelegant, I think I'd rather use malloc and manage memory myself than doing anything like this. It is also of limited use compared to push_back_ - e.g. imagine several threads push their results into a vector, protecting puch_back_ calls by a lock; this is difficult to reformulate as generators.
 
The allocator solution is interesting, and would probably work, but I believe it is no less "error-prone" than push_back_. Do you really want to recommend it? I think push_back_ is no more difficult / error-prone than operator[], so the users can cope with it.

Howard Hinnant

unread,
Sep 7, 2013, 5:45:32 PM9/7/13
to std-pr...@isocpp.org
On Sep 7, 2013, at 5:33 PM, Victor....@ncl.ac.uk wrote:

> The generator solution is unelegant, I think I'd rather use malloc and manage memory myself than doing anything like this.

That is certainly another possibility.

> It is also of limited use compared to push_back_ - e.g. imagine several threads push their results into a vector, protecting puch_back_ calls by a lock; this is difficult to reformulate as generators.

And the generator method would also be unnecessary. The cost of the lock would put the cost of the one if-statement you're trying to avoid into the noise.

>
> The allocator solution is interesting, and would probably work, but I believe it is no less "error-prone" than push_back_. Do you really want to recommend it? I think push_back_ is no more difficult / error-prone than operator[], so the users can cope with it.

The no-construct allocator technique is very error prone. The one advantage it has over push_back_ is that it is in C++11 and later, and push_back_ isn't.

Howard

Howard Hinnant

unread,
Sep 7, 2013, 7:45:19 PM9/7/13
to std-pr...@isocpp.org
On Sep 7, 2013, at 4:50 PM, Howard Hinnant <howard....@gmail.com> wrote:

> Here's a sketch:
>
> #include <vector>
>
> class Widget
> {
> int data_;
> public:
> explicit Widget(int data) : data_(data) {}
> };
>
> class generator_Widget
> {

Fwiw, I decided it was time to see some real code and real numbers so we know what we're dealing with. I'm using the cheapest Widget I can imagine - just wraps an int. This emphasizes any performance gains as much as possible. As Widget becomes more expensive to construct, the performance gains of one technique over the other will probably become negligible.

Here's the code I'm testing:

#include <iostream>
#include <chrono>
#include <vector>

class Widget
{
int data_;
public:
explicit Widget(int data) : data_(data) {}
};

class generator_Widget
{
int count_;
int state_;

public:
// typedef std::input_iterator_tag iterator_category;
const unsigned N = 1000000;
using namespace std::chrono;
auto t0 = high_resolution_clock::now();
#if 1
std::vector<Widget> v(generator_Widget(0, N), generator_Widget(N));
#else
std::vector<Widget> v;
int state = N;
v.reserve(N);
for (unsigned i = 0; i < N; ++i, --state)
v.push_back(Widget(state));
#endif
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << duration_cast<microseconds>(t1-t0).count() << "\xC2\xB5s\n";
}

I'm testing the above with clang++ -O3, tip-of-trunk libc++. For me this outputs:

$ a.out
2468µs
$ a.out
2431µs
$ a.out
2443µs
$ a.out
2709µs

If I change the above to use push_back():

#if 0

Then I get:

$ a.out
3848µs
$ a.out
3608µs
$ a.out
3635µs
$ a.out
3765µs

That's around a 40% performance hit, which is significant.

I also noted a generator/iterator adaptor in boost:

http://www.boost.org/doc/libs/1_54_0/libs/utility/generator_iterator.htm

However this adaptor is strictly conforming, and thus classifying itself as an input_iterator. This is a performance killer. I modeled this with:

typedef std::input_iterator_tag iterator_category;
// typedef std::random_access_iterator_tag iterator_category;

and got:

$ a.out
5256µs
$ a.out
5426µs
$ a.out
5233µs
$ a.out
5388µs

Another 48% slower than the push_back loop. The main reason for this is that vector can't predict a-preori the final capacity it will need, and thus must grow its capacity geometrically during the iteration. libc++ grows with a factor of 2. This is also the approximate time I get with the push_back() method if I remove the v.reserve(N).

I haven't bothered testing the allocator-no-op-constructor method. I believe it will be no faster, and possibly slower than the random-access-iterator-generator technique because it requires two passes over the elements, and one of the passes might be optimized out.

Howard

Billy O'Neal

unread,
Sep 8, 2013, 12:43:27 AM9/8/13
to std-proposals
Note that you can use the generator even if you aren't constructing the vector by calling assign or insert. While the generator solution is clunky if someone has really really profiled their code, and really found that a significant fraction is spent in push_back (which I think is *exceedingly* unlikely), then they can use the generator solution.

> The no-construct allocator technique is very error prone.  The one advantage it has over push_back_ is that it is in C++11 and later, and push_back_ isn't.
 
It also has the advantage that the client can't "screw up" and go off the end of the allocated region. The resulting program is perfectly safe, which wouldn't happen with the proposed push_back_ solution. (Yes, the generator is more complex, but the vast majority of bugs there will result in easy to diagnose crashes or simply fail to compile)
 
The benchmark above neglects to mention that there is also a size modification going on inside push_back -- writes are typically more expensive than reads, such as the comparison against capacity. If the vector has been reserved in advance, then the capacity check is going to be predicted with high accuracy by any machine worth its salt, so it should parallelize well.
 
However, writing the value of size in the above example changes on every iteration of the loop, and causes cross loop body memory dependencies. It also doubles or triples the amount of memory being written to, because each int being written is causing a size_t to be written for the size update.
 
The push_back_ proposal doesn't remove the size modification, only the capacity check. Are we sure that the capacity check is actually the cause of that 40% difference?
 
 

Billy O'Neal
Malware Response Instructor - BleepingComputer.com



Howard

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Future Proposals" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-proposal...@isocpp.org.
To post to this group, send email to std-pr...@isocpp.org.
Visit this group at http://groups.google.com/a/isocpp.org/group/std-proposals/.

Magnus Fromreide

unread,
Sep 8, 2013, 2:25:38 AM9/8/13
to std-pr...@isocpp.org
I did some quick performance runs with Howard's code (gcc-3.9 20130903,
x86_64) but added two more cases

1 - generate_Widget
2 - reserve, push_back
3 - reserve, emplace_back
4 - reserve, push_back_

What I ended up with was that

* 1 and 4 resulted in the same code
* 3 is marginally faster than 2, but still in the same ballpark

Looking at the generated assembler it seems that the real killer is that
the ptr_end and ptr_capacity values are updated (written to memory) on
each lap through the code in case 3 and 4. (The state variable is also
spilled but forcing that into a register doesn't seem to affect the
performance)

/MF


Billy O'Neal

unread,
Sep 8, 2013, 3:44:43 AM9/8/13
to std-proposals
It seems like the implementation shouldn't be writing ptr_capacity if the capacity is unchanged.... :/

Billy O'Neal
Malware Response Instructor - BleepingComputer.com



/MF


Magnus Fromreide

unread,
Sep 8, 2013, 3:54:16 AM9/8/13
to std-pr...@isocpp.org
On Sun, 2013-09-08 at 00:44 -0700, Billy O'Neal wrote:
> It seems like the implementation shouldn't be writing ptr_capacity if
> the capacity is unchanged.... :/

It doesn't, but it reads it from memory every turn.

/MF


Bengt Gustafsson

unread,
Sep 8, 2013, 11:18:56 AM9/8/13
to std-pr...@isocpp.org
We could consider a totally different way to handle this, which also has other uses. Admittedly, this can easily crash if misused (just as index out of range or too many push_back_ calls). I will call this tentatively uninitialized_resize(size_t sz). It just moves the end pointer and reallocates as required, but leaves the new elements uninitialized. Then, to initialize the elements, placement new is used. This avoids the increments for each push_back_() call. The additional use is for interfacing with legacy libraries and similar which construct an array of objects at a specified "buffer" address. Today there is no way to use a vector as such a buffer without extra initialization code running at resize() or when constructing the vector with the size. Note that this is true for basic types such as char and int too (in C++11), although I'm uncertain if this is intended (see note below).

Example:

vector<Widget> v;
v.uninitialized_resize(10);

for (auto& i : v)
    new (&i) Widget;

// Get an image from a third party library into a vector:

vector<char> image;
image.uninitialized_resize(size.x*size.y);
GetImage(image.data());

The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.


Note: The rather complicated reason for not being able to set the size of even a vector<char> without clearing the memory block is that the new resize() and ctor overloads without the initializer element does not just default construct the elements (which would be a nop for char) but rather calls Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default constuction. To me it seems that this combination effect may not have been desired. On the other hand this resize() overload replaces the older =0 on the second argument of resize() so even in C++98 it was impossible to construct a vector<char> without writing the elements. Probably many programmers have used the feature that in contrast with a plain char or array of char a vector of char has 0 in its elements. uninitialized_resize() thus adds a new functionality to vector in addition to what push_back_ would yield.

Billy O'Neal

unread,
Sep 8, 2013, 11:41:06 AM9/8/13
to std-proposals
>Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default construction.
 
If your allocator::construct does nothing, then the memory will be uninitialized. Every compiler on earth will inline a completely empty function. std::allocator's default behavior is to zero initialize the memory. If you want to use uninitialized behavior, just change out the allocator to do that.
 
>The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.
 
If you allow this, how does vector know the number of elements to destroy (call destructors on) in vector's destructor? Do you want to make it undefined to allow the vector to be destroyed with uninitialized data?
 
How many cases are actually out there that need to avoid zero-initialization of the memory that aren't met by unique_ptr<T[]>?
 
I remain unconvinced that any of this makes a measurable difference in real applications; or even that if it did that the class allowing these different behaviors should be called vector.

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


Bengt Gustafsson

unread,
Sep 8, 2013, 2:35:11 PM9/8/13
to std-pr...@isocpp.org


Den söndagen den 8:e september 2013 kl. 17:41:06 UTC+2 skrev Billy O'Neal:
>Allocator::construct which is now a variadic template function and expanding the empty parameter pack creates a value construction rather than a default construction.
 
If your allocator::construct does nothing, then the memory will be uninitialized. Every compiler on earth will inline a completely empty function. std::allocator's default behavior is to zero initialize the memory. If you want to use uninitialized behavior, just change out the allocator to do that.
 
Right, but as vectors with different allocators are different types you can't subsequently send the vector as an argument to a non-template function. Also, at least if the vector is a member of a class there may well be other uses where you want the allocator to initialize the memory.

 
>The placement new usage could be somewhat problematic as it circumvents the allocator. If Allocator::construct does someting else than what the default allocator does a special method on vector may be warranted, for instance initialize_element(i, args...), which basically forwards to Allocator::construct.
 
If you allow this, how does vector know the number of elements to destroy (call destructors on) in vector's destructor? Do you want to make it undefined to allow the vector to be destroyed with uninitialized data?
 
This is a main drawback if exceptions are possible. In many cases you don't need to consider this, but especially for generic code you would have to be careful and add a try block which sets the size back to where you were in case you got an exception.
 
How many cases are actually out there that need to avoid zero-initialization of the memory that aren't met by unique_ptr<T[]>?
 
It is often a matter of trying to reconcile C libraries or legacy libraries, other languages, device drivers etc. You want to work with the convenient data structures such as vector, but especially when receiving data from such external sources it can be a deterrant to have to either copy the received data an extra time or clear the buffer before it is overwritten. All just because there is no way to tell vector that "someone else" has constructed the data.

Victor....@ncl.ac.uk

unread,
Sep 8, 2013, 3:17:54 PM9/8/13
to std-pr...@isocpp.org

On Sunday, September 8, 2013 4:18:56 PM UTC+1, Bengt Gustafsson wrote:
We could consider a totally different way to handle this, which also has other uses. Admittedly, this can easily crash if misused (just as index out of range or too many push_back_ calls). I will call this tentatively uninitialized_resize(size_t sz). It just moves the end pointer and reallocates as required, but leaves the new elements uninitialized.
 
I think this will never be adopted as vector no longer can enforce its invariants. Example:
{
    vector<Widget> v;
    v.uninitialized_resize(10);
    // at this point the desctructors are called, resulting in a crash
}

Victor....@ncl.ac.uk

unread,
Sep 8, 2013, 3:24:12 PM9/8/13
to std-pr...@isocpp.org

On Sunday, September 8, 2013 12:45:19 AM UTC+1, Howard Hinnant wrote:
Fwiw, I decided it was time to see some real code and real numbers so we know what we're dealing with.

 
Many thanks, Howard, for your effort! I know it was my job to do the experiments :-\

Victor....@ncl.ac.uk

unread,
Sep 8, 2013, 3:31:21 PM9/8/13
to std-pr...@isocpp.org

On Sunday, September 8, 2013 7:25:38 AM UTC+1, Magnus Fromreide wrote:
1 - generate_Widget
2 - reserve, push_back
3 - reserve, emplace_back
4 - reserve, push_back_
 
Many thanks for the figures!
 
 
What I ended up with was that

* 1 and 4 resulted in the same code
 
do you mean the same assembly code or the same performance? (I'd expect the code to be different, as 1 doesn't have to store ptr_end on each loop iteration.)
 
 Looking at the generated assembler it seems that the real killer is that
the ptr_end and ptr_capacity values are updated (written to memory) on
each lap through the code in case 3 and 4.
 
 
Did you mean 2 and 3?
 
If I'm interpreting your and Howard's results right, push_back_ and generator have similar performance, which is 40% better than push_back. Correct?
 

Magnus Fromreide

unread,
Sep 8, 2013, 4:29:30 PM9/8/13
to std-pr...@isocpp.org
On Sun, 2013-09-08 at 12:31 -0700, Victor....@ncl.ac.uk wrote:
>
> On Sunday, September 8, 2013 7:25:38 AM UTC+1, Magnus Fromreide wrote:
> 1 - generate_Widget
> 2 - reserve, push_back
> 3 - reserve, emplace_back
> 4 - reserve, push_back_
>
> Many thanks for the figures!
>
>
> What I ended up with was that
>
> * 1 and 4 resulted in the same code
>
> do you mean the same assembly code or the same performance? (I'd
> expect the code to be different, as 1 doesn't have to store ptr_end on
> each loop iteration.)

The same assembly code between the calls to system_clock::now.
Down to the label names.
Yes, I was just as surprised as you.

I suppose the compiler figures out that nothing can disturb the sequence
of assignments so it can hoist the saving of ptr_end out of the loop
when Widget is this simple.

> Looking at the generated assembler it seems that the real
> killer is that
> the ptr_end and ptr_capacity values are updated (written to
> memory) on
> each lap through the code in case 3 and 4.
>
>
> Did you mean 2 and 3?

Yes.

> If I'm interpreting your and Howard's results right, push_back_ and
> generator have similar performance, which is 40% better than
> push_back. Correct?

Yes.

/MF

David Krauss

unread,
Sep 8, 2013, 10:42:45 PM9/8/13
to std-pr...@isocpp.org

The problem is that generators are far less flexible. You can't simultaneously insert into several containers. Inserting a non-predetermined number of objects requires the generator to return a status code as well as the result, which adds overhead in C++ (I don't suppose it's even part of the Generator concept; haven't looked it up).

One use case with lots of containers is sorting a large number of objects into bins, for example generating graph nodes with unknown but bounded degree, given a list of edges.

I had a thought, that if conversions were allowed when propagating allocators, you could use std::allocator (or whatever) to get memory for a container, then convert to a dummy allocator which does not perform additional allocations. Ideally the bounds-overflow check would become a no-op and be removed by dead code elimination. But, looking at the actual implementation of libstdc++-v3, that could be too difficult to arrange.

billy...@gmail.com

unread,
Sep 9, 2013, 12:07:34 AM9/9/13
to std-proposals
Have you profiled and found the capacity check significant in comparison to choosing the container instance to push_back to?

Even if significant, are you sure the easily branch predicted capacity check is significant compared to updating size?

Sent from a touchscreen. Please excuse the brevity and tpyos.

David Krauss

unread,
Sep 9, 2013, 2:10:22 AM9/9/13
to std-pr...@isocpp.org
On Monday, September 9, 2013 12:07:34 PM UTC+8, Billy O'Neal wrote:
Have you profiled and found the capacity check significant in comparison to choosing the container instance to push_back to?

Even if significant, are you sure the easily branch predicted capacity check is significant compared to updating size?

The specific case I mentioned was from years ago, and I don't know the performance impact of push_back partly because I abandoned standard containers and used a custom scheme (based on open hashing and circular linked lists). Updating vector::size is deadly not only because it requires extra instructions, but because the vector object and the data sequence are on different cache lines, so you can easily double the memory write bandwidth requirement.

Again, the topic here is what happens after branch prediction does its job. The extra instructions still need to be retired.

Correct me if I'm wrong, but using a generator here requires
  1. The length of the sequence is predetermined before the elements are generated.
  2. The length of the sequence is not predetermined at compile time, or you'd just use an array instead.
  3. Consecutive elements from the generated sequence go into the same container.
It's not hard to find a case which is simple enough that the capacity check has an impact, yet fails #1 and 3. Tokenizing a string requires the sequence length to vary (although the length is usually unbounded, I'm just going off the top of my head). Partitioning an input sequence into two output containers would fail #3. The container selection could just be a range check or modulus; such a thing could easily be a very tight inner loop.

I wonder if any cases where construction by generator is applicable pass requirement #2 — that it shouldn't be a fixed-size array.

As for exception safety and construction-destruction semantics in a fixed-size array, I think the performant generic solution would be an RAII array construction sentry. It's nice for the library to maintain a size count, but redundant if the final size is fixed. But, I'm not sure how best to prevent the full array destructor from running to allow such a sentry to do its job, in the general case where the sentry lifetime is during initialization but the array lifetime is longer. I guess it would require union trickery and an explicit destructor call in the non-exceptional case.

Micro-optimization tends to get very esoteric very fast, and although I did say this problem is valid, I don't know if it's really possible to address such problems in a general fashion in the standard library. Boost has plenty of tools for slightly-different data structures. Maybe that array sentry would fit in nicely there. But we can't try to accommodate every specific inner loop with a member function, or extrapolate too surely from specific member functions to real-world application solutions.

Custom allocators could improve container performance if they were more flexible. That's the best way to expand the library. The contract to obtain memory is the real issue. There are fewer such contracts than specific kinds of inner loops. Handling more memory allocation schemes would more usefully expand the applicability of the container library than adding canned inner loops. In this case, the user should meaningfully be able to say they don't want more memory.

For what it's worth, the last time I was bit by something like this was in a tokenizer, and the problem was malloc allocating the small strings, not the size/end update or capacity check. A third-party arena allocator might have solved my problem, but nothing included with GCC did the trick. I ended up coding a custom subset of std::string functionality as a "drop-in replacement."

Jonathan Wakely

unread,
Sep 9, 2013, 10:59:35 AM9/9/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.

I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.

Nevin Liber

unread,
Sep 9, 2013, 11:15:57 AM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 09:59, Jonathan Wakely <c...@kayari.org> wrote:
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.

I prefer buffer_overrun(), since that is what this will be enabling.
 
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.

+1
--
 Nevin ":-)" Liber  <mailto:ne...@eviloverlord.com(847) 691-1404

Ion Gaztañaga

unread,
Sep 9, 2013, 12:07:26 PM9/9/13
to std-pr...@isocpp.org
El 09/09/2013 17:15, Nevin Liber escribi�:
> Has noone mentioned that the name is abysmal? OK, I'll do it:
> push_back_ is a terrible, terrible name. I prefer the
> uninitialized_resize() suggestion if only because the name at least
> gives a clue that it's dangerous if you use it wrong.

What about "unchecked_push_back()" ?

Resizing without constructing some elements (as the vector could store
some already constructed elements) breaks vector invariants. For
non-trivial types the user should construct the data using the same
allocator as vector::allocator_type and it should implement the
allocator_traits dispatching logic to be consistent with what vector
currently does.

"unchecked_push_back" just tells the vector that it should add a new
element without any boundary checking, just like we have operator[] vs at().

To use std::vector<POD> as a buffer to store data raw data to be filled
by an external function without paying for zero initialization of POD,
shouldn't something like this do the trick?

vector.resize(N, std::default_init);
receive_network_packet
(vector.data(), vector.size(), /*out*/ received_bytes);
vector.resize(received_bytes);

vector calls allocator_traits::construct(alloc, p, std::default_init)
for each element if the allocator does not define such function. The
default implementation for "allocator_traits::construct(p,
std::default_init)" would be "::new (p) T". I guess the compiler could
inline the call to allocator_traits::construct and see ::new(p) T does
nothing, simplifying the initialization loop to nothing. Otherwise, a
compiler traits is needed to detect trivially default-initializable types.

Best,

Ion

Nevin Liber

unread,
Sep 9, 2013, 12:32:52 PM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 11:07, Ion Gaztañaga <igazt...@gmail.com> wrote:
To use std::vector<POD> as a buffer to store data raw data to be filled by an external function without paying for zero initialization of POD,

That case can already be solved with allocators.

This proposal is purely to allow optimizing libraries the flexibility to remove one if check.

std::default_init is (a) not proposed (I didn't see it in the papers for Chicago) and (b) a whole different can of worms, as it either requires revising allocators (since allocators, not containers, are currently responsible for construction of objects inside containers) and/or moving/sharing responsibility of construction from the allocator to the container.  If you want to discuss that again, please start a new thread or resurrect one of the previous discussions on it.

Billy O'Neal

unread,
Sep 9, 2013, 1:06:21 PM9/9/13
to std-proposals
> Updating vector::size is deadly not only because it requires extra instructions, but because the vector object and the data sequence are on different cache lines, so you can easily double the memory write bandwidth requirement
 
Ok, well the push_back_ proposal doesn't get rid of the size modifications. Which was my whole point. The extra 1 or 2 instructions for the capacity check are negligible in comparison.
 
I don't think the cache line thing is going to be a big deal because the cache line containing size already needs to be accessed to get the pointer to vector's allocated buffer.
 
> I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
Ok, but that's a different proposal, which is going to be difficult because it breaks semantics of vector's destructor. I think if you want to restrict this to POD types that don't need destructor calls, then there's no reason someone can't write a container with the desired semantics and contains static_asserts that T be a POD type.
 
> This proposal is purely to allow optimizing libraries the flexibility to remove one if check.
 
Have you shown that this comparison is at all significant in a real world use case? The size update is the expensive portion, and this proposal doesn't remove the size update.
 

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Victor....@ncl.ac.uk

unread,
Sep 9, 2013, 2:28:51 PM9/9/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Monday, September 9, 2013 3:59:35 PM UTC+1, Jonathan Wakely wrote:
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
 
That trailing unserscore is a common convention. In any case, the proposal is not about the name, I'm quite flexible to change it. E.g. unchecked_push_back suggested by Ion is perfectly fine with me.
 
 
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.
 
The problem is that no STL container is good enough for this particular use case. It looks like falling back to malloc is the most natural approach here. (While I like the generating iterators methods, it's far from natural - how many programmers would invent it?)
 

Victor....@ncl.ac.uk

unread,
Sep 9, 2013, 2:29:58 PM9/9/13
to std-pr...@isocpp.org

On Monday, September 9, 2013 5:07:26 PM UTC+1, Ion Gaztañaga wrote:

What about "unchecked_push_back()" ?
 
Fine with me. 

Victor....@ncl.ac.uk

unread,
Sep 9, 2013, 2:38:18 PM9/9/13
to std-pr...@isocpp.org

On Monday, September 9, 2013 6:06:21 PM UTC+1, Billy O'Neal wrote:
Ok, well the push_back_ proposal doesn't get rid of the size modifications. Which was my whole point. The extra 1 or 2 instructions for the capacity check are negligible in comparison.
 
In Magnus' experiments the compiler was clever enough to remove these stores. I'd expect it to be easy to spot for the optimiser: on each iteration it loads and stores this value, and no one else modifies it, so the store instruction can be postponed to the very end.
 
 
 
Have you shown that this comparison is at all significant in a real world use case? The size update is the expensive portion, and this proposal doesn't remove the size update.
 
So far there are only small benchmarks by Howard and Magnus showing 40% speedup.

Thiago Macieira

unread,
Sep 9, 2013, 3:28:13 PM9/9/13
to std-pr...@isocpp.org
On segunda-feira, 9 de setembro de 2013 11:28:51, Victor....@ncl.ac.uk
wrote:
> That trailing unserscore is a common convention.

Can you point to examples in the Standard Library public API?

I can see examples of that in the private code in libstdc++-v3:

bits/stl_set.h: { return _M_t._M_insert_unique_(__position, __x); }
bits/stl_tree.h: _M_insert_(_Const_Base_ptr __x, _Const_Base_ptr __p,
_Arg&& __v)
ext/pb_ds/detail/unordered_iterator/iterator.hpp: iterator_()

In libc++ I can't find a single function ending in underscore, only internal
macros and variable names.

--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel Open Source Technology Center
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
signature.asc

Bengt Gustafsson

unread,
Sep 9, 2013, 5:03:26 PM9/9/13
to std-pr...@isocpp.org
I also think that the uninitialized_resize() I suggested would not be accepted due to the problems with the destruction of the maybe-constructed elements. However, I still think that there are practical uses for this, and as sadly enough not all vectors are equal, if you change the allocator you change the type and the vector is useless in non-template APIs written for regular vectors. While such APIs may be deemed as bad practize now that template oriented programming is prevalent it is very common in real world applications and in some cases necessary, such as when the called method is virtual. Due to this I'm willing to give it another try. I think that from the standpoint of understandability this one is worse, but from the point of view of keeping the vector promises it may be better.

Instead of an uninitialized_resize() this involves another ugly method, which we could tentatively call just_set_the_end_pointer(). The usage scenario is this:

vector.reserve(n);
int actual_count = socket.read(vector.data(), n);
vector.just_set_the_end_pointer(actual_count);

There is actually not much difference here, except that if the element type has a dtor it would be the socket or whatever functionality is messing around in the memory area that is responsible for handling destruction in case of a problem. As far as the vector is concerned there's still nothing there. Only when the just_set_the_end_pointer() method is called the vector takes responsiblity and at this time there is no problem in doing so. Some quite obvious nothrow guarantee must be placed on just_set_the_end_pointer() I guess, but as it just sets the end pointer that would not be a problem. The case that the initializing functionality is overstepping the buffer bounds is no worse than if the buffer was an array, a heap block or whatever.

The function name of course has to be given serious thought so that it really conveys what is going on. I would be quite happy to enable_if this function for POD elements only but I think the original poster proposing push_back_() would not. Also it is kind of falling into the vector<bool> trap again, although not as seriously.

Nevin Liber

unread,
Sep 9, 2013, 5:19:30 PM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 16:03, Bengt Gustafsson <bengt.gu...@beamways.com> wrote:

Instead of an uninitialized_resize() this involves another ugly method, which we could tentatively call just_set_the_end_pointer(). The usage scenario is this:

-1.  Classes have invariants; we should not be providing easy methods to break those invariants. 

I really don't see why any of this has to be shoehorned into std::vector.   If people can change their code to use the new methods you propose, they can also change their implementations to use more appropriate data structures.

Ion Gaztañaga

unread,
Sep 9, 2013, 6:19:23 PM9/9/13
to std-pr...@isocpp.org
El 09/09/2013 18:32, Nevin Liber escribi�:
> To use std::vector<POD> as a buffer to store data raw data to be
> filled by an external function without paying for zero
> initialization of POD,
>
> That case can already be solved with allocators.
> This proposal is purely to allow optimizing libraries the flexibility to
> remove one if check.
>
> std::default_init is (a) not proposed (I didn't see it in the papers for
> Chicago) and (b) a whole different can of worms, as it either requires
> revising allocators (since allocators, not containers, are currently
> responsible for construction of objects inside containers) and/or
> moving/sharing responsibility of construction from the allocator to the
> container. If you want to discuss that again, please start a new thread
> or resurrect one of the previous discussions on it.

Sorry but I don't understand your response. I'm just giving an
alternative to some proposals in this thread like the one proposed by Bengt:

vector.reserve(n);
int actual_count = socket.read(vector.data(), n);
vector.just_set_the_end_pointer(actual_count);

and other posts talking about the cost of zero-initialization when
dealing with C libraries.

Writing a custom allocator just to avoid the value-initialization
overhead is not a trivial task and you are changing the type of the
vector, which is not very convenient to maintain ABI compatibility, and
you miss move semantics with other classes holding std::allocator-based
vectors. In this case, the socket is writing to uninitialized memory,
constructing objects without passing through allocator_traits protocol.
If value_type has no trivial destructor then resources will be leaked.

I know that std::default_init is not in any proposal, it's was a quick
example to show how default initialization for all allocators could be
achieved (just like in C++11 variadic construct functions for emplace
were added). allocator_traits is the proxy object that dispatches object
construction for containers between allocators' "construct" methods and
default implementations based on placement new. Currently, with
allocator_traits+std::allocator you can obtain these placement calls:

::new (p) T() //value initialization for DefaultInsertable types
::new (p) T(a) //1 arg constructor for EmplaceInsertable types
::new (p) T(a, b) //2 arg, etc.. for EmplaceInsertable types

but there is no equivalent to:
::new (p) T //default initialization for DefaultInsertable types

I consider this a (tiny) hole in the specification. If default
initialization is a good idea in the language, then it should be there
also for collections. The typical use cases is avoiding more costly
value initializations for types that will be shortly externally assigned
(socket, IPC, etc.).

We can add a new function to allocator_traits requesting "default
initialization" when constructing an DefaultInsertable object, just like
it offers variadic constructors while C++03 allocators only provide a
copy construction based "construct" member. Changing allocator_traits
does not require changing user defined allocators, just like we didn't
need to update our C++03 allocators with C++11 containers.

I agree that this is off-topic in a "push_back()_" titled thread, sorry
for any inconvenience this might have caused.

Best,

Ion

David Krauss

unread,
Sep 9, 2013, 6:30:31 PM9/9/13
to std-pr...@isocpp.org


On Tuesday, September 10, 2013 5:03:26 AM UTC+8, Bengt Gustafsson wrote:
However, I still think that there are practical uses for this, and as sadly enough not all vectors are equal, if you change the allocator you change the type and the vector is useless in non-template APIs written for regular vectors. While such APIs may be deemed as bad practize now that template oriented programming is prevalent it is very common in real world applications and in some cases necessary, such as when the called method is virtual.

Since we're already in micro-optimization land, legacy code and especially virtual dispatches essentially don't exist. The present motivation is to stop users from abandoning the Standard Library entirely in favor of malloc, or a user-defined end pointer within the vector. Such user tendencies result in reliability and security traded off for performance.

There's no sense in opening a potential correctness hole for the sake of keeping std::allocator in use.

Nevin Liber

unread,
Sep 9, 2013, 6:34:36 PM9/9/13
to std-pr...@isocpp.org
On 9 September 2013 17:19, Ion Gaztañaga <igazt...@gmail.com> wrote:
Sorry but I don't understand your response. I'm just giving an alternative to some proposals in this thread like the one proposed by Bengt:


  vector.reserve(n);
  int actual_count = socket.read(vector.data(), n);
  vector.just_set_the_end_pointer(actual_count);

and other posts talking about the cost of zero-initialization when dealing with C libraries.

Writing a custom allocator just to avoid the value-initialization overhead is not a trivial task

1)  It isn't that hard in C++11.  Which part of writing the allocator are you finding tricky?

2)  I would strongly support someone proposing such an allocator for C++17.
 
and you are changing the type of the vector, which is not very convenient to maintain ABI compatibility,

And changing the standard is *more* convenient, for a feature you cannot count on having for at least half a decade?
 
and you miss move semantics with other classes holding std::allocator-based vectors.

So we should shoehorn *everything* into std::vector<T, std::allocator<T>> because some people are unwilling to change their interfaces?
 
I know that std::default_init is not in any proposal, it's was a quick example to show how default initialization for all allocators could be achieved (just like in C++11 variadic construct functions for emplace were added). allocator_traits is the proxy object that dispatches object construction for containers between allocators' "construct" methods and default implementations based on placement new. Currently, with allocator_traits+std::allocator you can obtain these placement calls:

::new (p) T() //value initialization for DefaultInsertable types
::new (p) T(a) //1 arg constructor for EmplaceInsertable types
::new (p) T(a, b) //2 arg, etc.. for EmplaceInsertable types

but there is no equivalent to:
::new (p) T //default initialization for DefaultInsertable types

There is also no equivalent to:

:: new (p) T{} 
:: new (p) T{a} 
:: new (p) T{a, b} 

or 

:: new (p) T
:: new (p) T{a} 
:: new (p) T{a, b} 

People desire lots of combinations of things.

While it isn't great that allocators have two responsibilities (getting/freeing storage and constructing/destroying objects), that is the customization point for it.

-- 

Jonathan Wakely

unread,
Sep 10, 2013, 5:58:37 AM9/10/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk


On Monday, September 9, 2013 7:28:51 PM UTC+1, Victor....@ncl.ac.uk wrote:

On Monday, September 9, 2013 3:59:35 PM UTC+1, Jonathan Wakely wrote:
Has noone mentioned that the name is abysmal?  OK, I'll do it: push_back_ is a terrible, terrible name.  I prefer the uninitialized_resize() suggestion if only because the name at least gives a clue that it's dangerous if you use it wrong.
 
That trailing unserscore is a common convention.

Where else is it used in the standard library?
 
In any case, the proposal is not about the name, I'm quite flexible to change it. E.g. unchecked_push_back suggested by Ion is perfectly fine with me.
 
 
I agree with Howard's first mail that this is a pretty specialized request and so I'm not keen on adding it to the standard because you don't want to write some extra code for your specialized scenario.  std::vector doesn't need to be ideal for every possible use case.
 
The problem is that no STL container is good enough for this particular use case. It looks like falling back to malloc is the most natural approach here. (While I like the generating iterators methods, it's far from natural - how many programmers would invent it?)

So write your own container, the STL containers were never meant to be all things to all men, they are examples of data structures that follow the STL interface, which is about iterators and algorithms, not containers.
 
Specialized use cases deserve specialized containers.

Victor....@ncl.ac.uk

unread,
Sep 10, 2013, 3:53:46 PM9/10/13
to std-pr...@isocpp.org

On Monday, September 9, 2013 8:28:13 PM UTC+1, Thiago Macieira wrote:
On segunda-feira, 9 de setembro de 2013 11:28:51, Victor....@ncl.ac.uk
wrote:
> That trailing unserscore is a common convention.

Can you point to examples in the Standard Library public API?
 
Ok, this is a good argument, let's go with Ion's suggestion then (unchecked_push_back). 

masse....@gmail.com

unread,
Sep 11, 2013, 3:27:27 PM9/11/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Hi,
I haven't read all the comments, but it seems to me that your are finding really complicated solution to a simple problem.
The problem is that we are doing the check every time when adding multiple values?
When the we don't we provide a method wich will be able to add mutiple values directly and do the check only one?
something like:
template< class RandomIt >
void push_back( RandomIt first, RandomIt last );
RandomIt should be a RandomAccessIterator, so that we can compute the difference between first and last to do the check.
Note that in the case where the allocated memory isn't suffisent and a reallocation happen, things could also be optimized because we already know how much space will be necessary in order to do all the insertion.
The code will then become :

vector<Widget> vw;
vector
<Widget> others;
//vw.reserve(...); commented out since the extra allocation will be done in one pass below, so adding this won't changes a lot of things.
vw
.push_back(others.begin(), others.end()); // efficient because the capacity checks will be done only once.


Note that this solution is probably not the best in term of pure efficiency, but I do believe it is good compromise between efficiency and code savety.

On Saturday, September 7, 2013 12:05:03 AM UTC+2, Victor....@ncl.ac.uk wrote:
[I suspect I am re-inventing the wheel, but a quick search of this forum didn't yield any similar proposal.]
 
vector::push_back() is inefficient, as every time it checks whether the vector has reached its full capacity and needs to allocate more memory. Such a check is often unnecessary, e.g. if reserve() was called, see the example below. Hence, the proposal is to add vector::push_back_() which skips this check. Calling it for a full vector results in undefined behaviour (consistent with operator[] which does not check that the index is valid). Of course, an assertion checking the capacity should be used in push_back_() to simplify debugging, but the optimised code has no overhead.
 
---------
Example: The vector class is often used like this:
 
vector<Widget> vw;
vw.reserve(...);
for(...){
    vw.push_back(...); // inefficient due to unnecessary capacity checks; push_back_() should be used instead
}
 
The capacity check is unnecessary in this example and leads to loss of performance (code like this is likely to be in a tight internal loop). Furthermore, unnecessary overhead cannot be removed, e.g. replacing this code by
 
vector<Widget> vw(...);
for(...){
    vw[...]=...;
}
 
does not solve the problem, as now the default constructor of Widget is called many times, only for the created objects to be immediately overwritten in the loop.
---------
 
Of course, the same argument works for emplace_back_(). One might also consider adding back_inserter_ to further exploit the new functionality, e.g.:
 
vector<Widget> vw;
vw.reserve(...);
transform(...,back_inserter_(vw),...);
 
 

stackm...@hotmail.com

unread,
Sep 11, 2013, 4:35:45 PM9/11/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
I'd prefer push_back_unchecked because it allows for better code completion.

Billy O'Neal

unread,
Sep 11, 2013, 5:27:37 PM9/11/13
to std-proposals, Victor....@ncl.ac.uk
That already exists, the member function is called insert.
 
vw.insert(vw.begin(), others.begin(), others.end()); //C++98/03

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Victor....@ncl.ac.uk

unread,
Sep 12, 2013, 12:06:34 PM9/12/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Wednesday, September 11, 2013 9:35:45 PM UTC+1, stackm...@hotmail.com wrote:
I'd prefer push_back_unchecked because it allows for better code completion.
 
This would be somewhat misleading, as one can read it, "push back an unchecked Widget".
 
Victor.

Victor....@ncl.ac.uk

unread,
Sep 12, 2013, 1:12:53 PM9/12/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Dear all,
 
I think it would be appropriate to provide a (somewhat biased) summary of the discussion so far.
 
 * Name:
 
It was pointed by several posters that the original name push_back_ has no analogs in the STL. This is a strong argument, and I'm quite happy to change it. A couple of  alternatives were suggested, my current favourite is unchecked_push_back suggested by Ion. Any other suggestions?
 
 * Performance:
 
The independent (from me) experiments by Howard and Magnus showed 40% improvement over push_back. The loop code was included into the measurments, so the "pure" performance gain of unchecked_push_back over push_back must be higher. However, it would be difficult to disentangle it from the optimised code, and maybe the end user would be more interested in the performance of the whole loop anyway.
 
Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate!
 
* Alternatives to unchecked_push_back:
 
Several alternatives have been suggested how to improve the performance without dragging unchecked_push_back into the standard (I tried to make a complete list, but apologies if I missed anything).
 
1. Use malloc and manage memory yourself. (Ok, unique_ptr with a deleter calling free() would simplify the task.) This is probably a solution an average programmer would come up with.
 
Advantages: good performance, easy
 
Disadvantages: repulsive for C++ programmers, unsafe, difficult to interface with the rest of the C++ code
 
2. Generating iterators suggested by Howard: use the range constructor of vector, but give it an iterator that would compute a new Widget when dereferenced.
 
Advantages: safe, uses what already exists in the library, in experiments it peformed as well as unchecked_push_back.
 
Disadvantages: mixing iterators with generators; "pain to write all this boilerplate" - to quote Howard.
 
In principle, the idea of a generating iterator could make a proposal of its own, but that would require amending the standard anyway.
 
3.  Filling constructor: add a vector constructor accepting a generator.
Advantages: safe, potentially can peform as well as unchecked_push_back.
 
Disadvantages: still requires addition to the library. (Afterthought: is this at all possible in a natural way? vector already has templated constructors with two parameters. But still can be done somehow, e.g. with a fake parameter.)
 
4.  The "assume" trick suggested by Felipe.
 
Advantages: simple, does not require additions to the standard
 
Disadvantages: no positive assurance that the optimisation is actually applied; non-trivial for the compiler to deduce the optimisation; non-standard extension
 
5. uninitialized_resize suggested by Bengt.
 
Advantages: simple and efficient
 
Disadvantages: require additions to the standard, exceedingly unsafe (breaks vector's invariants); if fact Bengt dropped this suggestion in favour of the next one.
 
6. just_set_the_end_pointer suggested by Bengt.
 
Advantages: simple and efficient
 
Disadvantages: require additions to the standard, quite unsafe (gives the user direct write access to vector's end pointer).
 
7. Custom allocators suggested by several posters. The idea is, roughly, to pass a custom allocator to a vector, such that when the vector's constructor taking the size is called, the memory would be left uninitialised. Then placement-new can be used to create Widgets.
 
Advantages: uses what already exists in the library; might achieve the same performance as unchecked_push_back, but this was not tested.
 
Disadvantages: exceedingly unsafe: might call the destructor of an uninitialised Widget, e.g. in case of an exception; might break the "usual" invariants expected of the vector.
 
* Safety: When discussing safety, we should bare in mind that vector was primeraly designed for efficiency and interface with C code [Meyers]. As such, it has several unsafe features, such as data() returing descriptors of internal data, unchecked operator[], the possibility of the pointer returned by data() and iterators to become invalid (the latter are very, very difficult to check by assertions), etc. In view of these, unchecked_push_back looks rather benign: it has a simple pre-condition which is easy to check by an assertion.

Billy O'Neal

unread,
Sep 12, 2013, 1:21:21 PM9/12/13
to std-proposals
[quote]
Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate!
[/quote]
At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.
 
Note again that we have to distinguish the behavior of not checking capacity (which was the original proposal) from not updating size. If you are proposing the capacity check being removed, I doubt that was a significant part of that 40%. If you are removing the size check, then using vector without causing undefined behavior is pretty much impossible.
 
If you want uninitialized semantics, then the container should not be called vector. You want something else. That is a specialized use case for which the existing containers are not a good fit.
 

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Nevin Liber

unread,
Sep 12, 2013, 1:48:03 PM9/12/13
to std-pr...@isocpp.org
On 12 September 2013 12:21, Billy O'Neal <billy...@gmail.com> wrote:
[quote]
Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate!
[/quote]
At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.
 
Note again that we have to distinguish the behavior of not checking capacity (which was the original proposal) from not updating size. If you are proposing the capacity check being removed, I doubt that was a significant part of that 40%. If you are removing the size check, then using vector without causing undefined behavior is pretty much impossible.

push_back normally does three things in the non-growing case:

1.  pointer comparison (capacity check)
2.  placement new (object construction)
3.  pointer increment (size correction)

Howard's benchmark minimized the cost of (2).  I can very easily believe if all your code does is a pointer comparison and pointer increment, you can get a 40% increase if you eliminate the pointer comparison.  I'm betting if you eliminate (3) instead of (1) while still minimizing (2), you'll also get a double digit percentage increase.

For most real uses of vector, (2) dominates, putting (1) and (3) in the noise.

I'm sticking with my name of buffer_overrun() for this misfeature.  Please add it (or potential_buffer_overrun(), if it makes the OP feel better) to the list if the OP intends to go through with this proposal.
 
 If you want uninitialized semantics, then the container should not be called vector.

+100.
 
You want something else. That is a specialized use case for which the existing containers are not a good fit.

The motivation of needing this performance in an organization where it is impossible to change interfaces to get that performance is unconvincing to me.  That's a problem for the OP's company to fix and not for the standard to address.

Oh, and (7) is about adding an allocator which does default initialization instead of a value initialization on its object when passed no constructor parameters.  There is nothing unsafe about doing so, and I would be strongly in favor of adding such an allocator.

Howard Hinnant

unread,
Sep 12, 2013, 2:07:58 PM9/12/13
to std-pr...@isocpp.org
On Sep 12, 2013, at 1:48 PM, Nevin Liber <ne...@eviloverlord.com> wrote:

> Howard's benchmark minimized the cost of (2).

Thank you Nevin, I was beginning to wonder if anyone recalled that.

On Sep 7, 2013, at 7:45 PM, Howard Hinnant <howard....@gmail.com> wrote:

> I'm using the cheapest Widget I can imagine - just wraps an int. This emphasizes any performance gains as much as possible. As Widget becomes more expensive to construct, the performance gains of one technique over the other will probably become negligible.

Since I'm seeing statements like:

On Sep 12, 2013, at 1:12 PM, Victor....@ncl.ac.uk wrote:

> The independent (from me) experiments by Howard and Magnus showed 40% improvement over push_back. The loop code was included into the measurments, so the "pure" performance gain of unchecked_push_back over push_back must be higher.

I think it is time to reign in the excitement just a notch or two. Here is a very slightly modified test that puts a std::string in Widget instead of an int:

#include <iostream>
#include <chrono>
#include <vector>
#include <string>

class Widget
{
std::string data_;
public:
explicit Widget(int data) : data_(data, '\0') {}
};

class generator_Widget
{
int count_;
int state_;

public:
typedef std::input_iterator_tag iterator_category;
// typedef std::random_access_iterator_tag iterator_category;
typedef Widget value_type;
typedef std::ptrdiff_t difference_type;
typedef const Widget* pointer;
typedef Widget reference;

explicit generator_Widget(int count, int state = 0)
: count_(count), state_(state) {}

reference operator*() const {return Widget(state_);}
pointer operator->() const {return nullptr;}

generator_Widget& operator++() {++count_; --state_; return *this;}
generator_Widget operator++(int)
{generator_Widget tmp(*this); ++(*this); return tmp;}

generator_Widget& operator--() {--count_; ++state_; return *this;}
generator_Widget operator--(int)
{generator_Widget tmp(*this); --(*this); return tmp;}

generator_Widget& operator+=(difference_type n) {count_ += n; state_ -= n; return *this;}
generator_Widget operator+(difference_type n) const
{generator_Widget tmp(*this); tmp += n; return tmp;}
friend generator_Widget operator+(difference_type n, generator_Widget x)
{x += n; return x;}
generator_Widget& operator-=(difference_type n) {return *this += -n;}
generator_Widget operator-(difference_type n) const
{generator_Widget tmp(*this); tmp -= n; return tmp;}

reference operator[](difference_type n) const {return *(*this + n);}

friend
bool
operator==(const generator_Widget& x, const generator_Widget& y)
{return x.count_ == y.count_;}

friend
bool
operator!=(const generator_Widget& x, const generator_Widget& y)
{return !(x == y);}

friend
bool
operator<(const generator_Widget& x, const generator_Widget& y)
{return x.count_ < y.count_;}

friend
bool
operator>(const generator_Widget& x, const generator_Widget& y)
{return y < x;}

friend
bool
operator<=(const generator_Widget& x, const generator_Widget& y)
{return !(y < x);}

friend
bool
operator>=(const generator_Widget& x, const generator_Widget& y)
{return !(x < y);}

friend
difference_type
operator-(const generator_Widget& x, const generator_Widget& y)
{return x.count_ - y.count_;}

};

int
main()
{
const unsigned N = 100000;
using namespace std::chrono;
auto t0 = high_resolution_clock::now();
#if 1
std::vector<Widget> v(generator_Widget(0, N), generator_Widget(N));
#else
std::vector<Widget> v;
int state = N;
v.reserve(N);
for (unsigned i = 0; i < N; ++i, --state)
v.push_back(Widget(state));
#endif
auto t1 = std::chrono::high_resolution_clock::now();
std::cout << duration_cast<milliseconds>(t1-t0).count() << "ms\n";
}

Comparing the iterator construction technique with just doing reserve+push_back now gives:

$ clang++ -O3 -std=c++11 test.cpp
$ a.out
1902ms
$ a.out
1955ms
$ a.out
1892ms
$ clang++ -O3 -std=c++11 test.cpp
$ a.out
1944ms
$ a.out
1896ms
$ a.out
1893ms

Question: For the above test, did I run #if 1 or #if 0 first?

Answer: It doesn't matter. They are the same speed.

Summary:

Under the best of circumstances, with the cheapest of Widgets, you might see up to a 40% gain. However it would not be unexpected to see no gain at all, even with just a moderately expensive Widget (e.g. contains a single std::string that gets constructed beyond the short string buffer).

Howard

Magnus Fromreide

unread,
Sep 12, 2013, 2:39:37 PM9/12/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
On Thu, 2013-09-12 at 10:12 -0700, Victor....@ncl.ac.uk wrote:
> Dear all,
>
> I think it would be appropriate to provide a (somewhat biased) summary
> of the discussion so far.
>
> * Name:
>
> It was pointed by several posters that the original name push_back_
> has no analogs in the STL. This is a strong argument, and I'm quite
> happy to change it. A couple of alternatives were suggested, my
> current favourite is unchecked_push_back suggested by Ion. Any other
> suggestions?
>
> * Performance:
>
> The independent (from me) experiments by Howard and Magnus showed 40%
> improvement over push_back. The loop code was included into the
> measurments, so the "pure" performance gain of unchecked_push_back
> over push_back must be higher. However, it would be difficult to
> disentangle it from the optimised code, and maybe the end user would
> be more interested in the performance of the whole loop anyway.
>
> Hence the conclusion: the performance of a tight loop can be improved
> by 40% by replacing push_back by unchecked_push_back. Looks like a
> good bate!

Here I should probably mention that I did some inspection of the
generated assembly and found that if one could teach the optimizer to
only store the end_ptr when it needs to expand the storage or when the
loop ends and not on each lap then the gain from unchecked_push_back is
pretty low.

I still think it would be a better idea for you to work on improving the
optimizers.

/MF

David Krauss

unread,
Sep 12, 2013, 6:35:35 PM9/12/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk


On Friday, September 13, 2013 1:12:53 AM UTC+8, Victor....@ncl.ac.uk wrote:

7. Custom allocators suggested by several posters. The idea is, roughly, to pass a custom allocator to a vector, such that when the vector's constructor taking the size is called, the memory would be left uninitialised. Then placement-new can be used to create Widgets.

I don't think anyone suggested that, although I could be wrong.

My suggestion about custom allocators was to extend the library to allow allocator (and sequence storage) propagation in cases where the allocator is of different, but convertible type. This is a very general improvement, allowing any kind of change to a container's "memory contract." For the problem at hand, the implementation should also ensure that if allocator::allocate cannot allocate anything (according to some criteria), that no capacity check occurs.

You would reserve space inside the vector, then move it to a non-allocating vector of different specialization, then everything else would be as usual except incurring another allocation would be UB.

Jonathan Wakely

unread,
Sep 13, 2013, 8:31:46 AM9/13/13
to std-pr...@isocpp.org
On Thursday, September 12, 2013 7:07:58 PM UTC+1, Howard Hinnant wrote:
On Sep 12, 2013, at 1:48 PM, Nevin Liber <ne...@eviloverlord.com> wrote:

> Howard's benchmark minimized the cost of (2).

Thank you Nevin, I was beginning to wonder if anyone recalled that.

I think those who don't want to add std::vector::buffer_overrun() recalled that your test "emphasizes any performance gains as much as possible" and those who do want to add std::vector::buffer_overrun() recalled only the performance gains :-)


 

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 9:55:08 AM9/14/13
to std-pr...@isocpp.org

On Thursday, September 12, 2013 6:21:21 PM UTC+1, Billy O'Neal wrote:
[quote]
Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate!
[/quote]
At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.
 
As I wrote, the so called "vector's safety guarantees" are a bad joke if you have data(), operator[] and iterators that can become invalid. My argument was that unchecked_push_back is quite in line with those "safety guarantees", maybe even on the safer side of them.
 
 
 Note again that we have to distinguish the behavior of not checking capacity (which was the original proposal) from not updating size. If you are proposing the capacity check being removed, I doubt that was a significant part of that 40%. If you are removing the size check, then using vector without causing undefined behavior is pretty much impossible.
 
 
There is some confusion here. I never proposed removing the size update, and 40% figure was for the original proposal. However, the compiler managed to detect that size update was unnecessary and optimised the code. For push_back it would be impossible or at least very difficult for the compiler, as size update in each loop iteration would be necessary in case the vector has to grow on the next iteration (which it won't, but the compiler cannot work it out).
 
 
If you want uninitialized semantics, then the container should not be called vector. You want something else. That is a specialized use case for which the existing containers are not a good fit.
 
 
I don't want uninitialized semantics.
 

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 10:00:55 AM9/14/13
to std-pr...@isocpp.org

On Thursday, September 12, 2013 6:48:03 PM UTC+1, Nevin ":-)" Liber wrote:
 
Oh, and (7) is about adding an allocator which does default initialization instead of a value initialization on its object when passed no constructor parameters.  There is nothing unsafe about doing so, and I would be strongly in favor of adding such an allocator.
 
It would be unsafe if Widget has non-trivial destructor. E.g. it can wrap a pointer, which should be initialised to nullptr. If left uninitialised and ~Widget deletes it, you have undefined behaviour.

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 10:09:46 AM9/14/13
to std-pr...@isocpp.org

On Thursday, September 12, 2013 7:07:58 PM UTC+1, Howard Hinnant wrote:

I think it is time to reign in the excitement just a notch or two.  Here is a very slightly modified test that puts a std::string in Widget instead of an int:
[...]

    explicit Widget(int data) : data_(data, '\0') {}
 
This creates a new loop nested in the one we are trying to optimise, and the total work grows from O(n) to O(n^2). Very slight modification indeed!
 
 
Under the best of circumstances, with the cheapest of Widgets, you might see up to a 40% gain.  However it would not be unexpected to see no gain at all, even with just a moderately expensive Widget (e.g. contains a single std::string that gets constructed beyond the short string buffer).
 
Sure, as push_back is no longer in the innermost loop.
 
In any case, I agee that the intended use cases are when Widgets are small, e.g. wrap int or pointer. 

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 10:15:49 AM9/14/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Thursday, September 12, 2013 7:39:37 PM UTC+1, Magnus Fromreide wrote:
Here I should probably mention that I did some inspection of the
generated assembly and found that if one could teach the optimizer to
only store the end_ptr when it needs to expand the storage or when the
loop ends and not on each lap then the gain from unchecked_push_back is
pretty low.
 
 
In the case of push_back, you cannot eliminate that nasty store from the loop, as grow() needs it to be set properly. What one can do is to put it to the other branch of "if". However, it is difficult to do for the compiler, as it needs to work out that this other branch is rarely/never used, i.e. in effect it should be clever enough to work out that push_back can be replaced by unchecked_push_back. Ok, in some situations the compiler can be just lucky or use profile-guided optimisation.

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 10:19:02 AM9/14/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk

On Thursday, September 12, 2013 11:35:35 PM UTC+1, David Krauss wrote:

I don't think anyone suggested that, although I could be wrong.
 
Howard did.
 
My suggestion about custom allocators was to extend the library to allow allocator (and sequence storage) propagation in cases where the allocator is of different, but convertible type. This is a very general improvement, allowing any kind of change to a container's "memory contract." For the problem at hand, the implementation should also ensure that if allocator::allocate cannot allocate anything (according to some criteria), that no capacity check occurs.

You would reserve space inside the vector, then move it to a non-allocating vector of different specialization, then everything else would be as usual except incurring another allocation would be UB.
 
Apologies, this is quite a different idea and should have been included into the list. It is much safer, but will be a breaking change, e.g. the stateless allocators can no longer be optimised out.
 

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 10:20:23 AM9/14/13
to std-pr...@isocpp.org

On Friday, September 13, 2013 1:31:46 PM UTC+1, Jonathan Wakely wrote:

I think those who don't want to add std::vector::buffer_overrun() recalled that your test "emphasizes any performance gains as much as possible" and those who do want to add std::vector::buffer_overrun() recalled only the performance gains :-)
 
What a surprise! :-) 

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 10:29:36 AM9/14/13
to std-pr...@isocpp.org

On Thursday, September 12, 2013 6:48:03 PM UTC+1, Nevin ":-)" Liber wrote:

Please add it (or potential_buffer_overrun(), if it makes the OP feel better) to the list if the OP intends to go through with this proposal.
 
At the moment I don't see enought support here, so no intention to make a formal proposal. This might change if there is a sudden surge of support.

inkwizyt...@gmail.com

unread,
Sep 14, 2013, 2:26:54 PM9/14/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk


On Saturday, September 14, 2013 3:55:08 PM UTC+2, Victor....@ncl.ac.uk wrote:

On Thursday, September 12, 2013 6:21:21 PM UTC+1, Billy O'Neal wrote:
[quote]
Hence the conclusion: the performance of a tight loop can be improved by 40% by replacing push_back by unchecked_push_back. Looks like a good bate!
[/quote]
At the expense of completely destroying all of vector's safety guarantees. It is easy to be fast if you don't have to be correct.
 
As I wrote, the so called "vector's safety guarantees" are a bad joke if you have data(), operator[] and iterators that can become invalid. My argument was that unchecked_push_back is quite in line with those "safety guarantees", maybe even on the safer side of them.
 
 Is bit difference between making some pointer and iterators invalid and permanently breaking vector. every exception during `unchecked_push_back` will break program.

Victor....@ncl.ac.uk

unread,
Sep 14, 2013, 3:24:37 PM9/14/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk, inkwizyt...@gmail.com

On Saturday, September 14, 2013 7:26:54 PM UTC+1, inkwizyt...@gmail.com wrote:
 
 Is bit difference between making some pointer and iterators invalid and permanently breaking vector. every exception during `unchecked_push_back` will break program.
 
This is just not true. (If it were, push_back would also be broken.)

inkwizyt...@gmail.com

unread,
Sep 14, 2013, 3:55:15 PM9/14/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk
Right, I mix it with `uninitialized_resize()` that have that property (Billy O'Neal probably said this about `uninitialized_resize()` too).

Bengt Gustafsson

unread,
Sep 15, 2013, 12:52:58 PM9/15/13
to std-pr...@isocpp.org, Victor....@ncl.ac.uk, inkwizyt...@gmail.com
This discussion is being muddled by the fact that two suggestions with different goals are mixed. I admit to being the culprit here as I thought that the push_back_ suggestion was a subset of the uninitialized_resize() suggestion I had. In fact, due to the destruction at exception issue the issues are basically separate.

1. The original issue, which is for types that have a destructor which must be run in the case of an unwind.

Here typically complex objects are moved or copied to the vector and the test for the resize is probably negligable. In many cases the introduction of emplace_back as a complement to push_back() in C++11 has a much higher potential for speed up.

2. My other issue which is for filling vectors which initially act as buffers for some legacy function etc.

My initial suggestion uninitialized_resize() does the job but is dangeous if applied to types with a destructor. Later a suggestion to add a "tag" to the sizing constructor to instruct the allocator to use default construction instead of value construction solved this problem as there is no difference between these constructions except for the basic types and pointers. In complement with this constructor there would also naturally exist a resize() overload taking the same tag.

Note that this is now safe as only types without a destructor get an uninitialized construction:

struct default_construct_t {
} default_construct;

vector::vector(size_t sz, default_construct_t);
vector::resize(size_t sz, default_construct_t);

Nevin Liber

unread,
Sep 15, 2013, 2:36:10 PM9/15/13
to std-pr...@isocpp.org
On 15 September 2013 11:52, Bengt Gustafsson <bengt.gu...@beamways.com> wrote:

2. My other issue which is for filling vectors which initially act as buffers for some legacy function etc.

I've addressed this.  Twice.  Here goes a third time:

It can be done safely today (and heck, even in C++03) by writing an allocator which does a default initialization instead of value initialization when calling construct() (with no parameters).  Ion Gaztañaga stated it was hard to write such an allocator; but that has not been my experience in C++11.  He has yet to respond as to why he finds writing such an allocator difficult (I'd like to know, as that difficulty may be something the standard should address).

To go further, I'd be strongly in favor of adding such an allocator to the standard, were someone to propose it.  Unlike the other proposals, this is just an additional independent library and does not require modification of containers or allocators to implement.

Bengt Gustafsson

unread,
Sep 15, 2013, 4:57:08 PM9/15/13
to std-pr...@isocpp.org
I can't see that a special allocator solves the problem, as vectors with different allocators are different types. Even with the invention of  some sort of assignable allocator suggested earlier in this thread and provisions in all classes that use allocators so that you can move and swap the vector with the special allocator with a vector with the standard allocator it results in rather convoluted code compared to an extra ctor parameter or special resize method. 

A problem with the assignable allocator solution is that it uses allocators to differentiate something that is completely different from setting a policy on from where memory is allocated, which is what allocators are designed for. With assignable allocators the writer of the new allocator class is promising that as these allocators can be assigned to each other they are equivalent when it comes to doing their work, allocation/deallocation, and the only difference is that they do something else, i.e. construction, in different ways. To me this is not a very clean design. Also the amount of changes in the standard is much larger as all users of allocators are affected (to implement the actual assignment/copy construction which must now be templated on the allocator type).  Each new allocation scheme will now need a parallel non_initializing_allocator class for orthogonality. 

Alternatively an allocator could have a bool member which is inpected in each construct to see if value or default construction is required. Maybe easier but now construct() performance is reduced which does not sit well with this performance directed improvement, and it is still not the default allocator. Each construction of the vector needs to get a parameter which is a an allocator constructed with true or false value. This value can't be changed between allocations as you can't get a non-const reference to the allocator from the vector, only a copy. Even if you could it would not be very safe as you need to set/reset the flag between calls.

The counter argument will be that "then change all those vectors to be vectors with the special allocator". This is bad design as now all uses of this vector, which is typically a class member, are changed. What is needed is a per-site possibility to define if zero initiation is to be done, even if the data member is the same. A resize_default_constructed() is a quite nice solution to this problem as it does not need a special tag class and it doesn't compromize safety for classes with non-trivial destructors. The only drawback I can see is that it needs a new default_construct() method on the allocator, but this method is of course needed only if the method is called, i.e. almost only on the default allocator type.

Note the difference from the previously suggested uninitialized_resize(), in that this method calls the default constructor for the elements. It so happens that the default constructor for basic types and pointers do nothing and so the entire loop containing the placement new and incrementing the index etc. will very likely be optimized away. Not so for non-pod types, where it works exactly as the normal resize, as default and value construction are the same.

Example:

templat<typename T, ...> class allocator {
    void default_construct(void* at) { new(at) T; }
};
template<...> class vector {
    ...
    void resize_default_constructed(size_t newsize);
};

// Fill data from a socket without clearing the bytes first.
vector<char> myBuffer;
myBuffer.resize_default_constructed(1000);
socket.read(myBuffer.data(), 1000);


Not being able to construct the vector directly with a specified number of default constructed members is a minor annoyance, which I don't think warrants the tag-class trick previously suggested.

Billy O'Neal

unread,
Sep 15, 2013, 5:08:32 PM9/15/13
to std-proposals
[quote]
I can't see that a special allocator solves the problem, as vectors with different allocators are different types.
[/quote]
I see this as a *good* thing. Vector currently maintains the invariant that no accessible portion of the vector is ever uninitialized. If you want a routine to allow uninitialized data, then that routine should declare that it is capable of dealing with that.
 
[quote]
The counter argument will be that "then change all those vectors to be vectors with the special allocator". This is bad design as now all uses of this vector, which is typically a class member, are changed.
[/quote]
You are changing vector's semantics everywhere. It follows that code changing everywhere is a reasonable result of that.

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Nevin Liber

unread,
Sep 15, 2013, 7:09:55 PM9/15/13
to std-pr...@isocpp.org
On 15 September 2013 15:57, Bengt Gustafsson <bengt.gu...@beamways.com> wrote:
I can't see that a special allocator solves the problem, as vectors with different allocators are different types.

So??  Not everything needs to be shoehorned into std::vector<T, std::allocator<T>>.

The argument of:

1.  You need better performance than vector<T, allocator<T>> gives me today.
2.  You are unwilling to change any interfaces from using vector<T, allocator<T>> to get better performance.
3.  You are willing to chance pushing through an intrusive change to the standard library to get better performance.
4.  You are willing to wait five or more years before you can get this better performance.

can be reduced to:

You really don't need better performance than vector<T, allocator<T>>.

 
 A problem with the assignable allocator solution

I don't know what you mean by "assignable allocator".

As I wrote it in <http://stackoverflow.com/questions/15097783/value-initialized-objects-in-c11-and-stdvector-constructor/15119665#15119665>, I'm talking about using a different allocator which implements something like:


template<typename T>
struct DefaultInitAllocator {
    template<typename U>
    void construct(U* p)
    { ::new (static_cast<void*>(p)) U; }

    template<typename U, typename... Args>
    void construct(U* p, Args&&... args)
    { ::new (static_cast<void*>(p)) U(std::forward<Args>(args)...); }

    // ... rest of the allocator interface
};
 
is that it uses allocators to differentiate something that is completely different from setting a policy on from where memory is allocated, which is what allocators are designed for.

*shrug*  I guess we'll have to agree to disagree.  Last I checked, standard library containers construct/destroy objects by calling functions in their allocator.

Even with the invention of  some sort of assignable allocator suggested earlier in this thread and provisions in all classes that use allocators so that you can move and swap the vector with the special allocator with a vector with the standard allocator it results in rather convoluted code compared to an extra ctor parameter or special resize method. 

If you think it is easier to push through an intrusive change to both containers and allocators just to handle this very specialized use case, it's probably time for me to bow out of this discussion and wish you good luck with writing up and presenting your proposal.

David Krauss

unread,
Sep 15, 2013, 9:55:45 PM9/15/13
to std-pr...@isocpp.org


On Monday, September 16, 2013 4:57:08 AM UTC+8, Bengt Gustafsson wrote:
I can't see that a special allocator solves the problem, as vectors with different allocators are different types. Even with the invention of  some sort of assignable allocator suggested earlier in this thread and provisions in all classes that use allocators so that you can move and swap the vector with the special allocator with a vector with the standard allocator it results in rather convoluted code compared to an extra ctor parameter or special resize method. 

vector< int > q;
q
.reserve( 30 );
vector
< int, other_allocator > r( std::move( q ) ); // Now r has whatever special semantics.

Is this so convoluted?

This sort of code attaches invariants to an object using the type system, allowing appropriate functions to be selected automatically. Adding special members requires manual selection of functions, so the user has to work to refactor code and verify that such refactoring remains valid.
 
A problem with the assignable allocator solution is that it uses allocators to differentiate something that is completely different from setting a policy on from where memory is allocated, which is what allocators are designed for. With assignable allocators the writer of the new allocator class is promising that as these allocators can be assigned to each other they are equivalent when it comes to doing their work, allocation/deallocation, and the only difference is that they do something else, i.e. construction, in different ways. To me this is not a very clean design.

If by assignable allocators you mean my suggestion, no, I suggested to assign an allocator that cannot perform any allocation but performs construction as normal.

It seems there are at least 2-3 mutually exclusive allocator suggestions here.

- No-initialize allocator
- Default-initialize allocator
- No-allocate allocator
 
Alternatively an allocator could have a bool member which is inpected in each construct to see if value or default construction is required.

Adding state adds overhead. If one code path doesn't want two different allocator behaviors, then the runtime state also means a static invariant isn't getting checked by the type system as it should.

Applying and checking these invariants which vary from place to place, but apply to the same thing, is what assignable allocators would solve.
 
Maybe easier but now construct() performance is reduced which does not sit well with this performance directed improvement, and it is still not the default allocator.

If the special allocator assigns to and from the default allocator, then you can use the default allocator wherever you wish. Am I mistaking what you refer to by "assignable"?

But a default allocator vector and a special allocator vector cannot hold the same sequence at the same time. Move semantics would be needed to switch between them. Doesn't seem to be a problem as vector doesn't allow concurrent reading and resizing anyway.

Bengt Gustafsson

unread,
Sep 17, 2013, 5:08:14 AM9/17/13
to std-pr...@isocpp.org
This is getting ever more confusing. since my last post yesterday I have gotten two replies.

One from Nevin who correctly pointed out that getting an "assignable allocator" idea through the committe. But this was not my suggestion, what I wrote was a similar reaction to (now apparently) David's suggestion.

The other was from David who didn't think that his code was convoluted. I can agree to that, as it can be wrapped into a function or something. However, this solution is what needs the five years of committe work, isn't it? Or do you mean that you can move construct vectors with different allocators from each other in the current standard? I have checked both the current standard and the C++14 draft and could not find such a constructor... maybe I'm just blind.

By comparison I think that my suggestion of adding one (1) trivial method to vector and one even more trivial method to allocator should be easier to handle in the committe work.

By the way, David's code is a bit backwards as written. To do what I want you first have a vector with the special no-init construct() and then call resize() (not reserve()) on it, and finally move() it into a regular vector which is what you need for the rest of the program:

vector<int, other_allocator> q;  // default constructing construct() on this allocator.
q.resize( 30 );                  // A nop apart from moving the end pointer
vector
<int> r( std::move( q ) ); // Now r has standard allocator and default constructed elements, ready to be filled from some function.

Nevin Liber

unread,
Sep 17, 2013, 10:53:01 AM9/17/13
to std-pr...@isocpp.org

On 15 September 2013 20:55, David Krauss <pot...@gmail.com> wrote:
vector< int > q;
q
.reserve( 30 );
vector
< int, other_allocator > r( std::move( q ) ); // Now r has whatever special semantics.

Is this so convoluted?

Yes.

In the above example, you are saying that std::allocator is responsible for allocating space while other_allocator is responsible for deallocating that space.  Ugh.

I don't see how such a thing is workable in general, other than as a hack between two allocators that intimately know about each other.

Billy O'Neal

unread,
Sep 17, 2013, 12:34:39 PM9/17/13
to std-proposals
Of course, presumably you would want to use resize rather than reserve.
 
Billy3

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Nevin Liber

unread,
Sep 17, 2013, 12:59:50 PM9/17/13
to std-pr...@isocpp.org
On 17 September 2013 11:34, Billy O'Neal <billy...@gmail.com> wrote:
Of course, presumably you would want to use resize rather than reserve.

That's even more convoluted, as you now have *both* allocators used for constructing objects in the same container while only other_allocator is used for destroying objects.

Billy O'Neal

unread,
Sep 17, 2013, 3:25:36 PM9/17/13
to std-proposals
That's true -- but calling reserve here doesn't change the size of the vector -- it isn't legal to store anything there. Reserve never calls allocator::construct() or anything like that. If that's what you want, you can already do that today.
 
If you actually want to make it valid to store things in the vector, however, then you're going to have to resize it.
 
In terms of powering this, I don't see any reason we can't have a
 
template<typename U, typename V>
struct allocator_can_be_assigned : public std::false_type {};
 
template<>
struct allocator_can_be_assigned<std::allocator, my_no_construct_allocator> : public std::true_type {};
 
or similar. Any pair of allocators of course would have to deal with any of these cases correctly if they declare they support this pattern.
 
There is precedent for allocators declaring support for things like this; e.g. the existing std::allocator_traits<T>::propagate_on_container_copy_assignment and friends.

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


--

Bengt Gustafsson

unread,
Sep 17, 2013, 4:05:04 PM9/17/13
to std-pr...@isocpp.org
Yesterday I updated David's code so that it does the intended function, but you still discuss the shortcomings of the original version. Here's the updated version again:

vector<int, other_allocator> q;  // default constructing construct() on this allocator.
q.resize( 30 );                  // A nop apart from moving the end pointer
vector
<int> r( std::move( q ) ); // Now r has standard allocator and default constructed elements, ready to be filled from some function.

This is not to say that I like this solution as it is a bigger change to the library and will require a long standardization process. On the other hand allocator_can_be_assigned will have more uses than this example which may be worth fighting for, which could boost the use of allocators. I don't think that there are that many allocator classes being made out there today, which in part is due to the fact that each allocator creates a completely new set of containers which don't interplay in non-template code.

The vector::resize_default_constructed() method by comparison is a very small and local change. As no one has commented on this last version of special resize methods I don't know if you still consider it unsafe. If so, please explain how.

Billy O'Neal

unread,
Sep 17, 2013, 4:19:26 PM9/17/13
to std-proposals
resize() already default constructs elements. I don't understand your proposal. If you really meant resize_uninitialized or something like that then you have to solve the hairy problem to define what happens in the vector if elements are uninitialized when the vector is destroyed.

Billy O'Neal
Malware Response Instructor - BleepingComputer.com


Bengt Gustafsson

unread,
Sep 18, 2013, 12:30:58 AM9/18/13
to std-pr...@isocpp.org
No it doesn't, it value-constructs the elements, which means that basic types and pointers are zeroed out. What I'm proposing is a way to fill a vector of such types with "garbage" instead, i.e. do nothing. This saves time when you use the vector for instance to receive data from a legacy function or I/O library that expects buffers of simple types such as char. For types with constructors there is no difference.

Daniel Krügler

unread,
Sep 18, 2013, 12:15:07 PM9/18/13
to std-pr...@isocpp.org
> There is some ambiguity here -- the "default constructor" of a built in
> value, e.g. that the user requests by doing "new int[42]()" (note the extra
> parens) zero initializes the object.

This form of initialization is value-initialization as described by 8.5 p11.

> In fact, N3691 23.3.7.3
> [vector.capacity]/12 (emphasis mine) says:
>
> void resize(size_type sz);
> Effects: If sz <= size(), equivalent to calling pop_back() size() - sz
> times. If size() < sz, appends sz - size() default-inserted elements to the
> sequence.

Attention: Default-insertion is no core language term, but a
library-defined term, see 23.2.1 p13:

"An element of X is default-inserted if it is initialized by
evaluation of the expression
allocator_traits<A>::construct(m, p)"

In C++03 resize was only one function with a default parameter and
used effectively value-initialization. The effect of the
default-allocator of std::vector has the same effect.

> I can't find anywhere in the standard where "default construct" is a
> concept. The closest thing I can find is "default initialize", N3691 8.5
> [dcl.init]/7.

You should no longer find a lot of occurrences in the Library
specification talking about default construct[ed] where the meaning is
unclear, several issues had cleaned them up (e.g. LWG 1012, 1420,
868), where the meaning was indeed misleading. Do you have a specific
example in mind where this term is still used and is still misleading?
(Examples where the affected type is a class type with user-provided
default constructor are usually not misleading).

Thanks,

- Daniel

Nevin Liber

unread,
Sep 18, 2013, 12:59:54 PM9/18/13
to std-pr...@isocpp.org
On 18 September 2013 11:15, Daniel Krügler <daniel....@gmail.com> wrote:

In C++03 resize was only one function with a default parameter and
used effectively value-initialization. The effect of the
default-allocator of std::vector has the same effect.

I misspoke when I thought this problem could be solved in C++03 with an allocator.   This reminded me that in C++03, the only way to put an object into a vector was by copying it, even if it is copying a value-initialized object.

I wonder if one of the reasons we haven't seen an allocator solution to this is because it couldn't be done before C++11.
It is loading more messages.
0 new messages