std::vector<>::capacity()-growth

Bonita Montero

unread,

Mar 16, 2021, 1:31:20 PM3/16/21

to

Some time I wrote a class called gvector<> which derived from vector<>
and which grows the capacity of the vector by a constant value or a
constant factor, whichever is greater, usually startoing at the con-
stant value and then beeing overtaken by the constant factor at a
certain size when the latter results in a larger capacity than the
constant offset.
Today someone in another forum told me, that some vector<>-impmemen-
tations grow the capacity in powers of two. I couldn't believe that
because it appeared to me as too much bloat. So I wrote a littel pro-
gram to test this:

#include <iostream>
#include <vector>

using namespace std;

int main()
{
vector<size_t> vec;
size_t lc = vec.capacity();
for( size_t i = 1000000; i; --i )
{
vec.emplace_back( i );
if( lc != vec.capacity() )
lc = vec.capacity(),
cout << lc << endl;
}
}

This are the results of the current MSVC-compiler:

1
2
3
4
6
9
13
19
28
42
63
94
141
211
316
474
711
1066
1599
2398
3597
5395
8092
12138
18207
27310
40965
61447
92170
138255
207382
311073
466609
699913
1049869

I don't know which arithmetic pattern this growth follows, but
it is obviously not linear. Here are the results of the libstdc++:

1
2
4
8
16
32
64
128
256
512
1024
2048
4096
8192
16384
32768
65536
131072
262144
524288
1048576

So libstdc++ in fact grows the vector in powers of two. That's not what
I expected since I so far assumed that growing the capacity efficiently
is the duty of the programmer and growing the vector with such hops is
really bloat for me.

Bo Persson

unread,

Mar 16, 2021, 2:08:21 PM3/16/21

to

Here "efficiently" can mean either using less time or using less memory.

Obviously libstdc++ grows in larger chunks (capacity x 2), which results
in fewer reallocations. MSVC instead chose to grow by a factor of 1.5,
which possibly results in more allocations, but uses less memory.

One advantage of the 1.5 growth is that it can possibly resuse the space
deallocated previously, while with a factor 2.0 the previous space is
always *just* too small to be resused (1+2+4+8 < 16, 1+2+4+8+16 < 32,
and so on).

For more info see

https://stackoverflow.com/questions/5232198/about-vectors-growth

Bo Persson

Bonita Montero

unread,

Mar 16, 2021, 2:42:06 PM3/16/21

to

And is emplace_back() also required to satisify this constant time
requirement ? Strongly spoken this a bit different than push_back().
And has std::string also such an exponential growth behaviour ?

Bonita Montero

unread,

Mar 16, 2021, 2:47:49 PM3/16/21

to

> And is emplace_back() also required to satisify this constant time
> requirement ? Strongly spoken this a bit different than push_back().
> And has std::string also such an exponential growth behaviour ?

They have - I've checked it at en.cppreference.com.

Marcel Mueller

unread,

Mar 16, 2021, 4:40:58 PM3/16/21

to

Am 16.03.21 um 19:08 schrieb Bo Persson:

>> So libstdc++ in fact grows the vector in powers of two. That's not what
>> I expected since I so far assumed that growing the capacity efficiently
>> is the duty of the programmer and growing the vector with such hops is
>> really bloat for me.
>
> Here "efficiently" can mean either using less time or using less memory.
>
> Obviously libstdc++ grows in larger chunks (capacity x 2), which results
> in fewer reallocations. MSVC instead chose to grow by a factor of 1.5,
> which possibly results in more allocations, but uses less memory.
>
> One advantage of the 1.5 growth is that it can possibly resuse the space
> deallocated previously, while with a factor 2.0 the previous space is
> always *just* too small to be resused (1+2+4+8 < 16, 1+2+4+8+16 < 32,
> and so on).

In theory yes. In practice this only applies if no other allocation has
been don in between. Otherwise it won't work because of fragmentation of
virtual address space.

A sufficiently sophisticated allocator can deal with such situations. It
uses different pools depending on the allocation size. This can
significantly reduce memory and address space fragmentation.

However, std::vector is not a good choice for very large data, not even
when the size is known before and no reallocation is required. It is
more efficient to introduce some fragmentation into large data. This
significantly simplifies memory management.

I made very good experience with Btree structures with a fixed node size
of 127. Well designed they can operate with only a few bits overhead per
entry.
Although they are O(log(n)) in many operations the base of 127 requires
only 5 levels to store 2^32 Elements which is likely already an out of
memory condition. So a Btree based structure is in practice almost
constant time.

Unfortunately there is still no Btree in the standard library. And if
this changes at some time it will be most likely not that efficient.

Marcel

Juha Nieminen

unread,

Mar 17, 2021, 2:10:51 AM3/17/21

to

Bonita Montero <Bonita....@gmail.com> wrote:
> Some time I wrote a class called gvector<> which derived from vector<>
> and which grows the capacity of the vector by a constant value or a
> constant factor, whichever is greater, usually startoing at the con-
> stant value and then beeing overtaken by the constant factor at a
> certain size when the latter results in a larger capacity than the
> constant offset.
> Today someone in another forum told me, that some vector<>-impmemen-
> tations grow the capacity in powers of two. I couldn't believe that
> because it appeared to me as too much bloat.

Growing the capacity in powers of two simply means that the growth
factor is 2.

I haven't actually checked the math, but I believe that the factor
of 2 is used because of the standard requirement that push_back()
must be amortized-constant-time, and I think that doesn't work
with any factor less than 2 (although, as said, I haven't checked
the math).

It can be thought of like this: For each time a reallocation happens,
which incurs an O(n) copying operation, you need an additional n
O(1) operations to compensate (ie. keep the overall insertion time
at O(1)). This can only work if the growth factor is 2.

Alf P. Steinbach

unread,

Mar 17, 2021, 3:03:51 AM3/17/21

to

On 17.03.2021 07:10, Juha Nieminen wrote:
> Bonita Montero <Bonita....@gmail.com> wrote:
>> Some time I wrote a class called gvector<> which derived from vector<>
>> and which grows the capacity of the vector by a constant value or a
>> constant factor, whichever is greater, usually startoing at the con-
>> stant value and then beeing overtaken by the constant factor at a
>> certain size when the latter results in a larger capacity than the
>> constant offset.
>> Today someone in another forum told me, that some vector<>-impmemen-
>> tations grow the capacity in powers of two. I couldn't believe that
>> because it appeared to me as too much bloat.
>
> Growing the capacity in powers of two simply means that the growth
> factor is 2.
>
> I haven't actually checked the math, but I believe that the factor
> of 2 is used because of the standard requirement that push_back()
> must be amortized-constant-time, and I think that doesn't work
> with any factor less than 2 (although, as said, I haven't checked
> the math).

With an initial size m and buffer increase factor C the work in copying
from old to new buffers is

1m + Cm + C²m + C³m ... + Cʳm = m(1 + C + C²m + C³m ... + Cʳm)
= m(Cʳ⁺¹-1)/(C-1)

where r, the number of buffer increases, is the floor of the C-logarithm
of current size n, so that Cʳ is O(n) regardless of the value of C.

A simple way to understand the formula (which I used to write it now) is
that e.g. 10³ - 1 = 999 = 9×(111) = 9×(10² + 10 + 1), so.

> It can be thought of like this: For each time a reallocation happens,
> which incurs an O(n) copying operation, you need an additional n
> O(1) operations to compensate (ie. keep the overall insertion time
> at O(1)). This can only work if the growth factor is 2.

It works for any growth factor ≥1, but the smaller it is the more time
is spent on buffer reallocations, and perhaps the more fragmented memory
becomes, while saving a bit on buffer size.

- Alf

Paavo Helde

unread,

Mar 17, 2021, 11:23:48 AM3/17/21

to

16.03.2021 19:31 Bonita Montero kirjutas:
> So libstdc++ in fact grows the vector in powers of two. That's not what
> I expected since I so far assumed that growing the capacity efficiently
> is the duty of the programmer and growing the vector with such hops is
> really bloat for me.

It cannot be left to the programmer because otherwise push_back() and
emplace_back() would not have amortized constant complexity, as mandated
by the standard.

The exact factor (1.5 or 2) depends on the desired memory/speed balance.
Nowadays memory is plenty, so newer compilers like clang prefer a larger
factor. MSVC has most probably kept legacy factor 1.5 from some time in
the past where memory tended to be more scarce.

Just found this blog post about strings which discusses the same thing
among others: https://shaharmike.com/cpp/std-string/

Bonita Montero

unread,

Mar 19, 2021, 3:18:19 AM3/19/21

to

So I just had the idea if the growth-overhead isn't really constant.
To check that I wrote this tiny test-program:

#include <iostream>
#include <vector>
#include <chrono>

using namespace std;
using namespace chrono;

int main()
{
using hrc_tp = time_point<high_resolution_clock>;
for( size_t n = (size_t)1 << 8; n <= (size_t)1 << 24; n *= 2 )
{
vector<size_t> vi;
hrc_tp start = high_resolution_clock::now();
for( size_t i = n; i; --i )
vi.emplace_back( i );
double ns = (double)(int64_t)duration_cast<nanoseconds>(
high_resolution_clock::now() - start ).count() / n;
cout << n << "\t" << ns << endl;
}
}

The results on my ThreadRipper 3990X are following:

256 31.56250
512 20.15620
1024 8.28125
2048 7.65625
4096 6.81641
8192 11.10350
16384 8.15430
32768 6.22803
65536 5.69458
131072 6.26465
262144 4.69452
524288 5.28549
1048576 3.97713
2097152 4.02081
4194304 4.27639
8388608 4.72761
16777216 3.93250

So the overhead is about to be constant with 32768 elements and further.
I'm aware that it is hard to ensure because the memory-allocator won't
have a linear behaviour.

Please post your results and tell the compiler- / standard-libary
-version !

MrSpook_...@3k5_1.biz

unread,

Mar 19, 2021, 6:41:07 AM3/19/21

to

On Fri, 19 Mar 2021 08:18:06 +0100
Bonita Montero <Bonita....@gmail.com> wrote:
> double ns = (double)(int64_t)duration_cast<nanoseconds>(
>high_resolution_clock::now() - start ).count() / n;

A triple cast? Seriously?

Paavo Helde

unread,

Mar 19, 2021, 7:01:12 AM3/19/21

to

MSVC2019 + tbbmalloc

256 383.594
512 17.3828
1024 13.0859
2048 7.08008
4096 6.56738
8192 6.26221
16384 4.03442
32768 4.76074
65536 4.80347
131072 3.35541
262144 4.07028
524288 5.70507
1048576 4.62179
2097152 4.47669
4194304 6.01001
8388608 7.32294
16777216 5.90116

Bonita Montero

unread,

Mar 19, 2021, 8:00:32 AM3/19/21

to

> Bonita Montero <Bonita....@gmail.com> wrote:
>> double ns = (double)(int64_t)duration_cast<nanoseconds>(
>> high_resolution_clock::now() - start ).count() / n;

> A triple cast? Seriously?

I do this because the x86-FPUs don't have load-instructions for unsigned
values. And further the compiler would complain that converting from an
int64_t to a double might drop information; having an explicit cast does
not cause such warnings.

Bonita Montero

unread,

Mar 19, 2021, 8:07:32 AM3/19/21

to

Look at this code ...

#include <cstdint>

using namespace std;

double f( uint64_t x )
{
return x;
}

It results in this ...

?f@@YAN_K@Z PROC ; f
; File C:\Users\Olli\x.cpp
; Line 7
xorps xmm0, xmm0
test rcx, rcx
js SHORT $LN3@f
cvtsi2sd xmm0, rcx
; Line 8
ret 0
$LN3@f:
; Line 7
mov rax, rcx
and ecx, 1
shr rax, 1
or rax, rcx
cvtsi2sd xmm0, rax
addsd xmm0, xmm0
; Line 8
ret 0
?f@@YAN_K@Z ENDP ; f

While this ...

#include <cstdint>

using namespace std;

double f( uint64_t x )
{
return (int64_t)x;
}

... results in this ...

?f@@YAN_K@Z PROC ; f
; File C:\Users\Olli\x.cpp
; Line 6
xorps xmm0, xmm0
; Line 7
cvtsi2sd xmm0, rcx
; Line 8
ret 0
?f@@YAN_K@Z ENDP ; f

Bonita Montero

unread,

Mar 19, 2021, 8:16:25 AM3/19/21

to

> Bonita Montero <Bonita....@gmail.com> wrote:
>> double ns = (double)(int64_t)duration_cast<nanoseconds>(
>> high_resolution_clock::now() - start ).count() / n;

> A triple cast? Seriously?

And if you want to really improve the above code to be perfect,
it would look like the following:

double ns = (double)(int64_t)duration_cast<nanoseconds>(

high_resolution_clock::now() - start ).count() / (ptrdiff_t)n;

I cast to ptrdiff_t because today it's safe to assume that it has the
same size like size_t, and ptrdiff_t is implicitly casted to double in
this statement, so I don't need an additioncal cast.

Ian Collins

unread,

Mar 19, 2021, 6:23:25 PM3/19/21

to

Blimey!

> 512 17.3828
> 1024 13.0859
> 2048 7.08008
> 4096 6.56738
> 8192 6.26221
> 16384 4.03442
> 32768 4.76074
> 65536 4.80347
> 131072 3.35541
> 262144 4.07028
> 524288 5.70507
> 1048576 4.62179
> 2097152 4.47669
> 4194304 6.01001
> 8388608 7.32294
> 16777216 5.90116

AMD Ryzen 7 2700X, Ubuntu clang version 13.0.0

256 13.1914
512 5.32227
1024 4.36426
2048 2.48486
4096 2.66382
8192 2.16602
16384 2.32263
32768 2.64426
65536 3.06087
131072 3.24371
262144 3.36548
524288 3.67879
1048576 4.68726
2097152 5.22924
4194304 4.80952
8388608 5.18689
16777216 5.08674

--
Ian.

Chris M. Thomasson

unread,

Mar 19, 2021, 9:14:41 PM3/19/21

to

I got:

256 26.5625
512 23.4375
1024 24.1211
2048 14.9414
4096 8.44727
8192 8.8623
16384 10.5225
32768 8.40759
65536 10.6934
131072 7.61719
262144 8.93364
524288 9.865
1048576 8.03614
2097152 9.17945
4194304 12.5399
8388608 11.4621
16777216 9.46465

and one more run:

256 51.1719
512 27.1484
1024 23.8281
2048 19.3359
4096 8.27637
8192 8.63037
16384 9.61304
32768 7.76978
65536 8.69446
131072 11.3541
262144 9.21021
524288 10.709
1048576 12.0368
2097152 13.9337
4194304 12.8456
8388608 11.6393
16777216 9.28596

Chris M. Thomasson

unread,

Mar 19, 2021, 9:15:13 PM3/19/21

to

[...]

DAMN! That is wild! I wonder if the system was under load or something? wow.

red floyd

unread,

Mar 19, 2021, 9:20:47 PM3/19/21

to

Here's mine.

256 22.2656
512 8.39844
1024 3.41797
2048 5.0293
4096 3.07617
8192 3.06396
16384 2.83813
32768 3.7323
65536 4.47998
131072 5.04456
262144 5.21202
524288 5.62763
1048576 6.04553
2097152 6.34499
4194304 6.82788
8388608 7.28205
16777216 7.19015

g++ 10.2.0, Cygwin, under Windows 10.

Paavo Helde

unread,

Mar 20, 2021, 12:48:42 PM3/20/21

to

This was the first call of TBB scalable_malloc(), so this number
probably includes some setup. Indeed, when I call it first separately
beforehand some times, the number becomes a bit better.

256 85.1563
512 18.1641
1024 13.5742
2048 9.7168
4096 8.39844
8192 7.55615
16384 3.83911
32768 4.36096
65536 4.50745
131072 2.98157
262144 3.80096
524288 5.81131
1048576 4.12359
2097152 5.42884
4194304 6.82046
8388608 7.73486
16777216 6.8723

Bonita Montero

unread,

Mar 20, 2021, 1:49:46 PM3/20/21

to

> 256     85.1563
> 512     18.1641
> 1024    13.5742
> 2048    9.7168
> 4096    8.39844
> 8192    7.55615
> 16384   3.83911
> 32768   4.36096
> 65536   4.50745
> 131072 2.98157
> 262144 3.80096
> 524288 5.81131
> 1048576 4.12359
> 2097152 5.42884
> 4194304 6.82046
> 8388608 7.73486
> 16777216        6.8723

I guess the high overhead in the 2 ^ 24 * 8 (assuming 64 bit addres-
sing, i.e. size_t is 8 bytes) = 512MB case results from the high
memory-latency when writing to the memory. Usually latency is only
a load-operation topic, but when you fill such a large amount of
memory, the memory-interface, caches and write-buffers stall.

wij

unread,

Mar 20, 2021, 2:08:12 PM3/20/21

to

I think the memory growing factor by x2 is correct. After all, it is a C++ program,
and lots of other library functions depend on this feature.
If concerned, use reserve(size_t) beforehand.

Bonita Montero

unread,

Mar 20, 2021, 2:36:37 PM3/20/21

to

>> I guess the high overhead in the 2 ^ 24 * 8 (assuming 64 bit addres-
>> sing, i.e. size_t is 8 bytes) = 512MB case results from the high
>> memory-latency when writing to the memory. Usually latency is only
>> a load-operation topic, but when you fill such a large amount of
>> memory, the memory-interface, caches and write-buffers stall.

> I think the memory growing factor by x2 is correct. After all, it is a C++ program,
> and lots of other library functions depend on this feature.
> If concerned, use reserve(size_t) beforehand.

That's not related to what I said. Because the allocation-overhead
becomes more negligible with a growing allocation-size the memory
-latency rules and pre-reserve()-ing wouldn't help much. Only if
moving the objects has a high overhead reserve() would gain some-
thing here.

wij

unread,

Mar 20, 2021, 2:57:27 PM3/20/21

to

It seemed to me you were trying re-implementing the low level function void *malloc(size_t)

Chris M. Thomasson

unread,

Mar 20, 2021, 4:10:40 PM3/20/21

to

Ahhhh! I think you are correct. It been a while since I tested against
the tbb. I remember conversing with TBB implementers way back in some
Intel threading forums.

Bonita Montero

unread,

Mar 22, 2021, 1:53:48 PM3/22/21

to

Java does the same:

import java.util.ArrayList;
import java.lang.reflect.Field;

public class ArrayLIstGrowth
{
public static void main( String[] args )
{
ArrayList<Integer> al = new ArrayList<Integer>();
int cap = getCapacity( al );
for( int i = 0; i != 1000000; ++i )
{
al.add( i );
int newCap = getCapacity( al );
if( newCap != cap )
{
System.out.printf( "%d - %d\n", i, newCap );
cap = newCap;
}

}
}
static int getCapacity( ArrayList<?> al )
{
try
{
if( field == null )
{
field = ArrayList.class.getDeclaredField( "elementData" );
field.setAccessible( true );
}
return ((Object[])field.get( al )).length;
}
catch( NoSuchFieldException nsfe )
{
return -1;
}
catch( IllegalAccessException iae )
{
return -1;
}
}
static Field field = null;
}

0 - 10
10 - 15
15 - 22
22 - 33
33 - 49
49 - 73
73 - 109
109 - 163
163 - 244
244 - 366
366 - 549
549 - 823
823 - 1234
1234 - 1851
1851 - 2776
2776 - 4164
4164 - 6246
6246 - 9369
9369 - 14053
14053 - 21079
21079 - 31618
31618 - 47427
47427 - 71140
71140 - 106710
106710 - 160065
160065 - 240097
240097 - 360145
360145 - 540217
540217 - 810325
810325 - 1215487