std::vector perf: (reserve + push_back) seems slower than (resize + (data / operator[]))

294 views

Skip to first unread message

Primiano Tucci

unread,

May 30, 2017, 9:24:21 AM5/30/17

to cxx

TL;DR

On Linux+clang-O2+official_build std::vector's push_back() seems slower (up to 2x for vector<int>) and quite binary inefficient than resize()+data() / operator[]. This has nothing to do with memory / allocator / expansion policies.

Bonus: data() seems to give better alignment hints vs operator[]. For vector<int> looping over .data() seems ~9% faster than operator[] due to the use of double-quadword registers.

In the context of trying to reproduce go/minimize-pointer-following (nothing secret but unfortunately the link is internal only [1]) with our toolchains, I ended up with the following:

https://gist.github.com/primiano/f468a8d77de9a56e347032adb2bf1fe3

That vector-pointer-deref-is-not-hoisted issue doesn't seem to repro for me on Linux with clang -O2. In both cases (caching the data() ptr vs using operator[] on the pointer) the vector ptr dereferencing is hoisted outside of the loop (good). There is only something funny going on with alignment: the former is 9% faster as the compiler realizes it can use double-quadword instructions.

The thing that jumped to my attention though is the 3rd experiment using push_back(). That:

- Generates a whole lot of code (related discussion on binary-size@ [2])

- For the specific case of inserting into a vector<int> push_back() seems 2X slower than the two other alternatives (resize() + data() and operator[])

I tried something similar with a larger struct of 4 ints [3], and reserve()+emplace_back() still seems 20% slower than the equivalent resize() + Initialize().

Probably it is just me, but until now I would have never expected push_back to be _that_ slow and binary-inefficient. I know that push_back() generally yields to safer and more readable code but, if I didn't do anything silly, this seems to suggest that .data() is faster for hot loops.

Talking about which, I am not sure how much of them we have. An codesearch attempt [4] suggests that we have ~3600 instances of push_back() within a loop.

[1] The TL;DR of that doc is that passing a pointer to std::vector<int>* to a function which reads/writes via .data()/operator[] can cause the vector ptr derefencing operation to _not_ be hoisted outside of the loop due to aliasing. Which doesn't seem to happen here.

[2] https://groups.google.com/a/chromium.org/d/msg/binary-size/ihlvm2utHIA/mc157s2LAAAJ

[3] https://gist.github.com/primiano/6dbe3929238063f7942f15c5078356d6

[4] https://cs.chromium.org/search/?q=for%5C+%5C(.*%5Cn(.*%5Cn)%7B0,3%7D.*push_back%5C(+pcre:yes+package:%5Echromium$&type=cs

Primiano Tucci

unread,

May 30, 2017, 10:00:03 AM5/30/17

to cxx, alexc...@chromium.org

Actually looks like the 9% difference between .data() and .operator[] (using XMM-based double-quadword movs vs just regular qword mov) is due to -fno-strict-aliasing.

Not sure why it makes a difference only for .operator() (.data() ends up using XMM in both cases).