std::vector perf: (reserve + push_back) seems slower than (resize + (data / operator[]))

294 views
Skip to first unread message

Primiano Tucci

unread,
May 30, 2017, 9:24:21 AM5/30/17
to cxx
TL;DR
On Linux+clang-O2+official_build std::vector's push_back() seems slower (up to 2x for vector<int>) and quite binary inefficient than resize()+data() / operator[]. This has nothing to do with memory / allocator / expansion policies.
Bonus: data() seems to give better alignment hints vs operator[]. For vector<int> looping over .data() seems ~9% faster than operator[] due to the use of double-quadword registers.
 
In the context of trying to reproduce go/minimize-pointer-following (nothing secret but unfortunately the link is internal only [1]) with our toolchains, I ended up with the following:


That vector-pointer-deref-is-not-hoisted issue doesn't seem to repro for me on Linux with clang -O2. In both cases (caching the data() ptr vs using operator[] on the pointer) the vector ptr dereferencing is hoisted outside of the loop (good). There is only something funny going on with alignment: the former is 9% faster as the compiler realizes it can use double-quadword instructions.

The thing that jumped to my attention though is the 3rd experiment using push_back(). That:
- Generates a whole lot of code (related discussion on binary-size@ [2])
- For the specific case of inserting into a vector<int> push_back() seems 2X slower than the two other alternatives (resize() + data() and operator[])

I tried something similar with a larger struct of 4 ints [3], and reserve()+emplace_back() still seems 20% slower than the equivalent resize() + Initialize().

Probably it is just me, but until now I would have never expected push_back to be _that_ slow and binary-inefficient. I know that push_back() generally yields to safer and more readable code but, if I didn't do anything silly, this seems to suggest that .data() is faster for hot loops.

Talking about which, I am not sure how much of them we have. An codesearch attempt [4] suggests that we have ~3600 instances of push_back() within a loop.

[1] The TL;DR of that doc is that passing a pointer to std::vector<int>* to a function which reads/writes via .data()/operator[] can cause the vector ptr derefencing operation to _not_ be hoisted outside of the loop due to aliasing. Which doesn't seem to happen here.

Primiano Tucci

unread,
May 30, 2017, 10:00:03 AM5/30/17
to cxx, alexc...@chromium.org
Actually looks like the 9% difference between .data() and .operator[] (using XMM-based double-quadword movs vs just regular qword mov) is due to -fno-strict-aliasing.
Not sure why it makes a difference only for .operator() (.data() ends up using XMM in both cases).
Reply all
Reply to author
Forward
0 new messages