Please avoid std::unordered_map

9,266 views
Skip to first unread message

Brett Wilson

unread,
Apr 25, 2017, 7:27:33 PM4/25/17
to Chromium-dev
We now have a guide on which of the various map and set containers to use (for non-Blink) in //base/containers/README.md.

Please check it out. I'm hoping to expand this for other containers over time.


Details relating to inflammatory click-bait title...

I've noticed a lot of code uses std::unordered_map because "hash tables are faster." But it has high memory overhead for small sizes which most of our maps are. It's also slightly slower than std::map to insert into, has poor cache coherency (every insert is a malloc), and isn't as much faster at querying over the alternatives than you might expect (it can even be slower!).

It's still a good option for the right work loads, but please deliberately select this container after considering your usage, rather than defaulting to use it or assuming it will be better.

Brett

Victor Costan

unread,
Apr 26, 2017, 2:40:49 AM4/26/17
to Brett Wilson, Chromium-dev
Have you also looked at Blink's types? I think there's an open question of whether it makes sense to move some of them in base/. (apologies if I missed threads discussing this)

This came to my mind because (assuming I read the code correctly) WTF::HashMap uses open addressing, so it doesn't have to do allocations for most insertions. I'd expect WTF::HashMap to do better than std::map for elements that are efficiently copyable / moveable. WTF::HashMap is still not very cache-friendly*, but then again, neither is the red-black tree behind std::map.

* I can think of a couple of tricks to make the double hashing trigger the prefetcher, but I wouldn't know how to analyze the implications on access runtime

Thank you,
    Victor


--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/CABiGVV9hnyt6fmf1sDtW325d_toeRXpCajzPs9yjYPa_H9kB9Q%40mail.gmail.com.

Yuta Kitamura

unread,
Apr 26, 2017, 3:01:06 AM4/26/17
to pwn...@chromium.org, Brett Wilson, Chromium-dev
On Wed, Apr 26, 2017 at 3:39 PM, Victor Costan <pwn...@chromium.org> wrote:
Have you also looked at Blink's types? I think there's an open question of whether it makes sense to move some of them in base/. (apologies if I missed threads discussing this)

[Wearing the WTF owner's hat] Yeah, that's precisely what I'm thinking now. I'm pretty eager for giving it a shot and seeing the impact, if people agree on this.
 

This came to my mind because (assuming I read the code correctly) WTF::HashMap uses open addressing, so it doesn't have to do allocations for most insertions. I'd expect WTF::HashMap to do better than std::map for elements that are efficiently copyable / moveable. WTF::HashMap is still not very cache-friendly*, but then again, neither is the red-black tree behind std::map.

Yes, that's a pretty accurate summary of the WTF::HashMap situation.
 

Brett Wilson

unread,
Apr 26, 2017, 11:23:41 AM4/26/17
to Victor Costan, Chromium-dev
Open addressed hash tables are an answer to lookup performance problems. But performance isn't really the problem:

I didn't actually run any code so somebody can check me: An empty WTF::HastMap is 24 bytes (it doesn't allocate a table to start out with), compared to a std::unordered_map at 128 bytes. That's good. But as soon as you insert something, WTF::HashMap will allocate enough storage for 8 pair<Key, Value>.

sizeof(std::string) is 32 bytes or so. So a one element WTF::HashMap<std::string, std::string> be 536 bytes, but the std::unordered_map<std::string, std::string> would be ~176 bytes and a std::map<std::string, std::string> would be ~80 bytes.

With this memory profile, I would definitely not want people to start using WTF::HashMap for random maps in Chrome! I think there is potentially a place for an open-addressed hash table for very specific use cases. But these are indeed very few.

If you are inspired to work on containers, adding a base::deque that's a circular buffer interface over a vector would be a much more clear win. I think it will be better for all workloads than std::deque. See

Brett

Brett Wilson

unread,
Apr 26, 2017, 11:36:23 AM4/26/17
to Victor Costan, Chromium-dev
I don't want to be too critical of WTF::HashMap or imply it sucks. The open addressing is clearly faster and the memory access patterns are also better.

But the real problem we have in most Chrome code is that we have a zillion almost empty maps. Hash tables of any type make this worse. I hacked std::map and put a histogram in the destructor of the map size. Here's the pattern for the browser process:

Histogram: Map.Size recorded 169828 samples, mean = 1.2 (flags = 0x1)

0     ------------------------------------------------------------------------O (110025 = 64.8%)

1     ----------O                                                               (15713 = 9.3%) {64.8%}

2     -----------------O                                                        (25289 = 14.9%) {74.0%}

3     -------O                                                                  (10520 = 6.2%) {88.9%}

4     -O                                                                        (1866 = 1.1%) {95.1%}

5     O                                                                         (1462 = 0.9%) {96.2%}

7     -O                                                                        (2424 = 1.4%) {97.1%}

9     O                                                                         (597 = 0.4%) {98.5%}

12    O                                                                         (508 = 0.3%) {98.9%}

16    O                                                                         (869 = 0.5%) {99.2%}

21    O                                                                         (118 = 0.1%) {99.7%}

28    O                                                                         (132 = 0.1%) {99.7%}

<long tail omitted>


Brett

Claudio deSouza

unread,
Apr 26, 2017, 11:50:04 AM4/26/17
to Chromium-dev
I just would like to chime in very briefly on this discussion to talk about base::flat_map. Recently a CL landed on base::values.h, changing the storage type for base::DictionaryValue from std::map (I think) to base::flat_map. I can understand the reasoning for that, but this change has a potential to make base::Values unusable in some cases, especially if you are loading base::Values from some JSON content, which doesn't understand the consideration involved in this choice.

I'm only giving a heads up because I've seen cases in which any performance gain with this change is lost due to the fact that base::flat_map uses brute force look up for uniqueness.

Cheers,

Brett Wilson

unread,
Apr 26, 2017, 12:43:53 PM4/26/17
to Claudio M. DeSouza Junior, Chromium-dev
On Wed, Apr 26, 2017 at 8:50 AM, Claudio deSouza <claudi...@gmail.com> wrote:
I just would like to chime in very briefly on this discussion to talk about base::flat_map. Recently a CL landed on base::values.h, changing the storage type for base::DictionaryValue from std::map (I think) to base::flat_map. I can understand the reasoning for that, but this change has a potential to make base::Values unusable in some cases, especially if you are loading base::Values from some JSON content, which doesn't understand the consideration involved in this choice.

The JSON parser knows about the dictionary storage algorithm in base::Value, and the parser got faster with this change. 


I'm only giving a heads up because I've seen cases in which any performance gain with this change is lost due to the fact that base::flat_map uses brute force look up for uniqueness.

base::flat_map uses binary search, not brute force.

Brett

Yuzhu Shen

unread,
Apr 26, 2017, 1:02:20 PM4/26/17
to Brett Wilson, Victor Costan, Chromium-dev
On Wed, Apr 26, 2017 at 8:35 AM, Brett Wilson <bre...@chromium.org> wrote:
I don't want to be too critical of WTF::HashMap or imply it sucks. The open addressing is clearly faster and the memory access patterns are also better.

But the real problem we have in most Chrome code is that we have a zillion almost empty maps. Hash tables of any type make this worse. I hacked std::map and put a histogram in the destructor of the map size. Here's the pattern for the browser process:

The contents of some maps may be cleaned up before the maps are destructed.

It might be useful to keep track of max_size and log it in the destructor.

Brett Wilson

unread,
Apr 26, 2017, 1:05:02 PM4/26/17
to Yuzhu Shen, Victor Costan, Chromium-dev
On Wed, Apr 26, 2017 at 10:00 AM, Yuzhu Shen <yzs...@chromium.org> wrote:


On Wed, Apr 26, 2017 at 8:35 AM, Brett Wilson <bre...@chromium.org> wrote:
I don't want to be too critical of WTF::HashMap or imply it sucks. The open addressing is clearly faster and the memory access patterns are also better.

But the real problem we have in most Chrome code is that we have a zillion almost empty maps. Hash tables of any type make this worse. I hacked std::map and put a histogram in the destructor of the map size. Here's the pattern for the browser process:

The contents of some maps may be cleaned up before the maps are destructed.

It might be useful to keep track of max_size and log it in the destructor.

That's true, but this change was pretty painful (the STL is not meant to be changed to call into base) and I was hoping the destructor would be a representative enough sample. I also didn't count unordered_maps which may have a different distribution because people use them in different places (but I suspect not).

Brett

Daniel Murphy

unread,
Apr 26, 2017, 4:10:48 PM4/26/17
to bre...@chromium.org, Yuzhu Shen, Victor Costan, Chromium-dev
Shameless plug for base::SmallMap! It turns all of those small maps into vectors, which (at that size) are superior in every way [citation needed] ESPECIALLY if the keys are in-lined memory in the array.

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Brett Wilson

unread,
Apr 26, 2017, 4:18:52 PM4/26/17
to Daniel Murphy, Yuzhu Shen, Victor Costan, Chromium-dev
On Wed, Apr 26, 2017 at 1:09 PM, Daniel Murphy <dmu...@google.com> wrote:
Shameless plug for base::SmallMap! It turns all of those small maps into vectors, which (at that size) are superior in every way [citation needed] ESPECIALLY if the keys are in-lined memory in the array.

Except code size which is noticeably larger (as discussed in the doc).

If you don't care about code size, it's probably always better. But we do care about code size so I'm a little reluctant to recommend it too strongly. If it was widely used some code may get folded or something. This is an area where somebody could contribute more data to help the guidance.

Brett

Claudio deSouza

unread,
Apr 27, 2017, 2:47:52 AM4/27/17
to Chromium-dev, claudi...@gmail.com
Hi Brett,

I must say I did not express myself well... But I had a code in which cDictionaryWould have around 10000 key values... and the drop in performance was noticeable. The current model leads to quadratic performance, that we didn't use to have with std::map.

Cheers,

Daniel Cheng

unread,
Apr 27, 2017, 3:05:36 AM4/27/17
to claudi...@gmail.com, Chromium-dev
If the code isn't doing bulk inserts and is inserting (or removing) items one by one over time, that would certainly lead to quadratic performance.

Is it possible for this code to use one of the bulk insertion methods of flat_map and construct the DictionaryValue from it?

Daniel

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Claudio deSouza

unread,
Apr 27, 2017, 3:52:16 AM4/27/17
to Chromium-dev, claudi...@gmail.com
Hello Daniel,

Yes, I can make specific changes to optimise this code... However, even if I inset them in a bulk, flap_map will be way slower for some operations after the bulk insertion (finding and inserting an extra one).

I must admit I have no knowledge on the size requirement that was taken into consideration here, but I think that base::DictionaryValue should use some kind of small map optimisation, with some similar idiom as used in clang, but I don't know how that would impact binary size.

Cheers

Brett Wilson

unread,
Apr 27, 2017, 1:45:42 PM4/27/17
to Claudio M. DeSouza Junior, Chromium-dev
We instrumented base::Value to catch code that was inserting into large lists. For the most part, this doesn't happen except in the base JSON parser and when copying them. These are both optimized for the new design. We could have missed some spots so please let me know if you find any. base::Value should be pretty efficient to move so most code won't notice it even for occasional large dictionaries.

It sounds like maybe you're doing something outside of Chromium so we didn't see that.

It's my opinion that flat_map is a better match for the workload we have than base::small_map. A different small_map implementation might be better though (Dana was suggesting one privately to me).

Somewhat related: In my opinion people use base::Value way too much. I was doing some memory analysis and found several files doing a shocking number of allocations. These were all doing things with base::Value that could have been avoided with a different design. In Chrome the main times you should really be needing to use it is for prefs and extensions, and I would encourage people to do something more efficient for other cases.

Brett

Jeffrey Yasskin

unread,
May 1, 2017, 12:19:20 PM5/1/17
to Brett Wilson, Chromium-dev, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
Adding two folks at Microsoft who might be able to improve their unordered_map situation.

Brett, can you link to the code for the microbenchmark that https://chromium.googlesource.com/chromium/src/+/master/base/containers/README.md#std_unordered_map-and-std_unordered_set used to determine that, "In a microbenchmark on Windows, inserts of 1M integers into a std::unordered_set took 1.07x the time of std::set, and queries took 0.67x the time of std::set."? At 1M integers, the extra log factor should make std::set slower, unless there's a bug or the benchmark did something that let the std::set implementation cheat, like inserting in order.

In the code size test, it looks like the comparison looked at the size for creating the first instantiation of a given template. However, there are techniques implementations can use that make the second and subsequent instantiations much smaller than the first. Could you confirm that you were actually looking at the incremental code size cost of the second or later instantiation?

Thanks,
Jeffrey

On Tue, Apr 25, 2017 at 4:26 PM, Brett Wilson <bre...@chromium.org> wrote:

--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
---
You received this message because you are subscribed to the Google Groups "Chromium-dev" group.

Brett Wilson

unread,
May 2, 2017, 2:30:08 AM5/2/17
to Billy O'Neal (VC LIBS), Jeffrey Yasskin, Chromium-dev, Stephan T. Lavavej
Thanks for the insightful input from everybody!

Brett

On Monday, May 1, 2017, Billy O'Neal (VC LIBS) <bi...@microsoft.com> wrote:

I’ve always told people to use trees instead of hash tables unless there was profiling evidence that hash tables would be faster for their application. The extra lg n factor only matters when n is large, and for a lot of use cases (such as a std::string for the key) a less than comparison can be a lot cheaper than a hash + equality comparison.

 

I don’t believe there is much that can be done on code size here (at least under our current ABI) – pretty much everything in a container has to depend on T, or at least on the allocator (which depends on T). (We do want to switch list to forward_list here, like libc++ does, but that’s an ABI break)


A robin-hood open-addressed hashed container might have better performance for small containers (since it would avoid per-element allocation), but the standard doesn’t have one (and my understanding is that unordered_map can’t be backed with one due to some of the iterator guarantees it provides).

 

Billy3

sabba...@gmail.com

unread,
Jan 30, 2021, 6:14:38 AM1/30/21
to Chromium-dev, Brett Wilson, Jeffrey Yasskin, Chromium-dev, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
I've seen an article http://jsteemann.github.io/blog/2016/06/14/how-much-memory-does-an-stl-container-use/ where it's said std::unordered_map has better memory usage than std::map has.

I've done some benchmarks from the article and it seems like that.

With libc++
clang++-11 -O2 -Wall -Wextra -pedantic -std=c++17 -DNDEBUG -stdlib=libc++ 
n =     0
---------
vector         => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
map            => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
set            => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
unordered_map  => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
unordered_set  => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
deque          => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)

n =     1
---------
vector         => 8 bytes allocd at end - total: 8 bytes mallocd, 1 malloc(s), 0 free(s)
map            => 48 bytes allocd at end - total: 48 bytes mallocd, 1 malloc(s), 0 free(s)
set            => 40 bytes allocd at end - total: 40 bytes mallocd, 1 malloc(s), 0 free(s)
unordered_map  => 48 bytes allocd at end - total: 48 bytes mallocd, 2 malloc(s), 0 free(s)
unordered_set  => 40 bytes allocd at end - total: 40 bytes mallocd, 2 malloc(s), 0 free(s)
deque          => 4104 bytes allocd at end - total: 4104 bytes mallocd, 2 malloc(s), 0 free(s)

n =     4
---------
vector         => 32 bytes allocd at end - total: 56 bytes mallocd, 3 malloc(s), 2 free(s)
map            => 192 bytes allocd at end - total: 192 bytes mallocd, 4 malloc(s), 0 free(s)
set            => 160 bytes allocd at end - total: 160 bytes mallocd, 4 malloc(s), 0 free(s)
unordered_map  => 168 bytes allocd at end - total: 184 bytes mallocd, 6 malloc(s), 1 free(s)
unordered_set  => 136 bytes allocd at end - total: 152 bytes mallocd, 6 malloc(s), 1 free(s)
deque          => 4104 bytes allocd at end - total: 4104 bytes mallocd, 2 malloc(s), 0 free(s)

With libstdc++
clang++-11 -O2 -Wall -Wextra -pedantic -std=c++17 -DNDEBUG -stdlib=libstdc++ 
n =     0
---------
vector         => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
map            => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
set            => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
unordered_map  => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
unordered_set  => 0 bytes allocd at end - total: 0 bytes mallocd, 0 malloc(s), 0 free(s)
deque          => 576 bytes allocd at end - total: 576 bytes mallocd, 2 malloc(s), 0 free(s)

n =     1
---------
vector         => 8 bytes allocd at end - total: 8 bytes mallocd, 1 malloc(s), 0 free(s)
map            => 48 bytes allocd at end - total: 48 bytes mallocd, 1 malloc(s), 0 free(s)
set            => 40 bytes allocd at end - total: 40 bytes mallocd, 1 malloc(s), 0 free(s)
unordered_map  => 128 bytes allocd at end - total: 128 bytes mallocd, 2 malloc(s), 0 free(s)
unordered_set  => 120 bytes allocd at end - total: 120 bytes mallocd, 2 malloc(s), 0 free(s)
deque          => 576 bytes allocd at end - total: 576 bytes mallocd, 2 malloc(s), 0 free(s)

n =     4
---------
vector         => 32 bytes allocd at end - total: 56 bytes mallocd, 3 malloc(s), 2 free(s)
map            => 192 bytes allocd at end - total: 192 bytes mallocd, 4 malloc(s), 0 free(s)
set            => 160 bytes allocd at end - total: 160 bytes mallocd, 4 malloc(s), 0 free(s)
unordered_map  => 200 bytes allocd at end - total: 200 bytes mallocd, 5 malloc(s), 0 free(s)
unordered_set  => 168 bytes allocd at end - total: 168 bytes mallocd, 5 malloc(s), 0 free(s)
deque          => 576 bytes allocd at end - total: 576 bytes mallocd, 2 malloc(s), 0 free(s)

Am I missing something? It seems there's no memory usage at all in empty unordered_map. Also, it seems libc++'s unordered_map has strictly better memory usage than libc++'s std::map ().

Brett Wilson

unread,
Jan 30, 2021, 8:35:28 PM1/30/21
to sabba...@gmail.com, Chromium-dev, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
There is more detail at:

I think when you're looking at the std::unordered_* containers, it's not counting the load factor, 4 elements may be optimum while other sized arrays may be worse. There are numbers on the Chromium page assume a 0.5 load factor.

That article you linked is also allocating the object itself on the stack and not counting those bytes. Whether this is appropriate or not will depend on your use. And in any case if you only care about n ≤ 4 you should just use a vector.

Brett

Sergey Abbakumov

unread,
Jan 31, 2021, 2:08:38 AM1/31/21
to Chromium-dev, Brett Wilson, Chromium-dev, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS), Sergey Abbakumov
In libc++ sizeof map is 24 whereas sizeof unordered_map is 40. Not a big difference really.

Yes, I've read that Chromium article, and I'm still not convinced.

n =  2047
---------
map            => 98256 bytes allocd at end - total: 98256 bytes mallocd, 2047 malloc(s), 0 free(s)
unordered_map  => 91128 bytes allocd at end - total: 116512 bytes mallocd, 2058 malloc(s), 10 free(s)

These benchmarks track all allocations including rehashing. Yes, rehashing happens from time to time taking into account the loading factor.
But in libc++ implementation (which I think, the one used in Chromium on all platforms now) unordered_map has lower memory usage after almost all the insertions than std::map does.
And it doesn't depend on the number of insertions. Except for a few cases:
when N = 48, 98, 198, 398, 798, ...

n =   798
---------                                                                        
map            => 38304 bytes allocd at end - total: 38304 bytes mallocd, 798 malloc(s), 0 free(s)
unordered_map  => 38312 bytes allocd at end - total: 50920 bytes mallocd, 808 malloc(s), 9 free(s)

the difference is 8 bytes.

When N = 1598, ...
n =  1598
---------                                                                        
map            => 76704 bytes allocd at end - total: 76704 bytes mallocd, 1598 malloc(s), 0 free(s)
unordered_map  => 76760 bytes allocd at end - total: 102144 bytes mallocd, 1609 malloc(s), 10 free(s)

the difference is 56.

I've tested these benchmarks with N from 1 to 2048. And the maximum memory difference between unordered_map and std::map after all insertions is 56 bytes.

Yes, unordered_map performs slightly more allocations. But after all insertions, unordered_map has less memory usage in general. I don't see any 128 bytes allocations on empty unordered_map.

Sergey Abbakumov

unread,
Jan 31, 2021, 2:36:09 AM1/31/21
to Chromium-dev, Sergey Abbakumov, Brett Wilson, Chromium-dev, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
Note that std::map also "has poor cache coherency (every insert is a malloc)".

Also, I support Yuzhu Shen's opinion regarding counting the maximum number of elements in a map during the whole lifetime. That would be hard to count but we definitely want to do that. Otherwise, we lower the estimate.

K. Moon

unread,
Feb 1, 2021, 11:40:44 AM2/1/21
to sabba...@gmail.com, Chromium-dev, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
It looks like one of the points from the original discussion was about the performance of hashing keys vs. comparisons. The linked benchmark is using maps with uint64_t keys, which is going to be a best case scenario. A more realistic benchmark would use std::string keys with a realistic length distribution.

To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromium-dev/b2a73810-b0e3-49ee-a1e4-e25aedd3cd61n%40chromium.org.

Sergey Abbakumov

unread,
Feb 3, 2021, 9:08:29 AM2/3/21
to Chromium-dev, km...@chromium.org, Chromium-dev, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS), Sergey Abbakumov
The hashing vs operator< was just one of the considerations. The main one was that std::map has better memory usage. But it seems to be not true. It doesn't depend on key/value type. The result is the same with std::string key/values.

In libc++ unordered_map seems to have better memory usage in general than maps have. I think we need to revisit the default map type guidelines.

Peter Kasting

unread,
Feb 3, 2021, 9:34:57 AM2/3/21
to sabba...@gmail.com, Chromium-dev, km...@chromium.org, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
On Wed, Feb 3, 2021 at 6:08 AM Sergey Abbakumov <sabba...@gmail.com> wrote:
In libc++ unordered_map seems to have better memory usage in general than maps have. I think we need to revisit the default map type guidelines.

The current guideline is, basically, to use a flat_map, not a map.  The upcoming guideline will at some point likely be to use the Abseil swiss tables.

I don't think map vs. unordered_map is the interesting comparison to run at this point.

PK 

Alexei Svitkine

unread,
Feb 3, 2021, 9:47:28 AM2/3/21
to Peter Kasting, sabba...@gmail.com, Chromium-dev, km...@chromium.org, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
As someone who recently read base/containers/README.md, I didn't get the impression "current guideline is, basically, to use a flat_map, not a map" - in fact, my decision was not to change from std::map based on what I read there.

From the doc:

* Generally avoid `std::unordered_set` and `std::unordered_map`. In the common
case, query performance is unlikely to be sufficiently higher than
`std::map` to make a difference, insert performance is slightly worse, and
the memory overhead is high. This makes sense mostly for large tables where
you expect a lot of lookups.

<snip about guidance about some specific cases, like "small" maps with wording such as
"better or comparable to other containers even for several dozen items">

* Use `std::map` and `std::set` if you can't decide. Even if they're not
great, they're unlikely to be bad or surprising.

So per the above, there's still the recommendation to use std::map unless it fits some definition of small, where the example is "several dozen items". Due to vagueness of what "small" and the guidance for defaulting to std::map if it's not clear. I'd wager I'm not the only one to arrive at that case via the "flow chart" of "small" not explicitly defined. (In my case, I was thinking of scenarios like number of field trials, which is in the hundreds range, or histograms, which are in the thousands range. Neither case was obviously a "small" map given the definition in the README.md).
> --
> --
> Chromium Developers mailing list: chromi...@chromium.org
> View archives, change email options, or unsubscribe:
> http://groups.google.com/a/chromium.org/group/chromium-dev
> ---
> You received this message because you are subscribed to the Google Groups "Chromium-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Peter Kasting

unread,
Feb 3, 2021, 10:08:07 AM2/3/21
to Alexei Svitkine, sabba...@gmail.com, Chromium-dev, km...@chromium.org, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
On Wed, Feb 3, 2021 at 6:45 AM Alexei Svitkine <asvi...@chromium.org> wrote:
<snip about guidance about some specific cases, like "small" maps with wording such as
"better or comparable to other containers even for several dozen items">

It's kind of critical that the section you start with says it applies to most maps and sets in Chromium and that in at least one common scenario flat_maps are "strictly better than std::map".

Saying, effectively, "For most maps... use a flat_map" makes the guidance be, effectively, to use flat_maps.

* Use `std::map` and `std::set` if you can't decide. Even if they're not
great, they're unlikely to be bad or surprising.

Yes, if you think you might fall into some uncommon case, the fact that map has not-terrible bounds on its perf makes it an OK fallback.  That does NOT make it the preferred default choice for arbitrary maps -- that, today, is flat_map.

So per the above, there's still the recommendation to use std::map unless it fits some definition of small, where the example is "several dozen items". Due to vagueness of what "small" and the guidance for defaulting to std::map if it's not clear. I'd wager I'm not the only one to arrive at that case via the "flow chart" of "small" not explicitly defined. (In my case, I was thinking of scenarios like number of field trials, which is in the hundreds range, or histograms, which are in the thousands range. Neither case was obviously a "small" map given the definition in the README.md).

"Small" varies by what you're doing.  If you're doing a ton of inserts and your type cannot be efficiently moved, "small" is dozens.  If your type can be efficiently moved, small is hundreds.  If you're not doing many inserts, small is larger; by the time you're not doing any, flat_maps are strictly better than maps.  This all assumes that your concern is runtime; if your concern is only memory cost, flat_maps are always a win over maps.

PK

Chris Hamilton

unread,
Feb 3, 2021, 10:10:58 AM2/3/21
to Alexei Svitkine, Peter Kasting, sabba...@gmail.com, Chromium-dev, km...@chromium.org, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
I would agree with Alexei, and also would specifically disagree with any guidance that encourages blanket usage of flat_map or flat_set. Those containers are only really suitable for small (where what small is depends on how often it's going to be modified, really) data sets.

From Sergey's comments it appears that that memory usage argument might no longer be true (at least depending on platform). Is it possible that the unordered container implementations have simply changed since the comparison was last done?

Another possibility is that while libc++ may be fine, since Chrome is cross-platform we really need to worry about all platforms at once. So if unordered containers are really bad on any one platform, in general we should simply avoid them. So we'd need to look at more than just libc++ (ie, we should also look at the MS ucrt implementation).

Cheers,

Chris

K. Moon

unread,
Feb 5, 2021, 3:32:26 PM2/5/21
to Chris Hamilton, Alexei Svitkine, Peter Kasting, sabba...@gmail.com, Chromium-dev, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
I instrumented the containers.cpp program to record more detailed allocation traces, and learned a few interesting things about the libc++ implementations of the map and set containers (on a 64-bit Linux machine with 8 byte pointers):
  1. Both map and unordered_map have similar memory usage as the corresponding set or unordered_set holding a pair<K, V> (that is, sizeof(K) + sizeof(V)), so there's no need to analyze them separately.
  2. set has a per item size of 32 + sizeof(V), while unordered_set has a per item size of 16 + sizeof(V).
  3. unordered_set has an additional overhead of 8n, where n = size of the hash table.
  4. By default, unordered_set seems to have a maximum load factor of 1 (size() <= n), and uses table sizes of 2^k+1 (k > 0).
  5. set always deallocates memory when an item is erased.
  6. unordered_set never shrinks the hash table, at least not automatically. (I tried adding and then deleting up to 8192 items; I don't know if the logic changes at even higher sizes. That probably requires looking at the source code.)
So yes, it's true that given (2) and (3), the worst case memory usage of unordered_set (load factor ~= 0.5) is still 8 fewer bytes than the corresponding set (and sometimes much less, at load factor = 1). This advantage disappears as soon as you deallocate just a few items, however, since unordered_set doesn't recover memory allocated for the hash table.

Given this disadvantage, I think the guidance to avoid the unordered containers for general use remains sound, as they seem to be optimized for peak memory usage. (This is also another lesson to beware conclusions drawn from a microbenchmark with only one kind of workload.)

K. Moon

unread,
Feb 5, 2021, 4:08:26 PM2/5/21
to Billy O'Neal (VC LIBS), Chris Hamilton, Alexei Svitkine, Peter Kasting, sabba...@gmail.com, Chromium-dev, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej
This was a relatively minor point of my findings, but I think we have a mix-up in terminology: The maximum load factor is 1, but I was referring to the actual load factor, which varies depending on usage.

On Fri, Feb 5, 2021 at 12:33 PM Billy O'Neal (VC LIBS) <bi...@microsoft.com> wrote:
The default load factor for the standard containers is 1, not 0.5. Note that the standard containers are a separate chaining hash table, not an open addressed hash table, so relatively high load factors are much less ruinous for it (but of course it has worse constant factors). See http://eel.is/c++draft/unord#map.cnstr-1

From: K. Moon <km...@chromium.org>
Sent: Friday, February 5, 2021 12:30 PM
To: Chris Hamilton <chr...@chromium.org>
Cc: Alexei Svitkine <asvi...@chromium.org>; Peter Kasting <pkas...@chromium.org>; sabba...@gmail.com <sabba...@gmail.com>; Chromium-dev <chromi...@chromium.org>; Brett Wilson <bre...@chromium.org>; Jeffrey Yasskin <jyas...@chromium.org>; Stephan T. Lavavej <s...@exchange.microsoft.com>; Billy O'Neal (VC LIBS) <bi...@microsoft.com>
Subject: Re: [chromium-dev] Re: Please avoid std::unordered_map
 

K. Moon

unread,
Feb 5, 2021, 4:44:53 PM2/5/21
to Billy O'Neal (VC LIBS), Chris Hamilton, Alexei Svitkine, Peter Kasting, sabba...@gmail.com, Chromium-dev, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej
Good point. 👍

One last detail that I left out of my previous post: rehash()/reserve() do seem to shrink the hash table, so there is a way to recover the extra memory usage. The design intent for std::unordered_set seems to be similar to std::vector, in that the programmer is expected to request recovery of this memory explicitly at a suitable time. Not recovering the hash table memory doesn't seem to be an implementation defect. That said, I think this behavior is more surprising in a set or map than in a vector.

On Fri, Feb 5, 2021 at 1:30 PM Billy O'Neal (VC LIBS) <bi...@microsoft.com> wrote:
Ah, makes sense. (I just wanted to underscore that a load factor of 1 makes sense here since lots of folks will be comparing with Google DenseHash or SwissTables or similar and wondering why things are different)

Thanks!

From: K. Moon <km...@chromium.org>
Sent: Friday, February 5, 2021 01:05 PM
To: Billy O'Neal (VC LIBS) <bi...@microsoft.com>
Cc: Chris Hamilton <chr...@chromium.org>; Alexei Svitkine <asvi...@chromium.org>; Peter Kasting <pkas...@chromium.org>; sabba...@gmail.com <sabba...@gmail.com>; Chromium-dev <chromi...@chromium.org>; Brett Wilson <bre...@chromium.org>; Jeffrey Yasskin <jyas...@chromium.org>; Stephan T. Lavavej <s...@exchange.microsoft.com>

erik...@chromium.org

unread,
Feb 8, 2021, 5:48:24 PM2/8/21
to Chromium-dev, K. Moon, Chris Hamilton, Alexei Svitkine, Peter Kasting, Sergey Abbakumov, Chromium-dev, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
base::flat_map has a subtle behavioral difference: "Iterators are invalidated across mutations." This is not the behavior that most STL users expect so I would caution against using base::flat_map as the recommended-by-default container. It means the following line of code will result in UaF, but only when the insertion causes a container resize, which is rare: 
  container["new element"] = it.second;

Daniel Cheng

unread,
Feb 8, 2021, 5:56:38 PM2/8/21
to Erik Chen, Chromium-dev, K. Moon, Chris Hamilton, Alexei Svitkine, Peter Kasting, Sergey Abbakumov, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
Note that (barring unforeseen circumstances), it's very likely that absl's flat_hash_map will become the recommended default container at some point, which would have the same issues mentioned here. While I wouldn't recommend flat_map as a default container for a number of reasons (incremental mutations can lead to O(n^2) performance), I do think this is just a pattern that people need to be careful of writing, e.g.

  vector.push_back(*it);

Is similarly problematic.

Perhaps there's a bugprone warning for this kind of thing that can be enabled?

Daniel

K. Moon

unread,
Feb 8, 2021, 7:36:24 PM2/8/21
to Daniel Cheng, Erik Chen, Chromium-dev, Chris Hamilton, Alexei Svitkine, Peter Kasting, Sergey Abbakumov, Brett Wilson, Jeffrey Yasskin, Stephan T. Lavavej, Billy O'Neal (VC LIBS)
+1; as an STL user, I don't assume anything about iterator validity after mutation until I've read the documentation. 🙂

Reply all
Reply to author
Forward
0 new messages