Allowing the use of Abseil hash maps

1,276 views
Skip to first unread message

John Admanski

unread,
Aug 3, 2023, 5:35:50 PM8/3/23
to cxx
So, this topic has come up a few times before, but some of those discussion threads are quite old now or had a much broader focus so I thought I'd start a new one: what would it take for us to allow the use of the Abseil hash-based containers (flat_hash_map and company)?

Prior discussion suggests that a lot of people do generally agree that the flat_hash structures are superior to the stdlib unordered containers. So what would be the blockers to allowing their use?

 - Extensive chromium-specific benchmarks demonstrating their impact/benefit?
 - Commitment to a large scale change to convert all existing uses of unordered containers to them?
 - A rewrite of base/containers/README.md to recommend them?
 - Something else I'm missing or not aware of at all?

-- John

Lei Zhang

unread,
Aug 3, 2023, 6:58:28 PM8/3/23
to John Admanski, cxx
Have you seen https://crbug.com/1121345 ? There's a field trial from
https://crrev.com/1133851 that is in progress. It's been a few months,
so maybe it is time to look at the results and see how that study
went.
> --
> You received this message because you are subscribed to the Google Groups "cxx" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to cxx+uns...@chromium.org.
> To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/cxx/e75f0a35-6153-48f2-acfe-1ebfa4cccc88n%40chromium.org.

John Admanski

unread,
Aug 7, 2023, 11:52:35 AM8/7/23
to cxx, Lei Zhang, cxx, John Admanski
I did actually see that bug from one of the earlier threads from 2020, but at a glance I thought it hadn't seen any significant updates after the initial flurry in 2020. Now I see that it got some attention this year, it would indeed be interesting to see the results of that study.

John Admanski

unread,
Aug 30, 2023, 5:51:29 PM8/30/23
to cxx, John Admanski, Lei Zhang, cxx
Who would be able to actually do the analysis of the results of the field trial? I tried taking a look myself but TBH I don't really know what to look at to see the impact of this kind of study on Chrome.

Daniel Cheng

unread,
Sep 1, 2023, 7:36:28 PM9/1/23
to John Admanski, cxx, Lei Zhang
The numbers generally seem to indicate statistically-insignificant movements in various metrics. absl's hash maps are generally competitive with libc++'s unordered_map
That being said, there is currently a known inefficiency where iterating a flat_hash_map is actually slower than iterating an unordered_map; I'll be sending out a CL to fix that in abseil. IMO, we may as well wait for that to roll into Chrome. Once that's done, we will allow usage.

I would also like to update the container guidance to default to flat_hash_map unless there are reasons it wouldn't work well (e.g. ordering is required), similar to the internal guidance. I really dislike that our default container recommendation has unfortunate performance cliffs. However, updating the default recommendations will require gathering a bit more numbers about the expected impact on memory.

Finally, this does not apply to absl's btree maps. While I would prefer to see us use them in Chrome where possible, the absl btree containers are much more heavyweight in terms of binary size than their STL equivalents, so those will remain disallowed for the time being.

Daniel

John Admanski

unread,
Apr 24, 2024, 2:15:00 PMApr 24
to Daniel Cheng, cxx, Lei Zhang
Just wondering if there was ever any update or movement on this. I get that it's obviously not a particularly high priority for anyone (not even myself, clearly) but I can't quite resist pulling at the threads which have not been tied off.

Daniel Cheng

unread,
Apr 29, 2024, 3:26:06 PMApr 29
to John Admanski, cxx, Lei Zhang
Fair point. I'll try to finish cleaning up the docs around this to allow using it (but won't update the default recommendations for maps yet).

... to start the discussion around default recommendations, my belief is that we should default to using absl::flat_hash_map everywhere.

Pros:
- decent default behaviour in most cases.
  - an empty collection is about the same size as a std::unordered_map.
  - insert/removal are O(1) instead of O(n) like base::flat_map (which is the current default recommendation; this has caused perf issues in a number of places)
  - special cases for small object sizes (1, 2, 3, 5, 7)
  - at small sizes and "reasonably" sized objects(i.e. std::string -> std::string), competitive or better than the STL containers
- has built-in protections against dereferencing end (which has been responsible for some security bugs). That being said, I think libc++ is slowly gaining these protections as well.
- codegen is comparable to STL types

Cons:
- base::flat_map is strictly better from a size perspective
  - an empty flat_map is 3 words; an empty flat_hash_map is 5 words.
  - the max load factor for flat_hash_map is 87.5%, so it's utilization will always be somewhat worse than base::flat_map
- Large key/value types are more of a drag on absl::flat_hash_map. That being said, base::flat_map suffers from this problem too—and the solution to this is to use absl::node_hash_map or wrap the value in std::unique_ptr.

Curious what people think about these points (or who would need to sign-off/approve these sorts of changes).

There is another problem too, which is that Mojo has settled on base::flat_map as its default container. While it's possible to use typemapping to use an alternate map type instead, this is fairly unwieldy.
In fact, Mojo maps have another problem: the Mojo `map` type is sorted in regular variants... and unsorted in the Blink variants. This isn't great, and I kind of feel like we should probably migrate explicitly to sorted_map (which would be uniformly sorted in both variants) and unsorted_map (which would be uniformly unsorted in both variants)...

Daniel

danakj

unread,
Apr 29, 2024, 3:37:55 PMApr 29
to Daniel Cheng, John Admanski, cxx, Lei Zhang
On Mon, Apr 29, 2024 at 3:26 PM Daniel Cheng <dch...@chromium.org> wrote:
Fair point. I'll try to finish cleaning up the docs around this to allow using it (but won't update the default recommendations for maps yet).

... to start the discussion around default recommendations, my belief is that we should default to using absl::flat_hash_map everywhere.

Pros:
- decent default behaviour in most cases.
  - an empty collection is about the same size as a std::unordered_map.
  - insert/removal are O(1) instead of O(n) like base::flat_map (which is the current default recommendation; this has caused perf issues in a number of places)
  - special cases for small object sizes (1, 2, 3, 5, 7)
  - at small sizes and "reasonably" sized objects(i.e. std::string -> std::string), competitive or better than the STL containers
- has built-in protections against dereferencing end (which has been responsible for some security bugs). That being said, I think libc++ is slowly gaining these protections as well.
- codegen is comparable to STL types

Cons:
- base::flat_map is strictly better from a size perspective
  - an empty flat_map is 3 words; an empty flat_hash_map is 5 words.
  - the max load factor for flat_hash_map is 87.5%, so it's utilization will always be somewhat worse than base::flat_map
- Large key/value types are more of a drag on absl::flat_hash_map. That being said, base::flat_map suffers from this problem too—and the solution to this is to use absl::node_hash_map or wrap the value in std::unique_ptr.

Curious what people think about these points (or who would need to sign-off/approve these sorts of changes).

I think your conclusions sound very reasonable.

It's hard to tell if it is going off the rails by changing default guidance though. Without actually rewriting everything in prod, I wonder if it's possible to do some hacky rewrite of all our base::flat* to absl types and look at the APK size and run speedometer locally (without PGO) just to make sure it's not super regrettable at scale?

 

Daniel Cheng

unread,
Apr 29, 2024, 4:09:22 PMApr 29
to danakj, John Admanski, cxx, Lei Zhang
On Mon, 29 Apr 2024 at 12:37, danakj <dan...@chromium.org> wrote:
On Mon, Apr 29, 2024 at 3:26 PM Daniel Cheng <dch...@chromium.org> wrote:
Fair point. I'll try to finish cleaning up the docs around this to allow using it (but won't update the default recommendations for maps yet).

... to start the discussion around default recommendations, my belief is that we should default to using absl::flat_hash_map everywhere.

Pros:
- decent default behaviour in most cases.
  - an empty collection is about the same size as a std::unordered_map.
  - insert/removal are O(1) instead of O(n) like base::flat_map (which is the current default recommendation; this has caused perf issues in a number of places)
  - special cases for small object sizes (1, 2, 3, 5, 7)
  - at small sizes and "reasonably" sized objects(i.e. std::string -> std::string), competitive or better than the STL containers
- has built-in protections against dereferencing end (which has been responsible for some security bugs). That being said, I think libc++ is slowly gaining these protections as well.
- codegen is comparable to STL types

Cons:
- base::flat_map is strictly better from a size perspective
  - an empty flat_map is 3 words; an empty flat_hash_map is 5 words.
  - the max load factor for flat_hash_map is 87.5%, so it's utilization will always be somewhat worse than base::flat_map
- Large key/value types are more of a drag on absl::flat_hash_map. That being said, base::flat_map suffers from this problem too—and the solution to this is to use absl::node_hash_map or wrap the value in std::unique_ptr.

Curious what people think about these points (or who would need to sign-off/approve these sorts of changes).

I think your conclusions sound very reasonable.

It's hard to tell if it is going off the rails by changing default guidance though. Without actually rewriting everything in prod, I wonder if it's possible to do some hacky rewrite of all our base::flat* to absl types and look at the APK size and run speedometer locally (without PGO) just to make sure it's not super regrettable at scale?

I already did this experiment with std::unordered_map -> absl::flat_hash_map. IIRC binary size didn't increase. I did not attempt any speedometer benchmarks.

flat_map -> flat_hash_map is harder because someone would have to implement a lot of hashing functions.

Also, I forgot: the other complexity with absl::flat_hash_map is we should probably use absl's customization point for hashing rather than std::hash. But that's something we could incrementally migrate.

Daniel

John Admanski

unread,
Apr 29, 2024, 6:10:24 PMApr 29
to Daniel Cheng, danakj, cxx, Lei Zhang
If the choice for a default map type is flat_hash_map vs std::map, it would seem to me that the flat hash maps are just generally better? Maybe I'm misreading the current container guidance, but I don't think the current guidance is really pushing flat_map as the default. I'd say it does fairly aggressively push it as a good choice if you know the map size is small, but that doesn't seem like something that would need to change.

To me, I think the minimal change to the recommendations would be to replace std::map (and std::set) as the default containers, and relegate them to the relatively rare cases where you absolutely need the container to be ordered. But don't mess with the recommendations to use base::flat_map when the max size of the container is known to be small.

Jeremy Roman

unread,
May 1, 2024, 3:19:34 PMMay 1
to John Admanski, Daniel Cheng, danakj, cxx, Lei Zhang
Also because some code may rely on the ordering guarantee that flat_map provides.
 

Peter Kasting

unread,
Jun 10, 2024, 7:03:34 PMJun 10
to Daniel Cheng, John Admanski, cxx, Lei Zhang
On Mon, Apr 29, 2024 at 12:26 PM Daniel Cheng <dch...@chromium.org> wrote:
Fair point. I'll try to finish cleaning up the docs around this to allow using it (but won't update the default recommendations for maps yet).

I haven't seen a cxx@ thread proposing allowing these yet. Are we ready to have one, separate from the discussion on defaults?

... to start the discussion around default recommendations, my belief is that we should default to using absl::flat_hash_map everywhere.

Pros:
- decent default behaviour in most cases.
  - an empty collection is about the same size as a std::unordered_map.
  - insert/removal are O(1) instead of O(n) like base::flat_map (which is the current default recommendation; this has caused perf issues in a number of places)
  - special cases for small object sizes (1, 2, 3, 5, 7)
  - at small sizes and "reasonably" sized objects(i.e. std::string -> std::string), competitive or better than the STL containers
- has built-in protections against dereferencing end (which has been responsible for some security bugs). That being said, I think libc++ is slowly gaining these protections as well.
- codegen is comparable to STL types

Cons:
- base::flat_map is strictly better from a size perspective
  - an empty flat_map is 3 words; an empty flat_hash_map is 5 words.
  - the max load factor for flat_hash_map is 87.5%, so it's utilization will always be somewhat worse than base::flat_map
- Large key/value types are more of a drag on absl::flat_hash_map. That being said, base::flat_map suffers from this problem too—and the solution to this is to use absl::node_hash_map or wrap the value in std::unique_ptr.

Curious what people think about these points (or who would need to sign-off/approve these sorts of changes).

Directionally, this seems OK.

Would MakeFixedFlatMap still return a base::flat_map (since insert/remove perf don't matter), or an absl::flat_hash_map (since I assume lookup would then be O(1) instead of O(lg n))?

I don't think any particular approval is needed, in part because there really isn't a "current default recommendation". https://chromium.googlesource.com/chromium/src/+/main/base/containers/README.md#Map-and-set-selection-Usage-advice is the closest we have, but that doesn't carry any particular weight, and I'd be surprised if most folks on the team have heard of it.

To me, the biggest win would be if we could decide to remove some container types because others are "good enough". No matter how much guidance we give and how accurate it is, just having to know it (or know where to find it) to answer Yet Another Design Question is a papercut for engineers. Can we kill base::small_map, for example? Could we kill base::flat_map with a combo of absl::flat_hash_map and other absl:: and std:: types?

There is another problem too, which is that Mojo has settled on base::flat_map as its default container. While it's possible to use typemapping to use an alternate map type instead, this is fairly unwieldy.

Should we introduce a type alias (currently assumed to be flat_map) that represents "the map type mojo uses for this type" (or at least "the map type mojo uses by default")? Then we could convert direct flat_map usage that corresponds to that to use that alias, which would make changing Mojo's default (either sooner or later) feasible.

In fact, Mojo maps have another problem: the Mojo `map` type is sorted in regular variants... and unsorted in the Blink variants. This isn't great, and I kind of feel like we should probably migrate explicitly to sorted_map (which would be uniformly sorted in both variants) and unsorted_map (which would be uniformly unsorted in both variants)...

That sounds like something worth filing a bug on.

PK
Reply all
Reply to author
Forward
0 new messages