PSA: ScopedPtrMap now available!

61 views
Skip to first unread message

Matt Giuca

unread,
Jun 25, 2015, 5:36:26 AM6/25/15
to chromium-dev, Dana Jansens
You know how ScopedVector lets you store a vector of raw pointers and automatically deletes them when they're removed from the vector or the vector is destroyed? But there's nothing quite like that for std::map?

Now there is!


And replace "std::map<X, Y*>" with "base::ScopedPtrMap<X, scoped_ptr<Y>>".

Just remember to remove any STLValueDeleters, or manual deletion of pointers (or you'll have a double-free)!

This beats manual pointer management, or the existing automated solutions (STLValueDeleter, STLDeleteValues, and friends). In fact, I just went through and updated (almost) every usage of STLValueDeleter with ScopedPtrMap, and it generally made things a lot nicer and harder to accidentally leak memory.

One day (when we get C++11 library support on Mac --- hopefully soon), we can drop this and just use std::map<X, scoped_ptr<Y>>, but in the mean time, this is the next best thing.

Some caveats:
  • The second type parameter must be a scoped_ptr<Y>. Why not just Y like ScopedVector? Because when we migrate over to std::map, the transition will be a lot smoother.
  • Keys that are pointers DO NOT get deleted automatically, only values.
  • There is no operator[]. Sorry, it wasn't possible to get this to work the way you would expect, so you should use .find() and .insert() instead (which are generally better anyway because they are explicit about failure and creation of default values).
  • Iterators are a bit weird, because they contain raw pointers, not scoped_ptrs. If you use "it.second->blah" then this is not a problem, but if you do anything else with it.second, the illusion of having a map of scoped_ptrs will be broken, and your code will have to be modified when we transition over to std::map. Also, there are no non-const iterators for this reason.
Otherwise, you should now be able to happily store scoped_ptrs in the ScopedPtrMap snug and safe in the knowledge that they'll be cleaned up when the map goes away, or when you erase() them.

If you're curious about what client code should look like, have a look at my CL that converts a bunch of existing clients over. Who knows, maybe your code has already been converted!

Cheers

Matt

(PS. Thanks Dana Jensens for many rounds of reviews and collaboration.)

Primiano Tucci

unread,
Jun 25, 2015, 6:23:30 AM6/25/15
to Matt Giuca, chromium-dev, Dana Jansens
I might be making a very stupid question but... what is the difference between the new base/containers/scoped_ptr_map.h and the existing base/containers/scoped_ptr_hash_map.h ?

In other words when am I expected to use  base::ScopedPtrHashMap<key_type, scoped_ptr<X>> vs base::ScopedPtrMap<key_type, scoped_ptr<X>> ?
The outer interface looks similar to me.


--
--
Chromium Developers mailing list: chromi...@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev

Primiano Tucci

unread,
Jun 25, 2015, 6:32:13 AM6/25/15
to Matt Giuca, chromium-dev, Dana Jansens
Ehm, I should get probably more coffee in the morning. hash != map, ordered != unordered ... facepalm

As a side note, some of the replacements in that CL that converted map -> ScopedPtrMap use strings as key. Assuming that lexicographic order doesn't matter, is ScopedPtrMap preferred to ScopedPtrHashMap ?

Matt Giuca

unread,
Jun 25, 2015, 6:48:29 AM6/25/15
to Primiano Tucci, chromium-dev, Dana Jansens
I don't think there is a specific preference for ScopedPtrMap or ScopedPtrHashMap (use your judgement same as std::map vs base::hash_map). I think std::map / ScopedPtrMap is generally the "default" just because it's been in C++ for approximately ever, whereas hash maps have not been in the standard library until C++11.

(Note: I just converted std::map to ScopedPtrMap, without consideration for whether it made sense to use an unordered map instead.)

Vincent Scheib

unread,
Jun 25, 2015, 4:28:02 PM6/25/15
to Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
On a previous project we moved all maps to hash_maps due to performance.

Brett Wilson

unread,
Jun 25, 2015, 5:01:23 PM6/25/15
to Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
If you have a good reason, it's great to use a hash map. But we should default to regular maps without a good reason. Hashing functions are more commonly implemented on objects and much easier to screw up than operator<. And for some workloads they're actually slower and can use more memory.

Brett

Antoine Labour

unread,
Jun 25, 2015, 5:48:27 PM6/25/15
to Brett Wilson, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
On Thu, Jun 25, 2015 at 2:00 PM, Brett Wilson <bre...@chromium.org> wrote:
If you have a good reason, it's great to use a hash map. But we should default to regular maps without a good reason. Hashing functions are more commonly implemented on objects and much easier to screw up than operator<. And for some workloads they're actually slower and can use more memory.

I disagree :) You should use hash maps by default, and only use maps if you actually need ordering of keys.
Hash maps use less memory almost always (the per-node overhead is 2 pointers instead of 3 pointers + a bool) and lookups are not only faster themselves, but also don't pollute your data cache because of the tree traversal required for the map. That's especially bad if the operator< is expensive (e.g. strings).

Antoine


Brett
To unsubscribe from this group and stop receiving emails from it, send an email to chromium-dev...@chromium.org.

Nico Weber

unread,
Jun 25, 2015, 6:23:07 PM6/25/15
to Antoine Labour, Brett Wilson, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
On Thu, Jun 25, 2015 at 2:47 PM, 'Antoine Labour' via Chromium-dev <chromi...@chromium.org> wrote:


On Thu, Jun 25, 2015 at 2:00 PM, Brett Wilson <bre...@chromium.org> wrote:
If you have a good reason, it's great to use a hash map. But we should default to regular maps without a good reason. Hashing functions are more commonly implemented on objects and much easier to screw up than operator<. And for some workloads they're actually slower and can use more memory.

I disagree :) You should use hash maps by default, and only use maps if you actually need ordering of keys.
Hash maps use less memory almost always (the per-node overhead is 2 pointers instead of 3 pointers + a bool) and lookups are not only faster themselves, but also don't pollute your data cache because of the tree traversal required for the map. That's especially bad if the operator< is expensive (e.g. strings).

Meh, std::unsorted_map is pretty slow for a hash map, due to its generality. If perf matters, you probably want to roll your own, and if it doesn't matter it doesn't matter. 

Brett Wilson

unread,
Jun 25, 2015, 6:32:35 PM6/25/15
to Antoine Labour, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
On Thu, Jun 25, 2015 at 2:47 PM, Antoine Labour <pi...@google.com> wrote:


On Thu, Jun 25, 2015 at 2:00 PM, Brett Wilson <bre...@chromium.org> wrote:
If you have a good reason, it's great to use a hash map. But we should default to regular maps without a good reason. Hashing functions are more commonly implemented on objects and much easier to screw up than operator<. And for some workloads they're actually slower and can use more memory.

I disagree :) You should use hash maps by default, and only use maps if you actually need ordering of keys.
Hash maps use less memory almost always (the per-node overhead is 2 pointers instead of 3 pointers + a bool) and lookups are not only faster themselves, but also don't pollute your data cache because of the tree traversal required for the map. That's especially bad if the operator< is expensive (e.g. strings).

If you've got a map of strings this is fine.

But I trust about three people on our team to write hash functions for random objects. The thought of everybody writing hash functions because they want to use something in a map terrifies me.

Brett

Antoine Labour

unread,
Jun 25, 2015, 6:43:31 PM6/25/15
to Brett Wilson, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
If you have a id->something map (id being an int of some size or a pointer), the hash is trivial, and a hash map should work well. This is a very common pattern.

We have base::HashPair which is good enough to hash a pair of at-most-pointer-sized-things (including hashes), and by iteration should give something good for a value type (struct, etc.). If what you're hashing is not a value type, you're probably doing it wrong.

Antoine


Brett

Jeffrey Yasskin

unread,
Jun 25, 2015, 6:52:02 PM6/25/15
to Nico Weber, Antoine Labour, Brett Wilson, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
On Thu, Jun 25, 2015 at 3:22 PM, Nico Weber <tha...@chromium.org> wrote:
> On Thu, Jun 25, 2015 at 2:47 PM, 'Antoine Labour' via Chromium-dev
> <chromi...@chromium.org> wrote:
>>
>>
>>
>> On Thu, Jun 25, 2015 at 2:00 PM, Brett Wilson <bre...@chromium.org> wrote:
>>>
>>> If you have a good reason, it's great to use a hash map. But we should
>>> default to regular maps without a good reason. Hashing functions are more
>>> commonly implemented on objects and much easier to screw up than operator<.
>>> And for some workloads they're actually slower and can use more memory.
>>
>>
>> I disagree :) You should use hash maps by default, and only use maps if
>> you actually need ordering of keys.
>> Hash maps use less memory almost always (the per-node overhead is 2
>> pointers instead of 3 pointers + a bool) and lookups are not only faster
>> themselves, but also don't pollute your data cache because of the tree
>> traversal required for the map. That's especially bad if the operator< is
>> expensive (e.g. strings).
>
>
> Meh, std::unsorted_map is pretty slow for a hash map, due to its generality.
> If perf matters, you probably want to roll your own, and if it doesn't
> matter it doesn't matter.

Clearly we should be using WTF::HashMap instead. ;)

Yuri Wiitala

unread,
Jun 25, 2015, 7:10:47 PM6/25/15
to jyas...@chromium.org, Nico Weber, Antoine Labour, Brett Wilson, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
You can't generalize whether to use maps or hash_maps.  It depends on a number of factors.  OTOH, things that haven't already been mentioned on this thread:

1. How many entries?  Less than 1,000 and it may not matter much.  Less than 50 entries, and you might get better performance using a vector.

2. String keys: How long is a typical string key?  How random are the beginnings of the string keys?  Without considering this, operator<() with O(lg n) map look-ups may typically out-perform hash() with O(1+) hash_map look-ups.

3. Did you account for the caching behavior of the hardware platform?  Is the map frequently accessed in tight loops, or just one-off operations after lots of other code executes?

4. What is the expected rate of look-ups versus mutations?

5. If performance is critical, you must do benchmarking.  Carefully thinking through how each should perform for a particular use case is not enough, as actual run-time behavior will often prove you surprisingly wrong.


Antoine Labour

unread,
Jun 25, 2015, 7:28:18 PM6/25/15
to Yuri Wiitala, Jeffrey Yasskin, Nico Weber, Brett Wilson, Vincent Scheib, Matt Giuca, Primiano Tucci, chromium-dev, Dana Jansens
On Thu, Jun 25, 2015 at 4:10 PM, Yuri Wiitala <m...@chromium.org> wrote:
You can't generalize whether to use maps or hash_maps.  It depends on a number of factors.

Sure. But I would like us to come up with a good "do this by default unless you looked at the implications and something else is better in your case" advice.

In particular things such as cache footprint have very non-local effects. While it may have only limited (or at least proportional) impact on the local code, cache evictions impact a cost on the rest of the code (and beware of microbenchmarks that don't take this into account). Because cache is a shared resource (and quite limited at that) I would prefer if we advised towards algorithms/containers that are more cache-friendly by default.

Antoine
Reply all
Reply to author
Forward
0 new messages