Re: [std-discussion] unordered_map iterator validity and performance

1,211 views
Skip to first unread message

Andrew Tomazos

unread,
Apr 10, 2019, 11:18:08 AM4/10/19
to std-discussion
I don't think your analysis is correct.  When there is a lookup miss (ie .find returns .end) each of the logN tables need to be checked.  This is an O(logN) operation.  If we assumed a fixed proportion of lookups will be misses, then lookup is itself O(logN), which violates performance requirement.

On Wed, Apr 10, 2019 at 1:26 AM <wolfgan...@gmail.com> wrote:
proposed_unordered_map.png
It is commonly accepted that a hashmap with iterators that stay valid even after inserting or deleting other elements to and from the hashmap needs to be implemented using pointers.
This is the reason, so I was told, that the unordered_map in gcc and clang are implemented the way they are and why they are slow. But I don't think that this is actually the case.
Consider the following:
Using a probing strategy that does not rely on moving elements after insertion or deletion, like plain double hashing, one can construct a fixed size hash table that has valid iterators, even when inserting and deleting elements, so long as the table is not full. If now, when it does get full, instead of rehashing this table, one constructs a hash table of double the size for the new keys then one would have a data structure that actually achieves what the standard wants, but with open addressing.
Now you might think that this could not guarantee O(1) access times on average, but finding the element in the first table has a probability of around 0.5, finding it in the next smaller table has probability 0.25 and so on to still yield O(1) on average (geometric series).
This implementation cannot possibly be really fast, because on average you are doubling the number of cache misses needed to retrieve an element. On the other hand, the additional pointers in the std::unordered_map make it slow too. So I wrote a small prototype and it runs quite well for random integer insertion, lookup and insertion, faster even and more memory efficient than the std::unordered_map. And all that with a consistent memory layout, pointers to elements that have not been deleted will stay valid as long as the hashmap lives.

What do you think?

If you think this is a good idea I'd be happy to put in the work to make this hash table implementation standard conforming, the version right now is just to showcase that it can be done.

--

---
You received this message because you are subscribed to the Google Groups "ISO C++ Standard - Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to std-discussio...@isocpp.org.
To post to this group, send email to std-dis...@isocpp.org.
Visit this group at https://groups.google.com/a/isocpp.org/group/std-discussion/.
Reply all
Reply to author
Forward
0 new messages