Lookup in unordered_map and unordered_set by hash

1,211 views
Skip to first unread message

DeadMG

unread,
Nov 8, 2012, 4:20:34 PM11/8/12
to std-pr...@isocpp.org
There's a fairly major deficiency in the interface of the Standard unordered containers. Firstly, you can't look up by types other than the key type. Instead, lookup should be viable for any type where hash(lookup) == hash(stored key), if lookup == stored key. For example, unique_ptr<T> or shared_ptr<T> should be able to be looked up as T*. Secondly, you can't cache the output of the hash function, because it's called internally. Thus, it should provide the ability to provide a different hash function for a specific argument if necessary.

Thus, I propose two new functions to be added to unordered_map, and one to unordered_set (and some const varieties)

template<typename T, typename Hash = std::hash<T>> iterator find(T t, Hash h = Hash());
template<typename T, typename Hash = std::hash<T>> const_iterator find(T t, Hash h = Hash()) const;

Finds the position which corresponds to the argument t. Requires that h(t) == container_hash_argument(key) if t == key for any key in the unordered_map or unordered_set.

template<typename T> value& operator[](T t);
template<typename T> const value& operator[](T t) const;

Returns a [const] reference to the value at the key t. Requires that std::hash<T>()(t) == container_hash_argument(key) if t == key, for any key. If no key is found, an exception must be thrown. Some similar erase() overloads may also be warranted.

This would allow some additional use cases which are currently (needlessly) not permitted as Standard, both of which I have run into in real application code. For example,

std::unordered_set<std::unique_ptr<T>> x;
// fill x
int* ptr = ...;
if (x.find(ptr) == x.end()) {
}

class X {
    std::unordered_map<std::string, int>* map;
    std::string value = "some value";
    bool dirty;
    std::size_t hash;
public:
    int& lookup() {
        // Currently- forcing a needless hash if !dirty
        auto it = map->find(value, [this](std::string&) {
            if (!dirty) return hash;
            else return hash = std::hash<std::string>()(value);
         });
         return it->second;
     }    
};

Whilst looking up into a hash container is O(k) in the length of the key, although often referred to as O(1) as it does not depend on the size of the container, this cache scheme can bring it to true O(1), potentially saving many cycles in applications that depend heavily in hash table performance.

Olaf van der Spek

unread,
Nov 9, 2012, 7:32:31 AM11/9/12
to std-pr...@isocpp.org
On Thursday, November 8, 2012 10:20:34 PM UTC+1, DeadMG wrote:
There's a fairly major deficiency in the interface of the Standard unordered containers. Firstly, you can't look up by types other than the key type. Instead, lookup should be viable for any type where hash(lookup) == hash(stored key), if lookup == stored key. For example, unique_ptr<T> or shared_ptr<T> should be able to be looked up as T*. Secondly, you can't cache the output of the hash function, because it's called internally. Thus, it should provide the ability to provide a different hash function for a specific argument if necessary.

Thus, I propose two new functions to be added to unordered_map, and one to unordered_set (and some const varieties)

template<typename T, typename Hash = std::hash<T>> iterator find(T t, Hash h = Hash());
template<typename T, typename Hash = std::hash<T>> const_iterator find(T t, Hash h = Hash()) const;

Does this work with the default std::equal_to<Key>?
Even if hashes match, the actual key has to be compared. And (unfortunately) std::equal_to requires both of it's arguments to be of the same type.


rhalb...@gmail.com

unread,
Nov 9, 2012, 10:29:53 AM11/9/12
to std-pr...@isocpp.org
On Thursday, November 8, 2012 10:20:34 PM UTC+1, DeadMG wrote:
Secondly, you can't cache the output of the hash function, because it's called internally. 

IIRC, the original SGI implementation of the unordered_xxx family, is implemented in terms of a hash_table container ( http://www.sgi.com/tech/stl/stl_hashtable.h ) that has an extra Extract template parameter. The unordered_map simply contains a hash_table<Key, T, GetFirst, Hash, KeyEqual, Allocator> object, where GetFirst extracts the first field from the std::pair<Key, T>. But Extract could of course be a user-defined function object to extract a possibly cached hash from the Key. Exposing the Extract paramet to the general unorderd_xxx family's interface would solve caching of hashes.

rhalb...@gmail.com

unread,
Nov 9, 2012, 10:31:56 AM11/9/12
to std-pr...@isocpp.org, rhalb...@gmail.com
Oh, and Boost.MultiIndex also uses key extraction template parameters. 

DeadMG

unread,
Nov 9, 2012, 2:11:02 PM11/9/12
to std-pr...@isocpp.org, rhalb...@gmail.com
No, that completely wouldn't work. The caching needs to be per object/lookup, not embedded in the key type or the hash table (presumably that already caches hashes for stored keys).

As for std::equal_to<T>, you're right that it's problematic. I thought that Stephan's proposal to make them polymorphic was accepted, in which case it would work. But if it doesn't work, then we will have to specify operator== directly rather than std::equal_to, then.

Olaf van der Spek

unread,
Nov 10, 2012, 7:45:16 AM11/10/12
to std-pr...@isocpp.org, rhalb...@gmail.com
Op vrijdag 9 november 2012 20:11:02 UTC+1 schreef DeadMG het volgende:

As for std::equal_to<T>, you're right that it's problematic. I thought that Stephan's proposal to make them polymorphic was accepted, in which case it would work.

Didn't know about that proposal, in that case I assume it's a non-issue.
 
But if it doesn't work, then we will have to specify operator== directly rather than std::equal_to, then.

As in removing the template parameter KeyEqual?

BTW, if you use maps (a lot), a function like my find_ptr might be handy:

template <class T, class U>
const typename T::value_type::second_type* find_ptr(const T& c, U v)
{
        typename T::const_iterator i = c.find(v);
        return i == c.end() ? NULL : &i->second;
} It simplifies the majority of use cases of find. http://code.google.com/p/xbt/source/browse/trunk/xbt/misc/xbt/find_ptr.h

DeadMG

unread,
Nov 13, 2012, 12:03:58 PM11/13/12
to std-pr...@isocpp.org, rhalb...@gmail.com
No- the equality parameter would still be used for all K-K comparisons, just not K-T comparisons. I'll add an equality comparator functor to the argument list, I think.

DeadMG

unread,
Nov 13, 2012, 6:38:30 PM11/13/12
to std-pr...@isocpp.org
A quick sample proposal is up at http://codepuppy.co.uk/HashProposal.aspx. Unstyled version attached.


hashprop.html
Reply all
Reply to author
Forward
0 new messages