N3573: Why heterogenous extensions to unordered containers will not work

475 views
Skip to first unread message

jgot...@gmail.com

unread,
Apr 8, 2013, 8:36:46 PM4/8/13
to std-pr...@isocpp.org
N3573 suggests allowing lookup into unordered sets and maps by alternative keys, as long as operator== can compare both types and the hashes of the two types are compatible.  The use case given is

 std::unordered_map<std::string, int> map;
 map
.find("foo"); // We don't have to construct string("foo") to do the find.


While this would be useful, there is too much chance of two types being equality-comparable but having incompatible hashes.  For example, under gcc 8.0,
std::hash<long>(-1L) == 18446744073709551615
and
std::hash<double>(-1.0) == 11078049357879903929


Thus, if N3573 were accepted, this code  would fail:

std:unordered_set<double> s;
s
.insert(-1L); // Internally converts -1L to -1.0 and hashes that
assert(s.find(-1L) != s.end()); // fails. find calls hash<long>(-1L) and gets the wrong bucket


Note that under C++11 rules this code succeeds, because find also converts its parameter to double before hashing.

Joe Gottman

DeadMG

unread,
Apr 9, 2013, 6:58:15 AM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com
Actually, I don't believe that exact use case *is* in the paper. And, whilst it's nice, it would depend on std::hash<const char*> treating it's argument like a C string rather than a generic pointer, and I don't know if it does. If so, then that would certainly be a nice use case- and you could certainly provide a custom hasher for one lookup that could perform this using another of the features in N3573.

Whilst I admit that I overlooked implicit conversions in this case, it's not a breaking problem, I'll just have to provide a new name, like, find_by or some such (and the same for erasure). That also means I'd have to use SFINAE to limit operator[] too, since that cannot be given another name, or perhaps some kind of trait. It would have been nice to have something like

template<typename T, typename U> void find(std::unordered_set<T> t, U&& u) {
    t
.find(u);
}

function for any U, I doubt any such code exists, so I'm not that bothered by losing it.

Secondly, there are a couple of other things I wanted to mention about N3573. The proposal is based on my own experience of the limitations of these Standard containers for move-only types, but in fact, there are a few others I failed to address simply because I didn't come across them in my experience. Specifically, std::map and std::set will need a similar treatment, and secondly, I did not consider a moving erase. Currently, it is impossible to move an element out of a map/set/unordered_map/unordered_set, which is a bit painful IYAM- my previous use cases simply did not require it, so I didn't consider it. Obviously this would have to be move-out-and-erase in one step to prevent violating the invariants of the container.

sghe...@gmail.com

unread,
Apr 9, 2013, 7:38:38 AM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com
On Tuesday, April 9, 2013 12:58:15 PM UTC+2, DeadMG wrote:
... and secondly, I did not consider a moving erase. Currently, it is impossible to move an element out of a map/set/unordered_map/unordered_set, which is a bit painful IYAM- my previous use cases simply did not require it, so I didn't consider it. Obviously this would have to be move-out-and-erase in one step to prevent violating the invariants of the container.

Have you had look at proposal n3586 "Splicing Maps and Sets"? Seems to address the same usecase area:

As technically a splice method is not possible for maps or sets, the goal is to add a new function called remove, which returns a unique pointer holding the node, and add a new overload to insert, allowing to insert the node into the new container 
 

Seth 

DeadMG

unread,
Apr 9, 2013, 7:40:25 AM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com, sghe...@gmail.com
That refers to moving a node out of one set and into another, whereas I am referring to moving the *value* out of a set and then just... it's a value.

Róbert Dávid

unread,
Apr 9, 2013, 9:11:14 AM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com


2013. április 9., kedd 2:36:46 UTC+2 időpontban jgot...@gmail.com a következőt írta:
While this would be useful, there is too much chance of two types being equality-comparable but having incompatible hashes.  For example, under gcc 8.0,
std::hash<long>(-1L) == 18446744073709551615
and
std::hash<double>(-1.0) == 11078049357879903929



If a == b, then I would expect that hash(a) == hash(b). If a language does not exhibit this, then there is something wrong - from a user's perspective hashing is useless if it produces different results for equal values... Implementer's perspective is a different question..

Regards, Robert

PS: What is gcc 8.0?

DeadMG

unread,
Apr 9, 2013, 9:21:39 AM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com
He probably means 4.8.0. The Standard doesn't currently require any such thing for cross-type hashes in the general case, only a few specific instances.

Olaf van der Spek

unread,
Apr 9, 2013, 2:56:53 PM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com


Op dinsdag 9 april 2013 15:11:14 UTC+2 schreef Róbert Dávid het volgende:
If a == b, then I would expect that hash(a) == hash(b). If a language does not exhibit this, then there is something wrong - from a user's perspective hashing is useless if it produces different results for equal values...


How'd you implement that for float vs int?

jgot...@gmail.com

unread,
Apr 9, 2013, 10:22:47 PM4/9/13
to std-pr...@isocpp.org, jgot...@gmail.com

On Tuesday, April 9, 2013 6:58:15 AM UTC-4, DeadMG wrote:
Actually, I don't believe that exact use case *is* in the paper. And, whilst it's nice, it would depend on std::hash<const char*> treating it's argument like a C string rather than a generic pointer, and I don't know if it does. If so, then that would certainly be a nice use case- and you could certainly provide a custom hasher for one lookup that could perform this using another of the features in N3573.

Whilst I admit that I overlooked implicit conversions in this case, it's not a breaking problem, I'll just have to provide a new name, like, find_by or some such (and the same for erasure). That also means I'd have to use SFINAE to limit operator[] too, since that cannot be given another name, or perhaps some kind of trait. It would have been nice to have something like

template<typename T, typename U> void find(std::unordered_set<T> t, U&& u) {
    t
.find(u);
}

function for any U, I doubt any such code exists, so I'm not that bothered by losing it.

 
Actually, this is possible.  You just need to overload operator() in your hash object.  Thus, for hash<unique_ptr<X>>, as well as defining operator()(const unique_ptr<X> &), you would also define operator()(X *).  Then, your templated find method would always call hash<unique_ptr<X>>()(u), but which overload it chooses would depend on the type of u.

Joe Gottman

Jim Porter

unread,
Apr 12, 2014, 2:29:23 AM4/12/14
to std-pr...@isocpp.org
Sorry for resurrecting a dead thread, but I just came across this issue doing something similar to the motivation in N3573.


On Monday, April 8, 2013 7:36:46 PM UTC-5, jgot...@gmail.com wrote:
N3573 suggests allowing lookup into unordered sets and maps by alternative keys, as long as operator== can compare both types and the hashes of the two types are compatible.  The use case given is

 std::unordered_map<std::string, int> map;
 map
.find("foo"); // We don't have to construct string("foo") to do the find.

[snip]

Thus, if N3573 were accepted, this code  would fail:

std:unordered_set<double> s;
s
.insert(-1L); // Internally converts -1L to -1.0 and hashes that
assert(s.find(-1L) != s.end()); // fails. find calls hash<long>(-1L) and gets the wrong bucket


Would it be sufficient for std::unordered_set to default to using std::hash<Key> instead of std::hash<void>? This would force you to explicitly enable this behavior, preventing bad things from happening by default (essentially like std::map and std::less<void>). I think you'd still get some degree of assurance that the key type is compatible with the type you're using in find() thanks to std::equal_to (again, much like std::less).

In any case, evil implementations of operator< could cause the same problem in std::map.

- Jim

DeadMG

unread,
Apr 16, 2014, 1:34:29 PM4/16/14
to std-pr...@isocpp.org
No, no it would not. Consider the original motivation in the paper- unique_ptr<T> and T*. Since there's already a hash specialization for unique_ptr, you could not add one for T* later. Obviously for a Standard type this is not a big deal but you cannot go back and enable it for, say, boost::shared_ptr after the fact. This would only work if the author of some given T knows all the possible types U which might be meaningful here, and that's just not a very good system.

Better would be to use a separate trait to control which types are and are not eligible.

Jim Porter

unread,
Apr 17, 2014, 6:19:18 PM4/17/14
to std-pr...@isocpp.org
On Wednesday, April 16, 2014 12:34:29 PM UTC-5, DeadMG wrote:
No, no it would not. Consider the original motivation in the paper- unique_ptr<T> and T*. Since there's already a hash specialization for unique_ptr, you could not add one for T* later.

Ah, yeah, that would be a problem. (I assume the issue here is divergent hash specializations

Better would be to use a separate trait to control which types are and are not eligible.

If N3573 dropped the hash<void> proposal, you could probably handle this by requiring people to write their own hashers that only accept the eligible types.

Maybe I should explain the use case I care about at the moment: I have a map whose key_type is a tuple of value types, but I want to be able to perform lookups on the map with a tuple of reference types. Implementing my own hash function that just combines the hashes of the tuple's members would be totally acceptable for me.

My original idea before seeing N3573 was to add a member function to unordered_map that retrieves the bucket associated with a particular hash value. Then I could just perform the hash myself and do the lookup of the value on my own. That's obviously not the ideal way to do this, although I do think that would be a (somewhat) useful function to have as a complement to unordered_map::bucket(const key_type&).

- Jim

Howard Hinnant

unread,
Apr 18, 2014, 10:52:41 AM4/18/14
to std-pr...@isocpp.org
On Apr 17, 2014, at 6:19 PM, Jim Porter <jvp...@g.rit.edu> wrote:

> Maybe I should explain the use case I care about at the moment: I have a map whose key_type is a tuple of value types, but I want to be able to perform lookups on the map with a tuple of reference types. Implementing my own hash function that just combines the hashes of the tuple's members would be totally acceptable for me.

With this proposal:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

which has not yet been submitted, one can do this:

char a = 'a';
int i = 3;
double d = 2.5;
auto t1 = std::make_tuple(a, i, d);
auto t2 = std::tie (a, i, d);
assert(uhash<>{}(t1) == uhash<>{}(t2));

One can also easily change the assert to:

assert(uhash<fnv1a>{}(t1) == uhash<fnv1a>{}(t2));

or

assert(uhash<spooky>{}(t1) == uhash<spooky>{}(t2));

or

assert(uhash<siphash>{}(t1) == uhash<siphash>{}(t2));

Howard

Jim Porter

unread,
Apr 19, 2014, 11:46:48 AM4/19/14
to std-pr...@isocpp.org
On Friday, April 18, 2014 9:52:41 AM UTC-5, Howard Hinnant wrote:
On Apr 17, 2014, at 6:19 PM, Jim Porter <jvp...@g.rit.edu> wrote:

> Maybe I should explain the use case I care about at the moment: I have a map whose key_type is a tuple of value types, but I want to be able to perform lookups on the map with a tuple of reference types. Implementing my own hash function that just combines the hashes of the tuple's members would be totally acceptable for me.

With this proposal:

http://htmlpreview.github.io/?https://github.com/HowardHinnant/papers/blob/master/hashing.html

I suppose this would still have the problem mentioned in the original post, namely that uhash<>{}(1.0) != uhash<>{}(1). However, I think this would resolve DeadMG's issue by providing a specification for what data is hashed for each type. Thus, you wouldn't have divergent specializations of hash functions (or at least, it would be no more likely than writing an "evil" less-than operator).

There'd still need to be some way of explicitly enabling the heterogeneous extensions to unordered_map, though.

- Jim
Reply all
Reply to author
Forward
0 new messages