Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Hashing for unordered containers?

73 views
Skip to first unread message

Jorgen Grahn

unread,
Oct 11, 2020, 4:37:23 AM10/11/20
to
It strikes me I still don't know the best way to do this.

When I write small classes (e.g. a class Photo for the file name of a
photo) I tend to add an operator< because:

- It's easy.
- It's natural; it makes sense to show there's an ordering.
- I can then use Photo as a map or set key whenever I want.

However, when I want to use Photo as a key in unordered_map or
unordered_set (which is at least as reasonable as using it in a map or
set) what I end up doing is:

- Add a size_t Photo::hash() const member, implemented in terms
of funny things like std::hash<std::string> {} ().
- Specialize the std::hash class for Photo, and arrange for its
operator() to call Photo::hash().

That's a lot of boilerplate, compared to operator<. Is there a better
way?

I know I can do things to the std::unordered_set itself, but would
prefer not to (and I doubt there would be less boilerplate, even for
a single instantiation).

My context is C++11 or C++14; a C++17 or later solution would be
interesting, but of no immediate use to me.

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .

Bonita Montero

unread,
Oct 11, 2020, 7:20:32 AM10/11/20
to
> However, when I want to use Photo as a key in unordered_map or
> unordered_set (which is at least as reasonable as using it in a map or
> set) what I end up doing is:
> - Add a size_t Photo::hash() const member, implemented in terms
> of funny things like std::hash<std::string> {} ().
> - Specialize the std::hash class for Photo, and arrange for its
> operator() to call Photo::hash().
> That's a lot of boilerplate, compared to operator<. Is there a better
> way?

1. A unordered map is usually faster.
2. You can define the hashing-class as a lambda and use
decltype(hc) as the teplate-parameter.

Öö Tiib

unread,
Oct 11, 2020, 8:55:01 AM10/11/20
to
On Sunday, 11 October 2020 11:37:23 UTC+3, Jorgen Grahn wrote:
> It strikes me I still don't know the best way to do this.
>
> When I write small classes (e.g. a class Photo for the file name of a
> photo) I tend to add an operator< because:
>
> - It's easy.
> - It's natural; it makes sense to show there's an ordering.
> - I can then use Photo as a map or set key whenever I want.
>
> However, when I want to use Photo as a key in unordered_map or
> unordered_set (which is at least as reasonable as using it in a map or
> set) what I end up doing is:
>
> - Add a size_t Photo::hash() const member, implemented in terms
> of funny things like {} ().
> - Specialize the std::hash class for Photo, and arrange for its
> operator() to call Photo::hash().
>
> That's a lot of boilerplate, compared to operator<. Is there a better
> way?

If you always hash photos in same way then it is perhaps fine like that:

namespace std {
template <>
struct hash<Photo> {
size_t operator()(const Photo& p) const noexcept {
return std::hash<std::string>{}(p.name());
}
};
}

8 lines is not lot. The std::equal_to uses Photo::operator== already
so (unless your case needs some potentially confusing differences)
that isn't needed to specialize.

> I know I can do things to the std::unordered_set itself, but would
> prefer not to (and I doubt there would be less boilerplate, even for
> a single instantiation).

The plain std::unordered_set<Photo> is fine when std::hash<Photo>
and std::equal_to<Photo> compile.

> My context is C++11 or C++14; a C++17 or later solution would be
> interesting, but of no immediate use to me.

What I wrote was AFAIK so since C++11 ... C++17 did not touch it.
C++17 added std::pmr::unordered_set but that is of no concern in
current thread.

Brian Wood

unread,
Oct 11, 2020, 10:38:43 AM10/11/20
to
On Sunday, October 11, 2020 at 7:55:01 AM UTC-5, Öö Tiib wrote:
> On Sunday, 11 October 2020 11:37:23 UTC+3, Jorgen Grahn wrote:
> > It strikes me I still don't know the best way to do this.
> >
> > When I write small classes (e.g. a class Photo for the file name of a
> > photo) I tend to add an operator< because:
> >
> > - It's easy.
> > - It's natural; it makes sense to show there's an ordering.
> > - I can then use Photo as a map or set key whenever I want.
> >
> > However, when I want to use Photo as a key in unordered_map or
> > unordered_set (which is at least as reasonable as using it in a map or
> > set) what I end up doing is:
> >
> > - Add a size_t Photo::hash() const member, implemented in terms
> > of funny things like {} ().
> > - Specialize the std::hash class for Photo, and arrange for its
> > operator() to call Photo::hash().
> >
> > That's a lot of boilerplate, compared to operator<. Is there a better
> > way?
> If you always hash photos in same way then it is perhaps fine like that:
>
> namespace std {
> template <>
> struct hash<Photo> {
> size_t operator()(const Photo& p) const noexcept {
> return std::hash<std::string>{}(p.name());
> }
> };
> }
>

A little indentation is helpful in my opinion:

namespace std {
template <>
struct hash<Photo> {
size_t operator()(Photo const& p) const noexcept {
return ::std::hash<::std::string>{}(p.name());
}
};
}

Not sure if I'd use noexcept there.


Brian
Ebenezer Enterprises
https://github.com/Ebenezer-group/onwards

Jorgen Grahn

unread,
Oct 11, 2020, 10:51:34 AM10/11/20
to
On Sun, 2020-10-11, Öö Tiib wrote:
> On Sunday, 11 October 2020 11:37:23 UTC+3, Jorgen Grahn wrote:
>> It strikes me I still don't know the best way to do this.
>>
>> When I write small classes (e.g. a class Photo for the file name of a
>> photo) I tend to add an operator< because:
>>
>> - It's easy.
>> - It's natural; it makes sense to show there's an ordering.
>> - I can then use Photo as a map or set key whenever I want.
>>
>> However, when I want to use Photo as a key in unordered_map or
>> unordered_set (which is at least as reasonable as using it in a map or
>> set) what I end up doing is:
>>
>> - Add a size_t Photo::hash() const member, implemented in terms
>> of funny things like {} ().
>> - Specialize the std::hash class for Photo, and arrange for its
>> operator() to call Photo::hash().
>>
>> That's a lot of boilerplate, compared to operator<. Is there a better
>> way?
>
> If you always hash photos in same way then it is perhaps fine like that:
>
> namespace std {
> template <>
> struct hash<Photo> {
> size_t operator()(const Photo& p) const noexcept {
> return std::hash<std::string>{}(p.name());
> }
> };
> }
>
> 8 lines is not lot.

It's IMO a lot, compared to the operator< that std::map and friends
need. And (like I hinted at, but didn't explain) I can't hash this
class via its public interface (it doesn't have a name() method) so
I have to delegate to a Photo::hash() method.

"A lot" is subjective. Perhaps it's better to say I think it makes
people prefer std::set to std::unordered set, especially for
short-lived containers which maybe just help eliminate duplicates from
a sequence. And maybe use std::strings as keys rather than their own
types.

As you can tell, it's a bit frustrating to me that the better
containers (when you don't need ordering etc) mean more work.

But, at least it seems I haven't missed some nice shortcut that
everyone else uses.

Jorgen Grahn

unread,
Oct 11, 2020, 11:02:42 AM10/11/20
to
On Sun, 2020-10-11, Bonita Montero wrote:
>> However, when I want to use Photo as a key in unordered_map or
>> unordered_set (which is at least as reasonable as using it in a map or
>> set) what I end up doing is:
>> - Add a size_t Photo::hash() const member, implemented in terms
>> of funny things like std::hash<std::string> {} ().
>> - Specialize the std::hash class for Photo, and arrange for its
>> operator() to call Photo::hash().
>> That's a lot of boilerplate, compared to operator<. Is there a better
>> way?
>
> 1. A unordered map is usually faster.

Yes, that's the only reason I bother with it. (Another reason would be
if my type was easy to hash but hard to compare, but I don't think
that has happened to me yet.)

> 2. You can define the hashing-class as a lambda and use
> decltype(hc) as the teplate-parameter.

That /did/ occur to me this morning. It would make it possible to
write

std::unordered_set<Photo, decltype([] (const Photo& p) {...})> my_set;

but it seems perverse to (seemingly) create a lambda function just for
its type. And more importantly, I want the "hashability" to be a
property of class Photo so I can use it anywhere. I may e.g. want to
build on it when I make MugShot and std::pair<Photo, Photo> hashable.

Manfred

unread,
Oct 11, 2020, 11:19:14 AM10/11/20
to
From cppreference.com:

Named requirements: UnorderedAssociativeContainer (std::set belongs to
this set ;) )

"Unordered associative containers are parametrized by ... Hash, a Hash
function object which acts as hash function on Key ..."
https://en.cppreference.com/w/cpp/named_req/UnorderedAssociativeContainer

named requirements: Hash
"A Hash is a /function object/ ..."
https://en.cppreference.com/w/cpp/named_req/Hash

So, instead of decltype(...) you may use
std::function<size_t(const Photo&)>

>
> /Jorgen
>

Öö Tiib

unread,
Oct 11, 2020, 1:43:56 PM10/11/20
to
Yes. I've often been helping with code bases of hundreds of
thousands of lines and so it is non-issue for my subjective taste.
C++ is verbose, I use it not because it is terse. Some C-like
languages like Swift are way more laconic but these have their
own issues.
Program is put together of patterns and it is business of compiler
to reduce it to minimal binary code.

> Perhaps it's better to say I think it makes
> people prefer std::set to std::unordered set, especially for
> short-lived containers which maybe just help eliminate duplicates from
> a sequence. And maybe use std::strings as keys rather than their own
> types.

May be it is indeed better to use unordered_map<std::string, Photo>.
I think of such idiom as "dictionary idiom" as it is rather common.
That does compile when <string> is included. What you search
for photo in your set of photos with? Without file name in
external interface? Dummy photo?

> As you can tell, it's a bit frustrating to me that the better
> containers (when you don't need ordering etc) mean more work.
>
> But, at least it seems I haven't missed some nice shortcut that
> everyone else uses.

I have not felt like that is issue. Most searchable containers
I use are sorted vectors, hash-table based containers or
Boost.Intrusives. Sometimes I use my own implementation of
sorted deque (as std::deque does not let to specify block sizes).
The std::set or std::map I avoid because these either lose in
performance or lack characteristics that I need.

For example Boost.Intrusive sets are way more verbose than
std::sets to handle as these do not own the elements but when
used correctly then have great qualities that nothing else provides.
For example it is possible to have polymorphic objects in those
very efficiently, similarly to boost::poly_collection::base_collection.
Consider std::set<unique_ptr<Base>> ... boost::intrusive::set just
wipes floors with it in every benchmark.


Jorgen Grahn

unread,
Oct 11, 2020, 1:50:33 PM10/11/20
to
On Sun, 2020-10-11, Brian Wood wrote:
> On Sunday, October 11, 2020 at 7:55:01 AM UTC-5, Öö Tiib wrote:
...
>> > That's a lot of boilerplate, compared to operator<. Is there a better
>> > way?
>> If you always hash photos in same way then it is perhaps fine like that:
>>
>> namespace std {
>> template <>
>> struct hash<Photo> {
>> size_t operator()(const Photo& p) const noexcept {
>> return std::hash<std::string>{}(p.name());
>> }
>> };
>> }
>>
>
> A little indentation is helpful in my opinion:

Tiib's post /was/ indented; Google must have broken it for you.

Jorgen Grahn

unread,
Oct 14, 2020, 4:05:37 PM10/14/20
to
It's not how I see it, but I couldn't come up with a good explanation
of how I /do/ see it.

>> Perhaps it's better to say I think it makes
>> people prefer std::set to std::unordered set, especially for
>> short-lived containers which maybe just help eliminate duplicates from
>> a sequence. And maybe use std::strings as keys rather than their own
>> types.
>
> May be it is indeed better to use unordered_map<std::string, Photo>.
> I think of such idiom as "dictionary idiom" as it is rather common.

For me, anything that discourages use of domain-specific types is a
bad thing. I use them a lot to (the way I see it) make my designs
clearer; it's one of the main benefits of C++ IMO.

> That does compile when <string> is included. What you search
> for photo in your set of photos with? Without file name in
> external interface? Dummy photo?

In this application, a (name of a) photo isn't just a string: it's
either a file name matching a certain pattern, or it's invalid. I
establish that at the external interface, before this set or map gets
involved. Then it makes sense to represent it as a type, to show that
this invariant holds.

[snip]

Tim Rentsch

unread,
Nov 9, 2020, 2:17:52 AM11/9/20
to
I believe C++17 (or C++20?) offers constructions sufficient to
accomplish this. However I don't think any of those ways are
available in C++14 or earlier.

If I had this problem to deal with, probably I would do something
like this:

#include <unordered_set>
#include <unordered_map>

#define CLASS_HASHER( KeyType ) \
namespace std { \
template <> struct hash< class KeyType > { \
\
using result_type = size_t; \
using argument_type = KeyType; \
\
result_type \
operator()( const KeyType &it ) const { \
return it.me_hash_function(); \
} \
\
}; \
}

struct Whatsit {
size_t blah;
size_t me_hash_function() const { return blah; }
};

CLASS_HASHER( Whatsit )

void
check_it_out(){
std::unordered_map< Whatsit, int > whatsits_values;
std::unordered_set< Whatsit > whatsits;
}

Of course the point of the CLASS_HASHER macro is that it can be
reused for any class that needs it. The overhead is then one
extra line of code per self-hashing class, which in my view
is acceptable.

Jorgen Grahn

unread,
Nov 9, 2020, 3:38:19 PM11/9/20
to
Thanks. Yes, an extra line would be acceptable -- although I'm sure
people would find the macro odd, if they found it.

For now I've decided to use std::map and friends, unless I know I have
a performance problem. I'm a bit disappointed, because C++ doesn't
usually let me down. When things are hard to express well, I normally
understand why ... but I still don't understand why it's harder to use
a class with the unordered containers than with the ordered
equivalents.

Tim Rentsch

unread,
Nov 20, 2020, 12:24:41 PM11/20/20
to
Oops, I left out an important part of the code suggestion,
namely, this comment:

// Despite an extensive search, I am unable to find
// a better way to do this in C++14.

Also, with benefit of hindsight, I think I would also change the
macro to take two arguments, the second being the name of the
member function to use to compute the hash value.

> For now I've decided to use std::map and friends, unless I know I have
> a performance problem. I'm a bit disappointed, because C++ doesn't
> usually let me down. When things are hard to express well, I normally
> understand why ... but I still don't understand why it's harder to use
> a class with the unordered containers than with the ordered
> equivalents.

The rapid pace of change to the C++ language is a double-edged
sword. It's nice to get new features early, but sometimes they
aren't as complete or integrated as we would like them to be.
My understanding is, in C++17 and later you can do what you want
with one fixed overhead, not even needing an extra line per
class, but just by defining the relevant member function (along
with the aforementioned small fixed overhead, just once for all
the classes that need it). If you know that support will get
better if/when the project moves to a later version of C++, I
don't know why you wouldn't take advantage of this alternate
path. Admittely it is less than perfect, but that is pretty
much the nature of software engineering for products that are
expected to be delivered.
0 new messages