[boost] hash_combine vs hash_value

313 views
Skip to first unread message

Michael Goldshteyn

unread,
Mar 25, 2011, 9:57:38 AM3/25/11
to bo...@lists.boost.org
After reading the explanation of how hash_combine should be used, I am left
wondering. Why is the recommendation for hash_combine different from what
one would intuitively expect?

Namely, given:

struct MyStruct
{
int a;
int b;
int c;
};

The recommendation via examples and documentation is:

size_t hash(const MyStruct &s)
{
size_t hashValue=0;

// Combine hashes of all elements
hash_combine(hashValue,s.a);
hash_combine(hashValue,s.b);
hash_combine(hashValue,s.c);

return hashValue;
}

Rather than what I would think is the expected correct approach:

size_t hash(const MyStruct &s)
{
// For a composite hash, start by hashing the first element
size_t hashValue=hash_value(s.a);

// Then, combine with hashes of the remaining elements
hash_combine(hashValue,s.b);
hash_combine(hashValue,s.c);

return hashValue;
}

Thanks,

Michael Goldshteyn


_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost

Steven Watanabe

unread,
Mar 25, 2011, 10:47:25 AM3/25/11
to bo...@lists.boost.org
AMDG

On 03/25/2011 06:57 AM, Michael Goldshteyn wrote:
> After reading the explanation of how hash_combine should be used, I am
> left wondering. Why is the recommendation for hash_combine different
> from what one would intuitively expect?
>

Either one is okay.

> Namely, given:
>
> struct MyStruct
> {
> int a;
> int b;
> int c;
> };
>
> The recommendation via examples and documentation is:
>
> size_t hash(const MyStruct &s)
> {
> size_t hashValue=0;
>
> // Combine hashes of all elements
> hash_combine(hashValue,s.a);
> hash_combine(hashValue,s.b);
> hash_combine(hashValue,s.c);
>
> return hashValue;
> }
>
> Rather than what I would think is the expected correct approach:
>
> size_t hash(const MyStruct &s)
> {
> // For a composite hash, start by hashing the first element
> size_t hashValue=hash_value(s.a);
>
> // Then, combine with hashes of the remaining elements
> hash_combine(hashValue,s.b);
> hash_combine(hashValue,s.c);
>
> return hashValue;
> }
>

In Christ,
Steven Watanabe

Daniel James

unread,
Mar 25, 2011, 10:49:23 AM3/25/11
to bo...@lists.boost.org
On 25 March 2011 13:57, Michael Goldshteyn <mgold...@comcast.net> wrote:
> After reading the explanation of how hash_combine should be used, I am left
> wondering. Why is the recommendation for hash_combine different from what
> one would intuitively expect?

If you call hash_value directly you won't be taking advantage of the
mechanism for finding the correct overload. Obviously, in this case
that isn't a big deal but we'd rather encourage an implementation
style that always works. There are also some portability workarounds
that you'll be missing.

Daniel

Michael Goldshteyn

unread,
Mar 25, 2011, 11:32:47 AM3/25/11
to bo...@lists.boost.org
The thing that bothers me about the use "hash_combine" for even the first
hashed value of a bunch of values is the fact that:

size_t myHashVal=0;

boost::hash_combine(myHashVal,firstVal);
...

becomes

size_t seed=0;
seed ^= hash_value(myFirstHashVal) + 0x9e3779b9; // + (seed << 6) + (seed >>
2) does nothing for seed==0
...

The xor'ing of the first value doesn't bother me, but this displacement by
0x9e3779b9 of the first value does, since for 32-bit size_t values, we've
just lost a bunch of info. It seems like the formula will result in less of
the first value being applied to the overall hash, due to the "potential"
unsigned 32-bit overflow caused by the addition. Maybe I'm misinterpreting
the effect. If so, please correct me.

Steven Watanabe

unread,
Mar 25, 2011, 11:42:53 AM3/25/11
to bo...@lists.boost.org
AMDG

On 03/25/2011 08:32 AM, Michael Goldshteyn wrote:
> The xor'ing of the first value doesn't bother me, but this displacement
> by 0x9e3779b9 of the first value does, since for 32-bit size_t values,
> we've just lost a bunch of info. It seems like the formula will result
> in less of the first value being applied to the overall hash, due to the
> "potential" unsigned 32-bit overflow caused by the addition. Maybe I'm
> misinterpreting the effect. If so, please correct me.
>

You won't lose anything. Unsigned addition
wraps around.

In Christ,
Steven Watanabe

Topher Cooper

unread,
Mar 25, 2011, 2:01:13 PM3/25/11
to bo...@lists.boost.org
Your method implies a special status to the first element, which it
should not have. Of course, because ordering makes a difference the
first element is special because it is the first, but the second element
is special because it is the second, and so on. Every element should be
treated uniformly.

Code should always be written, to the extent possible, to reflect
concept and intent to the greatest extent possible. Is it really
conceptually necessary that any combining algorithm should be just as
good if the hash of the initial value can be treated as if it were the
combination of a sequence of values?

(My own preference would have been for a combiner that was a struct
which would be initialized, called on each value to be included and then
finalized when the final hash value is extracted -- a more general
interface which could be implemented to do precisely what this one
does. But I didn't speak up when this was being discussed, so I don't
have anything to complain about).

Topher

Scott McMurray

unread,
Mar 25, 2011, 10:26:28 PM3/25/11
to bo...@lists.boost.org
On Fri, Mar 25, 2011 at 06:57, Michael Goldshteyn
<mgold...@comcast.net> wrote:
> After reading the explanation of how hash_combine should be used, I am left
> wondering. Why is the recommendation for hash_combine different from what
> one would intuitively expect?
>

With combine, you can also use a non-zero initial value, so Point(x,y)
and BoundingBox(x,y) don't have the same hash, if you happen to mix
types in your tables.

Daniel James

unread,
Mar 26, 2011, 6:11:36 AM3/26/11
to bo...@lists.boost.org
On 26 March 2011 02:26, Scott McMurray <me22.ca+bo...@gmail.com> wrote:
>
> With combine, you can also use a non-zero initial value, so Point(x,y)
> and BoundingBox(x,y) don't have the same hash, if you happen to mix
> types in your tables.

That's a good point, but since STL containers aren't polymorphic it's
usually not the case. The boost hash function should always match the
equality operator, so that would be appropriate for writing
'hash_value' if you have a polymorphic equality operator that always
returns false for comparing a Point and a BoundingBox.

If you're using the library to write a hash function for another
purpose, that isn't really something I considered when writing the
documentation. I probably wasn't clear enough that I saw the library
as fairly single minded. I think it can work well if you're careful.
The design of your hash library could make a more appropriate base.

Daniel James

unread,
Mar 26, 2011, 6:13:40 AM3/26/11
to bo...@lists.boost.org
On 26 March 2011 10:11, Daniel James <dnl...@gmail.com> wrote:
>
> The design of your hash library could make a more appropriate base.

A few seconds after sending that I read the email where you said that
it's focussed on more secure hash functions. So ignore me there.

Reply all
Reply to author
Forward
0 new messages