Namely, given:
struct MyStruct
{
int a;
int b;
int c;
};
The recommendation via examples and documentation is:
size_t hash(const MyStruct &s)
{
size_t hashValue=0;
// Combine hashes of all elements
hash_combine(hashValue,s.a);
hash_combine(hashValue,s.b);
hash_combine(hashValue,s.c);
return hashValue;
}
Rather than what I would think is the expected correct approach:
size_t hash(const MyStruct &s)
{
// For a composite hash, start by hashing the first element
size_t hashValue=hash_value(s.a);
// Then, combine with hashes of the remaining elements
hash_combine(hashValue,s.b);
hash_combine(hashValue,s.c);
return hashValue;
}
Thanks,
Michael Goldshteyn
_______________________________________________
Unsubscribe & other changes: http://lists.boost.org/mailman/listinfo.cgi/boost
On 03/25/2011 06:57 AM, Michael Goldshteyn wrote:
> After reading the explanation of how hash_combine should be used, I am
> left wondering. Why is the recommendation for hash_combine different
> from what one would intuitively expect?
>
Either one is okay.
> Namely, given:
>
> struct MyStruct
> {
> int a;
> int b;
> int c;
> };
>
> The recommendation via examples and documentation is:
>
> size_t hash(const MyStruct &s)
> {
> size_t hashValue=0;
>
> // Combine hashes of all elements
> hash_combine(hashValue,s.a);
> hash_combine(hashValue,s.b);
> hash_combine(hashValue,s.c);
>
> return hashValue;
> }
>
> Rather than what I would think is the expected correct approach:
>
> size_t hash(const MyStruct &s)
> {
> // For a composite hash, start by hashing the first element
> size_t hashValue=hash_value(s.a);
>
> // Then, combine with hashes of the remaining elements
> hash_combine(hashValue,s.b);
> hash_combine(hashValue,s.c);
>
> return hashValue;
> }
>
In Christ,
Steven Watanabe
If you call hash_value directly you won't be taking advantage of the
mechanism for finding the correct overload. Obviously, in this case
that isn't a big deal but we'd rather encourage an implementation
style that always works. There are also some portability workarounds
that you'll be missing.
Daniel
size_t myHashVal=0;
boost::hash_combine(myHashVal,firstVal);
...
becomes
size_t seed=0;
seed ^= hash_value(myFirstHashVal) + 0x9e3779b9; // + (seed << 6) + (seed >>
2) does nothing for seed==0
...
The xor'ing of the first value doesn't bother me, but this displacement by
0x9e3779b9 of the first value does, since for 32-bit size_t values, we've
just lost a bunch of info. It seems like the formula will result in less of
the first value being applied to the overall hash, due to the "potential"
unsigned 32-bit overflow caused by the addition. Maybe I'm misinterpreting
the effect. If so, please correct me.
On 03/25/2011 08:32 AM, Michael Goldshteyn wrote:
> The xor'ing of the first value doesn't bother me, but this displacement
> by 0x9e3779b9 of the first value does, since for 32-bit size_t values,
> we've just lost a bunch of info. It seems like the formula will result
> in less of the first value being applied to the overall hash, due to the
> "potential" unsigned 32-bit overflow caused by the addition. Maybe I'm
> misinterpreting the effect. If so, please correct me.
>
You won't lose anything. Unsigned addition
wraps around.
In Christ,
Steven Watanabe
Code should always be written, to the extent possible, to reflect
concept and intent to the greatest extent possible. Is it really
conceptually necessary that any combining algorithm should be just as
good if the hash of the initial value can be treated as if it were the
combination of a sequence of values?
(My own preference would have been for a combiner that was a struct
which would be initialized, called on each value to be included and then
finalized when the final hash value is extracted -- a more general
interface which could be implemented to do precisely what this one
does. But I didn't speak up when this was being discussed, so I don't
have anything to complain about).
Topher
With combine, you can also use a non-zero initial value, so Point(x,y)
and BoundingBox(x,y) don't have the same hash, if you happen to mix
types in your tables.
That's a good point, but since STL containers aren't polymorphic it's
usually not the case. The boost hash function should always match the
equality operator, so that would be appropriate for writing
'hash_value' if you have a polymorphic equality operator that always
returns false for comparing a Point and a BoundingBox.
If you're using the library to write a hash function for another
purpose, that isn't really something I considered when writing the
documentation. I probably wasn't clear enough that I saw the library
as fairly single minded. I think it can work well if you're careful.
The design of your hash library could make a more appropriate base.
A few seconds after sending that I read the email where you said that
it's focussed on more secure hash functions. So ignore me there.