Fun with hash function analysis.

28 views
Skip to first unread message

Austin Appleby

unread,
Mar 7, 2008, 5:00:57 AM3/7/08
to Hash Functions
Just added some fun new tools to my bag of tricks - sparse-key
analysis, differential analysis, and integer key sequences (1,2,3),
(2,4,6...), (3,6,9...) etc.

Some interesting results -

Paul Hsieh's SuperFastHash is riddled with differentials - I found 10
two-bit differentials and 228 (!) three-bit differentials in 128-bit
keys (haven't tested 4-bit differentials on anything, but I might let
it run overnight just for the hell of it). Because of this it falls
down badly on sparse keys, producing any from 20x to 3700x the
expected number of collisions. It also produces about 9x the expected
collisions on the dictionary keys.

Bob Jenkins' lookup3 is nearly indistiguishable from random.

MurmurHash has 6 3-bit differentials in 128-bit keys (nothing
surprising there, they're the patterns I expected). It's weak on
sparse keys (also no surprise) but nowhere near as bad as
SuperFastHash - collisions are between 6x and 250x expected.
Interestingly, it does much better than random for integer sequences
and small numbers of buckets - a property it shares with FNV.

FNV does surprisingly well and is surprisingly fast, but it fails
_totally_ on sparse keys. It's practically non-random on integer
sequences though, which should be obvious since it's a multiplicative
hash and all its operations are linear.


Tomorrow I'm going to add some code to dump some of the data in .csv
format so I can build a few graphs in Excel - should make things
crystal clear. :)

-Austin

Austin Appleby

unread,
Mar 7, 2008, 5:55:20 AM3/7/08
to Hash Functions
Oh, FNV turned out only to be fast when inlined (1100 mb/s vs. 450 mb/
s not inlined).
Reply all
Reply to author
Forward
0 new messages