Austin Appleby
unread,Mar 7, 2008, 5:00:57 AM3/7/08Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Hash Functions
Just added some fun new tools to my bag of tricks - sparse-key
analysis, differential analysis, and integer key sequences (1,2,3),
(2,4,6...), (3,6,9...) etc.
Some interesting results -
Paul Hsieh's SuperFastHash is riddled with differentials - I found 10
two-bit differentials and 228 (!) three-bit differentials in 128-bit
keys (haven't tested 4-bit differentials on anything, but I might let
it run overnight just for the hell of it). Because of this it falls
down badly on sparse keys, producing any from 20x to 3700x the
expected number of collisions. It also produces about 9x the expected
collisions on the dictionary keys.
Bob Jenkins' lookup3 is nearly indistiguishable from random.
MurmurHash has 6 3-bit differentials in 128-bit keys (nothing
surprising there, they're the patterns I expected). It's weak on
sparse keys (also no surprise) but nowhere near as bad as
SuperFastHash - collisions are between 6x and 250x expected.
Interestingly, it does much better than random for integer sequences
and small numbers of buckets - a property it shares with FNV.
FNV does surprisingly well and is surprisingly fast, but it fails
_totally_ on sparse keys. It's practically non-random on integer
sequences though, which should be obvious since it's a multiplicative
hash and all its operations are linear.
Tomorrow I'm going to add some code to dump some of the data in .csv
format so I can build a few graphs in Excel - should make things
crystal clear. :)
-Austin