fast hash functions

15 views
Skip to first unread message

Cesium

unread,
Oct 26, 2007, 8:50:50 PM10/26/07
to Hash Functions
I spent a few months deconstructing Bob Jenkins' lookup3 function
trying to figure out why it works well. (See http://www.burtleburtle.net/bob/c/lookup3.c
)

I'm not sure I can shed light on that, but it does look like there are
a couple of minor optimizations that may be possible without reducing
the quality of the hashing function, assuming that the right constants
for the rotates can be recomputed.

The lookup3 mix looks like:

#define mix(a,b,c) \
{ \
a -= c; a ^= rot(c, 4); c += b; \
b -= a; b ^= rot(a, 6); a += c; \
c -= b; c ^= rot(b, 8); b += a; \
a -= c; a ^= rot(c,16); c += b; \
b -= a; b ^= rot(a,19); a += c; \
c -= b; c ^= rot(b, 4); b += a; \
}

We first note that in the statement "a ^= rot(c, 4)", the compiler
will break this down to: t = rot(c, 4) ; a ^= t; On an x86 chip,
which has two-operand instructions, this will turn into "t = c ; t =
rot(c, 4); a ^= t". What we are doing here is preserving the original
vlaue of 'c' for later use, and also computing a rotated value of
'c'. However, there does not seem to be a theoretical need to
preserve the original value and to also make use of a rotated value,
and preserving the original value tends to consume small amounts of
chip resources.

Thus, we suggest a mix routine that contains lines that look like: a -
= c ; c = rot(c, const); a^= c; c += b


For a good mix function, we want each bit of output to interact with
each bit of input. A technique that we can use to estimate the
quality of a mix function is to follow the bit interactions as they
are computed. For example, after executing "a -= c", each bit of 'a'
contains an interaction between the unrotated value of 'a' and the
unrotated value of 'c'. We can write "a: a0 c0". After executing "c
= rot(c, 4)", we have "c: c4". After exectuing "a ^= c", we have "a:
a0 c0 c4". It's not a particularly accurate estimate, but it grossly
suggests how well we've mixed the bits.

In the original mix function, notice that the code looks like "a -=
c ; ... ; a += c". Although 'c' has changed a bit between the two
operations, a bit in 'a' that interacts with the same bit in 'c'
twice. This may have a slight tendency for bit interactions to cancel
out. The modified 'mix' function has similar properties. This
suggests removing the rightmost column of operations. (This won't
have much effect on the speed since the rightmost column of operations
can use an otherwise unused functional unit.)


In the mix function, groups of statements of the form "x op= y ; y
rot= const; x op= y" increases the number of bit interactions from
whatever was in X before, to whatever was in X before plus twice
whatever was in Y before. In lookup3, each line takes a a register
with relatively few interactions and combines in two copies of the
register that contains the most interactions. The interactions per
bit for each register over time (ignoring the rightmost column) looks
like:
A B C
0: 1 1 1
1: 3 1 1
2: 3 7 1
3: 3 7 15
4: 33 7 15
5: 33 73 15
6: 33 73 161

Of course, there are only 96 bits that can interact, and once there
are a sufficient number of interactions, adding more interactions is
as likely to cancel out previous interactions as it is to add new
interactions.

So, the lookup 3 approach can do a bit better than doubling the number
of interactions every two cycles. There is an alternate approach that
can use more functional units per cycle and which better balances the
number of bit interactions per register:

Instead of combing the best mixed register into the least mixed
register, we can combine the register we weren't using into the
register that we were reading. The lookup3 approach basically has a
data-flow bottleneck on the betst mixed register, the alternate
approach overlaps a couple of the computations much better:

For example:

#define threefast_mix(a,b,c) do { \
a ^= b; b ^= c; c = rotate(c, 20); \
b += c; c += a; a = rotate(a, 11); \
c -= a; a -= b; b = rotate(b, 24); \
a ^= b; b ^= c; c = rotate(c, 8); \
b += c; c += a; a = rotate(a, 16); \
c -= a; a -= b; b = rotate(b, 16); \
a ^= b; b ^= c; c = rotate(c, 20); \
b += c; c += a; a = rotate(a, 11); \
c -= a; a -= b; b = rotate(b, 24); \
} while(0)

In the first two lines, the "b ^= c ; c rot= 20; b += c" is a standard
sequence of instructions taking two cycles that strongly interacts the
bits. This is followed in lines two and three with "c += a; a rot=
11; c -= a". The first two instructions of that can execute in
parallel with the last instruction of the previous sequence.

What happens to out bit interaction estimates? Ignore the first
instruction which pads out the first line, and just look at the
standard sequences and the results after each sequence:

A B C
0: 1 1 1
1: 1 3 1
2: 1 3 3
3: 7 3 3
4: 7 9 3
5: 7 9 17
6: 25 9 17
7: 25 43 17
8: 25 43 67
9: 159 43 67

although this uses 9 squences of instructions instead of 6 sequences
of instructions, the sequences overlap to run twice as fast when
sufficient functional units are present.


Another approach is to temporarily spread the hash over 128 bits (4
registers), and to run two copies of the lookup3 approach in parallel:

A B C D
0: 1 1 1 1
1: 3 1 3 1
2: 3 7 3 7
3: 17 7 17 7
4: 17 41 17 41
5: 99 41 99 41

This might be able to adquately mix the bits about 2 cycles faster
than the original algorithm.

bob_j...@burtleburtle.net

unread,
Nov 1, 2007, 9:41:02 PM11/1/07
to Hash Functions
On Oct 26, 5:50 pm, Cesium <cesiu...@gmail.com> wrote:
> I spent a few months deconstructing Bob Jenkins' lookup3 function
> trying to figure out why it works well. (Seehttp://www.burtleburtle.net/bob/c/lookup3.c

The requirement I was testing for mix() was that every bit (and pair
of bits) in A, B, C affected at least 32 bits on average, and the same
for the reverse of mix(). It's final() that has the goal of affecting
every bit in C (and B) as much as possible.

Rotating a variable in place is a good idea, Chuck Simmons has
suggested it too, I haven't explored it yet.

Blah Blah

unread,
Nov 3, 2007, 3:06:56 AM11/3/07
to hash-fu...@googlegroups.com
"Chuck Simmons" (initials Cs) and "Cesium" (symbol Cs) are equivalent.
I was re-publishing the basic ideas to a broader audience.

Cheers, Chuck

Andres Valloud

unread,
Nov 11, 2007, 8:10:49 PM11/11/07
to hash-fu...@googlegroups.com
Guys,

It is noted that for good hash functions, every bit of the input interacts with every bit of the output.  This is some sort of avalanche test definition.  However, the avalanche test is very weak: there exist hash functions that pass the test while producing exactly four different hash values.  I think a much better test is to do a chi squared test mod several p to observe the distribution of the hash values.  The hash function is good if and only if it behaves like a RNG for every well chosen p.

Thanks,
Andres.

Blah Blah

unread,
Nov 13, 2007, 10:54:54 PM11/13/07
to hash-fu...@googlegroups.com
Yes, the avalanche test won't tell you if a function is good. But it
can be used to eliminate functions that are clearly bad.

However, it is strictly not true that the hash function must behave
like an RNG. One of the fascinating things about Jenkins' lookup3 is
that it is better than an RNG. If you have an input set that is
randomly distributed, you don't need a hash function. You need a good
hash function when you have input sets where the keys are relatively
similar in some fashion.

While it's nice to have a test you can use after the fact to evaluate
a hash function, what are the rules of thumbs one can use when
constructing hash functions to probabilistically generate good
functions to feed to the evaluator?

Cs

Andres Valloud

unread,
Nov 13, 2007, 11:26:01 PM11/13/07
to hash-fu...@googlegroups.com
Hello,

True enough, perfect hash functions do not need to behave like an RNG to be good.  However, excluding those, I find that behaving like an RNG is a useful criteria.

How is lookup3 better than an RNG?

What do you mean by the evaluator?

Thanks,
Andres.

Blah Blah

unread,
Nov 15, 2007, 2:06:32 AM11/15/07
to hash-fu...@googlegroups.com
On Nov 13, 2007 8:26 PM, Andres Valloud <andres....@gmail.com> wrote:
> Hello,
>
> True enough, perfect hash functions do not need to behave like an RNG to be
> good. However, excluding those, I find that behaving like an RNG is a
> useful criteria.
>
> How is lookup3 better than an RNG?

You can simulate an RNG by taking an input set and generating a random
number for each of the inputs and having a hash function that returns
the specified random number when given the associated value from the
input set. Take the same input set and generate a second set of
hashes using lookup3(). Next, pick a table size and use the RNG
hashes to put inputs in buckets. Similarily, distribute the inputs
using the lookup3 hashes. You will find that the number of collisions
for the lookup3() function tends to be slightly lower than the number
of collisions you'ld get from a random number generator.

The results will be about the same if your input set consists of
random bit strings. But if your input sets use natural language
strings, or strings that mostly contain zero bits, and presumably any
input set where there is relatively little difference between the
members of the inputs, the lookup3() function will have a better
(lower) collision rate than the random number generator.

And the key observation here is that if you have an input set that has
lots of variance, if you have a random input set, you don't need a
good hash function. So lookup3() is valuable exactly where it needs
to be valuable.

>
> What do you mean by the evaluator?

In order to decide if a hash function is of high quality, you'll have
to test it, or evaluate it, in some robust fashion. Your evaluator is
a chi-squared test. My evaluator simulates collisions in hash tables
of various sizes, which is probably functionally equivalent to a
chi-squared test.

Andres Valloud

unread,
Nov 16, 2007, 1:46:07 AM11/16/07
to hash-fu...@googlegroups.com
Regarding the distribution of hash values, I have observed such phenomena.  However, I also think the sample sizes are usually not large enough to answer the question with certainty.  For example, lookup3 has 96 bits of internal state, so in theory at least one could argue that first of all there needs to be a proof that shows the period is 2^96.  That can't quite be because the answer is constrained to 32 bits, so really one would have to take a sample of 2^32 or more objects at random.  That is hardly ever done, and so I am assuming the sample sizes you take are smaller as well.

But maybe I am wrong... I did do that (2^32 samples) for Java's hash value post processing in HashMap, and the results were not pretty...

The hash analysis tool has collision rate, unbounded chi square, and chi square mod 25+ primes which are also well chosen (coprime to 256^k +/- a, 1<=k<=8, -32<a<32, and not close to any power of two).  The last part, as you say, is a simulator of hashed collections.

Thanks,
Andres.
Reply all
Reply to author
Forward
0 new messages