MurmurHash - non-cryptographic hash, faster than Hsieh and Jenkins on x86

134 views
Skip to first unread message

Austin Appleby

unread,
Mar 3, 2008, 6:13:59 PM3/3/08
to
Just finished testing this, and was asked to post it here - This is a
NON-CRYPTOGRAPHIC hash, suitable only for hash tables and such.

That said, it's about 30% faster than Hsieh's SuperFastHash and
Jenkins' lookup3 on my Intel Core 2 Duo, while retaining extremely
good distribution and collision resistance (only 3 collisions on a
list of 200k words vs. Hsieh's 41). Avalanche bias is also excellent,
at under 1% for all pairs of input-output bits tested for key lengths
of 3 to 16 bytes. It's also ridiculously simple. The only weak spot is
Jenkins' frog.c test, where it fails after 2^25 key pairs (besting
Hsieh's 2^17 but not Jenkins' 2^63).

Perf results from Hsieh's test app -

CRC32 : 4.7810s
oneAtATimeHash : 3.5470s
alphaNumHash : 2.2500s
FNVHash : 2.1870s
Jenkins lookup3 : 1.2650s
SuperFastHash : 1.2970s
MurmurHash : 0.9060s

There's also a bit more info on my blog here - http://tanjent.livejournal.com/756623.html
.

This is all public-domain - do whatever you want with it.

-Austin Appleby

//-------------------------------------------------------------------

unsigned int MurmurHash ( const unsigned char * data, int len,
unsigned int h )
{
const unsigned int m = 0x7fd652ad;
const int r = 16;

h += 0xdeadbeef;

while(len >= 4)
{
h += *(unsigned int *)data;
h *= m;
h ^= h >> r;

data += 4;
len -= 4;
}

switch(len)
{
case 3:
h += data[2] << 16;
case 2:
h += data[1] << 8;
case 1:
h += data[0];
h *= m;
h ^= h >> r;
};

h *= m;
h ^= h >> 10;
h *= m;
h ^= h >> 17;

return h;
}

ke...@bytebrothers.co.uk

unread,
Mar 4, 2008, 3:49:54 AM3/4/08
to
On 3 Mar, 23:13, Austin Appleby <tanj...@gmail.com> wrote:

>
> unsigned int MurmurHash ( const unsigned char * data, int len,
> unsigned int h )

<snip>


> h += *(unsigned int *)data;

I can think of platforms where dereferencing a pointer to char as a
pointer to int will give you a SIGBUS if it's not correctly aligned...

Vend

unread,
Mar 4, 2008, 5:26:57 AM3/4/08
to
On 4 Mar, 00:13, Austin Appleby <tanj...@gmail.com> wrote:
> Just finished testing this, and was asked to post it here - This is a
> NON-CRYPTOGRAPHIC hash, suitable only for hash tables and such.
>
> That said, it's about 30% faster than Hsieh's SuperFastHash and
> Jenkins' lookup3 on my Intel Core 2 Duo, while retaining extremely
> good distribution and collision resistance (only 3 collisions on a
> list of 200k words vs. Hsieh's 41). Avalanche bias is also excellent,
> at under 1% for all pairs of input-output bits tested for key lengths
> of 3 to 16 bytes. It's also ridiculously simple. The only weak spot is
> Jenkins' frog.c test, where it fails after 2^25 key pairs (besting
> Hsieh's 2^17 but not Jenkins' 2^63).
>
> Perf results from Hsieh's test app -
>
> CRC32 : 4.7810s
> oneAtATimeHash : 3.5470s
> alphaNumHash : 2.2500s
> FNVHash : 2.1870s
> Jenkins lookup3 : 1.2650s
> SuperFastHash : 1.2970s
> MurmurHash : 0.9060s
>
> There's also a bit more info on my blog here -http://tanjent.livejournal.com/756623.html

It looks similar the seeding algorithm of the Mersenne Twister.

yar...@gmail.com

unread,
Mar 4, 2008, 9:19:40 AM3/4/08
to
On Mar 4, 1:13 am, Austin Appleby <tanj...@gmail.com> wrote:
> Just finished testing this, and was asked to post it here - This is a
> NON-CRYPTOGRAPHIC hash, suitable only for hash tables and such.
>
> That said, it's about 30% faster than Hsieh's SuperFastHash and
> Jenkins' lookup3 on my Intel Core 2 Duo, while retaining extremely
> good distribution and collision resistance (only 3 collisions on a
> list of 200k words vs. Hsieh's 41). Avalanche bias is also excellent,
> at under 1% for all pairs of input-output bits tested for key lengths
> of 3 to 16 bytes. It's also ridiculously simple. The only weak spot is
> Jenkins' frog.c test, where it fails after 2^25 key pairs (besting
> Hsieh's 2^17 but not Jenkins' 2^63).
>
> Perf results from Hsieh's test app -
>
> CRC32 : 4.7810s
> oneAtATimeHash : 3.5470s
> alphaNumHash : 2.2500s
> FNVHash : 2.1870s
> Jenkins lookup3 : 1.2650s
> SuperFastHash : 1.2970s
> MurmurHash : 0.9060s
>
> There's also a bit more info on my blog here -http://tanjent.livejournal.com/756623.html

Quick analysis: Two messages with the two-block integer delta
0x80000000 0x80008000 result in the same hash with a 50% probability.
Suggestion: mix the upper bits into the lower ones before multiplying
as well (with a different shift amount of course).
You might also want to consider h *= 2*h+m; for mixing low bits into
upper ones, it's reversible and very nicely non-linear.

yar...@gmail.com

unread,
Mar 4, 2008, 9:25:20 AM3/4/08
to
On Mar 4, 4:19 pm, yarr...@gmail.com wrote:

> Quick analysis: Two messages with the two-block integer delta
> 0x80000000 0x80008000 result in the same hash with a 50% probability.

XOR-delta does work also by the way.

Austin Appleby

unread,
Mar 4, 2008, 12:34:11 PM3/4/08
to
Agreed, I'll put a caveat on the code and make a version that will
handle that case.

Austin Appleby

unread,
Mar 4, 2008, 12:44:01 PM3/4/08
to

Yeah, I tried a few things to fix that - Adding a second "h ^= h >> r"
before the multiply fixes some of the deltas, adding a second "h *= m"
fixes a few more, but both reduce the peformance down below that of
jenkins/hsieh. I decided to live with the three-bit delta issue in
favor of speed.

The "h *= 2*h + m" is interesting and should compile down to only 3-4
instructions - I'll give that a whirl.

-Austin

Austin Appleby

unread,
Mar 4, 2008, 1:06:00 PM3/4/08
to
On Mar 4, 6:19 am, yarr...@gmail.com wrote:
> Quick analysis: Two messages with the two-block integer delta
> 0x80000000 0x80008000 result in the same hash with a 50% probability.
> Suggestion: mix the upper bits into the lower ones before multiplying
> as well (with a different shift amount of course).
> You might also want to consider h *= 2*h+m; for mixing low bits into
> upper ones, it's reversible and very nicely non-linear.


Interesting - replacing "h *= m" with "h *= 2*h+m" doesn't slow it
down as much as I feared because the compiler is smart enough to
evaulate 2*h+m using a single 'lea' instruction.

Unfortunately it doesn't fare any better differential-wise - Jenkins'
frog.c fails after 2^23 keys (h*=m gives 2^25 keys).

-Austin

Vend

unread,
Mar 4, 2008, 6:43:39 PM3/4/08
to

If by reversible you mean bi-bijective then your formula isn't.
If h = 0, then for each m, h will stay 0.
And h = 0 can be reached from any h if m = -h / 2

The formula h = 2*h*m + h + m used in CryptoMT, instead, is bi-
bijective.

yar...@gmail.com

unread,
Mar 5, 2008, 12:10:23 AM3/5/08
to

Hmm? You are misunderstanding something. m is an odd non-zero
_constant_. The result is zero only precisely when h is zero.

yar...@gmail.com

unread,
Mar 5, 2008, 9:35:46 AM3/5/08
to
On 5 maalis, 01:43, Vend <ven...@virgilio.it> wrote:

Here's a function for reversing it, which I'm pretty sure should be
enough to prove bijectivity.

#define C 0x9e3779b9
uint32_t rev_nlmix(uint32_t x)
{
uint32_t r=0,z;
int i;
for(i=0;i<32;i++)
{
z =r|(1<<i);
z*=2*z+C;
if(((z^x)&(0xFFFFFFFF>>(32-i-1)))==0)
r|=(1<<i);
}
return r;
}

Running it forwards, then backwards for each possible x (and verifying
the result) can be used to verify it if you feel there's some room for
error.

Vend

unread,
Mar 5, 2008, 2:27:00 PM3/5/08
to

That should have been h = -m / 2

> > The formula h = 2*h*m + h + m used in CryptoMT, instead, is bi-
> > bijective.
>
> Hmm? You are misunderstanding something. m is an odd non-zero
> _constant_. The result is zero only precisely when h is zero.

Yes, I thought that m was the input word, sorry.

Vend

unread,
Mar 5, 2008, 2:35:47 PM3/5/08
to
On 4 Mar, 00:13, Austin Appleby <tanj...@gmail.com> wrote:
> Just finished testing this, and was asked to post it here - This is a
> NON-CRYPTOGRAPHIC hash, suitable only for hash tables and such.
>
> That said, it's about 30% faster than Hsieh's SuperFastHash and
> Jenkins' lookup3 on my Intel Core 2 Duo, while retaining extremely
> good distribution and collision resistance (only 3 collisions on a
> list of 200k words vs. Hsieh's 41). Avalanche bias is also excellent,
> at under 1% for all pairs of input-output bits tested for key lengths
> of 3 to 16 bytes. It's also ridiculously simple. The only weak spot is
> Jenkins' frog.c test, where it fails after 2^25 key pairs (besting
> Hsieh's 2^17 but not Jenkins' 2^63).
>
> Perf results from Hsieh's test app -
>
> CRC32 : 4.7810s
> oneAtATimeHash : 3.5470s
> alphaNumHash : 2.2500s
> FNVHash : 2.1870s
> Jenkins lookup3 : 1.2650s
> SuperFastHash : 1.2970s
> MurmurHash : 0.9060s
>
> There's also a bit more info on my blog here -http://tanjent.livejournal.com/756623.html

> .
>
> This is all public-domain - do whatever you want with it.
>
> -Austin Appleby
>
> //-------------------------------------------------------------------
>
> unsigned int MurmurHash ( const unsigned char * data, int len,
> unsigned int h )
> {
> const unsigned int m = 0x7fd652ad;
> const int r = 16;
>
> h += 0xdeadbeef;

Why do you add a constant instead of assigning a constant? Is this a
bug or is it intentional?

> while(len >= 4)
> {
> h += *(unsigned int *)data;
> h *= m;
> h ^= h >> r;
>
> data += 4;
> len -= 4;
> }
>
> switch(len)
> {
> case 3:
> h += data[2] << 16;
> case 2:
> h += data[1] << 8;
> case 1:
> h += data[0];
> h *= m;
> h ^= h >> r;
> };

The last block routine can cause collisions for some input patterns,
such as 0x00, 0x0000 and 0x000000.
You should consider combining the message length in the hash either at
the begining or at the end.

Vend

unread,
Mar 5, 2008, 2:37:16 PM3/5/08
to

Also the hash value will depend on the machine endian. This might be a
problem if the hashes should be consistent between different
architectures.

Austin Appleby

unread,
Mar 6, 2008, 1:37:29 AM3/6/08
to
The code's been fixed and the updates posted, should make a bit more
sense - http://murmurhash.googlepages.com

On Mar 5, 11:35 am, Vend <ven...@virgilio.it> wrote:
> On 4 Mar, 00:13, Austin Appleby <tanj...@gmail.com> wrote:
>
> Why do you add a constant instead of assigning a constant? Is this a
> bug or is it intentional?
>

Austin Appleby

unread,
Mar 6, 2008, 1:37:53 AM3/6/08
to

I've added a caveat to the code for that restriction.

Reply all
Reply to author
Forward
0 new messages