New hash function "MurmurHash" - beats Hsieh and Jenkins.

877 views
Skip to first unread message

Austin Appleby

unread,
Mar 3, 2008, 5:54:13 PM3/3/08
to Hash Functions
Faster by about 30% on my machine, with both excellent distribution
and minimal collisions. Jenkins' lookup3 beats it at the frog test
(frog.c), but that's about it.

More discussion is available here - http://tanjent.livejournal.com/756623.html


unsigned int MurmurHash ( const unsigned char * data, int len,
unsigned int h )
{
const unsigned int m = 0x7fd652ad;
const int r = 16;

h += 0xdeadbeef;

while(len >= 4)
{
h += *(unsigned int *)data;
h *= m;
h ^= h >> r;

data += 4;
len -= 4;
}

switch(len)
{
case 3:
h += data[2] << 16;
case 2:
h += data[1] << 8;
case 1:
h += data[0];
h *= m;
h ^= h >> r;
};

h *= m;
h ^= h >> 10;
h *= m;
h ^= h >> 17;

return h;
}

Andres Valloud

unread,
Mar 3, 2008, 7:15:39 PM3/3/08
to hash-fu...@googlegroups.com
Austin,

Assuming my Smalltalk translation of your hash function is correct, the Hash Analysis Tool shows that the iteration of the hash function you mention is vulnerable to sequences of null bytes.  More concretely, it can output colliding hash values for sequences of null bytes for sizes 4k, 4k+1, 4k+2 and 4k+3.  This appears reasonable when one considers what happens in the leftover bytes switch: if all bytes are zero, all the additions become no-ops regardless of the size of the sequence.

I'd suggest changing

h += 0xdeadbeef;

to

h += 0xdeadbeef + len;


Note that this improves the behavior and addresses the vast majority of the collisions the Hash Analysis Tool found, but it is not enough because then these sequences collide:

#[0 0 0 0] #[3]

#[0 0] #[1]

#[0 0 0] #[2] #[255 255 255 255]


Using

h += 0xdeadbeef + len * m;


fixes these issues at the cost of a multiplication.  Perhaps there's a cheaper way to do this?

Other than that, the behavior seems to be very good as far as I've tested it.

Thanks,
Andres.

Paolo Bonzini

unread,
Mar 4, 2008, 8:06:51 AM3/4/08
to hash-fu...@googlegroups.com

> switch(len)
> {
> case 3:
> h += data[2] << 16;
> case 2:
> h += data[1] << 8;
> case 1:
> h += data[0];
> h *= m;
> h ^= h >> r;
> };

Here are two possible solutions to the problem Andres reported:

1)


case 3:
h += data[2] << 16;

h *= m;
h ^= h >> 10;

case 2:
h += data[1] << 8;

h *= m;
h ^= h >> 17;

case 1:
h += data[0];

};

/* no mixing here. */

2)
case 3:
h += (data[2] << 16) + 16777216;
case 2:
h += (data[1] << 8) + 16777216;
case 1:
h += data[0] + 16777216;
};

/* mixing as in your function. */


Could you write a version that does unaligned accesses efficiently, as I
hinted in http://reddit.com/r/programming/info/6as1d/comments/c03chal ?

Thanks!

Paolo

Austin Appleby

unread,
Mar 5, 2008, 1:23:12 AM3/5/08
to Hash Functions
I'll work on it. :)

Updates will be at http://murmurhash.googlepages.com

Austin Appleby

unread,
Mar 5, 2008, 4:49:58 AM3/5/08
to Hash Functions
Aligned-read-only version posted to http://murmurhash.googlepages.com/murmurhashaligned.cpp


On Mar 4, 5:06 am, Paolo Bonzini <bonz...@gnu.org> wrote:

Andres Valloud

unread,
Mar 5, 2008, 5:00:23 AM3/5/08
to hash-fu...@googlegroups.com
You know... I looked at what you did with h ^= h >> r... and I wondered what happened in the multiplication with the top most byte of the 4 byte data chunk.  Darn... this is something I should look into.  For example, byte oriented hash functions usually do:

uint32_t h;

h = h + byte * factor;


This is fine because h opens into

h = h * factor + byte * factor;


and since usually 255 * factor < 2^32, then there is no "bit loss" so to speak.  However, as the data chunk being combined grows, this limits the size of the factor one can choose without having the hash function become biased in favor of the least significant bits of the data chunk being combined.

So really... since at least on x86 it costs the same to do uint32t = uint32_t * uint32_t and uint64_t = uint32_t * uint32_t, I wonder if this wouldn't be more effective (in x86 asm)...

mov eax, runningHashValue;
add eax, nextFourBytesOfData;
mov ebx, factor;
mul ebx;
xor eax, edx;
mov runningHashValue, eax;

... thus folding all bits into a single register without necessarily introducing least/most significant bit bias (and assuming hashed collections will look at hash values mod p).

Andres.

Austin Appleby

unread,
Mar 5, 2008, 5:33:47 AM3/5/08
to Hash Functions
Just plugged it into my tools - it seems to avalanche marvelously
well, but it's producing a lot of collisions on the dictionary test
(between 70 and 200 depending on the constant - lookup3 is around 25).
On the other hand, it passes frog.c just fine. Weird.

Can't comment on performance yet, but I'll throw together an asm
version and see what I get.

Austin Appleby

unread,
Mar 5, 2008, 5:49:57 AM3/5/08
to Hash Functions
ASM was the wrong way to go, the compiler optimizes it just fine.

Even so, it's not that fast on my machine - ~1020 megs / sec vs. 1340
for murmur.

Good idea though, might be useful later on.

Paolo Bonzini

unread,
Mar 5, 2008, 5:52:33 AM3/5/08
to hash-fu...@googlegroups.com
Austin Appleby wrote:
> ASM was the wrong way to go, the compiler optimizes it just fine.

Sure it does? All GCCs before 4.3.0 kinda suck on register allocation
of 64-bit registers...

Paolo

Andres Valloud

unread,
Mar 5, 2008, 5:54:13 AM3/5/08
to hash-fu...@googlegroups.com
Hmmmm... also, for when that becomes useful... doing such a thing could use a 32 bit factor to make sure that one can get to 64 bit multiplication results... otherwise one only changes the lower (e.g.) 16 bits of eax via xor...

Darn!  Homework, homework and more homework!!!

Andres.

Andres Valloud

unread,
Mar 5, 2008, 6:08:06 AM3/5/08
to hash-fu...@googlegroups.com
I can't help thinking on this, and I feel the need to run more experiments... for example, if the factor is 32 bits, then the bits of the data will be present in edx without rotation (plus all the additions and carries... consider extreme case of factor = 2^31)... so perhaps a rotation before combination into eax is worth considering...

I see it coming... the hash book will hit 500 pages any time now :).

Andres.

Austin Appleby

unread,
Mar 5, 2008, 8:04:37 PM3/5/08
to Hash Functions
I got a bit of additional speed by tweaking the compiler's version -

__asm
{
mov eax,h
mov esi,data;
mov edi,blocks;
mov ecx,7FD652ADh;

start:
mov ebx,dword ptr [esi]
add eax,ebx;
mul ecx;
add esi,4;
xor eax,edx;
sub edi,1;
jne start;

mov h,eax;
};

which brings it up to 1280-1300 megs/sec. Still more collidey than
expected though.



On Mar 5, 3:08 am, "Andres Valloud" <andres.vall...@gmail.com> wrote:
> I can't help thinking on this, and I feel the need to run more
> experiments... for example, if the factor is 32 bits, then the bits of the
> data will be present in edx without rotation (plus all the additions and
> carries... consider extreme case of factor = 2^31)... so perhaps a rotation
> before combination into eax is worth considering...
>
> I see it coming... the hash book will hit 500 pages any time now :).
>
> Andres.
>
> On Wed, Mar 5, 2008 at 2:54 AM, Andres Valloud <andres.vall...@gmail.com>
> wrote:
>
> > Hmmmm... also, for when that becomes useful... doing such a thing could
> > use a 32 bit factor to make sure that one can get to 64 bit multiplication
> > results... otherwise one only changes the lower (e.g.) 16 bits of eax via
> > xor...
>
> > Darn! Homework, homework and more homework!!!
>
> > Andres.
>

Andrew M.

unread,
Mar 6, 2008, 2:14:43 AM3/6/08
to Hash Functions
> which brings it up to 1280-1300 megs/sec. Still more collidey than
> expected though.

Which dataset are you seeing more collisions on? With Andres' folding
trick, I see a significant speedup on both of my machines (1ghz
Tbird / 3200+ AMD64) and the collisions are down / distribution is
just a touch better on average.

Austin Appleby

unread,
Mar 6, 2008, 5:28:51 AM3/6/08
to Hash Functions
I'm using a dictionary of 470k english words & variants from SCOWL.

Avalanche-wise it seems fine, the collisions is the weird bit -
expected number of collisions is around 26, I'm seeing 50-60 which
coincidentally is the same number I'm getting on a good ol'
multiplicative hash, so I'm a bit wary.

What sort of perf numbers are you seeing for the different versions?
Also, what sort of keys are you testing against?

Andres Valloud

unread,
Mar 6, 2008, 7:26:33 PM3/6/08
to hash-fu...@googlegroups.com
Just to make sure... and from something that happened to me... I tested MD5 against a dictionary of words, and I found something like 46 collisions.  I couldn't believe it!  So I went in and checked: the word list had exactly 46 duplicates...

... ah.  So much for the word list being duplicate-free...

Just in case, you might want to run MD5 on the same dictionary :).

Andres.

Austin Appleby

unread,
Mar 6, 2008, 10:58:30 PM3/6/08
to Hash Functions
I print out the collisions after I've found them, so that's not the
case. :)


On Mar 6, 4:26 pm, "Andres Valloud" <andres.vall...@gmail.com> wrote:
> Just to make sure... and from something that happened to me... I tested MD5
> against a dictionary of words, and I found something like 46 collisions. I
> couldn't believe it! So I went in and checked: the word list had exactly 46
> duplicates...
>
> ... ah. So much for the word list being duplicate-free...
>
> Just in case, you might want to run MD5 on the same dictionary :).
>
> Andres.
>

Andres Valloud

unread,
Mar 6, 2008, 11:05:12 PM3/6/08
to hash-fu...@googlegroups.com
Hehe :)... what was not fun was cleaning up the dictionaries because they were encoded as a digit of leading repeated characters followed by the non repeating characters... i.e.:

Small
5talk
9ers

Andres.

Andrew M.

unread,
Mar 7, 2008, 5:02:17 AM3/7/08
to Hash Functions

On Mar 6, 4:28 am, Austin Appleby <tanj...@gmail.com> wrote:
> I'm using a dictionary of 470k english words & variants from SCOWL.
>
> Avalanche-wise it seems fine, the collisions is the weird bit -
> expected number of collisions is around 26, I'm seeing 50-60 which
> coincidentally is the same number I'm getting on a good ol'
> multiplicative hash, so I'm a bit wary.
>
> What sort of perf numbers are you seeing for the different versions?
> Also, what sort of keys are you testing against?
>

I have a scowl dictionary in my tests as well, but I've got 650k words
in mine. Ignore the elapsed time, the app isn't written for speed but
I include it anyway as a general overview:

elapsed for bj-lookup3/scowl-dict 469, collisions 60, chisq 0.953436,
active bins 488640, items 658244
elapsed for SHA1/scowl-dict 859, collisions 69, chisq 0.952829, active
bins 488491, items 658244
elapsed for murmur/scowl-dict 453, collisions 45, chisq 0.950456,
active bins 488930, items 658244

As for the aggregate across all of the input files I test, murmur is
#1 with prime sized tables and is neck and neck with the Python hash
(FNV variant) with pow2 sized tables.

I just use Hsieh's app for the speed tests (not the greatest, I know).

1ghz Tbird Linux, so no sfhasm:

SuperFastHash : 2.5100s
MurmurHash : 2.7600s
MurmurHash2 : 2.4800s


AMD64 3200+ WinXP:

SFHAsm : 0.9840s
SuperFastHash : 1.5310s
MurmurHash : 1.0310s
MurmurHash2 : 0.8750s

Andrew M.

unread,
Mar 7, 2008, 5:20:06 AM3/7/08
to Hash Functions

> elapsed for bj-lookup3/scowl-dict 469, collisions 60, chisq 0.953436,
> active bins 488640, items 658244
> elapsed for SHA1/scowl-dict 859, collisions 69, chisq 0.952829, active
> bins 488491, items 658244
> elapsed for murmur/scowl-dict 453, collisions 45, chisq 0.950456,
> active bins 488930, items 658244


Those results using the 64bit folding trick, this is default murmur:

elapsed for murmur/scowl-dict 453, collisions 66, chisq 0.951742,
active bins 488870, items 658244

Austin Appleby

unread,
Mar 7, 2008, 5:53:58 AM3/7/08
to Hash Functions
The collision numbers you're seeing are all within reason for your
dictionary size (expected # of collisions for 650k words is ~50.5).

The perf numbers are interesting - rather in line with what some other
folks have guessed, that modern multipliers are fast enough to beat a
series of shifts and adds and xors but that wasn't the case a
generation or two ago.

I'll look at the collisions on my end again tomorrow, I'm waaaay past
my bedtime.
Reply all
Reply to author
Forward
0 new messages