Design discuss: Switch hashmap implementation to SipHash-2-4

Matthew Dempsky

unread,

Jan 3, 2013, 9:54:45 PM1/3/13

to golang-dev, David Symonds, ia...@golang.org

SipHash-2-4 is a cryptographic hash function designed by Jean-Philippe Aumasson and Daniel J. Bernstein. More information available at https://131002.net/siphash/.

I propose that Go's hashmaps switch from its current (apparently collision prone) hash function to SipHash-2-4, as suggested by its authors. (See section 7 of their research paper.) Alternatively, make it available as a build-time option like the race detector, or even a run-time option.

As listed on the SipHash website, other languages like Perl 5, Ruby, and Rust have already switched to using SipHash-2-4 for their hash table implementations, as well as some other software packages and vendors.

For performance measurements of SipHash on various platforms, you can see the eBACS website: http://bench.cr.yp.to/results-auth.html (NB: These measurements try a number of different implementations along with a number of different compilers and optimization settings, and report the best for each implementation. However, that said, SipHash is fairly simple, and the only notable discrepency on x86-64 platforms that I could see between GCC's and cc's outputs is that the latter doesn't emit ROTQ instructions. I haven't compared x86 output.)

Finally, I've put together a proof-of-concept patch for this at https://codereview.appspot.com/7029053/, but I don't claim that to be the best implementation approach.

Discuss.

Patrick Mylund Nielsen

unread,

Jan 3, 2013, 10:28:46 PM1/3/13

to Matthew Dempsky, golang-dev, David Symonds, ia...@golang.org

This is issue 4604: http://code.google.com/p/go/issues/detail?id=4604

I agree that we should switch if the benchmarks turn out to be competitive. Also agree with Russ that it's not super-urgent (i.e. not for Go 1.1.)

Matthew Dempsky

unread,

Jan 3, 2013, 10:52:22 PM1/3/13

to Patrick Mylund Nielsen, golang-dev, David Symonds, iant

Below is a quick benchmark program I wrote, and here are the results on my x86-64 desktop. There appears to be about a 30% slow down for short int-sized keys, but it decreases to about 5% slow down at 20-byte string keys, and actually a 10% speed up at 100-byte string keys.

Note that if cc could optimize the rotations into a ROTQ instruction, I expect SipHash-2-4's performance would improve some, but I don't know how much. Currently each rotation gets compiled into 6 instructions rather than just one. Asymptotically, SipHash-2-4 needs 1.5 rotations per byte hashed, as well as 24 rotations per invocation for finalization.

I tried changing the ROTATE() macro to something that cc could better optimize like just "x = x << 1" or "x = x >> 1" for benchmarking purposes, but the resulting hashes all seemed consistently slower. I presume this is because their distribution ends up worse.

default:

$ ./mapbench.test -test.bench=".*"

testing: warning: no tests to run

PASS

BenchmarkIntMap10 500000 4118 ns/op

BenchmarkIntMap100 50000 45364 ns/op

BenchmarkIntMap10000 500 5057292 ns/op

BenchmarkIntMap1000000 5 805614711 ns/op

BenchmarkStringMap10 500000 6785 ns/op

BenchmarkStringMap100 20000 77035 ns/op

BenchmarkStringMap10000 500 5319521 ns/op

BenchmarkStringMap1000000 5 495649046 ns/op

BenchmarkStringMap10Medium 200000 9991 ns/op

BenchmarkStringMap10Large 100000 18932 ns/op

BenchmarkStringMap100Medium 10000 100578 ns/op

BenchmarkStringMap100Large 10000 188233 ns/op

SipHash-2-4:

$ ./mapbench.test -test.bench=".*"

testing: warning: no tests to run

PASS

BenchmarkIntMap10 500000 5419 ns/op

BenchmarkIntMap100 50000 57732 ns/op

BenchmarkIntMap10000 500 6132868 ns/op

BenchmarkIntMap1000000 5 905450211 ns/op

BenchmarkStringMap10 200000 8178 ns/op

BenchmarkStringMap100 20000 89061 ns/op

BenchmarkStringMap10000 500 6330882 ns/op

BenchmarkStringMap1000000 5 614827912 ns/op

BenchmarkStringMap10Medium 200000 10484 ns/op

BenchmarkStringMap10Large 100000 17030 ns/op

BenchmarkStringMap100Medium 10000 105176 ns/op

BenchmarkStringMap100Large 10000 168942 ns/op

The percentage differences respectively are: +31.6%, +27.3%, +21.3%, +12.4%, +20.5%, +15.6%, +19.0%, +24.0%, +4.9%, -10.0%, +4.6%, -10.2%

(Positive means SipHash-2-4 is slower than default, while negative means it's faster.)

package mapbench_test

import (

"testing"

)

func benchmarkIntMapN(b *testing.B, m int) {

for i := 0; i < b.N; i++ {

mup := make(map[int]bool)

for j := 0; j < m; j++ {

mup[j] = true

}

for j := 0; j < m; j++ {

if !mup[j] {

b.Fatalf("key %d missing", j)

}

for j := 0; j < m; j++ {

delete(mup, j)

}

func BenchmarkIntMap10(b *testing.B) {

benchmarkIntMapN(b, 10)

}

func BenchmarkIntMap100(b *testing.B) {

benchmarkIntMapN(b, 100)

}

func BenchmarkIntMap10000(b *testing.B) {

benchmarkIntMapN(b, 10000)

}

func BenchmarkIntMap1000000(b *testing.B) {

benchmarkIntMapN(b, 1000000)

}

func benchmarkStringMapN(b *testing.B, f int, m int) {

for i := 0; i < b.N; i++ {

buf := make([]byte, f)

mup := make(map[string]bool)

for j := 0; j < m; j++ {

for k := 0; k < f; {

if buf[k]++; buf[k] != 0 {

break

}

mup[string(buf)] = true

}

buf = make([]byte, f)

for j := 0; j < m; j++ {

for k := 0; k < f; {

if buf[k]++; buf[k] != 0 {

break

}

if !mup[string(buf)] {

b.Fatalf("key %d missing", j)

}

buf = make([]byte, f)

for j := 0; j < m; j++ {

for k := 0; k < f; {

if buf[k]++; buf[k] != 0 {

break

}

delete(mup, string(buf))

}

func BenchmarkStringMap10(b *testing.B) {

benchmarkStringMapN(b, 2, 10)

}

func BenchmarkStringMap100(b *testing.B) {

benchmarkStringMapN(b, 4, 100)

}

func BenchmarkStringMap10000(b *testing.B) {

benchmarkStringMapN(b, 6, 10000)

}

func BenchmarkStringMap1000000(b *testing.B) {

benchmarkStringMapN(b, 8, 1000000)

}

func BenchmarkStringMap10Medium(b *testing.B) {

benchmarkStringMapN(b, 20, 10)

}

func BenchmarkStringMap10Large(b *testing.B) {

benchmarkStringMapN(b, 100, 10)

}

func BenchmarkStringMap100Medium(b *testing.B) {

benchmarkStringMapN(b, 20, 100)

}

func BenchmarkStringMap100Large(b *testing.B) {

benchmarkStringMapN(b, 100, 100)

}

David Symonds

unread,

Jan 3, 2013, 10:54:07 PM1/3/13

to Matthew Dempsky, Patrick Mylund Nielsen, golang-dev, iant

On Fri, Jan 4, 2013 at 2:52 PM, Matthew Dempsky <mdem...@google.com> wrote:

> The percentage differences respectively are: +31.6%, +27.3%, +21.3%, +12.4%,
> +20.5%, +15.6%, +19.0%, +24.0%, +4.9%, -10.0%, +4.6%, -10.2%

Use $GOROOT/misc/benchcmp to render the benchmark results.

Dave Cheney

unread,

Jan 3, 2013, 10:57:38 PM1/3/13

to David Symonds, Matthew Dempsky, Patrick Mylund Nielsen, golang-dev, iant

As an initial observation 30% is a very high penalty to pay. Hopefully
this could be improved, but I suspect the similar improvement would
flow through to the existing hashmap code.

https://code.google.com/p/go/issues/detail?id=3885

> Use $GOROOT/misc/benchcmp to render the benchmark results.

This is the rough procedure that I use.

go test -c
mv $PKG.test $PKG.golden
./$PKG.golden -test.run=XXX -test.bench=. > old.txt // this is your baseline
// make changes
go test -run=XXX -bench=. > new.txt
$GOROOT/misc/benchcmp {old,new}.txt

Matthew Dempsky

unread,

Jan 3, 2013, 11:48:17 PM1/3/13

to Dave Cheney, David Symonds, Patrick Mylund Nielsen, golang-dev, iant

On Thu, Jan 3, 2013 at 7:57 PM, Dave Cheney <da...@cheney.net> wrote:

As an initial observation 30% is a very high penalty to pay. Hopefully
this could be improved, but I suspect the similar improvement would
flow through to the existing hashmap code.

Agreed. I'm hopeful now though that if cc could optimize ROTQ instructions, then hash performance would be more competitive. E.g., below I added an explicit optimization for 8-byte keys (just unrolling the loops) and also benchmarked switching to SipHash-1-1. SipHash-1-1 is actually about the same performance as the current int hashing code.

For comparison: When hashing an 8-byte key, SipHash-A-B needs 2A+B rounds. Each round has 6 rotations, 4 adds, and 4 xors. cc compiles a rotation into 6 instructions, but it should be doable in 1. So SipHash-1-1 currently compiled needs 3*(6*6+4+4)==132 instructions, whereas if cc emitted ROTQ instructions, SipHash-2-4 should only need 8*(6+4+4)==112 instructions (i.e., even faster than sip11 below).

I haven't tried optimizing any other key sizes yet though.

https://code.google.com/p/go/issues/detail?id=3885

> Use $GOROOT/misc/benchcmp to render the benchmark results.

This is the rough procedure that I use.

go test -c
mv $PKG.test $PKG.golden
./$PKG.golden -test.run=XXX -test.bench=. > old.txt // this is your baseline
// make changes
go test -run=XXX -bench=. > new.txt
$GOROOT/misc/benchcmp {old,new}.txt

Thanks.

SipHash-2-4 with optimization for 8-byte keys:

$ ~/wd/go/misc/benchcmp {old,sip24}.txt

benchmark old ns/op new ns/op delta

BenchmarkIntMap10 4121 5005 +21.45%

BenchmarkIntMap100 45266 53820 +18.90%

BenchmarkIntMap10000 5054063 5898136 +16.70%

BenchmarkIntMap1000000 791751871 874390644 +10.44%

BenchmarkStringMap10 6728 7982 +18.64%

BenchmarkStringMap100 76587 86836 +13.38%

BenchmarkStringMap10000 5342200 6163102 +15.37%

BenchmarkStringMap1000000 501381800 581276514 +15.93%

BenchmarkStringMap10Medium 9992 10223 +2.31%

BenchmarkStringMap10Large 18920 16733 -11.56%

BenchmarkStringMap100Medium 100566 102723 +2.14%

BenchmarkStringMap100Large 190037 166530 -12.37%

$ ~/wd/go/misc/benchcmp {old,sip11}.txt

benchmark old ns/op new ns/op delta

BenchmarkIntMap10 4121 4193 +1.75%

BenchmarkIntMap100 45266 45343 +0.17%

BenchmarkIntMap10000 5054063 5019093 -0.69%

BenchmarkIntMap1000000 791751871 795529325 +0.48%

BenchmarkStringMap10 6728 8093 +20.29%

BenchmarkStringMap100 76587 88515 +15.57%

BenchmarkStringMap10000 5342200 6347534 +18.82%

BenchmarkStringMap1000000 501381800 504984998 +0.72%

BenchmarkStringMap10Medium 9992 10465 +4.73%

BenchmarkStringMap10Large 18920 17184 -9.18%

BenchmarkStringMap100Medium 100566 105245 +4.65%

BenchmarkStringMap100Large 190037 169741 -10.68%

Matthew Dempsky

unread,

Jan 3, 2013, 11:49:17 PM1/3/13

to Dave Cheney, David Symonds, Patrick Mylund Nielsen, golang-dev, iant

Oops, hopefully it was implied well enough, but sip11.txt is SipHash-1-1 with 8-byte key optimizations too.

Dave Cheney

unread,

Jan 3, 2013, 11:52:52 PM1/3/13

to Matthew Dempsky, David Symonds, Patrick Mylund Nielsen, golang-dev, iant

Please remember that Go supports more than just x86-64.

Patrick Mylund Nielsen

unread,

Jan 4, 2013, 12:04:44 AM1/4/13

to Dave Cheney, Matthew Dempsky, David Symonds, golang-dev, iant

FWIW Perl uses SipHash on x64 and Jenkins one-at-a-time (http://en.wikipedia.org/wiki/Jenkins_hash_function#one-at-a-time) elsewhere (http://perl5.git.perl.org/perl.git/commit/3db6cbfca39da94d152d3e860e2aa79b9c6bb161 / http://perl5.git.perl.org/perl.git/blob/HEAD:/hv.h) suggesting that SipHash had unacceptable performance consequences on 32-bit.

It would be interesting to see how Siphash performs on x86/ARM vs. one-at-a-time. Jenkins' isn't nearly as resilient as Siphash, of course, but beats FNV.

Matthew Dempsky

unread,

Jan 4, 2013, 12:05:56 AM1/4/13

to Dave Cheney, David Symonds, Patrick Mylund Nielsen, golang-dev, iant

On Thu, Jan 3, 2013 at 8:52 PM, Dave Cheney <da...@cheney.net> wrote:

Please remember that Go supports more than just x86-64.

Yep, which seems to lend credence to the idea of making it a build flag to indicate which runtime hashing implementation should be linked in.

David Symonds

unread,

Jan 4, 2013, 12:11:37 AM1/4/13

to Matthew Dempsky, Dave Cheney, Patrick Mylund Nielsen, golang-dev, iant

That would only be true if it needs to change for different builds.
There's no big problem having different hash algorithms for different
targets be chosen a priori, just as different syscalls are used to
implement various OS services.

Patrick Mylund Nielsen

unread,

Jan 4, 2013, 12:16:53 AM1/4/13

to Matthew Dempsky, Dave Cheney, David Symonds, golang-dev, iant

IMO Go should make that choice depending on the arch, e.g. if SipHash only reaches an acceptable level on x64, only use it on x64. This seems like something where there's only really one right choice per arch, so exposing toggles will just enable a "best optimal compile time options for Go" culture vs. "Go makes the right choice."

I guess a user might actually choose to have much slower maps and be better protected against collision attacks, but these are probably a minority. *shrug* Seeding FNV does make it a fair bit harder, after all.

Matthew Dempsky

unread,

Jan 4, 2013, 2:13:53 AM1/4/13

to Patrick Mylund Nielsen, Dave Cheney, David Symonds, golang-dev, iant

FWIW, I updated https://codereview.appspot.com/7029053/ to conditionally use SipHash-2-4 only on amd64 targets, and the existing hash code on !amd64.

Here's the previous benchmark ran comparing amd64 performance using alg_simple.c vs alg_siphash.c.

$ ~/wd/go/misc/benchcmp {simple,siphash}.txt

benchmark old ns/op new ns/op delta

BenchmarkIntMap10 4283 5150 +20.24%

BenchmarkIntMap100 45561 54138 +18.83%

BenchmarkIntMap10000 5058276 5986396 +18.35%

BenchmarkIntMap1000000 793643784 877129665 +10.52%

BenchmarkStringMap10 6868 8034 +16.98%

BenchmarkStringMap100 76513 88068 +15.10%

BenchmarkStringMap10000 5215703 6295222 +20.70%

BenchmarkStringMap1000000 499790742 583745043 +16.80%

BenchmarkStringMap10Medium 9985 10352 +3.68%

BenchmarkStringMap10Large 19047 16894 -11.30%

BenchmarkStringMap100Medium 100201 103100 +2.89%

BenchmarkStringMap100Large 189018 166839 -11.73%

(simple's performance looks comparable to unpatched)

Russ Cox

unread,

Jan 4, 2013, 10:09:37 AM1/4/13

to Matthew Dempsky, golang-dev, David Symonds, Ian Taylor

Thank you for looking into this.

As we discussed on issue 4604, I believe there is no urgent need to
change the hash function we're using for maps today. I would be happy
to revisit this later this year, after Go 1.1 is out, but the current
hash function is (modulo a bug that Ian has sent a fix for) working
fine. Changing the hash function is something I would want significant
experience with, and Go 1.1 is O(months) away. It is more likely that
rewriting the hash code will introduce bugs than it is that the
rewrite will fix an actual bug, especially since the attacks we're
talking about are theoretical.

Russ

Matthew Dempsky

unread,

Jan 4, 2013, 3:20:19 PM1/4/13

to Russ Cox, golang-dev, David Symonds, Ian Taylor

Fair enough.

One last benchmark before I move on for now, I hacked 6c to compile left shifts by 0x52000000+x as a left rotate by x (it's undefined behavior anyway ;)), and patched alg_siphash.c to use "#define ROTATE(x,b) x <<= (('R'<<24)+b)". The resulting hash performance seems reasonably competitive now:

$ ~/wd/go/misc/benchcmp {golden,siphash}.txt

benchmark old ns/op new ns/op delta

BenchmarkIntMap10 4114 4292 +4.33%

BenchmarkIntMap100 45748 45552 -0.43%

BenchmarkIntMap10000 5011986 5056786 +0.89%

BenchmarkIntMap1000000 796641724 798138767 +0.19%

BenchmarkStringMap10 6759 7184 +6.29%

BenchmarkStringMap100 76554 80031 +4.54%

BenchmarkStringMap10000 5282694 5498818 +4.09%

BenchmarkStringMap1000000 493529322 491760921 -0.36%

BenchmarkStringMap10Medium 10015 9511 -5.03%

BenchmarkStringMap10Large 18949 14088 -25.65%

BenchmarkStringMap100Medium 100896 91225 -9.59%

BenchmarkStringMap100Large 188706 139422 -26.12%

Patrick Mylund Nielsen

unread,

Jan 4, 2013, 3:25:24 PM1/4/13

to Matthew Dempsky, Russ Cox, golang-dev, David Symonds, Ian Taylor

Wow, that's much better. These were the kinds of numbers I was hoping for. Is that a change that could go into the gc compiler?

minux

unread,

Jan 4, 2013, 3:26:47 PM1/4/13

to Matthew Dempsky, Russ Cox, golang-dev, David Symonds, Ian Taylor

On Sat, Jan 5, 2013 at 4:20 AM, Matthew Dempsky <mdem...@google.com> wrote:

One last benchmark before I move on for now, I hacked 6c to compile left shifts by 0x52000000+x as a left rotate by x (it's undefined behavior anyway ;)), and patched alg_siphash.c to use "#define ROTATE(x,b) x <<= (('R'<<24)+b)". The resulting hash performance seems reasonably competitive now:

i think this result is much more acceptable.

Several months ago, Russ has made gc recognize the rotate pattern and use rotate instructions if available,

and we can do the same for cc (or just add a new ROT intrinsic if that's too big a change for cc?).

Matthew Dempsky

unread,

Jan 4, 2013, 3:38:07 PM1/4/13

to Patrick Mylund Nielsen, Russ Cox, golang-dev, David Symonds, Ian Taylor

On Fri, Jan 4, 2013 at 12:25 PM, Patrick Mylund Nielsen <pat...@patrickmylund.com> wrote:

Is that a change that could go into the gc compiler?

As a quirky "worse-is-better" implementation detail, I don't see anything wrong with hacking 6c to act this way, but it's a pretty ugly hack just the same IMO so it's not something I'd really feel good about recommending.

It would probably be slightly better if cgen.c could recognize expressions like "(x << 3) | (x >> 61)" and rewrite them as "(x << 0x52000003)" so that alg_siphash.c didn't have to depend on the implementation detail.

One more issue though is that 6c doesn't peep hole optimize MOV instructions around shift or rotate instructions. E.g., if x is allocated to register DX, "x = x << 3;" will emit:

(tweet.c:4) MOVL DX,!!AX

(tweet.c:4) SALL $3,!!AX

(tweet.c:4) MOVL AX,!!DX

You need to instead write "x <<= 3" to avoid the extra register shuffling, so cgen.c might actually have to instead recognize "x = (x << 3) | (x >> 61)" as a whole, or the peep hole optimizer needs to be smarter.

Reply all

Reply to author

Forward