Design discuss: Switch hashmap implementation to SipHash-2-4

580 views
Skip to first unread message

Matthew Dempsky

unread,
Jan 3, 2013, 9:54:45 PM1/3/13
to golang-dev, David Symonds, ia...@golang.org
SipHash-2-4 is a cryptographic hash function designed by Jean-Philippe Aumasson and Daniel J. Bernstein.  More information available at https://131002.net/siphash/.

I propose that Go's hashmaps switch from its current (apparently collision prone) hash function to SipHash-2-4, as suggested by its authors.  (See section 7 of their research paper.)  Alternatively, make it available as a build-time option like the race detector, or even a run-time option.

As listed on the SipHash website, other languages like Perl 5, Ruby, and Rust have already switched to using SipHash-2-4 for their hash table implementations, as well as some other software packages and vendors.

For performance measurements of SipHash on various platforms, you can see the eBACS website: http://bench.cr.yp.to/results-auth.html  (NB: These measurements try a number of different implementations along with a number of different compilers and optimization settings, and report the best for each implementation.  However, that said, SipHash is fairly simple, and the only notable discrepency on x86-64 platforms that I could see between GCC's and cc's outputs is that the latter doesn't emit ROTQ instructions.  I haven't compared x86 output.)

Finally, I've put together a proof-of-concept patch for this at https://codereview.appspot.com/7029053/, but I don't claim that to be the best implementation approach.

Discuss.

Patrick Mylund Nielsen

unread,
Jan 3, 2013, 10:28:46 PM1/3/13
to Matthew Dempsky, golang-dev, David Symonds, ia...@golang.org
This is issue 4604: http://code.google.com/p/go/issues/detail?id=4604

I agree that we should switch if the benchmarks turn out to be competitive. Also agree with Russ that it's not super-urgent (i.e. not for Go 1.1.)

Matthew Dempsky

unread,
Jan 3, 2013, 10:52:22 PM1/3/13
to Patrick Mylund Nielsen, golang-dev, David Symonds, iant
Below is a quick benchmark program I wrote, and here are the results on my x86-64 desktop.  There appears to be about a 30% slow down for short int-sized keys, but it decreases to about 5% slow down at 20-byte string keys, and actually a 10% speed up at 100-byte string keys.

Note that if cc could optimize the rotations into a ROTQ instruction, I expect SipHash-2-4's performance would improve some, but I don't know how much.  Currently each rotation gets compiled into 6 instructions rather than just one.  Asymptotically, SipHash-2-4 needs 1.5 rotations per byte hashed, as well as 24 rotations per invocation for finalization.

I tried changing the ROTATE() macro to something that cc could better optimize like just "x = x << 1" or "x = x >> 1" for benchmarking purposes, but the resulting hashes all seemed consistently slower.  I presume this is because their distribution ends up worse.

default:

$ ./mapbench.test -test.bench=".*"
testing: warning: no tests to run
PASS
BenchmarkIntMap10   500000       4118 ns/op
BenchmarkIntMap100    50000      45364 ns/op
BenchmarkIntMap10000      500    5057292 ns/op
BenchmarkIntMap1000000        5  805614711 ns/op
BenchmarkStringMap10   500000       6785 ns/op
BenchmarkStringMap100    20000      77035 ns/op
BenchmarkStringMap10000      500    5319521 ns/op
BenchmarkStringMap1000000        5  495649046 ns/op
BenchmarkStringMap10Medium   200000       9991 ns/op
BenchmarkStringMap10Large   100000      18932 ns/op
BenchmarkStringMap100Medium    10000     100578 ns/op
BenchmarkStringMap100Large    10000     188233 ns/op

SipHash-2-4:

$ ./mapbench.test -test.bench=".*"
testing: warning: no tests to run
PASS
BenchmarkIntMap10  500000      5419 ns/op
BenchmarkIntMap100   50000     57732 ns/op
BenchmarkIntMap10000     500   6132868 ns/op
BenchmarkIntMap1000000       5 905450211 ns/op
BenchmarkStringMap10  200000      8178 ns/op
BenchmarkStringMap100   20000     89061 ns/op
BenchmarkStringMap10000     500   6330882 ns/op
BenchmarkStringMap1000000       5 614827912 ns/op
BenchmarkStringMap10Medium  200000     10484 ns/op
BenchmarkStringMap10Large  100000     17030 ns/op
BenchmarkStringMap100Medium   10000    105176 ns/op
BenchmarkStringMap100Large   10000    168942 ns/op


The percentage differences respectively are: +31.6%, +27.3%, +21.3%, +12.4%, +20.5%, +15.6%, +19.0%, +24.0%, +4.9%, -10.0%, +4.6%, -10.2%

(Positive means SipHash-2-4 is slower than default, while negative means it's faster.)


package mapbench_test

import (
"testing"
)

func benchmarkIntMapN(b *testing.B, m int) {
for i := 0; i < b.N; i++ {
mup := make(map[int]bool)
for j := 0; j < m; j++ {
mup[j] = true
}
for j := 0; j < m; j++ {
if !mup[j] {
b.Fatalf("key %d missing", j)
}
}
for j := 0; j < m; j++ {
delete(mup, j)
}
}
}

func BenchmarkIntMap10(b *testing.B) {
benchmarkIntMapN(b, 10)
}

func BenchmarkIntMap100(b *testing.B) {
benchmarkIntMapN(b, 100)
}

func BenchmarkIntMap10000(b *testing.B) {
benchmarkIntMapN(b, 10000)
}

func BenchmarkIntMap1000000(b *testing.B) {
benchmarkIntMapN(b, 1000000)
}

func benchmarkStringMapN(b *testing.B, f int, m int) {
for i := 0; i < b.N; i++ {
buf := make([]byte, f)
mup := make(map[string]bool)
for j := 0; j < m; j++ {
for k := 0; k < f; {
if buf[k]++; buf[k] != 0 {
break
}
}
mup[string(buf)] = true
}
buf = make([]byte, f)
for j := 0; j < m; j++ {
for k := 0; k < f; {
if buf[k]++; buf[k] != 0 {
break
}
}
if !mup[string(buf)] {
b.Fatalf("key %d missing", j)
}
}
buf = make([]byte, f)
for j := 0; j < m; j++ {
for k := 0; k < f; {
if buf[k]++; buf[k] != 0 {
break
}
}
delete(mup, string(buf))
}
}
}

func BenchmarkStringMap10(b *testing.B) {
benchmarkStringMapN(b, 2, 10)
}

func BenchmarkStringMap100(b *testing.B) {
benchmarkStringMapN(b, 4, 100)
}

func BenchmarkStringMap10000(b *testing.B) {
benchmarkStringMapN(b, 6, 10000)
}

func BenchmarkStringMap1000000(b *testing.B) {
benchmarkStringMapN(b, 8, 1000000)
}

func BenchmarkStringMap10Medium(b *testing.B) {
benchmarkStringMapN(b, 20, 10)
}

func BenchmarkStringMap10Large(b *testing.B) {
benchmarkStringMapN(b, 100, 10)
}

func BenchmarkStringMap100Medium(b *testing.B) {
benchmarkStringMapN(b, 20, 100)
}

func BenchmarkStringMap100Large(b *testing.B) {
benchmarkStringMapN(b, 100, 100)
}

David Symonds

unread,
Jan 3, 2013, 10:54:07 PM1/3/13
to Matthew Dempsky, Patrick Mylund Nielsen, golang-dev, iant
On Fri, Jan 4, 2013 at 2:52 PM, Matthew Dempsky <mdem...@google.com> wrote:

> The percentage differences respectively are: +31.6%, +27.3%, +21.3%, +12.4%,
> +20.5%, +15.6%, +19.0%, +24.0%, +4.9%, -10.0%, +4.6%, -10.2%

Use $GOROOT/misc/benchcmp to render the benchmark results.

Dave Cheney

unread,
Jan 3, 2013, 10:57:38 PM1/3/13
to David Symonds, Matthew Dempsky, Patrick Mylund Nielsen, golang-dev, iant
As an initial observation 30% is a very high penalty to pay. Hopefully
this could be improved, but I suspect the similar improvement would
flow through to the existing hashmap code.

https://code.google.com/p/go/issues/detail?id=3885

> Use $GOROOT/misc/benchcmp to render the benchmark results.

This is the rough procedure that I use.

go test -c
mv $PKG.test $PKG.golden
./$PKG.golden -test.run=XXX -test.bench=. > old.txt // this is your baseline
// make changes
go test -run=XXX -bench=. > new.txt
$GOROOT/misc/benchcmp {old,new}.txt

Matthew Dempsky

unread,
Jan 3, 2013, 11:48:17 PM1/3/13
to Dave Cheney, David Symonds, Patrick Mylund Nielsen, golang-dev, iant
On Thu, Jan 3, 2013 at 7:57 PM, Dave Cheney <da...@cheney.net> wrote:
As an initial observation 30% is a very high penalty to pay. Hopefully
this could be improved, but I suspect the similar improvement would
flow through to the existing hashmap code.

Agreed.  I'm hopeful now though that if cc could optimize ROTQ instructions, then hash performance would be more competitive.  E.g., below I added an explicit optimization for 8-byte keys (just unrolling the loops) and also benchmarked switching to SipHash-1-1.  SipHash-1-1 is actually about the same performance as the current int hashing code.

For comparison: When hashing an 8-byte key, SipHash-A-B needs 2A+B rounds.  Each round has 6 rotations, 4 adds, and 4 xors.  cc compiles a rotation into 6 instructions, but it should be doable in 1.  So SipHash-1-1 currently compiled needs 3*(6*6+4+4)==132 instructions, whereas if cc emitted ROTQ instructions, SipHash-2-4 should only need 8*(6+4+4)==112 instructions (i.e., even faster than sip11 below).

I haven't tried optimizing any other key sizes yet though.

https://code.google.com/p/go/issues/detail?id=3885

> Use $GOROOT/misc/benchcmp to render the benchmark results.

This is the rough procedure that I use.

go test -c
mv $PKG.test $PKG.golden
./$PKG.golden -test.run=XXX -test.bench=. > old.txt // this is your baseline
// make changes
go test -run=XXX -bench=. > new.txt
$GOROOT/misc/benchcmp {old,new}.txt

Thanks.

SipHash-2-4 with optimization for 8-byte keys:

$ ~/wd/go/misc/benchcmp {old,sip24}.txt
benchmark                      old ns/op    new ns/op    delta
BenchmarkIntMap10                   4121         5005  +21.45%
BenchmarkIntMap100                 45266        53820  +18.90%
BenchmarkIntMap10000             5054063      5898136  +16.70%
BenchmarkIntMap1000000         791751871    874390644  +10.44%
BenchmarkStringMap10                6728         7982  +18.64%
BenchmarkStringMap100              76587        86836  +13.38%
BenchmarkStringMap10000          5342200      6163102  +15.37%
BenchmarkStringMap1000000      501381800    581276514  +15.93%
BenchmarkStringMap10Medium          9992        10223   +2.31%
BenchmarkStringMap10Large          18920        16733  -11.56%
BenchmarkStringMap100Medium       100566       102723   +2.14%
BenchmarkStringMap100Large        190037       166530  -12.37%

$ ~/wd/go/misc/benchcmp {old,sip11}.txt
benchmark                      old ns/op    new ns/op    delta
BenchmarkIntMap10                   4121         4193   +1.75%
BenchmarkIntMap100                 45266        45343   +0.17%
BenchmarkIntMap10000             5054063      5019093   -0.69%
BenchmarkIntMap1000000         791751871    795529325   +0.48%
BenchmarkStringMap10                6728         8093  +20.29%
BenchmarkStringMap100              76587        88515  +15.57%
BenchmarkStringMap10000          5342200      6347534  +18.82%
BenchmarkStringMap1000000      501381800    504984998   +0.72%
BenchmarkStringMap10Medium          9992        10465   +4.73%
BenchmarkStringMap10Large          18920        17184   -9.18%
BenchmarkStringMap100Medium       100566       105245   +4.65%
BenchmarkStringMap100Large        190037       169741  -10.68%

Matthew Dempsky

unread,
Jan 3, 2013, 11:49:17 PM1/3/13
to Dave Cheney, David Symonds, Patrick Mylund Nielsen, golang-dev, iant
Oops, hopefully it was implied well enough, but sip11.txt is SipHash-1-1 with 8-byte key optimizations too.

Dave Cheney

unread,
Jan 3, 2013, 11:52:52 PM1/3/13
to Matthew Dempsky, David Symonds, Patrick Mylund Nielsen, golang-dev, iant
Please remember that Go supports more than just x86-64.

Patrick Mylund Nielsen

unread,
Jan 4, 2013, 12:04:44 AM1/4/13
to Dave Cheney, Matthew Dempsky, David Symonds, golang-dev, iant
FWIW Perl uses SipHash on x64 and Jenkins one-at-a-time (http://en.wikipedia.org/wiki/Jenkins_hash_function#one-at-a-time) elsewhere (http://perl5.git.perl.org/perl.git/commit/3db6cbfca39da94d152d3e860e2aa79b9c6bb161 / http://perl5.git.perl.org/perl.git/blob/HEAD:/hv.h) suggesting that SipHash had unacceptable performance consequences on 32-bit.

It would be interesting to see how Siphash performs on x86/ARM vs. one-at-a-time. Jenkins' isn't nearly as resilient as Siphash, of course, but beats FNV.

Matthew Dempsky

unread,
Jan 4, 2013, 12:05:56 AM1/4/13
to Dave Cheney, David Symonds, Patrick Mylund Nielsen, golang-dev, iant
On Thu, Jan 3, 2013 at 8:52 PM, Dave Cheney <da...@cheney.net> wrote:
Please remember that Go supports more than just x86-64.

Yep, which seems to lend credence to the idea of making it a build flag to indicate which runtime hashing implementation should be linked in.

David Symonds

unread,
Jan 4, 2013, 12:11:37 AM1/4/13
to Matthew Dempsky, Dave Cheney, Patrick Mylund Nielsen, golang-dev, iant
That would only be true if it needs to change for different builds.
There's no big problem having different hash algorithms for different
targets be chosen a priori, just as different syscalls are used to
implement various OS services.

Patrick Mylund Nielsen

unread,
Jan 4, 2013, 12:16:53 AM1/4/13
to Matthew Dempsky, Dave Cheney, David Symonds, golang-dev, iant
IMO Go should make that choice depending on the arch, e.g. if SipHash only reaches an acceptable level on x64, only use it on x64. This seems like something where there's only really one right choice per arch, so exposing toggles will just enable a "best optimal compile time options for Go" culture vs. "Go makes the right choice."

I guess a user might actually choose to have much slower maps and be better protected against collision attacks, but these are probably a minority. *shrug* Seeding FNV does make it a fair bit harder, after all.

Matthew Dempsky

unread,
Jan 4, 2013, 2:13:53 AM1/4/13
to Patrick Mylund Nielsen, Dave Cheney, David Symonds, golang-dev, iant
FWIW, I updated https://codereview.appspot.com/7029053/ to conditionally use SipHash-2-4 only on amd64 targets, and the existing hash code on !amd64.

Here's the previous benchmark ran comparing amd64 performance using alg_simple.c vs alg_siphash.c.

$ ~/wd/go/misc/benchcmp {simple,siphash}.txt
benchmark                      old ns/op    new ns/op    delta
BenchmarkIntMap10                   4283         5150  +20.24%
BenchmarkIntMap100                 45561        54138  +18.83%
BenchmarkIntMap10000             5058276      5986396  +18.35%
BenchmarkIntMap1000000         793643784    877129665  +10.52%
BenchmarkStringMap10                6868         8034  +16.98%
BenchmarkStringMap100              76513        88068  +15.10%
BenchmarkStringMap10000          5215703      6295222  +20.70%
BenchmarkStringMap1000000      499790742    583745043  +16.80%
BenchmarkStringMap10Medium          9985        10352   +3.68%
BenchmarkStringMap10Large          19047        16894  -11.30%
BenchmarkStringMap100Medium       100201       103100   +2.89%
BenchmarkStringMap100Large        189018       166839  -11.73%

(simple's performance looks comparable to unpatched)

Russ Cox

unread,
Jan 4, 2013, 10:09:37 AM1/4/13
to Matthew Dempsky, golang-dev, David Symonds, Ian Taylor
Thank you for looking into this.

As we discussed on issue 4604, I believe there is no urgent need to
change the hash function we're using for maps today. I would be happy
to revisit this later this year, after Go 1.1 is out, but the current
hash function is (modulo a bug that Ian has sent a fix for) working
fine. Changing the hash function is something I would want significant
experience with, and Go 1.1 is O(months) away. It is more likely that
rewriting the hash code will introduce bugs than it is that the
rewrite will fix an actual bug, especially since the attacks we're
talking about are theoretical.

Russ

Matthew Dempsky

unread,
Jan 4, 2013, 3:20:19 PM1/4/13
to Russ Cox, golang-dev, David Symonds, Ian Taylor
Fair enough.

One last benchmark before I move on for now, I hacked 6c to compile left shifts by 0x52000000+x as a left rotate by x (it's undefined behavior anyway ;)), and patched alg_siphash.c to use "#define ROTATE(x,b) x <<= (('R'<<24)+b)". The resulting hash performance seems reasonably competitive now:

$ ~/wd/go/misc/benchcmp {golden,siphash}.txt
benchmark                      old ns/op    new ns/op    delta
BenchmarkIntMap10                   4114         4292   +4.33%
BenchmarkIntMap100                 45748        45552   -0.43%
BenchmarkIntMap10000             5011986      5056786   +0.89%
BenchmarkIntMap1000000         796641724    798138767   +0.19%
BenchmarkStringMap10                6759         7184   +6.29%
BenchmarkStringMap100              76554        80031   +4.54%
BenchmarkStringMap10000          5282694      5498818   +4.09%
BenchmarkStringMap1000000      493529322    491760921   -0.36%
BenchmarkStringMap10Medium         10015         9511   -5.03%
BenchmarkStringMap10Large          18949        14088  -25.65%
BenchmarkStringMap100Medium       100896        91225   -9.59%
BenchmarkStringMap100Large        188706       139422  -26.12%

Patrick Mylund Nielsen

unread,
Jan 4, 2013, 3:25:24 PM1/4/13
to Matthew Dempsky, Russ Cox, golang-dev, David Symonds, Ian Taylor
Wow, that's much better. These were the kinds of numbers I was hoping for. Is that a change that could go into the gc compiler?

minux

unread,
Jan 4, 2013, 3:26:47 PM1/4/13
to Matthew Dempsky, Russ Cox, golang-dev, David Symonds, Ian Taylor
On Sat, Jan 5, 2013 at 4:20 AM, Matthew Dempsky <mdem...@google.com> wrote:
One last benchmark before I move on for now, I hacked 6c to compile left shifts by 0x52000000+x as a left rotate by x (it's undefined behavior anyway ;)), and patched alg_siphash.c to use "#define ROTATE(x,b) x <<= (('R'<<24)+b)". The resulting hash performance seems reasonably competitive now:
i think this result is much more acceptable.

Several months ago, Russ has made gc recognize the rotate pattern and use rotate instructions if available,
and we can do the same for cc (or just add a new ROT intrinsic if that's too big a change for cc?).

Matthew Dempsky

unread,
Jan 4, 2013, 3:38:07 PM1/4/13
to Patrick Mylund Nielsen, Russ Cox, golang-dev, David Symonds, Ian Taylor
On Fri, Jan 4, 2013 at 12:25 PM, Patrick Mylund Nielsen <pat...@patrickmylund.com> wrote:
Is that a change that could go into the gc compiler?

As a quirky "worse-is-better" implementation detail, I don't see anything wrong with hacking 6c to act this way, but it's a pretty ugly hack just the same IMO so it's not something I'd really feel good about recommending.

It would probably be slightly better if cgen.c could recognize expressions like "(x << 3) | (x >> 61)" and rewrite them as "(x << 0x52000003)" so that alg_siphash.c didn't have to depend on the implementation detail.

One more issue though is that 6c doesn't peep hole optimize MOV instructions around shift or rotate instructions.  E.g., if x is allocated to register DX, "x = x << 3;" will emit:

(tweet.c:4) MOVL DX,!!AX
(tweet.c:4) SALL $3,!!AX
(tweet.c:4) MOVL AX,!!DX

You need to instead write "x <<= 3" to avoid the extra register shuffling, so cgen.c might actually have to instead recognize "x = (x << 3) | (x >> 61)" as a whole, or the peep hole optimizer needs to be smarter.
Reply all
Reply to author
Forward
0 new messages