Snappy

278 views
Skip to first unread message

Jan Mercl

unread,
Jan 17, 2013, 12:05:11 PM1/17/13
to golang-dev, nige...@golang.org
Hi.

Played with go-snappy (http://code.google.com/p/snappy-go/)
performance improving using sparse arrays
(http://research.swtch.com/sparse). Got mixed results, some numbers
are better while some others are worse:

(17:14) jnml@fsc-r550:~/go/misc$ ./benchcmp ~/tmp/old-tip.txt ~/tmp/new-tip.txt
benchmark old ns/op new ns/op delta
BenchmarkDecodeWords1e3 6887 6891 +0.06%
BenchmarkDecodeWords1e4 67504 67037 -0.69%
BenchmarkDecodeWords1e5 691593 692624 +0.15%
BenchmarkDecodeWords1e6 6108067 6020448 -1.43%
BenchmarkEncodeWords1e3 24139 16070 -33.43%
BenchmarkEncodeWords1e4 157504 148469 -5.74%
BenchmarkEncodeWords1e5 1464519 1683214 +14.93%
BenchmarkEncodeWords1e6 11771157 13947030 +18.48%

benchmark old MB/s new MB/s speedup
BenchmarkDecodeWords1e3 98.58 98.52 1.00x
BenchmarkDecodeWords1e4 89.24 89.86 1.01x
BenchmarkDecodeWords1e5 89.77 89.64 1.00x
BenchmarkDecodeWords1e6 85.36 86.60 1.01x
BenchmarkEncodeWords1e3 41.43 62.23 1.50x
BenchmarkEncodeWords1e4 63.49 67.35 1.06x
BenchmarkEncodeWords1e5 68.28 59.41 0.87x
BenchmarkEncodeWords1e6 84.95 71.70 0.84x

benchmark old allocs new allocs delta
BenchmarkDecodeWords1e3 0 0 n/a%
BenchmarkDecodeWords1e4 0 0 n/a%
BenchmarkDecodeWords1e5 0 0 n/a%
BenchmarkDecodeWords1e6 0 0 n/a%
BenchmarkEncodeWords1e3 1 0 -100.00%
BenchmarkEncodeWords1e4 1 0 -100.00%
BenchmarkEncodeWords1e5 1 0 -100.00%
BenchmarkEncodeWords1e6 1 0 -100.00%

benchmark old bytes new bytes delta
BenchmarkDecodeWords1e3 0 0 n/a%
BenchmarkDecodeWords1e4 0 5 n/a%
BenchmarkDecodeWords1e5 0 59 n/a%
BenchmarkDecodeWords1e6 0 598 n/a%
BenchmarkEncodeWords1e3 135168 2 -100.00%
BenchmarkEncodeWords1e4 135168 29 -99.98%
BenchmarkEncodeWords1e5 135168 299 -99.78%
BenchmarkEncodeWords1e6 135168 2990 -97.79%
(17:16) jnml@fsc-r550:~/go/misc$

Additionally, there is an API change. The sparse array (used for the
hash table) is now an Encoder struct and Encode is its method. Reason:
No need to (re)initialize the (reusable) Encoder (less allocations,
zero additional initialization overhead). Docs:

----

type Encoder struct {
// contains filtered or unexported fields
}

Encoder provides snappy encoding. A zero value of Encoder is ready for
use. After an invocation of Encode, Encoder is reusable for another
invocation of Encode.

func (e *Encoder) Encode(dst, src []byte) ([]byte, error)

Encode returns the encoded form of src. The returned slice may be a
sub- slice of dst if dst was large enough to hold the entire encoded
block. Otherwise, a newly allocated slice will be returned. It is
valid to pass a nil dst.

----

Not able to evaluate it properly, seeking opinions of others. Is it worth a CL?

Thanks.

-j

Nigel Tao

unread,
Jan 17, 2013, 8:54:23 PM1/17/13
to Jan Mercl, golang-dev
On Fri, Jan 18, 2013 at 4:05 AM, Jan Mercl <0xj...@gmail.com> wrote:
> Not able to evaluate it properly, seeking opinions of others. Is it worth a CL?

I'd hesistate to diverge from the C++ Snappy implementation [0] unless
there's good reason to, and the benchmarks seem inconclusive.

[0] https://code.google.com/p/snappy/source/browse/trunk/snappy.cc

Jan Mercl

unread,
Jan 18, 2013, 5:09:27 AM1/18/13
to Nigel Tao, golang-dev
On Fri, Jan 18, 2013 at 2:54 AM, Nigel Tao <nige...@golang.org> wrote:
> I'd hesistate to diverge from the C++ Snappy implementation [0] unless
> there's good reason to, and the benchmarks seem inconclusive.

Agreed. Round 2: No API changes, no slowdowns (above a 0.5% noise
level), minor speedups from small tweaks.

(10:44) jnml@fsc-r550:~$ benchcmp ~/tmp/old-tip.txt ~/tmp/new-tip.txt
benchmark old ns/op new ns/op delta
BenchmarkDecodeWords1e3 6835 6868 +0.48%
BenchmarkDecodeWords1e4 66808 66886 +0.12%
BenchmarkDecodeWords1e5 688943 689114 +0.02%
BenchmarkDecodeWords1e6 5991699 5998360 +0.11%
BenchmarkEncodeWords1e3 24150 23089 -4.39%
BenchmarkEncodeWords1e4 157335 138771 -11.80%
BenchmarkEncodeWords1e5 1464949 1437098 -1.90%
BenchmarkEncodeWords1e6 11773207 11611877 -1.37%

benchmark old MB/s new MB/s speedup
BenchmarkDecodeWords1e3 99.33 98.86 1.00x
BenchmarkDecodeWords1e4 90.17 90.06 1.00x
BenchmarkDecodeWords1e5 90.11 90.09 1.00x
BenchmarkDecodeWords1e6 87.01 86.92 1.00x
BenchmarkEncodeWords1e3 41.41 43.31 1.05x
BenchmarkEncodeWords1e4 63.56 72.06 1.13x
BenchmarkEncodeWords1e5 68.26 69.58 1.02x
BenchmarkEncodeWords1e6 84.94 86.12 1.01x

benchmark old allocs new allocs delta
BenchmarkDecodeWords1e3 0 0 n/a%
BenchmarkDecodeWords1e4 0 0 n/a%
BenchmarkDecodeWords1e5 0 0 n/a%
BenchmarkDecodeWords1e6 0 0 n/a%
BenchmarkEncodeWords1e3 1 1 0.00%
BenchmarkEncodeWords1e4 1 1 0.00%
BenchmarkEncodeWords1e5 1 1 0.00%
BenchmarkEncodeWords1e6 1 1 0.00%

benchmark old bytes new bytes delta
BenchmarkDecodeWords1e3 0 0 n/a%
BenchmarkDecodeWords1e4 0 0 n/a%
BenchmarkDecodeWords1e5 0 0 n/a%
BenchmarkDecodeWords1e6 0 0 n/a%
BenchmarkEncodeWords1e3 135168 135168 0.00%
BenchmarkEncodeWords1e4 135168 135168 0.00%
BenchmarkEncodeWords1e5 135168 135168 0.00%
BenchmarkEncodeWords1e6 135168 135168 0.00%
(10:46) jnml@fsc-r550:~$

(10:58) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$ diff
-u encode.go ../../snappy-go-fork/snappy/encode.go
--- encode.go 2013-01-17 17:05:40.307321778 +0100
+++ ../../snappy-go-fork/snappy/encode.go 2013-01-18 10:43:35.530802616 +0100
@@ -110,9 +110,6 @@
tableSize *= 2
}
var table [maxTableSize]int
- for i := 0; i < tableSize; i++ {
- table[i] = -1
- }

// Iterate over the source bytes.
var (
@@ -123,8 +120,8 @@
for s+3 < len(src) {
// Update the hash table.
h := uint32(src[s]) | uint32(src[s+1])<<8 | uint32(src[s+2])<<16 |
uint32(src[s+3])<<24
- h = (h * 0x1e35a7bd) >> shift
- t, table[h] = table[h], s
+ p := &table[(h*0x1e35a7bd)>>shift]
+ t, *p = *p-1, s+1
// If t is invalid or src[s:s+4] differs from src[t:t+4], accumulate
a literal byte.
if t < 0 || s-t >= maxOffset || !equal4(src, t, s) {
s++
(10:59) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$

-j

Nigel Tao

unread,
Feb 9, 2013, 11:27:13 PM2/9/13
to Jan Mercl, Marc-Antoine Ruel, golang-dev
CC'ing mar...@chromium.org, who was looking recently at snappy-go
benchmarks. Marc-Antoine, if you missed the earlier conversation, it's
at https://groups.google.com/d/topic/golang-dev/LS7ynr4dhPY/discussion

The diff below looks sane (albeit no longer anything to do with sparse
arrays), but the -1/+1 will need some comments somewhere. Feel free to
send out a CL after https://codereview.appspot.com/7271049/ lands.

Jan Mercl

unread,
Feb 10, 2013, 4:06:40 AM2/10/13
to Nigel Tao, Marc-Antoine Ruel, golang-dev
On Sun, Feb 10, 2013 at 5:27 AM, Nigel Tao <nige...@golang.org> wrote:
> The diff below looks sane (albeit no longer anything to do with sparse
> arrays), but the -1/+1 will need some comments somewhere. Feel free to
> send out a CL after https://codereview.appspot.com/7271049/ lands.

Ack. Will do.

-j

Dmitry Chestnykh

unread,
Feb 10, 2013, 11:29:18 AM2/10/13
to golan...@googlegroups.com, Nigel Tao
On Friday, January 18, 2013 11:09:27 AM UTC+1, Jan Mercl wrote:
  // Iterate over the source bytes.
  var (
@@ -123,8 +120,8 @@
  for s+3 < len(src) {
  // Update the hash table.
  h := uint32(src[s]) | uint32(src[s+1])<<8 | uint32(src[s+2])<<16 |
uint32(src[s+3])<<24
- h = (h * 0x1e35a7bd) >> shift
- t, table[h] = table[h], s
+ p := &table[(h*0x1e35a7bd)>>shift]
+ t, *p = *p-1, s+1
  // If t is invalid or src[s:s+4] differs from src[t:t+4], accumulate
a literal byte.
  if t < 0 || s-t >= maxOffset || !equal4(src, t, s) {
  s++
(10:59) jnml@fsc-r550:~/src/code.google.com/p/snappy-go/snappy$

You can also eliminate duplicate loads from src, which equal4 does:

                b0 := src[s+0]
                b1 := src[s+1]
                b2 := src[s+2]
                b3 := src[s+3]
                h := ((uint32(b0) | uint32(b1)<<8 | uint32(b2)<<16 | uint32(b3)<<24)
                p := &table[(h*0x1e35a7bd)>>shift] 
                t, *p = *p-1, s+1 
                // If t is invalid or src[s:s+4] differs from src[t:t+4],       accumulate a literal byte.
                if t < 0 || s-t >= maxOffset || b0 != src[t+0] || b1 != src[t+1] || b2 != src[t+2] || b3 != src[t+3] {
                        s++
                        continue
                }

Jan Mercl

unread,
Feb 18, 2013, 8:10:42 AM2/18/13
to Nigel Tao, dch...@gmail.com, Marc-Antoine Ruel, golang-dev
On Sun, Feb 10, 2013 at 5:27 AM, Nigel Tao <nige...@golang.org> wrote:
CC'ing mar...@chromium.org, who was looking recently at snappy-go
benchmarks. Marc-Antoine, if you missed the earlier conversation, it's
at https://groups.google.com/d/topic/golang-dev/LS7ynr4dhPY/discussion

The diff below looks sane (albeit no longer anything to do with sparse
arrays), but the -1/+1 will need some comments somewhere. Feel free to
send out a CL after https://codereview.appspot.com/7271049/ lands.

Reply all
Reply to author
Forward
0 new messages