snappy compression really slow

Eric Z

unread,

Nov 19, 2013, 7:36:32 PM11/19/13

to golan...@googlegroups.com

Hello there,

I kind of need a good compression library but it turns out that the golang implementation of snappy is really slow. Encode speed is in the order of 100MB/s while the decode speed is in the order of 200MB/s. This way below the wiki claim "Compression speed is 250 MB/s and decompression speed is 500 MB/s" .

I read an earlier post in the forum regarding this issue https://groups.google.com/forum/#!topic/golang-dev/g2m2v081nxU . However, there isn't any clear conclusion to me. I tried to use Snappy to minimize disk and network overhead, but 100MB/s is even slower than HDD disks.

Have any one get some good ideas regarding what compression I should use in golang for better speed?

Thank you so much for help!

Cheers,
Eric

Here is the benchmark results in my single thread Intel E5-2620 2.00GHz

go version go1.2rc5 linux/amd64

snappy]$ go test -bench=.
PASS
BenchmarkWordsDecode1e3-24    500000          3369 ns/op    296.77 MB/s
BenchmarkWordsDecode1e4-24       50000         58917 ns/op    169.73 MB/s
BenchmarkWordsDecode1e5-24        2000        752878 ns/op    132.82 MB/s
BenchmarkWordsDecode1e6-24         500       7454509 ns/op    134.15 MB/s
BenchmarkWordsEncode1e3-24    100000         17999 ns/op    55.56 MB/s
BenchmarkWordsEncode1e4-24       10000        123028 ns/op    81.28 MB/s
BenchmarkWordsEncode1e5-24        2000       1476329 ns/op    67.74 MB/s
BenchmarkWordsEncode1e6-24         100    14422396 ns/op    69.34 MB/s
Benchmark_UFlat0-24        5000        403891 ns/op    253.53 MB/s
Benchmark_UFlat1-24         500       3294375 ns/op    213.12 MB/s
Benchmark_UFlat2-24    100000         17165 ns/op    7396.28 MB/s
Benchmark_UFlat3-24       10000        109116 ns/op    864.49 MB/s
Benchmark_UFlat4-24        1000       1631508 ns/op    251.06 MB/s
Benchmark_UFlat5-24       10000        117094 ns/op    210.11 MB/s
Benchmark_UFlat6-24       50000         51696 ns/op    215.68 MB/s
Benchmark_UFlat7-24    200000         12573 ns/op    295.94 MB/s
Benchmark_UFlat8-24         500       4613290 ns/op    223.21 MB/s
Benchmark_UFlat9-24        2000       1162495 ns/op    130.83 MB/s
Benchmark_UFlat10-24        2000        974933 ns/op    128.40 MB/s
Benchmark_UFlat11-24         500       3138411 ns/op    135.98 MB/s
Benchmark_UFlat12-24         500       3909131 ns/op    123.27 MB/s
Benchmark_UFlat13-24        1000       1823203 ns/op    281.49 MB/s
Benchmark_UFlat14-24       10000        193647 ns/op    197.47 MB/s
Benchmark_UFlat15-24    200000         14538 ns/op    290.74 MB/s
Benchmark_UFlat16-24        5000        397661 ns/op    298.21 MB/s
Benchmark_UFlat17-24        2000       1128084 ns/op    163.39 MB/s
Benchmark_ZFlat0-24        5000        675178 ns/op    151.66 MB/s
Benchmark_ZFlat1-24         200       9186338 ns/op    76.43 MB/s
Benchmark_ZFlat2-24        1000       2216938 ns/op    57.27 MB/s
Benchmark_ZFlat3-24        2000       1448770 ns/op    65.11 MB/s
Benchmark_ZFlat4-24        1000       2635193 ns/op    155.43 MB/s
Benchmark_ZFlat5-24       10000        290330 ns/op    84.74 MB/s
Benchmark_ZFlat6-24       10000        101738 ns/op    109.60 MB/s
Benchmark_ZFlat7-24       50000         36105 ns/op    103.06 MB/s
Benchmark_ZFlat8-24         200       7907748 ns/op    130.22 MB/s
Benchmark_ZFlat9-24        1000       2254462 ns/op    67.46 MB/s
Benchmark_ZFlat10-24        1000       1962675 ns/op    63.78 MB/s
Benchmark_ZFlat11-24         500       6062985 ns/op    70.39 MB/s
Benchmark_ZFlat12-24         200       7805806 ns/op    61.73 MB/s
Benchmark_ZFlat13-24         500       3007613 ns/op    170.64 MB/s
Benchmark_ZFlat14-24        5000        481279 ns/op    79.45 MB/s
Benchmark_ZFlat15-24       50000         45099 ns/op    93.73 MB/s
Benchmark_ZFlat16-24        5000        685679 ns/op    172.95 MB/s
Benchmark_ZFlat17-24        1000       1862138 ns/op    98.98 MB/s
ok     code.google.com/p/snappy-go/snappy    105.189s

Dave Cheney

unread,

Nov 19, 2013, 8:00:06 PM11/19/13

to Eric Z, golang-nuts

I've added the snappy benchmarks to autobench. For the record, a
single sample on a linux/amd64 core i5 host shows a minimum 10-15%
improvement moving from go 1.1 to 1.2

#snappy
benchmark old ns/op new ns/op delta
BenchmarkWordsDecode1e3 4290 2600 -39.39%
BenchmarkWordsDecode1e4 49997 38101 -23.79%
BenchmarkWordsDecode1e5 583521 484093 -17.04%
BenchmarkWordsDecode1e6 5154234 4584456 -11.05%
BenchmarkWordsEncode1e3 15456 13532 -12.45%
BenchmarkWordsEncode1e4 102882 83080 -19.25%
BenchmarkWordsEncode1e5 1173426 1039936 -11.38%
BenchmarkWordsEncode1e6 9550965 8671430 -9.21%

benchmark old MB/s new MB/s speedup
BenchmarkWordsDecode1e3 233.08 384.49 1.65x
BenchmarkWordsDecode1e4 200.01 262.46 1.31x
BenchmarkWordsDecode1e5 171.37 206.57 1.21x
BenchmarkWordsDecode1e6 194.02 218.13 1.12x
BenchmarkWordsEncode1e3 64.70 73.90 1.14x
BenchmarkWordsEncode1e4 97.20 120.36 1.24x
BenchmarkWordsEncode1e5 85.22 96.16 1.13x
BenchmarkWordsEncode1e6 104.70 115.32 1.10x

> --
> You received this message because you are subscribed to the Google Groups
> "golang-nuts" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-nuts...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Andrew Gerrand

unread,

Nov 19, 2013, 8:18:06 PM11/19/13

to Eric Z, golang-nuts

On 20 November 2013 11:36, Eric Z <hadoo...@gmail.com> wrote:

Have any one get some good ideas regarding what compression I should use in golang for better speed?

Measure the performance of the available codecs and choose the one that makes the best tradeoff between size and speed.

If you find the snappy package inefficient, you could try profiling it to see if there are any easy performance gains to be had.

Andrew

Eric Z

unread,

Nov 19, 2013, 8:23:32 PM11/19/13

to golan...@googlegroups.com, Eric Z

Thank you Dave.

Yes, I did notice that 1.2 is faster, but that's still significantly slow than C implementation.

Eric.

Eric Z

unread,

Nov 19, 2013, 8:32:54 PM11/19/13

to golan...@googlegroups.com, Eric Z

Thank you Andrew.

Any intuition which one would be faster? The compression codecs that come with go are good in compression ratio instead of speed.

Eric

Jian Zhen

unread,

Nov 19, 2013, 9:14:21 PM11/19/13

to Eric Z, golang-nuts

Eric,

I ran a similar test about a month and a half ago. My test was specifically on compressing integers. I tested gzip, lzw and snappy. Snappy is always faster speed-wise, but always worst compression-wise. This is probably to be expected given the design goal.

For snappy compression, I got anywhere from 61MB/s to 470 MB/s, depending on how the integer list is sorted (in my case at least). For decompression, I got between 60 MB/s to 223 MB/s. I haven’t profiled it to see where the bottleneck is.

You can find my numbers at the end of this post: http://zhen.org/blog/benchmarking-integer-compression-in-go/

The numbers in blue are MiS (millions of integers per second). These are 32-bit integers so just take those numbers multiply by 4 and you get MB/s.

Not sure if it helps but at least it’s another data point for you.

Jian

p.s. The numbers are using Go version 1.2rc1. I suspect gccgo will be somewhat faster but I didn’t test that.

james4k

unread,

Nov 19, 2013, 10:01:23 PM11/19/13

to golan...@googlegroups.com, Eric Z

lz4 is another designed-for-speed compression algorithm (maybe faster than snappy), but not sure how good the Go packages are: http://godoc.org/?q=lz4

Eric Z

unread,

Nov 20, 2013, 2:21:18 AM11/20/13

to golan...@googlegroups.com, Eric Z

Thank you Jian. I didn't know there are so many impressively fast compressing techniques. Let me give a try. I was looking for in-place compression as the data could take almost all the memory, and I took a very rough look at your code and figure out that probably I have to write something myself to make the compression in place. Thanks a lot for sharing your experiments.

Eric

Eric Z

unread,

Nov 20, 2013, 2:22:36 AM11/20/13

to golan...@googlegroups.com, Eric Z

Thank you James.

I heard lz4 is pretty fast, and let me try to see if any of these go packages could deliver a decent speed.

Eric

Jian Zhen

unread,

Nov 20, 2013, 4:50:53 AM11/20/13

to Eric Z, golang-nuts

I ran a quick test with the snappy test data set and a few of my own files.

go-lz4 turned out to be slower than snappy-go in all cases, compression or decompression.
Compression ratio-wise they are pretty comparable.
https://docs.google.com/spreadsheet/pub?key=0ApDLtJuUH-1rdFY1LUJRQk4xUFNqUFE4TlRBTTJhbnc&output=html
https://gist.github.com/zhenjl/7560517

Note the LZ4 decompression columns are not from the current github repo. I applied a quick optimization to the writer.go/cp() function as it was taking a long time. For small files the performance increase is not as noticeable. For large files (see last 4 lz4 lines in spreadsheet), the decompression performance at least doubled. Will be sending in a PR to the original author.

Jan Mercl

unread,

Nov 20, 2013, 5:28:34 AM11/20/13

to Jian Zhen, Eric Z, golang-nuts

On Wed, Nov 20, 2013 at 10:50 AM, Jian Zhen <zhe...@gmail.com> wrote:
> I ran a quick test with the snappy test data set and a few of my own files.

You might want to check also http://godoc.org/github.com/cznic/zappy

-j

Jian Zhen

unread,

Nov 20, 2013, 6:11:36 AM11/20/13

to Jan Mercl, Eric Z, golang-nuts

Interesting. zappy is faster than snappy for both compression and decompression, in most cases.

Compression ratio is comparable for both but for a couple files (html_x_4, ts.txt) it was far better.

I updated the google spreadsheet with the zappy data:

https://docs.google.com/spreadsheet/pub?key=0ApDLtJuUH-1rdFY1LUJRQk4xUFNqUFE4TlRBTTJhbnc&output=html

Eric Z

unread,

Nov 20, 2013, 12:43:20 PM11/20/13

to golan...@googlegroups.com, Jan Mercl, Eric Z

Hi Jian,

What is the size of your input file? I found the zappy code buggy as it does not properly handle input size larger than MaxInt32. Besides, in a lot of cases, zappy is slow in decompression.

Eric

Jan Mercl

unread,

Nov 20, 2013, 12:47:03 PM11/20/13

to Eric Z, golang-nuts

Please report an issue at github. If there's a repro case/data, please include it/them as well. Thanks a lot.

-j

Jian Zhen

unread,

Nov 20, 2013, 1:06:39 PM11/20/13

to Eric Z, golang-nuts, Jan Mercl

Eric,

These are the original sizes. Nothing over Maxint32.

152864K dstip.txt.gz

151140K latency.txt.gz

152996K srcip.txt.gz

3028K ts.txt.gz

In which cases did you find zappy slow in decomp? I am just curious.

thx

Jian

Message has been deleted

Damian Gryski

unread,

Nov 21, 2013, 7:47:55 PM11/21/13

to golan...@googlegroups.com, Eric Z

On Wednesday, November 20, 2013 10:50:53 AM UTC+1, Jian Zhen wrote:

I ran a quick test with the snappy test data set and a few of my own files.

go-lz4 turned out to be slower than snappy-go in all cases, compression or decompression.

I think the performance issues with lz4 can be addressed without too much hassle. Firstly, some routines need to be inlined or eliminated. Second, if the destination buffer could be preallcoated as with snappy and zappy (because they store the uncompressed length along with the compressed content.) As it is, we have to keep using 'append' instead of a simple dst[i] = srcbyte.

The other problem is that array-append is slower than unrolling the loop and doing it by hand. That is,


        for ii := uint32(0); ii < length; ii++ {
                d.dst = append(d.dst, d.src[d.spos+ii])
        }

is faster than
         d.dst = append(d.dst, d.src[d.spos:d.spos+length]...)

Again, this wouldn't be an issue if we could preallocate d.dst in its entirety.

Damian

minux

unread,

Nov 21, 2013, 8:01:18 PM11/21/13

to Damian Gryski, Eric Z, golang-nuts

On Nov 21, 2013 7:47 PM, "Damian Gryski" <dgr...@gmail.com> wrote:
> The other problem is that array-append is slower than unrolling the loop and doing it by hand. That is,
>
>
>         for ii := uint32(0); ii < length; ii++ {
>                 d.dst = append(d.dst, d.src[d.spos+ii])
>         }
>
> is faster than
>          d.dst = append(d.dst, d.src[d.spos:d.spos+length]...)

i don't expect the latter to be slower
because append will know the length of the final dst slice, so it could potentially save some allocation and copying than the other one.

Could you please explain why the former is faster? length is small?

Damian Gryski

unread,

Nov 22, 2013, 2:09:45 AM11/22/13

to golan...@googlegroups.com, Damian Gryski, Eric Z

Yes, this loop is in the part that deals with copying "literals" (things that were not found in the lookup window). For compressible documents, the lengths will be small. I was hoping to avoid code like "if length > appendThreshold { d.dst = append... ) else { for i := 0. ... }, but it seems like there will always be a tradeoff there.

The larger win will be preallocation, and I'm going implement that tonight (and move away from append() entirely) and see what the numbers look like. Of course, I'll probably still have a copyThreshold variable to determine when to call copy() vs a for loop.

Damian

Damian Gryski

unread,

Nov 22, 2013, 5:47:49 AM11/22/13

to golan...@googlegroups.com, Damian Gryski, Eric Z

And indeed, this optimization gives a nice speed increase:

benchmark old ns/op new ns/op delta

BenchmarkLZ4Decode 4480230 3876438 -13.48%

BenchmarkWordsDecode1e3 6045 4528 -25.10%

BenchmarkWordsDecode1e4 68623 56984 -16.96%

BenchmarkWordsDecode1e5 740566 654634 -11.60%

BenchmarkWordsDecode1e6 6577977 5991773 -8.91%

benchmark old MB/s new MB/s speedup

BenchmarkWordsDecode1e3 165.42 220.82 1.33x

BenchmarkWordsDecode1e4 145.72 175.49 1.20x

BenchmarkWordsDecode1e5 135.03 152.76 1.13x

BenchmarkWordsDecode1e6 152.02 166.90 1.10x

I'll clean it up and file a pull request tonight.

Damian

Damian Gryski

unread,

Nov 22, 2013, 5:40:05 PM11/22/13

to golan...@googlegroups.com, Eric Z

For those of you who are still interested in LZ4 support, my pull-request has just been merged: https://github.com/bkaradzic/go-lz4/pull/8

I ended up being able to do a number of speedups for decoding. It's still slower than snappy, but 30-40% faster than it was before.

Damian

Jian Zhen

unread,

Nov 22, 2013, 5:57:02 PM11/22/13

to Damian Gryski, golang-nuts, Eric Z

Good work Damian!

Peter Waller

unread,

Nov 24, 2013, 12:49:55 PM11/24/13

to golan...@googlegroups.com

If you're interested in raw speed, I've had some success calling the C implementations of zlib and lz4. I'm afraid I don't have recent benchmarks handy, but I recall at least a factor of 2 speed improvement when I last tried.

If you want to do your own benchmark, here's my go at clz4 bindings:

http://godoc.org/github.com/pwaller/go-clz4

Damian Gryski

unread,

Nov 24, 2013, 6:19:10 PM11/24/13

to golan...@googlegroups.com

Indeed, running against /usr/share/dict/words we see a considerable difference: 3.75x faster on encode, 5.7x faster on decode.

BenchmarkLZ4Encode 100 14027833 ns/op

BenchmarkCLZ4Encode 500 3723171 ns/op

BenchmarkLZ4Decode 500 6194326 ns/op

BenchmarkCLZ4Decode 2000 1079959 ns/op

Damian

Reply all

Reply to author

Forward