Why Go gzip performance so bad?

4,268 views
Skip to first unread message

Giang Tran

unread,
Aug 24, 2015, 8:41:06 PM8/24/15
to golang-nuts
Hi!

I'm using Go for a http server that serve content (dynamic generated javascript, ~10kb each), It seem that performance was very poor when I enable gzip compression.
a request rate at 4000req/s could use full 8 core of my server
I have try https://github.com/youtube/vitess/tree/master/go/cgzip and  https://github.com/klauspost/compress/gzip as alternative for butin, they are better, but not much.
Is there any tunning plan for a better compression?

Thanks.

Matt Silverlock

unread,
Aug 24, 2015, 8:43:38 PM8/24/15
to golang-nuts
Can you post:

1. Your code as it stands with your gzip middleware
2. Your benchmarking strategy - e.g. the tool you're using, the methodology ("how") and the results?

Giang Tran

unread,
Aug 24, 2015, 9:03:04 PM8/24/15
to golang-nuts
Hello
My test code is at http://pastebin.com/XUYzPcSN
I use wrk to benmark the server with: wrk -d20s -c256 -t 4 -H "Connection: Keep-Alive" "http://localhost:8080/"
give me ~4k req/s vs 65k req/s for non-comression version.

Dave Cheney

unread,
Aug 24, 2015, 9:27:00 PM8/24/15
to golang-nuts
Which version of go are you using?

Giang Tran

unread,
Aug 24, 2015, 9:44:14 PM8/24/15
to golang-nuts
I test it with both Go 1.4.2 and 1.5 Release, version 1.4.2 give better result a little.

unread,
Aug 25, 2015, 12:57:40 AM8/25/15
to golang-nuts
gzip.NewWriter() seems to be allocating big chunks of memory that are zeroed by the Go runtime. Try reusing the writers with http://golang.org/pkg/compress/gzip/#Writer.Reset.

Giang Tran

unread,
Aug 25, 2015, 1:59:54 AM8/25/15
to golang-nuts
I have a small change to reuse gzip Writer http://pastebin.com/Hz7nYQMq , performance still the same.

unread,
Aug 25, 2015, 3:34:12 AM8/25/15
to golang-nuts
There are too many cache misses. Results from "perf stat":

Performance counter stats for './server2':

     74376.626319      task-clock (msec)         #    3.391 CPUs utilized           
          105,900      context-switches          #    0.001 M/sec                   
              470      cpu-migrations            #    0.006 K/sec                   
           74,791      page-faults               #    0.001 M/sec                   
  251,419,189,095      cycles                    #    3.380 GHz                     
  347,387,118,878      instructions              #    1.38  insns per cycle         
  <not supported>      stalled-cycles-frontend   
  <not supported>      stalled-cycles-backend    
    2,381,801,767      cache-references          #   32.024 M/sec                   
    1,113,542,089      cache-misses              #   46.752 % of all cache refs     
   61,745,751,447      branches                  #  830.177 M/sec                   
    1,249,929,295      branch-misses             #    2.02% of all branches         

     21.936505912 seconds time elapsed

You can lower cache misses by limiting the number of gzip.Writers to GOMAXPROCS. This helps - IPC improves to 1.76-1.82 on my machine - but unfortunately there are still many cache misses (29.949 % of all cache refs).

On Tuesday, August 25, 2015 at 7:59:54 AM UTC+2, Giang Tran wrote:
I have a small change to reuse gzip Writer http://pastebin.com/Hz7nYQMq , performance still the same.

Performance of 'server2' has improved over the performance of 'server' on my machine by 24% to 4372 req/s. With max 8 gzip.Writers, performance improves to 5520 req/s (+57%).

Giang Tran

unread,
Aug 25, 2015, 4:29:54 AM8/25/15
to golang-nuts
simple try gzip module to compress the text in the above example in Go console app, single goroutine - gzip writer, GOMAXPROCS=1 , give me 1.5k req/s , this performance is very poor vs java or c++ version. ;(

func testCompression(){
 w
:= gzip.NewWriter(ioutil.Discard)
 
for i:=0;i<10000;i++{
 w
.Reset(ioutil.Discard)
 w
.Write(data)
 w
.Close()
 
}
}

func main
(){
 runtime
.GOMAXPROCS(1)
 testCompression
()
}

time ./testgzip

real    0m6.612s
user    0m6.608s
sys     0m0.000s

Naoki INADA

unread,
Aug 25, 2015, 5:13:00 AM8/25/15
to golang-nuts
How your Java / C++ version fast?

package main


import (
 
"compress/gzip"
 kgzip
"github.com/klauspost/compress/gzip"
 
"io/ioutil"
 
"testing"
)


var StdGzipWriter = gzip.NewWriter(ioutil.Discard)
var KlausGzipWriter = kgzip.NewWriter(ioutil.Discard)


func
BenchmarkStdGzip(b *testing.B) {
 
for i := 0; i < b.N; i++ {
 
StdGzipWriter.Reset(ioutil.Discard)
 
StdGzipWriter.Write(data)
 
StdGzipWriter.Close()
 
}
}


func
BenchmarkKlausGzip(b *testing.B) {
 
for i := 0; i < b.N; i++ {
 
KlausGzipWriter.Reset(ioutil.Discard)
 
KlausGzipWriter.Write(data)
 
KlausGzipWriter.Close()
 
}
}


var data = ...


$ GOMAXPROCS=1 go test -bench=.
testing
: warning: no tests to run
PASS
BenchmarkStdGzip      2000    594023 ns/op
BenchmarkKlausGzip    5000    359110 ns/op



Giang Tran

unread,
Aug 25, 2015, 5:58:09 AM8/25/15
to golang-nuts
this is on my machine (  Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz )

GOMAXPROCS=1 go test -bench=.
2015/08/25 16:54:50 second init
testing
: warning: no tests to run
PASS
BenchmarkStdGzip      2000    680051 ns/op
BenchmarkKlausGzip    2000    828096 ns/op
ok   github
.com/secmask/gget 3.181s



Java version

public static void main(String[] args) throws IOException {
 
byte[] data = Files.readAllBytes(Paths.get("data.txt"));
 
ByteArrayOutputStream bout = new ByteArrayOutputStream(20*1024);
 
final int N = 20000;
 
long start = System.nanoTime();
 
for(int i=0;i<N;i++){
 
GZIPOutputStream gzout = new GZIPOutputStream(bout);
 gzout
.write(data);
 gzout
.close();
 bout
.reset();
 
}
 
System.out.format("%d ns/op",(System.nanoTime() - start)/N);
 
}

$ > 227196 ns/op

Naoki INADA

unread,
Aug 25, 2015, 6:31:48 AM8/25/15
to golang-nuts
On my machine (MacBook Pro 2013, Core i5 2.6GHz), Java speed is near to you,
but Go version is much faster

Java: 211536 ns/op
Go (std): 568859 ns/op
Go (klaus): 351132 ns/op



It's worse than Java, but better than your score.
What is your environ?
Could you install fresh Go 1.5 and klaus/compress?

Giang Tran

unread,
Aug 25, 2015, 6:41:13 AM8/25/15
to golang-nuts
ok, my previous result is Go 1.4.2, and this is on 1.5

$ GOMAXPROCS=1 go test -bench=.
2015/08/25 17:39:37 second init
testing
: warning: no tests to run
PASS
BenchmarkStdGzip      2000    581464 ns/op
BenchmarkKlausGzip    5000    358567 ns/op
ok   github
.com/secmask/gget 3.064s

Naoki INADA

unread,
Aug 25, 2015, 6:53:35 AM8/25/15
to golang-nuts
I'm happy to hear about huge performance improvement of Go 1.5!!

FYI, youtube/vitess/go/cgzip performance is same to Java.


$ GOMAXPROCS=1 ./t.test -test.bench .

testing
: warning: no tests to run
PASS
BenchmarkStdGzip   11243 -> 3389 byte
11243 -> 3389 byte
11243 -> 3389 byte
   
3000    563957 ns/op
BenchmarkKlausGzip 11243 -> 3436 byte
11243 -> 3436 byte
11243 -> 3436 byte
   
5000    353031 ns/op
BenchmarkCGzip     11243 -> 3382 byte
11243 -> 3382 byte
11243 -> 3382 byte
   
10000    215544 ns/op


cloudflare's zlib [1] may make cgzip bit faster.

Giang Tran

unread,
Aug 25, 2015, 11:17:50 AM8/25/15
to golang-nuts
OK, cgzip make it good enough, combine with HTTP component give me ~10k req/s (on the same input data), Java version give me 15k req/s (https://github.com/secmask/jvertweb)
I tried zlib from cloudflare, it's good, Go HTTP server can serve up to 12k req/s, but some of our servers don't have SSE4.2.

James Aguilar

unread,
Aug 25, 2015, 11:40:15 AM8/25/15
to golang-nuts
This discussion has been interesting, but have you considered modifying your strategy regarding how and what to serve? What proportion of that JavaScript is actually dynamic? Can you remove the dynamism from most of it, serve it statically with caching, and then serve up a small config.js that is dynamic and uncompressed?

Giang Tran

unread,
Aug 25, 2015, 12:31:03 PM8/25/15
to golang-nuts
@James: all of the test here are based on a static content, in production, we receive the dynamic content from an other services, most of them are really dynamic ( I have buit a cache for it, but cache hit ratio is very low so I remove them then)

Benjamin Measures

unread,
Aug 25, 2015, 5:06:52 PM8/25/15
to golang-nuts
On Tuesday, 25 August 2015 17:31:03 UTC+1, Giang Tran wrote:
@James: all of the test here are based on a static content, in production, we receive the dynamic content from an other services, most of them are really dynamic ( I have buit a cache for it, but cache hit ratio is very low so I remove them then)

If you're receiving content from "other services" via http and merely forwarding it on, you can skip decompression/compression altogether. DisableCompression on the Transport [1], accept-encoding gzip from "other services" and forward it on (setting headers appropriately).

Klaus Post

unread,
Aug 25, 2015, 5:53:23 PM8/25/15
to golang-nuts
Hi!

I wrote a blog article on webserver optimization, but I assume you have already read it: http://blog.klauspost.com/gzip-performance-for-go-webservers/

Some things that could help you:

* Reduce compression level to level 3 or 4. You gain very little compression and it runs approximately 1.3x the speed of level 6 (default).
* If you really need speed, use the "extra" constant time compression in my library. Set level to '-2'. It is more than 4x faster than level 1, but content is typically about twice the size.

I am writing a blog entry on the constant time compression, which will be ready in a few days.

/Klaus

Giang Tran

unread,
Aug 25, 2015, 11:20:42 PM8/25/15
to golang-nuts
@Benjamin: No, I'm not receiving content from HTTP, I can ask them to provide compression version of the service, but It's interesting to see how Go does the job : )
@Klaus: yep, set compression level is good idea, actually I have many solutions for this (ex, comression on HAProxy), but I think Go(compiler & runtime) should have some more optmize for CPU sensitive job.
Reply all
Reply to author
Forward
0 new messages