Benchmarking Go 1.6beta2 and possible GC scalability problem

2,289 views
Skip to first unread message

Konstantin Shaposhnikov

unread,
Jan 16, 2016, 2:05:08 AM1/16/16
to golang-dev
Hi,

I've been experimenting with the Go application (https://github.com/TechEmpower/FrameworkBenchmarks/blob/master/frameworks/Go/go-raw/src/hello/hello.go) for TechEmpower benchmark (https://github.com/TechEmpower/FrameworkBenchmarks) and when comparing Go 1.5.3 and Go 1.6beta2 I found a performance regression that I believe is GC-related.

In my tests the application compiled with Go 1.6beta2 doesn't scale as well as the same application compiled with Go 1.5.3. The problem becomes more visible on a server with larger numbers of cores. However with GOGC=800 both versions scale similarly. This makes me think that the some changes to GC in 1.6 are responsible.

I benchmarked the following TechEmpower tests:
- /fortune that queries ~15 records from a database and renders them using html/template (http://frameworkbenchmarks.readthedocs.org/en/latest/Project-Information/Framework-Tests/#fortunes)

For benchmarking I used three m4.4xlarge (16 cores, 60GB of RAM) EC2 instances in the same datacenter. The database (MySQL), the load testing program (wrk and wrk2) and the application itself were running on separate servers.

/json results were similar for 1.6beta2 and 1.5.3 up to 64 wrk threads/connections (wrk -t 64 -c 64). With 128 threads 1.6beta2 showed 144k requests per second comparing to 166k requests per second with Go 1.5.3. Full results can be found here: https://gist.github.com/kostya-sh/c6cd6adc377659a6c963#file-json-txt

/fortune showed similar difference for just 64 wrk threads (41k reqs/sec vs 30k reqs/sec). Full results: https://gist.github.com/kostya-sh/c6cd6adc377659a6c963#file-fortune-txt

Additionally I recorded latency distribution for /fortune at 20k reqs/sec and 30k reqs/sec. At 20k reqs/sec latency distribution is roughly the same. However at 30k reqs/sec Go 1.6beta2 performs worse:

Latency    1.5.3     1.6beta2
 50.000%    2.03ms   3.55ms
 75.000%    2.76ms   5.38ms
 90.000%    3.66ms   8.49ms
 99.000%    5.27ms  18.01ms
 99.900%    6.44ms  27.06ms
 99.990%    7.46ms  32.72ms
 99.999%    8.30ms  35.84ms
100.000%    8.77ms  36.83ms


After that I repeated the same test on three m4.10xlarge (40 cores, 60 GB of RAM) servers. The results are similar:

Is it expected that Go 1.6 GC doesn't scale as well as Go 1.5 GC with small heap size on servers with large number of CPUs? Or perhaps there is something wrong with my benchmarking approach?

Regards,
Konstantin

Ian Lance Taylor

unread,
Jan 16, 2016, 3:28:23 PM1/16/16
to Konstantin Shaposhnikov, golang-dev, Rick Hudson, Austin Clements
[ +austin, +rlh ]
> --
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Konstantin Shaposhnikov

unread,
Jan 16, 2016, 9:04:13 PM1/16/16
to Ian Lance Taylor, golang-dev, Rick Hudson, Austin Clements
It looks like GC is called more often with Go 1.6.

I ran a load test (/wrk -t 32 -c 32 -d 60 http://localhost:8080/json)
on my laptop (4 cores) a few times and collected total allocations and
number of GCs after each run:

1.5

TotalAlloc NumGC
10130419048 2763
9851675824 2690
10127972312 2767
10000730680 2728

1.6

TotalAlloc NumGC
10197172728 3142
10522884384 3239
9880468792 3045
10021569672 3084

Brad Fitzpatrick

unread,
Jan 26, 2016, 10:49:10 AM1/26/16
to Ian Lance Taylor, Konstantin Shaposhnikov, golang-dev, Rick Hudson, Austin Clements
ping Austin, Rick. Is this expected?

Rick Hudson

unread,
Jan 26, 2016, 12:26:36 PM1/26/16
to Brad Fitzpatrick, Ian Lance Taylor, Konstantin Shaposhnikov, golang-dev, Austin Clements
Is it possible to run these benchmarks with GODEBUG=gctrace=1 ?

These results will give us some indication if it is a GC issue and some sense of the size of the heap. 

Also the benchmarks use the term "threads" are these goroutines or are these GOMAXPROCS numbers?

- Rick

Austin Clements

unread,
Jan 26, 2016, 2:28:44 PM1/26/16
to Rick Hudson, Brad Fitzpatrick, Ian Lance Taylor, Konstantin Shaposhnikov, golang-dev
Hi Konstantin. In addition to what Rick mentioned, it would be useful if you could record and send profiles of the tests on 1.5.3 and 1.6 (unfortunately, pprof can't diff profiles from different binaries, but they're usually easy enough to compare by hand).

Konstantin Shaposhnikov

unread,
Jan 26, 2016, 6:59:13 PM1/26/16
to Austin Clements, Rick Hudson, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
Hi,

Threads in my original email referred to the number of threads used by
the load testing program (wrk -t NumberOfThreads .....).

For both Go 1.5 and 1.6 GOMAXPROCS was set to NumCPU at the start of
the program: runtime.GOMAXPROCS(runtime.NumCPU()). I wonder if NumCPU
returns the same value for Go 1.5 and 1.6?

I will re-run the tests with gctrace and profiling enabled and post
results here.

K

Dave Cheney

unread,
Jan 26, 2016, 7:01:31 PM1/26/16
to Konstantin Shaposhnikov, Austin Clements, Rick Hudson, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
In Go 1.5 the default value of GOMAXPROCS changed,
https://tip.golang.org/doc/go1.5#introduction

> runtime.GOMAXPROCS(runtime.NumCPU())

Please remove this and use env GOMAXPROCS=... to set the value
externally to your program.

On Wed, Jan 27, 2016 at 10:58 AM, Konstantin Shaposhnikov

Konstantin Shaposhnikov

unread,
Jan 29, 2016, 11:10:48 PM1/29/16
to Dave Cheney, Austin Clements, Rick Hudson, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
Hi,

I've collected profiling and gctrace output for json benchmark that I
had described in my original email. I used two m4.10xlarge EC2
instances for testing (40 cores, 60GB of RAM).

All results are here:
https://drive.google.com/file/d/0B9Oy8xv00g4Db2lta0h1WHVJN3c/view?usp=sharing.
In the archive you can find:

- two binaries: hello15 (compiled with Go 1.5.3) and hello16 (compiled
with Go 1.6rc1)
- *.log files with GODEBUG=gctrace=1,schedtrace=1000 output
- *.prof file with profiling data
- *.log and *.prof files are named go<VERSION>_<GOGC>_<NUM_WRK_CONNECTIONS>.

The corresponding benchmarking results:
https://gist.github.com/kostya-sh/7ac7e52b4519694f5f4a and a chart:
https://docs.google.com/spreadsheets/d/1MfB_-lfvyXOXTKS0mdqabPgkHjXgMKFawwQbHOWfP4M/pubchart?oid=1386599290&format=image

I hope this data helps to identify the reason of the decreased throughput.

Konstantin

Caleb Spare

unread,
Jan 30, 2016, 12:16:34 AM1/30/16
to Konstantin Shaposhnikov, Dave Cheney, Austin Clements, Rick Hudson, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
BTW m4.10xlarge instances have 160GB of ram, not 60GB.

Konstantin Shaposhnikov

unread,
Jan 30, 2016, 12:27:16 AM1/30/16
to Caleb Spare, Dave Cheney, Austin Clements, Rick Hudson, Brad Fitzpatrick, Ian Lance Taylor, golang-dev
You right, sorry for the typo. Not that it matters though, the
benchmarked application uses 40Mb max ;) I chose m4.10xlarge because
of 40 CPUs.

Brad Fitzpatrick

unread,
Jan 30, 2016, 10:51:34 AM1/30/16
to Konstantin Shaposhnikov, Dave Cheney, Austin Clements, Rick Hudson, Ian Lance Taylor, golang-dev
I moved this to a bug: https://github.com/golang/go/issues/14161

Let's discuss there.

Bugs get tracked. Emails get lost.

Reply all
Reply to author
Forward
0 new messages