go1 benchmarks show a large degree of variance

958 views
Skip to first unread message

Dave Cheney

unread,
Apr 2, 2017, 11:41:13 PM4/2/17
to golang-dev
Hello,

I'm concerned that the go1 benchmarks have a large amount of variation
between runs. I'm used to the variance being in the range of +/- 1-3%,
but the numbers comparing go 1.8 to tip show variances in the range of
+/- 10-15% on amd64 and larger on arm (see below)

linux/amd64 https://perf.golang.org/search?q=upload:20170403.4
linux/arm (raspberry pi 3) https://perf.golang.org/search?q=upload:20170403.5

Questions:

1. is this something I should be worried about?
2. is anyone else worried about this?

Thanks

Dave

Keith Randall

unread,
Apr 2, 2017, 11:53:04 PM4/2/17
to Dave Cheney, golang-dev
Yes, this worries me.  Variance makes it really hard to tell whether something that should give you a 1% improvement actually did or not.

The obvious suspect is GC, but I have no evidence to confirm.  Any chance you can do some reruns with GOGC=off and see what happens?


Dave

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael Jones

unread,
Apr 3, 2017, 12:33:38 AM4/3/17
to Dave Cheney, Keith Randall, golang-dev
I have seen variance too... but I suspected it was from my use of mmap and the mystery of what's cached by the os.

To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Michael T. Jones
michae...@gmail.com

Dave Cheney

unread,
Apr 3, 2017, 12:47:07 AM4/3/17
to Keith Randall, golang-dev
Didn't seem to make much difference (gc was definitely off based on
the memory consumption during the benchmark)

for t in go1.go1.8.test go1.8775747.test ; do for i in $(seq 1 20);
do env GOGC=off ./${t} -test.bench=. >> ${t}.txt; done; done

https://perf.golang.org/search?q=upload:20170403.6
>> email to golang-dev+...@googlegroups.com.
>> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "golang-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to golang-dev+...@googlegroups.com.

Dan Kortschak

unread,
Apr 3, 2017, 2:52:37 AM4/3/17
to Keith Randall, golang-dev
When Austin wrote the significance tests for benchstat I noticed that
the time distribution for inputs was fairly highly positively skewed
(which is why the test used is Mann-Whitney U rather than Student's t).
It might be worth looking at whether broadly the distribution has
changed in skew - particularly whether the high tail has lengthened.

On Sun, 2017-04-02 at 20:53 -0700, 'Keith Randall' via golang-dev
wrote:

Quentin Smith

unread,
Apr 3, 2017, 10:05:57 AM4/3/17
to Dave Cheney, golang-dev
Is your Raspberry Pi quiescent when running those benchmarks? I have found that linux-arm shows far less variance than you are seeing. (In fact, it shows far less variance than any other platform I've tested - virtual or physical.)


If you tell me what commits you used for your uploaded results, I can run them on the same hardware.

--Quentin

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

Russ Cox

unread,
Apr 3, 2017, 11:38:51 AM4/3/17
to Dave Cheney, golang-dev, Keith Randall
Yes, absolutely, those benchmark results are no good. As Keith said, you have to get down to lower variances before any of the data becomes meaningful.

However, I regularly see results with significantly lower variance than the ones you posted, especially on linux/amd64. Is the machine running something else too? Is CPU frequency scaling or governing on? From the data files you uploaded it looks like you ran the tests 20X in a shell loop. Another thing to try is to use -test.count=20 instead, but I would not expect the variances you are seeing in either mode.

If the linux/amd64 machine is quiet and you still can't get low variance, another thing to try is to run the benchmarks under https://github.com/aclements/perflock. The main thing perflock does is make sure that multiple programs run under perflock run separately, not simultaneously (like a real lock). But a secondary thing it does is change the CPU governing and frequency scaling settings to try to make the machine behave more consistently.

I just ran these on my Linux workstation, using go1.8, and I got more like <1% variance:

./go1.test -test.count=20 -test.run=XXX -test.bench=.

perflock ./go1.test -test.count=20 -test.run=XXX -test.bench=.
https://perf.golang.org/search?q=upload:20170403.8

Austin is running the tests in a shell loop on his workstation (instead of -test.count), at tip, and not seeing anything like 10% variance either.

Russ

Austin Clements

unread,
Apr 3, 2017, 11:45:21 AM4/3/17
to Russ Cox, Dave Cheney, golang-dev, Keith Randall
On Mon, Apr 3, 2017 at 11:38 AM, Russ Cox <r...@golang.org> wrote:
Austin is running the tests in a shell loop on his workstation (instead of -test.count), at tip, and not seeing anything like 10% variance either.

Here are my results on linux/amd64, run in a shell loop and with perflock: https://perf.golang.org/search?q=upload:20170403.11. Most of the variances are 0% or 1%.

Dave Cheney

unread,
Apr 3, 2017, 4:07:47 PM4/3/17
to Quentin Smith, golang-dev
It is idle, I've even uninstalled the desktop environment and access it via SSH. 


On Tue, 4 Apr 2017, 00:05 Quentin Smith <que...@golang.org> wrote:
Is your Raspberry Pi quiescent when running those benchmarks? I have found that linux-arm shows far less variance than you are seeing. (In fact, it shows far less variance than any other platform I've tested - virtual or physical.)


If you tell me what commits you used for your uploaded results, I can run them on the same hardware.

I don't have the exact revisons, it was 1.8 vs a build last week.



--Quentin
On Apr 2, 2017 23:41, "Dave Cheney" <da...@cheney.net> wrote:
Hello,

I'm concerned that the go1 benchmarks have a large amount of variation
between runs. I'm used to the variance being in the range of +/- 1-3%,
but the numbers comparing go 1.8 to tip show variances in the range of
+/- 10-15% on amd64 and larger on arm (see below)

linux/amd64 https://perf.golang.org/search?q=upload:20170403.4
linux/arm (raspberry pi 3) https://perf.golang.org/search?q=upload:20170403.5

Questions:

1. is this something I should be worried about?
2. is anyone else worried about this?

Thanks

Dave

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+...@googlegroups.com.

Dave Cheney

unread,
Apr 3, 2017, 4:13:42 PM4/3/17
to Russ Cox, Keith Randall, golang-dev


On Tue, 4 Apr 2017, 01:38 Russ Cox <r...@golang.org> wrote:
On Sun, Apr 2, 2017 at 11:41 PM, Dave Cheney <da...@cheney.net> wrote:
Hello,

I'm concerned that the go1 benchmarks have a large amount of variation
between runs. I'm used to the variance being in the range of +/- 1-3%,
but the numbers comparing go 1.8 to tip show variances in the range of
+/- 10-15% on amd64 and larger on arm (see below)

linux/amd64 https://perf.golang.org/search?q=upload:20170403.4
linux/arm (raspberry pi 3) https://perf.golang.org/search?q=upload:20170403.5

Questions:

1. is this something I should be worried about?
2. is anyone else worried about this?

Yes, absolutely, those benchmark results are no good. As Keith said, you have to get down to lower variances before any of the data becomes meaningful.

However, I regularly see results with significantly lower variance than the ones you posted, especially on linux/amd64. Is the machine running something else too?

Nope, I reboot it and access it remotely before running benchmarks. 

This is the same machine as I've been using for years to run benchmarks, a sandy bridge x220 thinkpad

Is CPU frequency scaling or governing on? From the data files you uploaded it looks like you ran the tests 20X in a shell loop. Another thing to try is to use -test.count=20 instead, but I would not expect the variances you are seeing in either mode.

I can try that, David Chases did caution me that he suspects that makes the tests less reliable. 


If the linux/amd64 machine is quiet and you still can't get low variance, another thing to try is to run the benchmarks under https://github.com/aclements/perflock. The main thing perflock does is make sure that multiple programs run under perflock run separately, not simultaneously (like a real lock). But a secondary thing it does is change the CPU governing and frequency scaling settings to try to make the machine behave more consistently.

Thanks, I'll give it a try. 


I just ran these on my Linux workstation, using go1.8, and I got more like <1% variance:

./go1.test -test.count=20 -test.run=XXX -test.bench=.

perflock ./go1.test -test.count=20 -test.run=XXX -test.bench=.
https://perf.golang.org/search?q=upload:20170403.8

Austin is running the tests in a shell loop on his workstation (instead of -test.count), at tip, and not seeing anything like 10% variance either.

I'd be happy to write this machine off as unreliable, except for the numbers from the rpi.

If nobody else is seeing these high variances I'll happily put this down to a local problem and try to find other benchmarking hardware. 

Thanks

Dave


Russ

Russ Cox

unread,
Apr 4, 2017, 1:12:34 PM4/4/17
to Dave Cheney, Keith Randall, golang-dev
On Mon, Apr 3, 2017 at 4:13 PM, Dave Cheney <da...@cheney.net> wrote:
However, I regularly see results with significantly lower variance than the ones you posted, especially on linux/amd64. Is the machine running something else too?

Nope, I reboot it and access it remotely before running benchmarks. 

This is the same machine as I've been using for years to run benchmarks, a sandy bridge x220 thinkpad

It really sounds like something changed on your end, although I can't guess what. We've been running benchmarks all the time for various things and basically never see the variance you're seeing. One way to check would be to go back to Go 1.7 or Go 1.6 or Go 1.5 and see if you get reliable numbers with those. 

Russ

Dave Cheney

unread,
Apr 4, 2017, 4:16:58 PM4/4/17
to Russ Cox, Keith Randall, golang-dev

Thanks Russ, I'll keep trying to diagnose the volatility on my systems.

Dave Cheney

unread,
Apr 4, 2017, 9:20:41 PM4/4/17
to Russ Cox, Keith Randall, golang-dev
Hello,

Ubuntu 16.04 ships with a service called thermald which purports to do
cpu throttling. This wasn't present on my 14.04 install, which I was
running til I guess the mid to late of last year.

With thermald disabled the results are more stable on this machine

https://perf.golang.org/search?q=upload:20170405.1

Hopefully this will be useful for others.

Austin Clements

unread,
Apr 4, 2017, 9:23:38 PM4/4/17
to Dave Cheney, golang-dev
Out of curiosity, does github.com/aclements/perflock defeat thermald or does thermald interfere with it? (If it's the latter, I could make perflock at least warn, or perhaps temporarily disable thermald.)

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

Dave Cheney

unread,
Apr 4, 2017, 9:24:56 PM4/4/17
to Austin Clements, golang-dev
Not sure tbh, I haven't gotten as far as trying perflock; turning off
thermald reverts (i guess) to the BIOS's thermal management. I haven't
tweaked the cpu governer yet either.
>> email to golang-dev+...@googlegroups.com.

Russ Cox

unread,
Apr 6, 2017, 11:33:21 AM4/6/17
to Dave Cheney, Keith Randall, golang-dev
Great, thanks for tracking that down Dave. Was thermald also the problem on the Raspberry Pi?


Dave Cheney

unread,
Apr 6, 2017, 5:20:37 PM4/6/17
to Russ Cox, Keith Randall, golang-dev

Sadly no, thermald was only an issue for my Linux/amd64 system running Ubuntu 16.04.

I'm still looking for the root cause on the rpi3 (playing with the cpufreq governor has so far not paid off)

Bakul Shah

unread,
Apr 6, 2017, 5:59:11 PM4/6/17
to Dave Cheney, Russ Cox, Keith Randall, golang-dev
You may have already tried this but… RPis these days change the cpu freq based on load. You can disable dynamic clocking by adding "force_turbo=1" in config.txt on the dos partition and rebooting. 

Brad Fitzpatrick

unread,
Apr 6, 2017, 6:11:44 PM4/6/17
to Bakul Shah, Dave Cheney, Russ Cox, Keith Randall, golang-dev
I remember that button.

Good times. Certain games were unplayable with it on, though.


--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.

Dave Cheney

unread,
Apr 24, 2017, 1:00:38 AM4/24/17
to Russ Cox, Keith Randall, golang-dev
Hello again,

Just to close the loop on this, adding force_turbo=1 to
/boot/config.txt on the rpi3 stabalised the benchmark numbers for the
rpi3. Sample below.

pi@raspberrypi:~/go/test/bench/go1 $ benchstat go1.8.1.test.txt go.tip.test.txt
name old time/op new time/op delta
BinaryTree17-4 25.2s ± 1% 26.4s ± 0% +4.42% (p=0.000 n=9+9)
Fannkuch11-4 16.5s ± 0% 16.4s ± 0% -0.65%
(p=0.000 n=10+10)
FmtFprintfEmpty-4 560ns ± 0% 537ns ± 0% -4.22% (p=0.000 n=9+9)
FmtFprintfString-4 1.41µs ± 0% 0.90µs ± 0% -36.11% (p=0.000 n=9+10)
FmtFprintfInt-4 1.15µs ± 0% 0.91µs ± 0% -20.53% (p=0.000 n=8+10)
FmtFprintfIntInt-4 1.81µs ± 0% 1.36µs ± 0% -25.14% (p=0.000 n=9+10)
FmtFprintfPrefixedInt-4 1.78µs ± 1% 1.47µs ± 1% -17.56%
(p=0.000 n=10+10)
FmtFprintfFloat-4 3.13µs ± 0% 2.65µs ± 0% -15.38%
(p=0.000 n=10+10)
FmtManyArgs-4 7.64µs ± 0% 5.06µs ± 0% -33.74% (p=0.000 n=8+10)
GobDecode-4 58.3ms ± 0% 54.7ms ± 1% -6.17% (p=0.000 n=10+9)
GobEncode-4 55.2ms ± 1% 48.6ms ± 1% -11.98%
(p=0.000 n=10+10)
Gzip-4 2.88s ± 0% 2.79s ± 1% -3.45% (p=0.000 n=8+10)
Gunzip-4 383ms ± 0% 391ms ± 0% +1.96% (p=0.000 n=10+9)
HTTPClientServer-4 294µs ± 3% 295µs ± 2% ~ (p=0.305 n=10+9)
JSONEncode-4 157ms ± 0% 160ms ± 0% +2.18% (p=0.000 n=10+9)
JSONDecode-4 522ms ± 1% 511ms ± 1% -2.13%
(p=0.000 n=10+10)
Mandelbrot200-4 25.3ms ± 0% 24.4ms ± 0% -3.23%
(p=0.000 n=10+10)
GoParse-4 25.5ms ± 1% 24.3ms ± 1% -4.71%
(p=0.000 n=10+10)
RegexpMatchEasy0_32-4 723ns ± 0% 717ns ± 0% -0.87%
(p=0.000 n=10+10)
RegexpMatchEasy0_1K-4 4.95µs ± 0% 4.94µs ± 0% -0.25%
(p=0.009 n=10+10)
RegexpMatchEasy1_32-4 785ns ± 0% 795ns ± 0% +1.17%
(p=0.000 n=10+10)
RegexpMatchEasy1_1K-4 6.58µs ± 0% 6.61µs ± 0% +0.42% (p=0.002 n=10+8)
RegexpMatchMedium_32-4 1.17µs ± 0% 1.21µs ± 3% +3.89%
(p=0.000 n=10+10)
RegexpMatchMedium_1K-4 336µs ± 0% 333µs ± 0% -0.93%
(p=0.000 n=10+10)
RegexpMatchHard_32-4 19.4µs ± 0% 19.3µs ± 0% ~ (p=0.446 n=10+8)
RegexpMatchHard_1K-4 588µs ± 0% 584µs ± 0% -0.70% (p=0.000 n=10+9)
Revcomp-4 52.5ms ± 1% 52.8ms ± 1% +0.57%
(p=0.001 n=10+10)
Template-4 597ms ± 1% 577ms ± 1% -3.34%
(p=0.000 n=10+10)
TimeParse-4 3.92µs ± 0% 4.12µs ± 0% +5.19%
(p=0.000 n=10+10)
TimeFormat-4 7.61µs ± 0% 7.77µs ± 0% +2.17%
(p=0.000 n=10+10)

name old speed new speed delta
GobDecode-4 13.2MB/s ± 0% 14.0MB/s ± 1% +6.56% (p=0.000 n=10+9)
GobEncode-4 13.9MB/s ± 1% 15.8MB/s ± 1% +13.60%
(p=0.000 n=10+10)
Gzip-4 6.73MB/s ± 0% 6.97MB/s ± 1% +3.59% (p=0.000 n=8+10)
Gunzip-4 50.7MB/s ± 0% 49.7MB/s ± 0% -1.93% (p=0.000 n=10+9)
JSONEncode-4 12.4MB/s ± 0% 12.1MB/s ± 0% -2.17% (p=0.000 n=10+8)
JSONDecode-4 3.71MB/s ± 1% 3.80MB/s ± 0% +2.18%
(p=0.000 n=10+10)
GoParse-4 2.27MB/s ± 1% 2.38MB/s ± 1% +5.03%
(p=0.000 n=10+10)
RegexpMatchEasy0_32-4 44.2MB/s ± 0% 44.6MB/s ± 0% +0.88%
(p=0.000 n=10+10)
RegexpMatchEasy0_1K-4 207MB/s ± 0% 207MB/s ± 0% +0.25%
(p=0.009 n=10+10)
RegexpMatchEasy1_32-4 40.7MB/s ± 0% 40.3MB/s ± 0% -1.13%
(p=0.000 n=10+10)
RegexpMatchEasy1_1K-4 156MB/s ± 0% 155MB/s ± 0% -0.42% (p=0.002 n=10+8)
RegexpMatchMedium_32-4 856kB/s ± 1% 823kB/s ± 3% -3.86%
(p=0.000 n=10+10)
RegexpMatchMedium_1K-4 3.05MB/s ± 0% 3.08MB/s ± 0% +1.02% (p=0.000 n=10+9)
RegexpMatchHard_32-4 1.65MB/s ± 0% 1.65MB/s ± 1% ~
(p=1.000 n=10+10)
RegexpMatchHard_1K-4 1.74MB/s ± 0% 1.75MB/s ± 0% +0.57% (p=0.000 n=9+9)
Revcomp-4 48.4MB/s ± 1% 48.1MB/s ± 1% -0.57%
(p=0.001 n=10+10)
Template-4 3.25MB/s ± 1% 3.36MB/s ± 1% +3.47%
(p=0.000 n=10+10)
Reply all
Reply to author
Forward
0 new messages