Comparing the performance of Go on Windows and Linux.

2,558 views
Skip to first unread message

Avalon

unread,
Sep 24, 2011, 9:27:23 AM9/24/11
to golang-nuts
Hello everyone,

I measured the performance of a simple Go channel benchmarking program
on Windows versus Linux on the same system with as processor an Intel
Core 2 Duo (T9300 2.5GHz) and 4 GB of memory.
The Windows OS is a 64 bit Windows 7 Professional; the Linux OS is a
64 bit Linux Ubuntu 11.04 (Natty).
The Linux Go release was: 6g release .r60 (9481).
The Windows Go releases are: 8g release .r60 (9684) and 6g version
weekly.2011-07-07 (9153+).
I used the following program:

package main

import (
"fmt"
"testing"
"runtime"
)

func main() {
runtime.GOMAXPROCS(1) // runtime.GOMAXPROCS(2)
fmt.Println(" sync",
testing.Benchmark(BenchmarkChannelSync).String())
fmt.Println("buffered",
testing.Benchmark(BenchmarkChannelBuffered).String())
}

func BenchmarkChannelSync(b *testing.B) {
ch := make(chan int)
go func() {
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
}()
for _ = range ch {
}
}

func BenchmarkChannelBuffered(b *testing.B) {
ch := make(chan int, 128)
go func() {
for i := 0; i < b.N; i++ {
ch <- i
}
close(ch)
}()
for _ = range ch {
}
}

These properties were compared:
1) A synchronous channel versus a buffered channel
2) The value of GOMAXPROCS (1 versus 2, with value 2 both cores should
be used)
3) Windows performance versus Linux
4) On Windows: a 32 bit Go-compiled program versus a 64 bit program.

The number of measurements for each result was between 4 and 5 x
106 ; each result gives how many nanoseconds one operation took.

Here are the results in ns/op:
GOMAXPROCS synchronous buffered
Windows (8g) 1 428 180
2 3577 3762
Windows (6g) 1 426 179
2 3603 4000
Linux(6g) 1 16642 210
2 17625 212
We can see the following:
1) A buffered channel performs better than a nonbuffered channel as is
to be expected:
2.4x better on Windows (8g and 6g)
81x better on Linux

2) Influence of GOMAXPROCS 1 versus 2:
On Windows (8g and 6g) this did not behave as expected: the
nonbuffered channel performed 8.4x worse for GOMAXPROCS=2 versus 1,
and the buffered channel even 21.5x worse. Moreover buffering with
GOMAXPROCS=2 even gives a slightly worser performance than the
synchronous channel, in contrast to 1)!
On Linux also the results for GOMAXPROCS=2 are almost the same than
value 1 (in fact very slightly worse). There is no improvement with
increasing GOMAXPROCS.

3) Linux versus Windows:
Windows performs slightly better for buffered channels and much better
(39 x) for synchronous channels; I don’t know a reason for the latter.

4) Windows 8g versus Windows 6g: they have the same performance.

We can conclude for this kind of problem (filling and reading a
channel):
- Buffered channels perform better than synchronous channels (much
better on Linux)
- Increasing GOMAXPROCS is not useful here; the task being divided
over the 2 cores creates an immense overhead.
- Windows performs on par with Linux, and even much better for
synchronous channels.
- Windows 8g performs the same as Windows 6g.

So perhaps contrary to popular belief, Go on Linux does not perform
better than Windows in this case.

Justin Israel

unread,
Sep 24, 2011, 1:04:41 PM9/24/11
to Avalon, golang-nuts
I ran the same test on Snow Leopard 10.6.8 and my ubuntu 11.04 64bit VMWare and got:
             GOMAXPROCS    SYNC      BUFFERED
OSX              1                     295            122
                     2                   6445         10733
LINUX VM     1                      300            120
                     2                    2005            158 

Im not sure how much it influences the Linux test that im using a vmware image.
Also, I'm still really new to Go, but I thought that GOMAXPROCS controls how many simultaneous user-code goroutines can run in parallel? In your test you only have a single goroutine per test, so it would seem that increasing GOMAXPROCS is worthless in this case. You might have been expecting that it would spread the workload of the single goroutine across multiple cores? Am I right in thinking that is not what would be happening here?

Dmitry Vyukov

unread,
Sep 24, 2011, 2:44:15 PM9/24/11
to Avalon, golang-nuts
On Sat, Sep 24, 2011 at 6:27 AM, Avalon <ivo.ba...@gmail.com> wrote:
9853:41d5abd18c57 tip
Linux, Intel(R) Xeon(R) CPU E5620 @ 2.40GHz

$ GOMAXPROCS=1 ./6.out
 sync 10000000       244 ns/op
buffered 20000000        87.9 ns/op
$ GOMAXPROCS=2 ./6.out 
 sync   500000      4277 ns/op
buffered 10000000       246 ns/op


andrey mirtchovski

unread,
Sep 24, 2011, 7:53:41 PM9/24/11
to Dmitry Vyukov, Avalon, golang-nuts
I've been looking at this for the past few days trying to explain it,
but I can't. basically, sync chan communication (un-buffered) is
faster than buffered in OSX Lion. The following is with the code from
the first message, call to GOMAXPROCS disabled:

9885:41d5abd18c57
2.2GHz Intel Core i7

$ GOMAXPROCS=1 ./6.out
sync 10000000 188 ns/op
buffered 50000000 70.4 ns/op
$ GOMAXPROCS=2 ./6.out
sync 1000000 2785 ns/op
buffered 500000 5046 ns/op

I wish I knew enough dtrace to be able to give more information. I
only have "Instruments" to examine the running of the program, if
anyone is interested I can send those traces.

andrey

Marc-Antoine Ruel

unread,
Sep 24, 2011, 8:42:08 PM9/24/11
to Avalon, golang-nuts
Le 24 septembre 2011 09:27, Avalon <ivo.ba...@gmail.com> a écrit :
Hello everyone,

I measured the performance of a simple Go channel benchmarking program
on Windows versus Linux on the same system with as processor an Intel
Core 2 Duo (T9300 2.5GHz) and 4 GB of memory.

As a starter, a T9300 is a laptop processor. Windows 7 has much deeper power management than most other OSes, definitely more than natty, so it is definitely not apple to apple comparison even if you run it on the same OS. Win7 disables cores on the fly which can set 'dynamic implicit core affinity' to threads.

I'd recommend to try again with all the BIOS settings set to maximum performance, speed step and turbo boost disabled, connected to power (I assume you did), and do the same on the OS.

M-A

Dmitry Vyukov

unread,
Sep 24, 2011, 9:28:10 PM9/24/11
to andrey mirtchovski, Avalon, golang-nuts
On Sat, Sep 24, 2011 at 4:53 PM, andrey mirtchovski <mirtc...@gmail.com> wrote:
I've been looking at this for the past few days trying to explain it,
but I can't. basically, sync chan communication (un-buffered) is
faster than buffered in OSX Lion.

I think it's due to mutex implementation. Currently we have good mutex implementation for Linux (spinning + no kernel objects, thus no finalizers), and we have a nice optimization on Windows (don't allocate per mutex kernel events). We need to extend these optimizations to all platforms.



peterGo

unread,
Sep 25, 2011, 12:11:56 AM9/25/11
to golang-nuts
Avalon,

Intel Quad CPU Q8300 @ 2.50GHz

Xubuntu 11.04 64-bit: linux/amd64
GOMAXPROCS = 1
sync: 5000000 352 ns/op
buff: 10000000 147 ns/op
GOMAXPROCS = 2
sync: 500000 4842 ns/op
buff: 10000000 437 ns/op

Windows 8 64-bit: windows/386
GOMAXPROCS = 1
sync: 5000000 441 ns/op
buff: 10000000 187 ns/op
GOMAXPROCS = 2
sync: 500000 3512 ns/op
buff: 500000 5942 ns/op

Peter

j...@webmaster.ms

unread,
Sep 25, 2011, 3:09:56 PM9/25/11
to golan...@googlegroups.com
Windows-7, i7-2600K.
6g is faster than 8g

6g, GOMAXPROC=1
 sync 10000000         149 ns/op
buffered 50000000               58.5 ns/op

6g, GOMAXPROC=2
 sync  1000000        1731 ns/op
buffered   500000             3240 ns/op

8g, GOMAXPROC=1
 sync 10000000         151 ns/op
buffered 50000000               61.7 ns/op

8g, GOMAXPROC=2
 sync  1000000        2166 ns/op
buffered   500000             4136 ns/op

Kyle Lemons

unread,
Sep 26, 2011, 12:38:28 PM9/26/11
to golan...@googlegroups.com
My very unscientific benchmark:
Go: Fast enough

And it will only get faster.

André Moraes

unread,
Sep 26, 2011, 1:03:59 PM9/26/11
to golan...@googlegroups.com
Usually,

Setting GOMAXPROC > 1 add an overhead since the program will be
running on two cores and need to communicate between them.

Your code don't do almost anything on the loop so the only thing that
takes time is the core communication, when using GOMAXPROC = 1 the
overhead is smaller and the program looks faster.

And like Kyle said:

--
André Moraes
http://andredevchannel.blogspot.com/

Reply all
Reply to author
Forward
0 new messages