Hi there!
TLDR; a simple test program appears to show that Go's (*os.File).Write is 10x slower than C's fputs (on MacOS).
While doing some benchmarking for my lua implementation in Go [1], I found very big differences between C Lua and and golua for benchmarks that do a lot of output to stdout. Using pprof, I found that my implementation spends a lot of its time in syscall. I couldn't see an obvious reason why so I decided to make a minimal example. It is a program that writes the string "Hello There" one million times to stdout:
-------- test.go --------
package main
import "os"
func main() {
hello := []byte("Hello There\n") // To make it fairer
for i := 0; i < 10000000; i++ {
os.Stdout.Write(hello)
}
}
--------- /test.go --------
To compare with, here what I think is the equivalent in C:
-------- test.c --------
#include <stdio.h>
int main() {
for (int i = 0; i < 10000000; i++) {
fputs("Hello There\n", stdout);
}
return 0;
}
-------- /test.c --------
I compared those using multitime [2], using both go 1.15.6 and the beta1 release of go 1.16, using the following steps (I am using gvm to select different Go versions).
- Compile the Go version using go 1.15 and go 1.16, and the C version using clang.
Now using version go1.16beta1
$ go version && go build -o test-go1.16 test.go
go version go1.16beta1 darwin/amd64
$ gvm use go1.15.6
Now using version go1.15.6
$ go version && go build -o test-go1.15 test.go
go version go1.15.6 darwin/amd64
$ clang -o test-c test.c
- Check that the C version and the Go version output the same amount of data to stdout:
$ ./test-c | wc -c
120000000
$ ./test-go1.15 | wc -c
120000000
- Run each executable 5 times
$ cat >cmds <<EOF
> -q ./test-c
> -q ./test-go1.15
> -q ./test-go1.16
> EOF
$ multitime -b cmds -n 5
===> multitime results
1: -q ./test-c
Mean Std.Dev. Min Median Max
real 0.524 0.070 0.476 0.492 0.662
user 0.475 0.011 0.465 0.472 0.495
sys 0.011 0.002 0.009 0.011 0.014
2: -q ./test-go1.15
Mean Std.Dev. Min Median Max
real 5.986 0.125 5.861 5.947 6.186
user 3.717 0.040 3.677 3.715 3.788
sys 2.262 0.034 2.221 2.260 2.314
3: -q ./test-go1.16
Mean Std.Dev. Min Median Max
real 5.958 0.160 5.781 5.941 6.213
user 3.706 0.094 3.624 3.638 3.855
sys 2.258 0.069 2.200 2.215 2.373
There is no significant difference between 1.15 and 1.16, but both are more than 10 times slower than the C version. Why is it so? Is there something that I can do to overcome this performance penalty? Any insights would be appreciated.
FWIW, I am running these on MacOS Catalina
$ uname -v
Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64
(sorry I haven't got easy access to a Linux box to run this on).
--
Arnaud Delobelle