Performance issue with os.File.Write

Arnaud Delobelle

unread,

Dec 20, 2020, 5:53:15 AM12/20/20

to golang-nuts

Hi there!

TLDR; a simple test program appears to show that Go's (*os.File).Write is 10x slower than C's fputs (on MacOS).

While doing some benchmarking for my lua implementation in Go [1], I found very big differences between C Lua and and golua for benchmarks that do a lot of output to stdout. Using pprof, I found that my implementation spends a lot of its time in syscall. I couldn't see an obvious reason why so I decided to make a minimal example. It is a program that writes the string "Hello There" one million times to stdout:

-------- test.go --------

package main

import "os"

func main() {

hello := []byte("Hello There\n") // To make it fairer

for i := 0; i < 10000000; i++ {

os.Stdout.Write(hello)

}

--------- /test.go --------

To compare with, here what I think is the equivalent in C:

-------- test.c --------

#include <stdio.h>

int main() {

for (int i = 0; i < 10000000; i++) {

fputs("Hello There\n", stdout);

}

return 0;

}

-------- /test.c --------

I compared those using multitime [2], using both go 1.15.6 and the beta1 release of go 1.16, using the following steps (I am using gvm to select different Go versions).

- Compile the Go version using go 1.15 and go 1.16, and the C version using clang.

$ gvm use go1.16beta1

Now using version go1.16beta1

$ go version && go build -o test-go1.16 test.go

go version go1.16beta1 darwin/amd64

$ gvm use go1.15.6

Now using version go1.15.6

$ go version && go build -o test-go1.15 test.go

go version go1.15.6 darwin/amd64

$ clang -o test-c test.c

- Check that the C version and the Go version output the same amount of data to stdout:

$ ./test-c | wc -c

120000000

$ ./test-go1.15 | wc -c

120000000

- Run each executable 5 times

$ cat >cmds <<EOF

> -q ./test-c

> -q ./test-go1.15

> -q ./test-go1.16

> EOF

$ multitime -b cmds -n 5

===> multitime results

1: -q ./test-c

Mean Std.Dev. Min Median Max

real 0.524 0.070 0.476 0.492 0.662

user 0.475 0.011 0.465 0.472 0.495

sys 0.011 0.002 0.009 0.011 0.014

2: -q ./test-go1.15

Mean Std.Dev. Min Median Max

real 5.986 0.125 5.861 5.947 6.186

user 3.717 0.040 3.677 3.715 3.788

sys 2.262 0.034 2.221 2.260 2.314

3: -q ./test-go1.16

Mean Std.Dev. Min Median Max

real 5.958 0.160 5.781 5.941 6.213

user 3.706 0.094 3.624 3.638 3.855

sys 2.258 0.069 2.200 2.215 2.373

There is no significant difference between 1.15 and 1.16, but both are more than 10 times slower than the C version. Why is it so? Is there something that I can do to overcome this performance penalty? Any insights would be appreciated.

FWIW, I am running these on MacOS Catalina

$ uname -v

Darwin Kernel Version 19.6.0: Thu Oct 29 22:56:45 PDT 2020; root:xnu-6153.141.2.2~1/RELEASE_X86_64

(sorry I haven't got easy access to a Linux box to run this on).

--

Arnaud Delobelle

[1] https://github.com/arnodel/golua

[2] https://tratt.net/laurie/src/multitime/multitime.1.html

Jan Mercl

unread,

Dec 20, 2020, 6:06:05 AM12/20/20

to Arnaud Delobelle, golang-nuts

On Sun, Dec 20, 2020 at 11:53 AM Arnaud Delobelle <arn...@gmail.com> wrote:

> TLDR; a simple test program appears to show that Go's (*os.File).Write is 10x slower than C's fputs (on MacOS).

Apples and oranges. fputs buffers, os.File does not. Rewrite the
benchmark using bufio.

Arnaud Delobelle

unread,

Dec 20, 2020, 6:27:43 AM12/20/20

to golang-nuts

Ah, that is it, thank you!

ben...@gmail.com

unread,

Dec 20, 2020, 3:31:52 PM12/20/20

to golang-nuts

And os.Stdout (and friends) are all regular *os.File objects (which as Jan said, don't buffer). It was non-intuitive to me that stdout didn't buffer by default, because it's such a bad thing for efficiently writing lots of output, but I guess it makes sense when you want terminal output to appear right away. So I realized it made sense, and gives you more control. And it's so easy to wrap it in a bufio.NewWriter() ... Flush() if you need buffering.

I ran into this exact same issue when implementing GoAWK ... a 10-line fix gave me a 10x speedup. https://github.com/benhoyt/goawk/commit/60745c3503ba3d99297816f5c7b5364a08ec47ab

-Ben

Diego Joss

unread,

Dec 21, 2020, 2:17:15 AM12/21/20

to golang-nuts

As Jan said "apples and oranges", in this case comparing *os.Stdout with C File *stdout is not fair. The equivalent of *os.Stdout in C is the filedescriptor 1 (STDOUT macro), and the equivalent of *os.Stdout.Write is write(2) (the syscall), not fwrite or fputs. If you retry your microbenchmark using the syscall write(2) with filedescriptor 1, you'll probably see very similar (if not identical) results.

Personally I prefer the explicit buffering proposed by the Go standard library, rather than the implicit one of the C stdio.h header. When I was starting using C, I've already been bitten by the fact that my fwrites were being implicitly buffered, until I discovered the setbuf(3) function and read the documentation. Having different default buffering schemes depending on the output used (yes stderr is not buffered by default), I find more confusing.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/909fa4da-1c74-4c33-98c1-185e6bae9d40n%40googlegroups.com.

Arnaud Delobelle

unread,

Dec 21, 2020, 6:46:21 AM12/21/20

to golang-nuts

@Ben interesting, I did something similar and also ended up with a 64KB buffer (seemed like the default of 4KB didn't work very well in my context). How did you decide of the buffer size?

Also, there is something that I don't understand. The default buffer size works really well for a toy example (like the one I posted in my original question), but the improvements are much less dramatic in my program, for the same pattern of writing to stdout. I can't work out a reason for this yet.

Arnaud Delobelle

unread,

Dec 21, 2020, 6:49:56 AM12/21/20

to golang-nuts

I agree that it's very sensible this way! My confusion stemmed for a combination of blind spots in both Go and C (in fact it was somewhere in my head that the File API provided unbuffered access, just not accessible at that time!). Thanks all for clarifying.

Arnaud

Ben Hoyt

unread,

Dec 21, 2020, 3:56:26 PM12/21/20

to Arnaud Delobelle, golang-nuts

> @Ben interesting, I did something similar and also ended up with a 64KB buffer (seemed like the default of 4KB didn't work very well in my context). How did you decide of the buffer size?

I forget how I ended up with 64KB. I think I was doing performance testing and a buffer larger than the 4KB default was faster, but it started plateauing at around 64KB (and I didn't want to increase the user's memory footprint too much). I think I did some basic tests, but it wasn't very scientific.

-Ben

Amnon

unread,

Dec 22, 2020, 2:18:35 AM12/22/20

to golang-nuts

This is really more of an OS question than a Go language question.
Writing single characters is going to be expensive in any language because
you pay the considerable overhead of a system call/user-mode to kernel mode context switch
overhead for each character. That is why the C stdio library buffers output. If instead of
using fputs you call write directly, you will get similarly slow performance.

The downside of buffering is that sometimes you are already sending large chunks of data
and the buffering is slowing you down by adding an unnecessary copy.
And sometimes needs to see you'r output immediately. So the C stdio package provides a separate flush function
and magically treats files which are connected to terminals differently, flushing them each time it writes a newline character.
Go took a simpler, more consistent and explicit approach, where buffering can be added to any writer using the bufio package.
It is easy to add it when you need it, but the problem is that you need it most of the time.
And people often forget, and then are surprised when their
programs spend an inordinate amount of time writing logs, etc.

komuW

unread,

Dec 22, 2020, 2:20:35 PM12/22/20

to golang-nuts

The bufio package also uses a max buffer of 64KB: https://go.googlesource.com/go/+/go1.15.6/src/bufio/scan.go#80
io.copybuffer on the other hand uses 32KB; https://go.googlesource.com/go/+/go1.15.6/src/io/io.go#398

Matt Harden

unread,

Dec 23, 2020, 1:18:19 PM12/23/20

to komuW, golang-nuts

On Tue, Dec 22, 2020 at 11:21 AM komuW <kom...@gmail.com> wrote:

The bufio package also uses a max buffer of 64KB: https://go.googlesource.com/go/+/go1.15.6/src/bufio/scan.go#80

That limit is for bufio.Scanner. It doesn't have anything to do with bufio.Reader and bufio.Writer, which don't have any particular limits on their buffer sizes.

--

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/f98810f0-9227-4002-a272-d458e21dcf2bn%40googlegroups.com.

Reply all

Reply to author

Forward