Poor performance reading stdin

853 views
Skip to first unread message

Phillip Walters

unread,
Jan 12, 2015, 9:16:39 PM1/12/15
to golan...@googlegroups.com
Hello,

I'm having a problem with the performance of Go programs reading data via stdin. The Go program as well as an awk script that I'm using for comparison are both attached. Note that the actual problem I'm working on is more complicated than these scripts, however it's for work and unfortunately I can't attach that code. These scripts do show the problem that I'm having, though, and show me that my problem is indeed with I/O.

I've also attached the results of an strace -c -f with a large file (247M) piped into both. Both programs are simple: print exactly what's received from stdin to stdout. The results are pretty crazy: the Go program takes 11 seconds to run, vs 0.3 seconds for the awk script. As you can see in the output from strace -c -f, the Go program spends a considerable amount of time (99%) on select and wait4. I ran strace without -c on the Go program to show some more detail, but that can't be attached. Here's a snippet:

[pid 32325] select(0, NULL, NULL, NULL, {0, 20} <unfinished ...>
[pid 32324] <... write resumed> )       = 49

This shows up quite a lot with the Go program, but not at all with the awk script. So, questions:

1) Am I correct in assuming that for scanner.Scan() is the root cause of the problem here?
2) What can I do to fix this? I'm assuming there's some better way to handle this to prevent all of these select() calls, but I don't know where to look and a Google search got me nowhere.

Any help is greatly appreciated!

-Phil
print.go
print.awk
strace_go.out
strace_awk.out

Rob Pike

unread,
Jan 12, 2015, 9:24:28 PM1/12/15
to Phillip Walters, golan...@googlegroups.com
I suspect the main problem is that printing to stdout is unbuffered in Go by default.

-rob


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brad Fitzpatrick

unread,
Jan 12, 2015, 9:25:58 PM1/12/15
to Rob Pike, Phillip Walters, golan...@googlegroups.com
The strace numbers agree:

awk:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.73    0.304019      152010         2         1 wait4
  0.14    0.000412           8        51        29 stat
  0.12    0.000379           4        94        58 open
  0.01    0.000031           0     63241           read
  0.01    0.000016           0     63226           write
  0.00    0.000000           0        37           close

go:
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 50.02    5.868367     2934184         2         1 wait4
 49.97    5.862993        1359      4313           select
  0.01    0.000970           0   3580726           write
  0.00    0.000571          10        57        34 open
  0.00    0.000089           0     63771           read
  0.00    0.000000           0        24           close

minux

unread,
Jan 12, 2015, 9:27:15 PM1/12/15
to Phillip Walters, golang-nuts
The select is not the problem, it's used as usleep for the sysmon thread (this is runtime
internal, if you don't know what I'm talking about, please just ignore it.)

You should have used bufio to buffer stdout instead, otherwise every fmt.Print will turn
into at least one write syscall.

os.Stdin, Stdout, Stderr are all unbuffered in Go, unlike C. If your programs uses of a lot of
IO to these files, consider using bufio.New{Reader,Writer} to wrap them. (all io.Reader/Writer
created by the os package are unbuffered.)

If you used the /usr/bin/time to run your program, you will notice that system time is very
high, generally this is an indication of this kind of program.

PS: please consider using play.golang.org to share the Go code, rather than sending them
as attachments next time, even if it won't run correctly on the playground. It will make life
easier for us to read the code.

andrewc...@gmail.com

unread,
Jan 12, 2015, 9:27:53 PM1/12/15
to golan...@googlegroups.com
Does bufio.Scanner use buffering? Try converting stdio to a buffered reader first then passing it in.

minux

unread,
Jan 12, 2015, 9:31:26 PM1/12/15
to Andrew Chambers, golang-nuts
On Mon, Jan 12, 2015 at 9:27 PM, <andrewc...@gmail.com> wrote:
Does bufio.Scanner use buffering? Try converting stdio to a buffered reader first then passing it in.
bufio.Scanner does have its own buffering, and the problem is not stdin anyway.

Phillip Walters

unread,
Jan 12, 2015, 9:53:04 PM1/12/15
to golan...@googlegroups.com, phillip...@gmail.com
os.Stdin, Stdout, Stderr are all unbuffered in Go, unlike C. If your programs uses of a lot of
IO to these files, consider using bufio.New{Reader,Writer} to wrap them. (all io.Reader/Writer
created by the os package are unbuffered.)

Okay. Thanks for the suggestion. I'll give this a try tomorrow when I get in and post results.
 
PS: please consider using play.golang.org to share the Go code, rather than sending them
as attachments next time, even if it won't run correctly on the playground. It will make life
easier for us to read the code.

My apologies! I'll make sure I do this in the future.

Phillip Walters

unread,
Jan 13, 2015, 1:01:58 PM1/13/15
to golan...@googlegroups.com
So I've switched to using a buffered readers and writers to wrap stdout and stdin. Code here: http://play.golang.org/p/-3FurHMuGR

Results using time:

$ time sh -c './print < bigfile.txt > /dev/null'
2.856u 0.508s 0:03.37 99.4%     0+0k 0+0io 0pf+0w
$ time sh -c 'awk -f print.awk < bigfile.txt > /dev/null'
0.640u 0.056s 0:00.70 98.5%     0+0k 0+0io 0pf+0w

strace -c -f attached again as well.

It looks better, but Go is still lagging behind awk a bit. Is there something else I can do here? Would it make sense to flush the output buffer less frequently? Would that help? Can you tell I'm a bit of a noob to all of this?
strace_go.txt
strace_awk.txt

Konstantin Khomoutov

unread,
Jan 13, 2015, 1:36:31 PM1/13/15
to Phillip Walters, golan...@googlegroups.com
On Tue, 13 Jan 2015 10:01:58 -0800 (PST)
Phillip Walters <phillip...@gmail.com> wrote:

> So I've switched to using a buffered readers and writers to wrap
> stdout and stdin. Code here: http://play.golang.org/p/-3FurHMuGR
>
> Results using time:
>
> $ time sh -c './print < bigfile.txt > /dev/null'
> 2.856u 0.508s 0:03.37 99.4% 0+0k 0+0io 0pf+0w
> $ time sh -c 'awk -f print.awk < bigfile.txt > /dev/null'
> 0.640u 0.056s 0:00.70 98.5% 0+0k 0+0io 0pf+0w
[...]
> It looks better, but Go is still lagging behind awk a bit. Is there
> something else I can do here? Would it make sense to flush the output
> buffer less frequently? Would that help? Can you tell I'm a bit of a
> noob to all of this?

You should stop doing buffering at all since your AWK script does not
do it as well -- please do [1] and learn about its full buffering mode
when running non-interactively.

From there you might also learn about the fflush() function (available
at least in GNU AWK), so you can add "fflush()" after the "print $0"
in your AWK script and see if it will slow down to match the Go's
performance.

Note that generally you should only flush output when it has sense, and
I know of only few cases for this:

* The stream is used for error reporting and hence it is sensible to
flush it as soon as you wrote something there because that tries to
guarantee at least *that* information will hit the error log/user's
terminal or whatever other medum that stream is directed to.

* The stream is being piped to a program which would benefit from
getting the information as soon as possible or even relies on this.

Filtering a file's contents is definitely not one of the cases above so
just stop flushing the stdout -- your AWK script does not do that
either.

Oh, and you might find learning about the --line-buffered command-line
option of GNU grep insightful: this program does not flush its output
when run non-interactively in the same manner AWK behaves; and this
might cause unexpected behaviour in some corner cases like in

$ tail --follow /var/log/some.log | grep ERROR | my-cool-script

where the my-cool-script will seriously lag behind new entries appearing
in the log file being monitored by tail because of the full buffering
grep will enable in this case unless instructed otherwise using
--line-buffered.

1. https://www.google.com/search?q=awk+output+buffering
2. http://www.gnu.org/software/grep/manual/grep.html

Rob Pike

unread,
Jan 13, 2015, 2:49:19 PM1/13/15
to Konstantin Khomoutov, Phillip Walters, golan...@googlegroups.com
Flush at the end of the program, not on every loop iteration. Otherwise you're wasting the advantage of using an output buffer.

Also check the returns from scanner.Err and output.Flush when the program is done.

-rob


Phillip Walters

unread,
Jan 13, 2015, 4:04:04 PM1/13/15
to golan...@googlegroups.com, flat...@users.sourceforge.net
Flushing at the end of the program helped, but didn't completely fix things. I was still off by a factor of 5. Konstantin's first link proved pretty helpful, though. By setting the go program to fully buffer both the output and input, I was able to get the go code to run slightly faster than awk.


Aaaaaaaaaaaaaaaand run times:

$ time sh -c './print < bigfile.txt > /dev/null'
0.564u 0.416s 0:01.31 74.0%     0+0k 448+8io 0pf+0w
$ time sh -c 'awk -f print.awk < bigfile.txt > /dev/null'
0.604u 0.084s 0:00.69 98.5%     0+0k 0+0io 0pf+0w

This code is a bit on the ugly side, since I use /dev/stdin to read from stdin (I don't know another way to fully buffer the input), but being new to Go, I couldn't think of a better way :)

You have all been a great help. Thanks everyone!

Rob Pike

unread,
Jan 13, 2015, 4:33:04 PM1/13/15
to Phillip Walters, golan...@googlegroups.com, flat...@users.sourceforge.net
I find it hard to believe that removing flush from the loop and doing it only at the end would not have a dramatic effect. You didn't show the code so I can't tell what you did there, but it seems unlikely it would still be five times slower than awk.

-rob


--

Phillip Walters

unread,
Jan 13, 2015, 4:44:50 PM1/13/15
to golan...@googlegroups.com, phillip...@gmail.com, flat...@users.sourceforge.net
Sorry Rob, I should have shown more on that step. Here's the code:


And here are the run times:

$ time sh -c './print2 < bigfile.txt > /dev/null'
2.056u 0.104s 0:02.17 99.0%     0+0k 0+0io 0pf+0w
$ time sh -c 'awk -f print.awk < bigfile.txt > /dev/null'
0.628u 0.048s 0:00.71 92.9%     0+0k 8+0io 0pf+0w

It's better than with a flush, and not 5 times slower than awk, now it's more like 3 times slower, but still slower. Do you think awk's input buffering has something to do with this? I've attached strace info below just in case that's helpful.

$ strace -c -f sh -c 'awk -f print.awk < bigfile.txt > /dev/null'
Process 27700 attached
Process 27699 suspended
Process 27699 resumed
Process 27700 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 99.33    0.200013      100007         2         1 wait4
  0.63    0.001263          13        95        58 open
  0.04    0.000079           0      3967           read
  0.00    0.000000           0     63226           write
  0.00    0.000000           0        39           close
  0.00    0.000000           0        51        29 stat
  0.00    0.000000           0        41           fstat
  0.00    0.000000           0        53           mmap
  0.00    0.000000           0        18           mprotect

$ strace -c -f sh -c './print2 < bigfile.txt > /dev/null'
Process 27705 attached
Process 27704 suspended
Process 27706 attached
Process 27707 attached
Process 27707 detached
Process 27704 resumed
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.74    0.924058      462029         2         1 wait4
  2.01    0.018986           0     42223           select
  0.16    0.001490           0     31887           futex
  0.06    0.000580          10        58        34 open
  0.03    0.000290           0     63234           read
  0.01    0.000049           0     63226           write
  0.00    0.000000           0        26           close
  0.00    0.000000           0        27        15 stat
  0.00    0.000000           0        21           fstat
  0.00    0.000000           0        38           mmap
  0.00    0.000000           0        10           mprotect

Uli Kunitz

unread,
Jan 13, 2015, 4:59:38 PM1/13/15
to golan...@googlegroups.com, andrewc...@gmail.com
I have  a real big text file with chess games in PGN notation.

$ wc -lc bigfile.pgn 
  35543278 1183828308 bigfile.pgn
$ time awk -f print.awk <bigfile.pgn >/dev/null

real 0m6.055s
user 0m5.860s
sys 0m0.188s

At first I cheat a little: https://play.golang.org/p/Te6Cwvrxi7

$ time cheat <bigfile.pgn >/dev/null

real 0m0.163s
user 0m0.016s
sys 0m0.144s

Wow Go is 37 times faster than awk! Fireworks!

But as a fair sportsmen I do a little less cheating: https://play.golang.org/p/2dbAdhbJMW

$ time myprint <bigfile.pgn >/dev/null

real 0m2.778s
user 0m2.604s
sys 0m0.172s

Go is still two times faster! 

Uli Kunitz

unread,
Jan 13, 2015, 5:05:44 PM1/13/15
to golan...@googlegroups.com, andrewc...@gmail.com
While the print program would handle a scanner error correctly, I actually intended to use the err value in the log.Fatal call:

Phillip Walters

unread,
Jan 13, 2015, 5:13:08 PM1/13/15
to golan...@googlegroups.com


On Tuesday, January 13, 2015 at 1:59:38 PM UTC-8, Uli Kunitz wrote:
But as a fair sportsmen I do a little less cheating: https://play.golang.org/p/2dbAdhbJMW

$ time myprint <bigfile.pgn >/dev/null

real 0m2.778s
user 0m2.604s
sys 0m0.172s

Go is still two times faster! 


 I'll have to give your non-cheating approach a try, since that looks a whole lot like what my code is trying to do (though with a bit more processing!)

nicolas...@gmail.com

unread,
Jan 14, 2015, 4:00:58 AM1/14/15
to golan...@googlegroups.com
On 13/01/15 21:44, Phillip Walters wrote:

[...]


$ strace -c -f sh -c './print2 < bigfile.txt > /dev/null'

[...]


% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.74    0.924058      462029         2         1 wait4
  2.01    0.018986           0     42223           select
  0.16    0.001490           0     31887           futex
  0.06    0.000580          10        58        34 open
  0.03    0.000290           0     63234           read
  0.01    0.000049           0     63226           write
  0.00    0.000000           0        26           close
  0.00    0.000000           0        27        15 stat
  0.00    0.000000           0        21           fstat
  0.00    0.000000           0        38           mmap
  0.00    0.000000           0        10           mprotect

I've highlighted the largest differences in the strace log:

* Go seems to be making many more calls to open. I would explore if reducing the number to read calls improves the speed of the GO program.

* I read the calls to select and futex as Go wasting time in synchronisation. To test this hypothesis I've rerun the go program with different values of GOMAXPROCS (see below). Using GOMAXPROCS other than 1 has an adverse effect.

# awk
time sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
7.64user 0.26system 0:07.90elapsed 100%CPU (0avgtext+0avgdata 1164maxresident)k
0inputs+0outputs (0major+586minor)pagefaults 0swaps


# Go GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./print < bigfile.txt > /dev/null'   
25.31user 0.61system 0:25.16elapsed 102%CPU (0avgtext+0avgdata 1348maxresident)k
0inputs+0outputs (0major+610minor)pagefaults 0swaps


# Go GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./print < bigfile.txt > /dev/null'   
41.68user 3.09system 0:44.84elapsed 99%CPU (0avgtext+0avgdata 1384maxresident)k
0inputs+0outputs (0major+622minor)pagefaults 0swaps


# Go GOMAXPROCS=4
time sh -c 'GOMAXPROCS=4 ./print < bigfile.txt > /dev/null'   
47.33user 6.36system 0:44.32elapsed 121%CPU (0avgtext+0avgdata 1472maxresident)k
0inputs+0outputs (0major+642minor)pagefaults 0swaps


# Go GOMAXPROCS=8
time sh -c 'GOMAXPROCS=8 ./print < bigfile.txt > /dev/null'   
54.63user 10.13system 0:44.33elapsed 146%CPU (0avgtext+0avgdata 1600maxresident)k
0inputs+0outputs (0major+675minor)pagefaults 0swaps

nicolas...@gmail.com

unread,
Jan 14, 2015, 4:21:26 AM1/14/15
to golan...@googlegroups.com
And here are the results using Kunitz's version (i.e. using
NewReaderSize and NewWriterSize). In this case increasing GOMAXPROCS
doesn't have an adverse effect:

# awk
time sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
7.70user 0.22system 0:07.92elapsed 100%CPU (0avgtext+0avgdata
1164maxresident)k
0inputs+0outputs (0major+588minor)pagefaults 0swaps


# Go Walters GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./printWalters < bigfile.txt > /dev/null'
25.14user 0.72system 0:25.11elapsed 102%CPU (0avgtext+0avgdata
1364maxresident)k
0inputs+0outputs (0major+616minor)pagefaults 0swaps


# Go Walters GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./printWalters < bigfile.txt > /dev/null'
41.57user 3.20system 0:44.90elapsed 99%CPU (0avgtext+0avgdata
1420maxresident)k
0inputs+0outputs (0major+631minor)pagefaults 0swaps


# Go Kunitz GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./printKunitz < bigfile.txt > /dev/null'
4.36user 0.18system 0:04.53elapsed 100%CPU (0avgtext+0avgdata
3260maxresident)k
0inputs+0outputs (0major+1080minor)pagefaults 0swaps


# Go Kunitz GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./printKunitz < bigfile.txt > /dev/null'
4.37user 0.17system 0:04.53elapsed 100%CPU (0avgtext+0avgdata
3264maxresident)k
0inputs+0outputs (0major+1082minor)pagefaults 0swaps

Rob Pike

unread,
Jan 14, 2015, 4:32:25 AM1/14/15
to nicolas...@gmail.com, golan...@googlegroups.com
You are needlessly allocating a string for every line. Scanner.Text allocates, scanner.Bytes does not.  This is the code you want. It's even shorter and I bet much faster:


// print2
package main

import (
"bufio"
"os"
)

func main() {
// Start going through each line
in := bufio.NewReader(os.Stdin)
scanner := bufio.NewScanner(in)
output := bufio.NewWriter(os.Stdout)

for scanner.Scan() {
output.Write(scanner.Bytes())
}
output.Flush()
}

Notice that Kunitz's version also uses the Bytes method, not Text.

This program is single-threaded so there is no point in increasing GOMAXPROCS. Maybe I missed the part where you parallelize, but you can't parallelize I/O like this effectively because the output will need to be joined in a single-threaded way anyway.

-rob



--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.

Nico

unread,
Jan 14, 2015, 4:45:34 AM1/14/15
to Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
On 14/01/15 09:31, Rob Pike wrote:
> Notice that Kunitz's version also uses the Bytes method, not Text.
>

I missed that! Your version is as fast as Kunitz's

# Go Pike GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./printPike < bigfile.txt > /dev/null'
4.49user 0.28system 0:04.77elapsed 100%CPU (0avgtext+0avgdata
1088maxresident)k
0inputs+0outputs (0major+538minor)pagefaults 0swaps


# Go Pike GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./printPike < bigfile.txt > /dev/null'
4.44user 0.36system 0:04.78elapsed 100%CPU (0avgtext+0avgdata
1100maxresident)k
0inputs+0outputs (0major+538minor)pagefaults 0swaps


> This program is single-threaded so there is no point in increasing
> GOMAXPROCS. Maybe I missed the part where you parallelize, but you
> can't parallelize I/O like this effectively because the output will
> need to be joined in a single-threaded way anyway.

This was my understanding. When I saw select and futex in the strace
log, I thought trying to parallelise this program would actually slow it
down.

I will check again the strace outputs in my computer.

Nico

unread,
Jan 14, 2015, 5:14:48 AM1/14/15
to Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
For completeness I'm copying below the results of running strace.

I've compared the percentage of time spent in wait4 + select + futex.
All the versions seem to spend roughly the same proportion of time, with your version running under GOMAXPROCS=1 spending the least.
This shows that my concern for wasting time in synchronisation is actually negligible.


awk wait4 97.72%
Walters1 wait4+futex+select 99.33%
Walters2 wait4+futex+select 98.88%
Kunitz1 wait4+futex+select 98.15%
Kunitz2 wait4+futex+select 98.90%
Pike1 wait4+futex+select 96.92%
Pike2 wait4+futex+select 98.43%



# awk
time sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
7.70user 0.26system 0:07.96elapsed 100%CPU (0avgtext+0avgdata 1164maxresident)k
0inputs+0outputs (0major+587minor)pagefaults 0swaps
strace -c -f sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
Process 17745 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 97.72    1.893230     1893230         1           wait4
  2.14    0.041376           0    439835           read
  0.09    0.001732           0     21148           write
  0.01    0.000219           9        25           mmap
  0.01    0.000167          13        13           open
  0.01    0.000143          10        14           mprotect
  0.00    0.000095           6        17           close
  0.00    0.000084           9         9         9 access
  0.00    0.000082           6        14           fstat
  0.00    0.000069           6        11           rt_sigaction
  0.00    0.000060           9         7         4 stat
  0.00    0.000047           8         6           brk
  0.00    0.000046          23         2           munmap
  0.00    0.000024           6         4           fcntl
  0.00    0.000023          12         2           execve
  0.00    0.000018           4         5           dup2
  0.00    0.000013           7         2           getgroups
  0.00    0.000013           7         2           arch_prctl
  0.00    0.000007           2         3         3 ioctl
  0.00    0.000007           7         1           sigaltstack
  0.00    0.000006           6         1           getpid
  0.00    0.000006           6         1           clone
  0.00    0.000006           6         1           getppid
  0.00    0.000005           5         1           rt_sigreturn
  0.00    0.000005           5         1           geteuid

------ ----------- ----------- --------- --------- ----------------
100.00    1.937483                461126        16 total


# Go Walters GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./printWalters < bigfile.txt > /dev/null'   
25.30user 0.67system 0:25.23elapsed 102%CPU (0avgtext+0avgdata 1364maxresident)k
0inputs+0outputs (0major+617minor)pagefaults 0swaps
strace -c -f sh -c 'GOMAXPROCS=1 ./printWalters < bigfile.txt > /dev/null'
Process 17937 attached
Process 17938 attached
Process 17939 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.89    6.282862     6282862         1           wait4
 42.55    5.358752          35    151968      6079 futex
  6.89    0.867736           2    449130           select
  0.34    0.042767           0    439828           read
  0.25    0.032070           0    439826           write
  0.07    0.008992           0     46729           sched_yield
  0.00    0.000148           2        65           rt_sigaction
  0.00    0.000050           4        14           mmap
  0.00    0.000022           6         4           open
  0.00    0.000021           3         8           close
  0.00    0.000018           5         4           mprotect
  0.00    0.000016           2         7           rt_sigprocmask
  0.00    0.000016           4         4           dup2
  0.00    0.000016           5         3           clone
  0.00    0.000015           5         3         3 access
  0.00    0.000012           6         2           munmap
  0.00    0.000012           3         4           arch_prctl
  0.00    0.000010           3         3           brk
  0.00    0.000010           5         2           execve
  0.00    0.000010           3         4           fcntl
  0.00    0.000008           4         2           stat
  0.00    0.000008           3         3           sigaltstack
  0.00    0.000006           3         2           fstat
  0.00    0.000005           5         1           rt_sigreturn
  0.00    0.000005           3         2           gettid
  0.00    0.000003           3         1           getpid
  0.00    0.000003           3         1           geteuid
  0.00    0.000003           3         1           sched_getaffinity
  0.00    0.000002           2         1           getppid

------ ----------- ----------- --------- --------- ----------------
100.00   12.593598               1527623      6082 total


# Go Walters GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./printWalters < bigfile.txt > /dev/null'   
41.81user 2.96system 0:44.97elapsed 99%CPU (0avgtext+0avgdata 1420maxresident)k
0inputs+0outputs (0major+630minor)pagefaults 0swaps
strace -c -f sh -c 'GOMAXPROCS=2 ./printWalters < bigfile.txt > /dev/null'
Process 18278 attached
Process 18279 attached
Process 18280 attached
Process 18281 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 52.17    9.978744     9978744         1           wait4
 37.75    7.221550          11    638286     24889 futex
  8.96    1.714025           4    480140           select
  0.41    0.079150           0    210129           sched_yield
  0.41    0.078424           0    439828           read
  0.29    0.055326           0    439826           write
  0.00    0.000390           6        65           rt_sigaction
  0.00    0.000120           9        14           mmap
  0.00    0.000103          10        10           rt_sigprocmask
  0.00    0.000049          12         4           open
  0.00    0.000046          12         4           mprotect
  0.00    0.000045           6         8           close
  0.00    0.000042          11         4           clone
  0.00    0.000041          10         4           sigaltstack
  0.00    0.000031          16         2           munmap
  0.00    0.000029          10         3         3 access
  0.00    0.000028           9         3           brk
  0.00    0.000028           7         4           fcntl
  0.00    0.000028           6         5           arch_prctl
  0.00    0.000026          13         2           execve
  0.00    0.000021           5         4           dup2
  0.00    0.000018           9         2           stat
  0.00    0.000015           8         2           fstat
  0.00    0.000012           4         3           gettid
  0.00    0.000008           8         1           sched_getaffinity
  0.00    0.000006           6         1           getpid
  0.00    0.000006           6         1           geteuid
  0.00    0.000006           6         1           getppid
  0.00    0.000004           4         1           rt_sigreturn

------ ----------- ----------- --------- --------- ----------------
100.00   19.128321               2208358     24892 total


# Go Kunitz GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./printKunitz < bigfile.txt > /dev/null'   
4.27user 0.24system 0:04.51elapsed 100%CPU (0avgtext+0avgdata 3264maxresident)k
0inputs+0outputs (0major+1080minor)pagefaults 0swaps
strace -c -f sh -c 'GOMAXPROCS=1 ./printKunitz < bigfile.txt > /dev/null'
Process 18514 attached
Process 18515 attached
Process 18516 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.96    0.238406      238406         1           wait4
 48.07    0.229384         368       623           select
  1.65    0.007855           5      1721           read
  0.12    0.000558          29        19         3 futex
  0.04    0.000196           3        65           rt_sigaction
  0.03    0.000137          17         8           close
  0.02    0.000119           6        19           mmap
  0.02    0.000087           0      1636           write
  0.01    0.000052          13         4           open
  0.01    0.000044          11         4           mprotect
  0.01    0.000039          13         3         3 access
  0.01    0.000033           5         7           rt_sigprocmask
  0.01    0.000028           9         3           clone
  0.01    0.000027          14         2           munmap
  0.01    0.000024           6         4           fcntl
  0.00    0.000021           7         3           brk
  0.00    0.000020           5         4           dup2
  0.00    0.000019          10         2           stat
  0.00    0.000018           5         4           arch_prctl
  0.00    0.000016           8         2           fstat
  0.00    0.000015           8         2           execve
  0.00    0.000014           5         3           sigaltstack
  0.00    0.000013           3         5           sched_yield
  0.00    0.000011           6         2           gettid
  0.00    0.000006           6         1           getpid
  0.00    0.000006           6         1           geteuid
  0.00    0.000006           6         1           getppid
  0.00    0.000005           5         1           rt_sigreturn
  0.00    0.000005           5         1           sched_getaffinity

------ ----------- ----------- --------- --------- ----------------
100.00    0.477164                  4151         6 total


# Go Kunitz GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./printKunitz < bigfile.txt > /dev/null'   
4.34user 0.19system 0:04.52elapsed 100%CPU (0avgtext+0avgdata 3272maxresident)k
0inputs+0outputs (0major+1084minor)pagefaults 0swaps
strace -c -f sh -c 'GOMAXPROCS=2 ./printKunitz < bigfile.txt > /dev/null'
Process 18565 attached
Process 18566 attached
Process 18567 attached
Process 18568 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 33.17    0.263213      263213         1           wait4
 33.12    0.262834         570       461         3 futex
 32.61    0.258794         486       532           select
  1.07    0.008469           5      1721           read
  0.01    0.000059           0      1636           write
  0.01    0.000049           3        18           mmap
  0.00    0.000023           6         4           open
  0.00    0.000021           3         8           close
  0.00    0.000019           0        65           rt_sigaction
  0.00    0.000018           5         4           mprotect
  0.00    0.000015           5         3         3 access
  0.00    0.000014           7         2           munmap
  0.00    0.000014           4         4           dup2
  0.00    0.000010           3         4           fcntl
  0.00    0.000009           5         2           stat
  0.00    0.000009           5         2           execve
  0.00    0.000008           3         3           brk
  0.00    0.000006           3         2           fstat
  0.00    0.000006           1         5           arch_prctl
  0.00    0.000004           4         1           rt_sigreturn
  0.00    0.000004           1         4           clone
  0.00    0.000004           4         1           sched_getaffinity
  0.00    0.000002           2         1           getpid
  0.00    0.000002           2         1           geteuid
  0.00    0.000002           2         1           getppid
  0.00    0.000000           0        10           rt_sigprocmask
  0.00    0.000000           0         9           sched_yield
  0.00    0.000000           0         4           sigaltstack
  0.00    0.000000           0         3           gettid

------ ----------- ----------- --------- --------- ----------------
100.00    0.793608                  4512         6 total



# Go Pike GOMAXPROCS=1
time sh -c 'GOMAXPROCS=1 ./printPike < bigfile.txt > /dev/null'   
4.45user 0.32system 0:04.77elapsed 100%CPU (0avgtext+0avgdata 1088maxresident)k
0inputs+0outputs (0major+536minor)pagefaults 0swaps
strace -c -f sh -c 'GOMAXPROCS=1 ./printPike < bigfile.txt > /dev/null'
Process 18616 attached
Process 18617 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 48.51    5.051889     5051889         1           wait4
 48.41    5.041850        2331      2163           select
  1.70    0.176679           0    439828           read
  1.37    0.143186           0    418679           write
  0.00    0.000185           3        65           rt_sigaction
  0.00    0.000051           4        14           mmap
  0.00    0.000030          10         3         3 access
  0.00    0.000025           6         4           open
  0.00    0.000024           3         8           close
  0.00    0.000019           5         4           mprotect
  0.00    0.000018           5         4           rt_sigprocmask
  0.00    0.000017           4         4           dup2
  0.00    0.000015           8         2           execve
  0.00    0.000014           7         2           munmap
  0.00    0.000014           5         3           arch_prctl
  0.00    0.000011           4         3           brk
  0.00    0.000011           3         4           fcntl
  0.00    0.000009           5         2           sigaltstack
  0.00    0.000008           4         2           stat
  0.00    0.000007           7         1           rt_sigreturn
  0.00    0.000006           3         2           fstat
  0.00    0.000006           3         2           clone
  0.00    0.000005           5         1           gettid
  0.00    0.000004           4         1           sched_getaffinity
  0.00    0.000002           2         1           getpid
  0.00    0.000002           2         1           geteuid
  0.00    0.000002           2         1           getppid
  0.00    0.000000           0         4           futex

------ ----------- ----------- --------- --------- ----------------
100.00   10.414089                860809         3 total



# Go Pike GOMAXPROCS=2
time sh -c 'GOMAXPROCS=2 ./printPike < bigfile.txt > /dev/null'   
4.52user 0.27system 0:04.77elapsed 100%CPU (0avgtext+0avgdata 1096maxresident)k
0inputs+0outputs (0major+540minor)pagefaults 0swaps
strace -c -f sh -c 'GOMAXPROCS=2 ./printPike < bigfile.txt > /dev/null'
Process 18770 attached
Process 18771 attached
Process 18772 attached
Process 18773 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.14   10.084456        8641      1167        80 futex
 24.69    5.067420     5067420         1           wait4
 24.60    5.049329        2346      2152           select
  0.86    0.176492           0    439828           read
  0.70    0.143627           0    418679           write
  0.00    0.000174           1       183           sched_yield
  0.00    0.000166           3        65           rt_sigaction
  0.00    0.000054           5        10           rt_sigprocmask
  0.00    0.000050           4        14           mmap
  0.00    0.000024           6         4           open
  0.00    0.000023           3         8           close
  0.00    0.000021           5         4           sigaltstack
  0.00    0.000017           4         4           dup2
  0.00    0.000017           4         4           clone
  0.00    0.000016           4         4           mprotect
  0.00    0.000014           7         2           munmap
  0.00    0.000014           5         3         3 access
  0.00    0.000013           3         5           arch_prctl
  0.00    0.000012           4         3           gettid
  0.00    0.000010           5         2           stat
  0.00    0.000010           3         4           fcntl
  0.00    0.000009           5         2           execve
  0.00    0.000008           3         3           brk
  0.00    0.000006           6         1           rt_sigreturn
  0.00    0.000005           3         2           fstat
  0.00    0.000003           3         1           getpid
  0.00    0.000003           3         1           getppid
  0.00    0.000003           3         1           sched_getaffinity
  0.00    0.000002           2         1           geteuid

------ ----------- ----------- --------- --------- ----------------
100.00   20.521998                862158        83 total

minux

unread,
Jan 14, 2015, 5:23:32 AM1/14/15
to Nico, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
On Wed, Jan 14, 2015 at 5:14 AM, Nico <nicolas...@gmail.com> wrote:
# awk
time sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
7.70user 0.26system 0:07.96elapsed 100%CPU (0avgtext+0avgdata 1164maxresident)k
0inputs+0outputs (0major+587minor)pagefaults 0swaps
strace -c -f sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
Why are you using sh -c to run awk here? the wait4 is issued by sh to wait for the awk
process to finish.

If you want to measure how awk performs, just invoke awk directly.
strace -c -f awk '{print $0}' < bigfile.txt >/dev/null
Same for the Go versions.

Go runtime has several background threads, so it uses select (for usleep), and
futex. The mere presence of these synchronization syscalls doesn't indicate that
synchronization cost is high.

Nico

unread,
Jan 14, 2015, 5:25:31 AM1/14/15
to minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
I think I need the "sh -c" to be able to set GOMAXPROCS.

minux

unread,
Jan 14, 2015, 5:29:39 AM1/14/15
to Nico, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts

On Wed, Jan 14, 2015 at 5:25 AM, Nico <nicolas...@gmail.com> wrote:
I think I need the "sh -c" to be able to set GOMAXPROCS.
No.

This is fine:
GOMAXPROCS=1 strace -c -f goVersion < bigfile.txt >/dev/null

Or, if you don't want to set the environment variable for strace process,
use:
strace -c -f -E GOMAXPROCS=1 goVersion < bigfile.txt >/dev/null

Nico

unread,
Jan 14, 2015, 5:51:19 AM1/14/15
to minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
How about the redirection of stdin and stdout?

I find the awk version much slower when run in this manner.

# awk

time sh -c 'awk "{print $0}" < bigfile.txt > /dev/null'
7.77user 0.23system 0:08.00elapsed 100%CPU (0avgtext+0avgdata
1164maxresident)k
0inputs+0outputs (0major+587minor)pagefaults 0swaps

time awk "{print $0}" < bigfile.txt > /dev/null
24.39user 0.20system 0:24.58elapsed 100%CPU (0avgtext+0avgdata
1252maxresident)k
0inputs+0outputs (0major+380minor)pagefaults 0swaps

Nico

unread,
Jan 14, 2015, 5:55:59 AM1/14/15
to minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
On 14/01/15 10:22, minux wrote:
> Go runtime has several background threads, so it uses select (for
> usleep), and
> futex. The mere presence of these synchronization syscalls doesn't
> indicate that
> synchronization cost is high.

I'm learning this today, but how do you interpret that the go versions
are spending a large proportion of the time in select and futex? Is it
something avoidable?

minux

unread,
Jan 14, 2015, 5:58:37 AM1/14/15
to Nico, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
It is a multithread program, so one thread sleeping for 100% of their time doesn't mean
anything bad.

Nico

unread,
Jan 14, 2015, 6:09:14 AM1/14/15
to minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
Oh! I missed that strace is actually reporting system times, not real
times. This explains your answer.

Konstantin Khomoutov

unread,
Jan 14, 2015, 10:18:24 AM1/14/15
to Nico, minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
This *might* have something to do with your /bin/sh not being the same
as your interactive shell (and on recent Debians and Debian-based
systems /bin/sh links to dash while an interactive shell is bash by
default). The `time` command is a bashism, so you most probably run
all these tests in bash.

For instance, I have Debian 7.x here with /bin/sh being dash and my
interactive shell being bash, now let's do a no-brainer test:

~% # press C-x, C-v to get the bash version...
GNU bash, version 4.2.37(1)-release (x86_64-pc-linux-gnu)

~% time for i in `seq 1 1000000`; do echo test; done </dev/zero >/dev/null

real 0m8.124s
user 0m5.788s
sys 0m0.236s

~% time /bin/sh -c 'for i in `seq 1 1000000`; do echo test; done </dev/zero >/dev/null'

real 0m1.971s
user 0m1.304s
sys 0m0.112s

~% time /bin/bash -c 'for i in `seq 1 1000000`; do echo test; done </dev/zero >/dev/null'

real 0m7.445s
user 0m5.748s
sys 0m0.264s

As you can see, plain pipeline run in directly in bash takes about
8 seconds to complete, the same shell code run by dash takes about
2 seconds to complete (including the time to start dash up), and the same
code run by another copy of bash again takes some 8 seconds to
complete.

I'm lazy to check if `time` actually spawns a subshell for a pipeline
which does not start with an external program, so I can't really say
if bash is *that* slow at doing echo, or the first `time` invocation
starts a subshell, and most of these seconds account for that new
bash instance executing its initialization files.

Nico

unread,
Jan 14, 2015, 10:53:09 AM1/14/15
to Konstantin Khomoutov, minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
I'm not sure that explains it.

This is what I see on Ubuntu 14.04

$ time dash -c 'awk "{print $0}" < bigfile.txt > /dev/null'

real 0m7.939s
user 0m7.657s
sys 0m0.285s


$ time bash -c 'awk "{print $0}" < bigfile.txt > /dev/null'

real 0m7.878s
user 0m7.642s
sys 0m0.238s


$ time awk "{print $0}" < bigfile.txt > /dev/null

real 0m23.475s
user 0m23.243s
sys 0m0.238s

Nico

unread,
Jan 14, 2015, 11:21:01 AM1/14/15
to Konstantin Khomoutov, minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
I've taken on board minux's idea (got an strace without creating an additional shell) and Konstantin's idea (ensured all the tests are run in bash). The results are below.

I've noticed that:

- the main difference between awk's and Go's strace logs are the calls to futex and select.

- Walters' version (that creates unnecessary strings) is not only the slowest, but also its performance deteriorates with increasing GOMAXPROCS.

- Walter's version also shows a large number of errors in the calls to futex


Could anyone explain what's the meaning of this larger number of errors when calling futex?




# awk
bash -c 'time -p awk "{print $0}" < bigfile.txt > /dev/null'
real 7.98
user 7.75
sys 0.23
bash -c 'strace -c -f awk "{print $0}" < bigfile.txt > /dev/null'

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.22    0.048003           0    439834           read
  3.90    0.001986           0     21148           write
  0.47    0.000242          13        18           mmap
  0.30    0.000151          17         9           open
  0.27    0.000136          14        10           mprotect
  0.23    0.000117          10        12           fstat
  0.16    0.000082          14         6         6 access
  0.14    0.000070           8         9           close
  0.06    0.000032          11         3           brk
  0.05    0.000028          28         1           munmap
  0.05    0.000027           9         3         3 ioctl
  0.05    0.000026           7         4           rt_sigaction
  0.04    0.000018           9         2           getgroups
  0.03    0.000014          14         1           execve
  0.02    0.000009           9         1           arch_prctl
  0.01    0.000007           7         1           sigaltstack
  0.00    0.000000           0         1           dup2

------ ----------- ----------- --------- --------- ----------------
100.00    0.050948                461063         9 total


# Go Walters GOMAXPROCS=1
bash -c 'GOMAXPROCS=1 time -p ./printWalters < bigfile.txt > /dev/null'   
real 25.27
user 25.31
sys 0.71
bash -c 'strace -c -f -E GOMAXPROCS=1 ./printWalters < bigfile.txt > /dev/null'
Process 25931 attached
Process 25971 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 81.75    4.157976          37    112321      5572 futex
 16.82    0.855519           2    448900           select
  0.79    0.040194           0    439827           read
  0.59    0.029880           0    439826           write
  0.05    0.002592           0     13384           sched_yield
  0.00    0.000222           4        58           rt_sigaction
  0.00    0.000036           5         7           mmap
  0.00    0.000018           3         7           rt_sigprocmask
  0.00    0.000012           4         3           sigaltstack
  0.00    0.000011           4         3           arch_prctl
  0.00    0.000009           5         2           gettid
  0.00    0.000006           6         1           munmap
  0.00    0.000005           5         1           execve

  0.00    0.000005           5         1           sched_getaffinity
  0.00    0.000004           2         2           clone

------ ----------- ----------- --------- --------- ----------------
100.00    5.086489               1454343      5572 total


# Go Walters GOMAXPROCS=2
bash -c 'GOMAXPROCS=2 time -p ./printWalters < bigfile.txt > /dev/null'   
real 45.13
user 42.33
sys 3.15
bash -c 'strace -c -f -E GOMAXPROCS=2 ./printWalters < bigfile.txt > /dev/null'
Process 26326 attached
Process 26327 attached
Process 26328 attached
Process 26529 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 79.91    7.893508          12    647857     29259 futex
 17.90    1.767814           4    482288           select
  0.81    0.080021           0    215456           sched_yield
  0.81    0.079730           0    439827           read
  0.58    0.056927           0    439826           write
  0.00    0.000184           3        58           rt_sigaction
  0.00    0.000092           7        13           rt_sigprocmask
  0.00    0.000055          11         5           arch_prctl
  0.00    0.000030           4         7           mmap
  0.00    0.000027           5         5           sigaltstack
  0.00    0.000019           5         4           gettid
  0.00    0.000014           4         4           clone
  0.00    0.000012          12         1           execve
  0.00    0.000006           6         1           munmap
  0.00    0.000006           6         1           sched_getaffinity

------ ----------- ----------- --------- --------- ----------------
100.00    9.878445               2225353     29259 total


# Go Kunitz GOMAXPROCS=1
bash -c 'GOMAXPROCS=1 time -p ./printKunitz < bigfile.txt > /dev/null'   
real 4.52
user 4.31
sys 0.21
bash -c 'strace -c -f -E GOMAXPROCS=1 ./printKunitz < bigfile.txt > /dev/null'
Process 26581 attached
Process 26582 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 96.34    0.195853         347       565           select
  3.17    0.006442           4      1720           read
  0.32    0.000646          59        11           futex
  0.07    0.000134           2        58           rt_sigaction
  0.02    0.000036           3        11           mmap
  0.02    0.000035           5         7           rt_sigprocmask
  0.02    0.000032           0      1636           write
  0.01    0.000030          10         3           arch_prctl
  0.01    0.000018          18         1           sched_getaffinity
  0.01    0.000016           5         3           sched_yield
  0.01    0.000014           7         2           clone
  0.01    0.000014           5         3           sigaltstack
  0.01    0.000012           6         2           gettid
  0.00    0.000010          10         1           execve
  0.00    0.000004           4         1           munmap

------ ----------- ----------- --------- --------- ----------------
100.00    0.203296                  4024           total


# Go Kunitz GOMAXPROCS=2
bash -c 'GOMAXPROCS=2 time -p ./printKunitz < bigfile.txt > /dev/null'   
real 4.53
user 4.34
sys 0.20
bash -c 'strace -c -f -E GOMAXPROCS=2 ./printKunitz < bigfile.txt > /dev/null'
Process 26634 attached

Process 26635 attached
Process 26636 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 65.63    0.316982         665       477         4 futex
 33.08    0.159785         300       532           select
  1.20    0.005783           3      1720           read
  0.03    0.000133           2        58           rt_sigaction
  0.01    0.000044           4        10           sched_yield
  0.01    0.000039           0      1636           write
  0.01    0.000039           4        11           mmap
  0.01    0.000039          13         3           clone
  0.01    0.000037           4        10           rt_sigprocmask
  0.01    0.000026           7         4           sigaltstack
  0.01    0.000025           6         4           arch_prctl
  0.00    0.000013           4         3           gettid
  0.00    0.000005           5         1           munmap

  0.00    0.000005           5         1           sched_getaffinity
  0.00    0.000004           4         1           execve

------ ----------- ----------- --------- --------- ----------------
100.00    0.482959                  4471         4 total


# Go Pike GOMAXPROCS=1
bash -c 'GOMAXPROCS=1 time -p ./printPike < bigfile.txt > /dev/null'   
real 4.77
user 4.42
sys 0.36
bash -c 'strace -c -f -E GOMAXPROCS=1 ./printPike < bigfile.txt > /dev/null'
Process 26686 attached

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 94.01    5.295772        2413      2195           select
  3.32    0.186891           0    439827           read
  2.67    0.150386           0    418679           write
  0.00    0.000007           4         2           arch_prctl
  0.00    0.000007           7         1           sched_getaffinity
  0.00    0.000003           3         1           execve
  0.00    0.000000           0         7           mmap
  0.00    0.000000           0         1           munmap
  0.00    0.000000           0        58           rt_sigaction
  0.00    0.000000           0         4           rt_sigprocmask
  0.00    0.000000           0         2           sched_yield
  0.00    0.000000           0         1           clone
  0.00    0.000000           0         2           sigaltstack
  0.00    0.000000           0         1           gettid
  0.00    0.000000           0         8           futex

------ ----------- ----------- --------- --------- ----------------
100.00    5.633066                860789           total


# Go Pike GOMAXPROCS=2
bash -c 'GOMAXPROCS=2 time -p ./printPike < bigfile.txt > /dev/null'   
real 4.78
user 4.49
sys 0.31
bash -c 'strace -c -f -E GOMAXPROCS=2 ./printPike < bigfile.txt > /dev/null'

Process 26815 attached
Process 26816 attached
Process 26817 attached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 64.97   10.171503        8664      1174        82 futex
 32.95    5.158679        2468      2090           select
  1.15    0.179769           0    439827           read
  0.93    0.145487           0    418679           write
  0.00    0.000353           6        58           rt_sigaction
  0.00    0.000137           1       145           sched_yield
  0.00    0.000092           9        10           rt_sigprocmask
  0.00    0.000060          20         3           gettid
  0.00    0.000055           8         7           mmap
  0.00    0.000042          11         4           sigaltstack
  0.00    0.000030          10         3           clone
  0.00    0.000028           7         4           arch_prctl
  0.00    0.000014          14         1           execve
  0.00    0.000010          10         1           munmap

  0.00    0.000008           8         1           sched_getaffinity
------ ----------- ----------- --------- --------- ----------------
100.00   15.656267                862007        82 total

Michael Jones

unread,
Jan 14, 2015, 4:00:34 PM1/14/15
to Nico, Konstantin Khomoutov, minux, Rob Pike, golang-nuts@googlegroups.com >> golang-nuts
Minor change to Rob's program so that it will copy newlines in the input to the output. This is imperfect though in the case that the input did not have any newlines to begin with. 


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Michael T. Jones | Chief Technology Advocate  | m...@google.com |  +1 650-335-5765

Rob Pike

unread,
Jan 14, 2015, 4:20:52 PM1/14/15
to Michael Jones, Nico, Konstantin Khomoutov, minux, golang-nuts@googlegroups.com >> golang-nuts
I think WriteByte would do. :)

In any case, if there are no newlines in the input Scanner will fail, since it has a maximum token size and the default behavior splits tokens on newlines. So your code is correct if it runs to completion.

-rob

Michael Jones

unread,
Jan 14, 2015, 4:36:37 PM1/14/15
to Rob Pike, Nico, Konstantin Khomoutov, minux, golang-nuts@googlegroups.com >> golang-nuts
I stand corrected. My error was thinking that I'd made an error.

Since this comes up all the time, maybe "how to copy efficiently" should be a code lab.

v1: copy bytes
v2: use buffering
v3: use scanner
v4: use Copy
:

Michael Jones

unread,
Jan 14, 2015, 4:41:21 PM1/14/15
to Rob Pike, Nico, Konstantin Khomoutov, minux, golang-nuts@googlegroups.com >> golang-nuts

roger peppe

unread,
Jan 15, 2015, 4:07:47 AM1/15/15
to Michael Jones, Rob Pike, Nico, Konstantin Khomoutov, minux, golang-nuts@googlegroups.com >> golang-nuts
On 14 January 2015 at 21:40, 'Michael Jones' via golang-nuts
<golan...@googlegroups.com> wrote:
> ...like this:
>
> http://play.golang.org/p/s03tlcc3I9

If you're using io.Copy, I suspect that using bufio will only
slow things down by copying the data more times - io.Copy does its
own buffering.

Michael Jones

unread,
Jan 15, 2015, 7:31:33 AM1/15/15
to roger peppe, Rob Pike, Nico, Konstantin Khomoutov, minux, golang-nuts@googlegroups.com >> golang-nuts
then we need to have some best-practice examples...

Uli Kunitz

unread,
Jan 15, 2015, 3:31:37 PM1/15/15
to golan...@googlegroups.com, rogp...@gmail.com, r...@golang.org, nicolas...@gmail.com, flat...@users.sourceforge.net, mi...@golang.org
I think a few rules are sufficient:

1) Use buffered IO if a performance problem must be solved and there is output to an unbuffered writer or input from an unbuffered reader.
2) io.Copy uses an internal buffer and doesn't require buffered IO.
3) iobuf.Writer requires flushing, but Flush returns information from preceding calls saving error checks.
4) The default buffer size of bufio.NewReader and bufio.NewWriter will be sufficient in most cases.

This thread teached me rule 4.
 
Uli

Matt Harden

unread,
Jan 16, 2015, 9:13:08 PM1/16/15
to Uli Kunitz, golan...@googlegroups.com, rogp...@gmail.com, r...@golang.org, nicolas...@gmail.com, flat...@users.sourceforge.net, mi...@golang.org
If the source implements io.WriterTo, io.Copy uses it. bufio.Reader implements io.WriterTo. So there is no wasted buffering if you use io.Copy with bufio (on either side of the copy). Creating a bufio.Reader allows you to set the buffer size instead of settling for the (undocumented) 32KB buffer built in to io.Copy.

It is true that buffering on both sides of io.Copy is wasteful since it adds a memory copy in the middle of things.

--

Michael Jones

unread,
Jan 16, 2015, 11:45:39 PM1/16/15
to Matt Harden, minux, Rob 'Commander' Pike, nicolas...@gmail.com, golan...@googlegroups.com, flat...@users.sourceforge.net, rogp...@gmail.com, Uli Kunitz

The collected wisdom here would make a great go blog post on the ways and whys of copying bytes.

Reply all
Reply to author
Forward
0 new messages