[Performance Problem] fmt.Scanf is around 50 times slower than c's scanf

1,193 views
Skip to first unread message

s.dhil...@gmail.com

unread,
Apr 21, 2014, 2:20:34 AM4/21/14
to golan...@googlegroups.com
Hi Go Team,

Good Morning. 

I stumbled upon this performance difference when i was trying to use 'go' for a simple program that i had originally written in C.  It appears that 'go' in this scenario is extremely slow in comparison with my C implementation. 

$go version
go version go1.2 linux/amd64

Here is a simple re-creation procedure. 

when we continuously read from the console using fmt.Scanf the performance is considerably slower in comparison to its 'C' counter part.  This is visible especially for very large input set.

Here is a simple program in C that prints numbers until 'n' 

$ cat pnum.c
#include <stdio.h>
#include <string.h>


int main (int argc, char **argv)
{
        int i,n;
        if (argc <=1)
        {
                n=10;
        }
        else
        {
                n=atoi(argv[1]);
        }

        for (i=0; i<n; i++)
        printf("%d ",i);
}

so it runs like below

$pnum 20
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

A go program that will Scanf from console for 'n' times.

$cat scanf.go
package main

import (

"fmt"
"os"
"strconv"

)


func main () {

        var i,n,tmp int
        if len(os.Args) <= 1 {
                n = 10
        } else {
                n,_ = strconv.Atoi(os.Args[1])
        }

        for i=0; i<n; i++ {
                fmt.Scanf ("%d", &tmp)
        }

        fmt.Printf("last num =%d\n", tmp)
}

its 'C' counter part is below

$cat scanfc.c
#include <stdio.h>
#include <string.h>

int main (int argc, char **argv)
{
        int tmp,i, n;
        if (argc <= 1)
        {
                n = 10;
        }
        else
        {
                n = atoi(argv[1]);
        }

        for (i=0; i<n; i++)
        scanf("%d", &tmp);

        printf("last num = %d\n", tmp);

        return 0;
}

Output of go program when we Scanf 100 times, 1000 times, etc until 1 Million times

output of Go program
$for i in 100 1000 10000 100000 1000000^Jdo^Jecho "n="${i}^Jpnum ${i} | time scanf ${i}^Jdone
n=100
last num =99

real    0m0.00s
user    0m0.00s
sys     0m0.01s
n=1000
last num =999

real    0m0.01s
user    0m0.01s
sys     0m0.00s
n=10000
last num =9999

real    0m0.06s
user    0m0.04s
sys     0m0.02s
n=100000
last num =99999

real    0m0.68s
user    0m0.39s
sys     0m0.30s
n=1000000
last num =999999

real    0m7.64s
user    0m4.22s
sys     0m3.57s


output of C program
$for i in 100 1000 10000 100000 1000000^Jdo^Jecho "n="${i}^Jpnum ${i} | time scanfc ${i}^Jdone
n=100
last num = 99

real    0m0.00s
user    0m0.00s
sys     0m0.00s
n=1000
last num = 999

real    0m0.00s
user    0m0.00s
sys     0m0.00s
n=10000
last num = 9999

real    0m0.01s
user    0m0.01s
sys     0m0.00s
n=100000
last num = 99999

real    0m0.02s
user    0m0.04s
sys     0m0.00s
n=1000000
last num = 999999

real    0m0.17s
user    0m0.33s
sys     0m0.01s

Difference is 7 secs vs 0.17 secs.  Is there a way to improve this? 

If i added some profiling code to scanf.go like below

$diff scanf.go scanf_prof.go
7a8
> "runtime/pprof"
14a16,20
>
>       f, err := os.Create("scanf.prof")
>       if err != nil {
>               fmt.Printf ("Error: Profiling %v\n", err)
>       }
20a27,28
>       pprof.StartCPUProfile(f)
>       defer pprof.StopCPUProfile ()

then the top10 shows syscall.Syscall being called maximum number of times.

$go tool pprof scanf_prof scanf.prof
Welcome to pprof!  For help, type 'help'.
(pprof) top10
Total: 669 samples
     319  47.7%  47.7%      352  52.6% syscall.Syscall
      35   5.2%  52.9%       35   5.2% runtime.stringiter2
      27   4.0%  57.0%       34   5.1% runtime.entersyscall
      22   3.3%  60.2%       57   8.5% fmt.indexRune
      16   2.4%  62.6%      168  25.1% fmt.(*ss).consume
      15   2.2%  64.9%       64   9.6% io.ReadAtLeast
      12   1.8%  66.7%       14   2.1% itab
      12   1.8%  68.5%       12   1.8% runtime.atomicstore
      11   1.6%  70.1%       81  12.1% fmt.(*readRune).readByte
      11   1.6%  71.7%      102  15.2% fmt.(*ss).getRune
(pprof) quit

Does it mean that every single fmt.Scanf go runtime goes to 'syscall.Syscall' ? will it hold true for frequently used functions like reading from a socket a million times or reading from a file a million times? 

Is this expected? is there a way to improve it?

Thanks in Advance,
Dhilip


Jackman

unread,
Apr 21, 2014, 11:33:47 AM4/21/14
to s.dhil...@gmail.com, golan...@googlegroups.com
I am curious if this is more of an initialization issue or a side effect from a language feature. I have heard that garbage collection is a bit of an issue and is scheduled to be rewritten in the next release or two, for example.


--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Andrew Jackman
kd7...@gmail.com

Tamás Gulácsi

unread,
Apr 21, 2014, 11:48:41 AM4/21/14
to golan...@googlegroups.com
Use bufio.NewReader(os.Stdin), not directly os.Stdin.
Message has been deleted

voidlogic

unread,
Apr 21, 2014, 12:05:58 PM4/21/14
to golan...@googlegroups.com
Additionally, tip may be a lot faster than 1.2 now. I believe tip uses sync.Pool for internal fmt buffers now.

On Monday, April 21, 2014 10:48:41 AM UTC-5, Tamás Gulácsi wrote:
Use bufio.NewReader(os.Stdin), not directly os.Stdin.

Dmitry Vyukov

unread,
Apr 21, 2014, 12:16:03 PM4/21/14
to s.dhil...@gmail.com, golang-nuts
Try to add bufio.Reader around os.Stdin

DhilipKumar Sankaranarayanan

unread,
Apr 21, 2014, 12:17:40 PM4/21/14
to golan...@googlegroups.com, s.dhil...@gmail.com
In the example code there is only one variable that's being repetitively used.  I don't understand why garbage collection has to suffer during its run-time?

Hotei

unread,
Apr 21, 2014, 12:26:20 PM4/21/14
to golan...@googlegroups.com, s.dhil...@gmail.com
I suspect it was scanf that created the garbage.  There's a lot going on behind the scenes when you call a function.

DhilipKumar Sankaranarayanan

unread,
Apr 21, 2014, 12:31:50 PM4/21/14
to golan...@googlegroups.com, s.dhil...@gmail.com
Can i know why is this slower? and how will buffio improve the performance?

My real question are all the system calls wrapped around syscall.Syscall?  

there was a situation where i had used io.ReadAtLeast to read arbitrary number of bytes from a socket. when number of times the application called io.ReadAtLeast increased then it appears that the performance deteriorated. 

The go profiler showed it again spent a lot of time on syscall.Syscall.

When i

minux

unread,
Apr 21, 2014, 12:53:34 PM4/21/14
to DhilipKumar Sankaranarayanan, golang-nuts
On Mon, Apr 21, 2014 at 12:31 PM, DhilipKumar Sankaranarayanan <s.dhil...@gmail.com> wrote:
Can i know why is this slower? and how will buffio improve the performance?
Because os.Stdin in Go is not buffered as C's stdin is.
Your C programs is using buffered stdin, so to be fair, you should add buffer to
the Go version, otherwise almost every character read will incur a syscall overhead.

My real question are all the system calls wrapped around syscall.Syscall?  
to some extent, yes. You can't call a syscall directly in Go (no way to write the
syscall/int $0x80, etc in Go), so every syscall has to go through assembly wrapper.

Carlos Castillo

unread,
Apr 21, 2014, 10:27:20 PM4/21/14
to golan...@googlegroups.com
On most platforms, IO is usually a OS feature you must make a system call to perform. When you read data from a file or pipe, you make a system call to tell the OS to move data into your program from the data source.

In go, IO by default is not buffered, which means that every time Read is called by fmt.Scanf to get bytes, a system call is made to do the work. When you use buffered IO, a large buffer is filled to capacity in a single system call, and then Read requests to the buffered stream are satisfied by the contents of the buffer it's emptied, at which point it is then re-filled. Therefore one system call is usually made per BUFFERSIZE bytes read, instead of at least once per fmt.Scanf.

You can make a more apples to apples comparison by either using bufio.NewReader to wrap os.Stdin and fmt.Fscanf to read it: http://play.golang.org/p/OL7TC7I3Cv

Or you can disable buffering in C by using the setvbuf function to disable buffering (see: http://www.cplusplus.com/reference/cstdio/setvbuf/).

Also, non-trivial go code should be posted through a go playground link.

Kevin Gillette

unread,
Apr 21, 2014, 10:34:31 PM4/21/14
to golan...@googlegroups.com
In addition to buffered io, Go's fmt package uses reflection, while C's just (unsafely) assumes you provided the right format specifiers. Reflection is always going to be slower than blind assumptions, but it won't lead to segfaults, which is a very reasonable tradeoff. Also, it means that you can specify %d for all integers, without having to worry about size specifiers.

Kevin Gillette

unread,
Apr 21, 2014, 10:45:13 PM4/21/14
to golan...@googlegroups.com
That is to say, fmt doesn't use buffered io (you have to do that explicitly), but it is a performance consideration.

DhilipKumar Sankaranarayanan

unread,
Apr 22, 2014, 1:57:04 AM4/22/14
to golan...@googlegroups.com
Thank you all for your kind explanation. 
Reply all
Reply to author
Forward
0 new messages