gccgo vs 6g benchmark

765 views
Skip to first unread message

Albert Strasheim

unread,
Dec 16, 2010, 3:26:18 AM12/16/10
to golang-nuts
Hello all

I've finally managed to get a significant part of our codebase
compiled with gccgo, so I ran a quick benchmark.

The code processes some binary data from a file. Lots of slices, maps,
and moving bytes around.

gccgo r167898 -O0 runs in about 26 seconds.

gccgo r167898 -O2 or -Ofast runs in about 17.7 seconds.

6g tip runs in about 14.5 seconds.

This code spends a good amount of time in GC, so we expect a speedup
when Russ's GC work lands.

Any thoughts on where we can poke and measure and test to provide some
useful information for improving gccgo and/or 6g?

Is there some kind of link-time optimization we can try with gccgo?

Regards

Albert

Serge Hulne

unread,
Dec 16, 2010, 12:51:52 PM12/16/10
to golang-nuts
At the risk of making a candid request, here is a link to a piece of
code which ranks the frequency of bigrams occurring in a large text
file.

http://groups.google.com/group/golang-dev/browse_thread/thread/de2125f164a1273/01396bc34cc63505?lnk=gst&q=bigram#01396bc34cc63505

The item which, I think, could be worth optimizing is the storage /
retrieval of items in a map, specially when said storage is indirect,
because the items are not integers or strings and therefore have to be
converted one way or the other into one of those two types (as is the
case in this example).

Motivation for suggesting this : This kind of operation (or something
equivalent) is bound to occur a lot in any application intended at
dealing with natural language processing in Go.

Serge.

Ian Lance Taylor

unread,
Dec 16, 2010, 1:03:46 PM12/16/10
to Albert Strasheim, golang-nuts
Albert Strasheim <ful...@gmail.com> writes:

> I've finally managed to get a significant part of our codebase
> compiled with gccgo, so I ran a quick benchmark.
>
> The code processes some binary data from a file. Lots of slices, maps,
> and moving bytes around.

How many goroutines? I would expect 6g to do better than gccgo on
programs with lots of goroutines. gccgo should generally do better in
straight line code.

> gccgo r167898 -O0 runs in about 26 seconds.
>
> gccgo r167898 -O2 or -Ofast runs in about 17.7 seconds.
>
> 6g tip runs in about 14.5 seconds.
>
> This code spends a good amount of time in GC, so we expect a speedup
> when Russ's GC work lands.
>
> Any thoughts on where we can poke and measure and test to provide some
> useful information for improving gccgo and/or 6g?

With gccgo, one fairly easy approach is to use the -pg option when you
compile and link, the program will dump a gmon.out file. You can then
use gprof to get a CPU profile. That will point at the slow areas.
http://sourceware.org/binutils/docs-2.21/gprof/index.html .

The corresponding operation for 6g is 6prof, but you don't get quite as
much information. http://golang.org/cmd/prof/ .

> Is there some kind of link-time optimization we can try with gccgo?

You can try -flto and perhaps -fwhole-program. You would be the first
person to try it, though.

Ian

Albert Strasheim

unread,
Dec 16, 2010, 1:28:19 PM12/16/10
to golang-nuts
Hello

On Dec 16, 8:03 pm, Ian Lance Taylor <i...@google.com> wrote:
> Albert Strasheim <full...@gmail.com> writes:
> > I've finally managed to get a significant part of our codebase
> > compiled with gccgo, so I ran a quick benchmark.
> > The code processes some binary data from a file. Lots of slices, maps,
> > and moving bytes around.
> How many goroutines?  I would expect 6g to do better than gccgo on
> programs with lots of goroutines.  gccgo should generally do better in
> straight line code.

This is straight line code only (no goroutines) which is why I was
surprised that 6g did better than gccgo.

> > Any thoughts on where we can poke and measure and test to provide some
> > useful information for improving gccgo and/or 6g?
> With gccgo, one fairly easy approach is to use the -pg option when you
> compile and link, the program will dump a gmon.out file.  You can then
> use gprof to get a CPU profile.  That will point at the slow areas.
> The corresponding operation for 6g is 6prof, but you don't get quite as
> much information.  http://golang.org/cmd/prof/.

Thanks. We've been using 6prof extensively and it's been very useful.
I will give gprof a try and will post some results.

> > Is there some kind of link-time optimization we can try with gccgo?
> You can try -flto and perhaps -fwhole-program.  You would be the first
> person to try it, though.

I tried -flto and it didn't break anything, but didn't make it faster
either. -fwhole-program causes a link error, but I think it's
something that could easily be fixed. I sent you an email about it.

Regards

Albert

Albert Strasheim

unread,
Dec 17, 2010, 10:50:09 AM12/17/10
to golang-nuts
Hello

On Dec 16, 10:26 am, Albert Strasheim <full...@gmail.com> wrote:
> The code processes some binary data from a file. Lots of slices, maps,
> and moving bytes around.
> gccgo r167898 -O0 runs in about 26 seconds.
> gccgo r167898 -O2 or -Ofast runs in about 17.7 seconds.
> 6g tip runs in about 14.5 seconds.

After including your patches

http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01353.html
http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01356.html

the gccgo version now runs in 14.6 seconds!

I also played with -fstack-protector (Fedora enables this for C code),
which slows it down to about 20 seconds.

Regards

Albert

Ian Lance Taylor

unread,
Dec 17, 2010, 11:05:54 AM12/17/10
to Albert Strasheim, golang-nuts
Albert Strasheim <ful...@gmail.com> writes:

> On Dec 16, 10:26 am, Albert Strasheim <full...@gmail.com> wrote:
>> The code processes some binary data from a file. Lots of slices, maps,
>> and moving bytes around.
>> gccgo r167898 -O0 runs in about 26 seconds.
>> gccgo r167898 -O2 or -Ofast runs in about 17.7 seconds.
>> 6g tip runs in about 14.5 seconds.
>
> After including your patches
>
> http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01353.html
> http://gcc.gnu.org/ml/gcc-patches/2010-12/msg01356.html
>
> the gccgo version now runs in 14.6 seconds!

Glad to hear it. I can think of one more improvement along these lines,
which should cause the garbage collector to split the stack less often.


> I also played with -fstack-protector (Fedora enables this for C code),
> which slows it down to about 20 seconds.

Personally I wouldn't bother with -fstack-protector for Go. Go doesn't
permit buffer overruns anyhow. A compiler bug permitting stack smashing
in Go is about as likely as a compiler bug breaking -fstack-protector
and making it useless.

Using -fstack-protector in conjunction with -fsplit-stack and gold is
going to slow the program down a lot, because every functions that gets
a stack protect check is going to wind up splitting the stack.

Ian

David Roundy

unread,
Dec 18, 2010, 9:39:00 AM12/18/10
to Ian Lance Taylor, Albert Strasheim, golang-nuts
On Fri, Dec 17, 2010 at 11:05 AM, Ian Lance Taylor <ia...@google.com> wrote:
> Personally I wouldn't bother with -fstack-protector for Go.  Go doesn't
> permit buffer overruns anyhow.  A compiler bug permitting stack smashing
> in Go is about as likely as a compiler bug breaking -fstack-protector
> and making it useless.

But wouldn't it potentially be helpful for go code that links with C
code? Not that I would be keen on using it, but I wonder if this might
be a reasonable corner case, particularly for non-cpu-intensive code
that might call large C libraries using input over the network.
--
David Roundy

Ian Lance Taylor

unread,
Dec 20, 2010, 1:15:00 PM12/20/10
to David Roundy, Albert Strasheim, golang-nuts
David Roundy <rou...@physics.oregonstate.edu> writes:

You can use -fstack-protector while compiling C code, and you can then
link that C code with Go code which is compiled without
-fstack-protector. There is no requirement that -fstack-protector be
used for all code linked into a binary.

Ian

Reply all
Reply to author
Forward
0 new messages