[go-nuts] Garbage collection and performance in gccgo

1,966 views
Skip to first unread message

a.m...@abdn.ac.uk

unread,
Apr 17, 2010, 4:40:43 PM4/17/10
to golang-nuts
Hi,

I heard somewhere (I think it was Rob Pike in his google tech talk on
go) that gccgo achieves about 20% of the performance of C. This is
pretty impressive for a garbage-collected language. But looking at the
gccgo site "Setting up and using gccgo", I read that garbage
collection is still not implemented in gccgo. When that happens, I can
only imagine that the performance gap between gccgo and C will
increase. Is that true? If so, does anybody have any idea by how much?

Thanks,

A.


--
Subscription settings: http://groups.google.com/group/golang-nuts/subscribe?hl=en

Ian Lance Taylor

unread,
Apr 17, 2010, 4:49:14 PM4/17/10
to a.m...@abdn.ac.uk, golang-nuts
a.m...@abdn.ac.uk writes:

> I heard somewhere (I think it was Rob Pike in his google tech talk on
> go) that gccgo achieves about 20% of the performance of C.

To be clear, the goal is to be within 20% of C performance or closer,
not to be 20% of the performance.

> This is
> pretty impressive for a garbage-collected language. But looking at the
> gccgo site "Setting up and using gccgo", I read that garbage
> collection is still not implemented in gccgo. When that happens, I can
> only imagine that the performance gap between gccgo and C will
> increase. Is that true? If so, does anybody have any idea by how much?

As it happens, gccgo is currently paying some of the price for garbage
collection even though it is not implemented. However, it is true
that we do not know how much garbage collection will affect overall
performance. Ongoing experience with Java suggests that very clever
garbage collection does not carry a significant performance penalty.
It remains to be seen whether that can be realized in Go. Go benefits
by comparison with Java by allocating less memory on the heap in
general. However, it is hurt by the fact that Go permits a pointer to
point into the middle of an object, which can make it more costly to
determine the object to which a pointer refers.

Ian

Hans Stimer

unread,
Apr 17, 2010, 9:51:35 PM4/17/10
to Ian Lance Taylor, a.m...@abdn.ac.uk, golang-nuts
On Sat, Apr 17, 2010 at 1:49 PM, Ian Lance Taylor <ia...@google.com> wrote:
To be clear, the goal is to be within 20% of C performance or closer,
not to be 20% of the performance.

How is that coming? Right now the flawed benchmarks (for what they are worth) are showing it at over 7x slower than C, which is pretty far from 1.2x.


Any idea where the performance improvements are going to come from? How big of effort is this going to be?

Will we see 1.2x this year?


 

Rob 'Commander' Pike

unread,
Apr 17, 2010, 10:52:23 PM4/17/10
to Hans Stimer, Ian Lance Taylor, a.m...@abdn.ac.uk, golang-nuts
On Apr 17, 2010, at 6:51 PM, Hans Stimer wrote:

> On Sat, Apr 17, 2010 at 1:49 PM, Ian Lance Taylor <ia...@google.com> wrote:
> To be clear, the goal is to be within 20% of C performance or closer,
> not to be 20% of the performance.
>
> How is that coming? Right now the flawed benchmarks (for what they are worth) are showing it at over 7x slower than C, which is pretty far from 1.2x.

A dominant fraction of that ratio is caused by a couple of tests that depend critically on libraries for which equivalent performance versions are not available in Go. For instance, pidigits depends on a multiprecision math package, and the C versions all use GNU's, which is written in very tight assembler. There's a similar story for regular expressions (regex-dna for instance). Even Python beats Go for pure regexp performance, but then Python's regexps (like almost everyone else's) are implemented in C, by PCRE, whereas Go's are in a slow but simple starter implementation. It's not an apples-to-apples comparison.

Plus, if you look at those C benchmarks, the highest scorers are often tainted with #pragmas, assembler tweaks, and require particular GCC flags to make them perform that well. If you compare a straight C program to a roughly-equivalent Go program, you'll find the two languages are much closer in performance than this benchmark suite would have you believe.

> http://shootout.alioth.debian.org/u32q/compare.php?lang=go

That particular comparison is pretty misleading. As I said, it doesn't really compare one programming language against another in any meaningful way.

Still, there's no question Go isn't always performing at the level we want yet.

> Any idea where the performance improvements are going to come from? How big of effort is this going to be?
>
> Will we see 1.2x this year?

If you try a straight but carefully written C program and a straight but carefully written Go program, we're there already, even for some of these benchmarks. The reverse-complement benchmark in our tests compares a clean but fast C version and a Go version and they're almost exactly the same speed. Have a look at

http://code.google.com/p/go/source/browse/test/bench/timing.log?r=release

for a fair and honest performance comparison as expressed by this particular benchmark suite. Also some of our internal comparisons between C++ and Go have been encouraging.

Our overall take: The compilers are OK (gc) to good (gccgo) but have room to improve. The libraries, particularly those people like to use in benchmarks, can be slow. Garbage collection isn't fast enough yet, but even if it were, taking care not to generate unnecessary garbage can have a huge effect.

-rob

P.S. Someone should write a FAQ entry about this particular benchmark suite.

Rob 'Commander' Pike

unread,
Apr 17, 2010, 10:55:58 PM4/17/10
to Hans Stimer, Ian Taylor, a.m...@abdn.ac.uk, golang-nuts Nuts
And by the way, if someone wants to lavish the kind of care on the Go benchmark entries that the C entries have had, it's bound to make a big difference. Most of them have hardly been tuned at all.

-rob

Russ Cox

unread,
Apr 17, 2010, 10:58:48 PM4/17/10
to Hans Stimer, Ian Lance Taylor, a.m...@abdn.ac.uk, golang-nuts
Following up on what Rob said, here's what the bottom of timing.log shows:

1.0x reverse-complement
1.2x mandelbrot
1.4x k-nucleotide
1.4x nbody
1.5x fasta
1.7x meteor
2.0x spectral-norm
2.1x fannkuch
(better array bounds checks would get to 1.5x)
5.6x binary-tree
(comparing malloc+free against garbage collector)
10.5x regex-dna
(different algorithms; Go has better worst case)
14.8x pidigits
(different bignum packages)
0.7x chameneos
(different thread libraries)
1.0x threadring
(different thread libraries)

Only the first eight are close to fair comparisons, and Go is doing
reasonably well. I don't believe it would take much to get those to 1.2x.

Russ

Isaac Gouy

unread,
Apr 18, 2010, 1:36:44 PM4/18/10
to golang-nuts


On Apr 17, 7:52 pm, "Rob 'Commander' Pike" <r...@google.com> wrote:
> On Apr 17, 2010, at 6:51 PM, Hans Stimer wrote:
>
> > On Sat, Apr 17, 2010 at 1:49 PM, Ian Lance Taylor <i...@google.com> wrote:
> > To be clear, the goal is to be within 20% of C performance or closer,
> > not to be 20% of the performance.
>
> > How is that coming? Right now the flawed benchmarks (for what they are worth) are showing it at over 7x slower than C, which is pretty far from 1.2x.
>
> A dominant fraction of that ratio is caused by a couple of tests that depend critically on libraries for which equivalent performance versions are not available in Go.


I can't figure out how Hans Stimer calculated 7x but let's take the
median ratio for those x86 quad-core measurements 9.44


> For instance, pidigits depends on a multiprecision math package, and the C versions all use GNU's, which is written in very tight assembler.  There's a similar story for regular expressions (regex-dna for instance).  Even Python beats Go for pure regexp performance, but then Python's regexps (like almost everyone else's) are implemented in C, by PCRE, whereas Go's are in a slow but simple starter implementation.  It's not an apples-to-apples comparison.


Throw out pidigits. Throw out regex-dna.

The median ratio for the remaining measurements is approximately 6x

Throw out binary-trees - the median ratio remains approximately 6x

It doesn't seem that PCRE and GMP use explains much in this story.


> Plus, if you look at those C benchmarks, the highest scorers are often tainted with #pragmas, assembler tweaks, and require particular GCC flags to make them perform that well.  If you compare a straight C program to a roughly-equivalent Go program, you'll find the two languages are much closer in performance than this benchmark suite would have you believe.


To be clear, is the intention that Go be able to follow C in using
#pragmas, assembler tweaks, and particular GCC flags to achieve that
performance?

In other words, will Go have rough equivalents to those "highest
scorers"?


> >http://shootout.alioth.debian.org/u32q/compare.php?lang=go
>
> That particular comparison is pretty misleading.   As I said, it doesn't really compare one programming language against another in any meaningful way.
>
> Still, there's no question Go isn't always performing at the level we want yet.
>
> > Any idea where the performance improvements are going to come from? How big of effort is this going to be?
>
> > Will we see 1.2x this year?
>
> If you try a straight but carefully written C program and a straight but carefully written Go program, we're there already, even for some of these benchmarks.  The reverse-complement benchmark in our tests compares a clean but fast C version and a Go version and they're almost exactly the same speed.  Have a look at
>
>        http://code.google.com/p/go/source/browse/test/bench/timing.log?r=rel...
>
> for a fair and honest performance comparison as expressed by this particular benchmark suite.  Also some of our internal comparisons between C++ and Go have been encouraging.


Once upon a time, a GNAT Ada programmer opined that the comparisons
would be fair and honest when the assembler produced for the C
programs and the assembler produced for the Ada programs was the
same - a this apple to this apple comparison.

At least that would be unambiguous.

Isaac Gouy

unread,
Apr 18, 2010, 1:43:30 PM4/18/10
to golang-nuts


On Apr 17, 7:58 pm, Russ Cox <r...@golang.org> wrote:
> Following up on what Rob said, here's what the bottom of timing.log shows:
>
>  1.0x reverse-complement
>  1.2x mandelbrot
>  1.4x k-nucleotide
>  1.4x nbody
>  1.5x fasta
>  1.7x meteor
>  2.0x spectral-norm
>  2.1x fannkuch
>       (better array bounds checks would get to 1.5x)
>  5.6x binary-tree
>       (comparing malloc+free against garbage collector)
> 10.5x regex-dna
>       (different algorithms; Go has better worst case)
> 14.8x pidigits
>       (different bignum packages)
>  0.7x chameneos
>       (different thread libraries)
>  1.0x threadring
>       (different thread libraries)
>
> Only the first eight are close to fair comparisons, and Go is doing
> reasonably well.  I don't believe it would take much to get those to 1.2x.


Seems as though "fair" means not using something that works better
than whatever Go uses - someone once opined that the world isn't fair.

Russ Cox

unread,
Apr 18, 2010, 2:39:37 PM4/18/10
to Isaac Gouy, golang-nuts
>>  1.0x reverse-complement
>>  1.2x mandelbrot
>>  1.4x k-nucleotide
>>  1.4x nbody
>>  1.5x fasta
>>  1.7x meteor
>>  2.0x spectral-norm
>>  2.1x fannkuch
>>       (better array bounds checks would get to 1.5x)
>>  5.6x binary-tree
>>       (comparing malloc+free against garbage collector)
>> 10.5x regex-dna
>>       (different algorithms; Go has better worst case)
>> 14.8x pidigits
>>       (different bignum packages)
>>  0.7x chameneos
>>       (different thread libraries)
>>  1.0x threadring
>>       (different thread libraries)
>>
>> Only the first eight are close to fair comparisons, and Go is doing
>> reasonably well.  I don't believe it would take much to get those to 1.2x.
>
> Seems as though "fair" means not using something that works better
> than whatever Go uses

My definition of fair above is that you're comparing programs that say
approximately the same thing and use the same algorithms, so that the
real difference is the choice of implementation language and not
implementation technique. The first eight fit that criteria, even though
most of them are still faster than Go's implementation, and the final
five do not, even though in two of them Go wins.

I'm not trying to attack the shootout site at all. What you're measuring
is simply not what I'm interested in measuring. You're measuring
what can be done with a particular tool by pulling out all the stops,
even ones (like inline assembly or magic pragmas) that another tool
might make a conscious decision to omit. I'm interested in measuring
the quality of the generated code when both compilers are presented
with what amounts to the same program.

Russ

Hans Stimer

unread,
Apr 18, 2010, 2:53:07 PM4/18/10
to Isaac Gouy, golang-nuts


I can't figure out how Hans Stimer calculated 7x but let's take the
median ratio for those x86 quad-core measurements 9.44


Fair question. I just referred to this chart which lists a median value of 7.44


To be clear, is the intention that Go be able to follow C in using
#pragmas, assembler tweaks, and particular GCC flags to achieve that
performance?

In other words, will Go have rough equivalents to those "highest
scorers"?



If Go can get to 1.2x without the gcc heroics then we have the successor to C except for anything that requires specific memory layout (I still have a complaint about that). Most of the gcc performance flags and pragmas go unused for a variety of reasons. The more useful comparison for most people is gcc -O3 with no pragmas and no benchmark specific assembly.

Isaac Gouy

unread,
Apr 18, 2010, 4:20:56 PM4/18/10
to golang-nuts
Not that it would be difficult to do ;-)


> What you're measuring
> is simply not what I'm interested in measuring.  You're measuring
> what can be done with a particular tool by pulling out all the stops,
> even ones (like inline assembly or magic pragmas) that another tool
> might make a conscious decision to omit.  I'm interested in measuring
> the quality of the generated code when both compilers are presented
> with what amounts to the same program.


Which is the sensible focus for a language implementor, but probably
not what potential language users understand by talk of performance.

To paraphrase, someone should write a FAQ entry about "20% of the
performance of C" :-)

John Joyus

unread,
Apr 19, 2010, 12:39:23 PM4/19/10
to golang-nuts

On Apr 17, 4:40 pm, a.mo...@abdn.ac.uk wrote:
> I heard somewhere (I think it was Rob Pike in his google tech talk on
> go) that gccgo achieves about 20% of the performance of C.

I think he said the perforamcne difference is "within 20%". What that
means is "80% of the C's performance". The 20% being the cost of
garbage collection.

I wish the garbage collection in Go is optional. It will be worth
even if that needs a different set of compilers.

Thanks,
JJ

chris dollin

unread,
Apr 19, 2010, 12:41:48 PM4/19/10
to John Joyus, golang-nuts
On 19 April 2010 17:39, John Joyus <john....@gmail.com> wrote:

On Apr 17, 4:40 pm, a.mo...@abdn.ac.uk wrote:
> I heard somewhere (I think it was Rob Pike in his google tech talk on
> go) that gccgo achieves about 20% of the performance of C.

I think he said the perforamcne difference is "within 20%". What that
means is "80% of the C's performance". The 20% being the cost of
garbage collection.

I wish  the garbage collection in Go is optional. It will be worth
even if that needs a different set of compilers.

I think that more importantly it would need a different set of
programs. And maybe programmers. Or t'other way about.

Chris

--
Chris "allusive" Dollin

Jessta

unread,
Apr 19, 2010, 2:56:10 PM4/19/10
to John Joyus, golang-nuts
On Tue, Apr 20, 2010 at 2:39 AM, John Joyus <john....@gmail.com> wrote:
> I wish  the garbage collection in Go is optional. It will be worth
> even if that needs a different set of compilers.
>

Go without garbage collection would be terrifying
and close to impossible to write anything correct in.

- jessta


--
=====================
http://jessta.id.au

Paolo 'Blaisorblade' Giarrusso

unread,
May 8, 2010, 11:39:38 PM5/8/10
to John Joyus, golang-nuts
On 19 Apr, 18:39, John Joyus <john.jo...@gmail.com> wrote:
> On Apr 17, 4:40 pm, a.mo...@abdn.ac.uk wrote:
>
> > I heard somewhere (I think it was Rob Pike in his google tech talk on
> > go) that gccgo achieves about 20% of the performance of C.
>
> I think he said the perforamcne difference is "within 20%". What that
> means is "80% of the C's performance". The 20% being the cost ofgarbagecollection.

Please, not again. The performance myth that "garbage collection is so
slow" "Java is slow" is old and should be dead. About Java, it has
just bad startup performance. So please don't draw uneducated
conclusions on GC.

Maybe that's also the cost of bounds checking on arrays, vtable calls
through interfaces, other dynamic features, and missing optimizations.
You know that with garbage collection, memory allocation (i.e. the
malloc() part) is almost as cheap as stack allocation? Not with
current Go's GC, however.
However, it is true that GC tends to trade more memory for CPU time.
But with enough memory it can be faster.
And some concurrent algorithms (like immutable data structures) become
unmanageable without GC (unless GC is emulated through atomic
refcounting - that's easy in C++, but much slower).

> I wish  thegarbagecollectionin Go is optional. It will be worth

Jesper Louis Andersen

unread,
May 9, 2010, 4:36:53 AM5/9/10
to Paolo 'Blaisorblade' Giarrusso, John Joyus, golang-nuts
On Sun, May 9, 2010 at 5:39 AM, Paolo 'Blaisorblade' Giarrusso
<p.gia...@gmail.com> wrote:

> Please, not again. The performance myth that "garbage collection is so
> slow" "Java is slow" is old and should be dead. About Java, it has
> just bad startup performance. So please don't draw uneducated
> conclusions on GC.

Two historic reasons pester the myth. One, most functional languages
at that time were garbage collected, but comparatively little was
known about making functional languages go fast, cross pollinating the
idea that it might as well be the garbage collector. Two, when Java
came out, it had an awful garbage collector. Three, Java programmers,
the bad ones, tend to like wrapping their objects in objects. If this
happens several times, cache memory quickly becomes a scarce resource
which in turn hurts performance. Four, GC is forgiving. If you write a
bad program with a horrendous allocation rate, and bad reference
management, the GC saves the day. The program runs, and it is slow.

> And some concurrent algorithms (like immutable data structures) become
> unmanageable without GC (unless GC is emulated through atomic
> refcounting - that's easy in C++, but much slower).

Right. GC is a productivity balance. With GC in Go, you spend more
time on your algorithm and problem than tedious memory management.
When memory become a problem, garbage collectors normally allow for
heap inspection in some way. You can then draw statistics from the GC
and optimize the bottleneck. I will claim that your time is much
better spent with this approach.

There is another interesting GC-strategy for a highly concurrent
language which is the one Erlang uses. In Erlang, a process is a
light-weight, userland scheduled beast, much akin to a goroutine. Two
differences however: They are completely isolated from each other,
hence warranting the process term, and each have their own garbage
collector attached. The consequence is that all communication between
processes must happen by copying. It makes the "Share data by
communicating" mantra extremely important. Don't hand a 16Mb binary
search tree to another process! You will not be passing by the
reference. The one-gc-per-process yields some nice soft-realtime
properties in the system. Also, it means that a process-crash is
non-fatal by construction. The rest of the program can be sure it does
not garbled data so it can keep running. The strategy of limiting the
impact of errors is central to Erlang programming.

On the other hand, the design choice of Erlang forces other things to
be in certain ways. To facilitate Erlangs main purpose, protocol
handling, bytes of binary data are handled in a seperate, ref-counted
heap. Each process only sits with the slice-structure in its heap
space. Also, the system support pervasive hash-tables essentially,
where any process can go look up information.

--
J.

Paolo Giarrusso

unread,
May 9, 2010, 7:36:58 AM5/9/10
to Jesper Louis Andersen, John Joyus, golang-nuts
Thanks for your enlightening message! Some comments below.

On Sun, May 9, 2010 at 10:36, Jesper Louis Andersen
<jesper.lou...@gmail.com> wrote:
> On Sun, May 9, 2010 at 5:39 AM, Paolo 'Blaisorblade' Giarrusso
> <p.gia...@gmail.com> wrote:
>
>> Please, not again. The performance myth that "garbage collection is so
>> slow" "Java is slow" is old and should be dead. About Java, it has
>> just bad startup performance. So please don't draw uneducated
>> conclusions on GC.
>
> Two historic reasons pester the myth. One, most functional languages
> at that time were garbage collected, but comparatively little was
> known about making functional languages go fast, cross pollinating the
> idea that it might as well be the garbage collector. Two, when Java
> came out, it had an awful garbage collector. Three, Java programmers,
> the bad ones, tend to like wrapping their objects in objects.

The recent paper "The causes of bloat, the limits of health" was
illuminating for me (even if I still must read it fully).

> If this
> happens several times, cache memory quickly becomes a scarce resource
> which in turn hurts performance. Four, GC is forgiving. If you write a
> bad program with a horrendous allocation rate, and bad reference
> management, the GC saves the day. The program runs, and it is slow.
>
>> And some concurrent algorithms (like immutable data structures) become
>> unmanageable without GC (unless GC is emulated through atomic
>> refcounting - that's easy in C++, but much slower).
>
> Right. GC is a productivity balance. With GC in Go, you spend more
> time on your algorithm and problem than tedious memory management.
> When memory become a problem, garbage collectors normally allow for
> heap inspection in some way. You can then draw statistics from the GC
> and optimize the bottleneck. I will claim that your time is much
> better spent with this approach.

I tend to agree with your claim, even if I'm more believing than else
(I haven't yet heap-profiled anything for real).

> There is another interesting GC-strategy for a highly concurrent
> language which is the one Erlang uses. In Erlang, a process is a
> light-weight, userland scheduled beast, much akin to a goroutine. Two
> differences however: They are completely isolated from each other,
> hence warranting the process term, and each have their own garbage
> collector attached. The consequence is that all communication between
> processes must happen by copying. It makes the "Share data by
> communicating" mantra extremely important. Don't hand a 16Mb binary
> search tree to another process! You will not be passing by the
> reference. The one-gc-per-process yields some nice soft-realtime
> properties in the system. Also, it means that a process-crash is
> non-fatal by construction. The rest of the program can be sure it does
> not garbled data so it can keep running. The strategy of limiting the
> impact of errors is central to Erlang programming.

> On the other hand, the design choice of Erlang forces other things to
> be in certain ways. To facilitate Erlangs main purpose, protocol
> handling, bytes of binary data are handled in a seperate, ref-counted
> heap. Each process only sits with the slice-structure in its heap
> space. Also, the system support pervasive hash-tables essentially,
> where any process can go look up information.

Ah-ah! They needed to have exceptions! Seriously speaking, the idea of
pass-by-copy-among-processes is so crazy that if they've found a
paradigm to make it work efficiently they must be geniuses.
Thanks again!
--
Paolo Giarrusso
Reply all
Reply to author
Forward
0 new messages