Google paper comparing performance of C++, Java, Scala, and Go

3,807 views
Skip to first unread message

bsr

unread,
Jun 3, 2011, 9:23:48 AM6/3/11
to golang-nuts
Just saw this post on hacker news https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
.. hope u may interested... I wasn't too impressed with the analysis,
as some of the factors stands out for me
1. complexity. It may take years to get to Scala Pro standard, until
then the program would be a port of Java to Scala syntax. Also not
sure how maintainable is it.
2. Binary size. Of course, it is not linear as the size of runtime is
a constant overhead.
3. I am not sure what they mean by "a Go program is only valid if it
comes unmodified out of the automatic formatter gofmt". I think you
could always write with semicolon, spacing and compile withour gofmt.
thanks,
bsr.

Ian Lance Taylor

unread,
Jun 3, 2011, 10:53:05 AM6/3/11
to bsr, golang-nuts
bsr <bsr...@gmail.com> writes:

Despite the name, the "Go Pro" code was never intended to be an example
of idiomatic or efficient Go. Robert asked me to take a look at his
code and I hacked on it for an hour to make a little bit nicer. If I
had realized that he was going to publish it externally I would have put
a lot more time into making it nicer.

I'm told that the program as compiled by 6g is significantly faster now
than it was when Robert did his measurements.

Ian

Jeroen Dirks

unread,
Jun 3, 2011, 12:47:01 PM6/3/11
to golang-nuts
The paper just hit the Reddit front page. I was a bit surprised about
the amount of memory used
by the Go solution. Would it have been possible to create less garbage
to start with?


On Jun 3, 10:53 am, Ian Lance Taylor <i...@google.com> wrote:
> bsr <bsr...@gmail.com> writes:
> > Just saw this post on hacker newshttps://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf

Russ Cox

unread,
Jun 3, 2011, 1:29:14 PM6/3/11
to Jeroen Dirks, golang-nuts
On Fri, Jun 3, 2011 at 12:47, Jeroen Dirks <jeroen....@gmail.com> wrote:
> The paper just hit the Reddit front page. I was a bit surprised about
> the amount of memory used
> by the Go solution. Would it have been possible to create less garbage
> to start with?

The program is not using appreciably more
memory than C++ dbg or Java.

Russ

Jeroen Dirks

unread,
Jun 3, 2011, 5:06:16 PM6/3/11
to golang-nuts
When I was looking at the benchmark I noticed the huge virtual memory
for the Go version:

Look at the Fig 7 data

Virtual memory:
Go 16.2 GB
C++ Dbg 0.474 GB
Java 1.1 GB
Scala 1.1 GB.

It looks like the GC is not keeping up, there is a memory leak or
fragmentation.

If the OS has spend a lot of time swapping out pages then this memory
could lower the overall
runtime performance too.

Jeroen

On Jun 3, 1:29 pm, Russ Cox <r...@golang.org> wrote:

Namegduf

unread,
Jun 3, 2011, 5:12:25 PM6/3/11
to golan...@googlegroups.com
On Fri, 3 Jun 2011 14:06:16 -0700 (PDT)
Jeroen Dirks <jeroen....@gmail.com> wrote:

> When I was looking at the benchmark I noticed the huge virtual memory
> for the Go version:
>
> Look at the Fig 7 data
>
> Virtual memory:
> Go 16.2 GB
> C++ Dbg 0.474 GB
> Java 1.1 GB
> Scala 1.1 GB.
>
> It looks like the GC is not keeping up, there is a memory leak or
> fragmentation.
>
> If the OS has spend a lot of time swapping out pages then this memory
> could lower the overall
> runtime performance too.
>
> Jeroen

Virtual memory is not essentially actual memory usage, swapped out; it
can be just things mapped into the address space. It itself is free, and
Go uses a lot of it in general, without impact on actual RAM usage.

While swap is one way of getting high VIRT usage, I doubt 16.2GB of it
would go unnoticed, or be doable without much, much poorer performance
in every other respect.

unread,
Jun 4, 2011, 6:55:19 AM6/4/11
to golang-nuts
Has somebody been able to identify the reason for the high memory
usage? If I run the benchmark on i386, it terminates with an out-of-
memory error after allocating some 500 MB of memory, while completing
only 10% of the benchmark. 32-bit Java is also using approximately 500
MB of memory, but it is able to finish the benchmark. If the Go
version was able to continue, it would probably consume gigabytes of
memory.

Why is the 6g version (used in the article) able to finish? Is garbage
collection in 6g different from garbage collection in 8g?

On Jun 3, 7:29 pm, Russ Cox <r...@golang.org> wrote:

John Beisley

unread,
Jun 4, 2011, 7:03:27 AM6/4/11
to ⚛, golang-nuts
What do you mean by "allocating some 500 MB of memory"? Where did the
number come from? (i.e there are multiple measures of memory "usage",
including virtual, resident, shared).

Bear in mind that 64-bit systems (6g) can address more than 4GiB of
memory, but 32-bit systems (8g) are limited to less than that in
addressable space.

unread,
Jun 4, 2011, 7:46:56 AM6/4/11
to golang-nuts
By "500 MB" I mean: the amount of modified user-space memory used by
the Linux process. That is:

1. The benchmark starts
2. It requests memory from the OS
3. It modifies the memory pages which were given to it by the OS. From
the viewpoint of the OS, these pages are filled with random values
(and thus the OS is unable use some clever algorithm to merge multiple
pages into 1 page). The total size of these pages is approximately 500
MB.

On Jun 4, 1:03 pm, John Beisley <great...@gmail.com> wrote:
> What do you mean by "allocating some 500 MB of memory"? Where did the
> number come from? (i.e there are multiple measures of memory "usage",
> including virtual, resident, shared).
>
> Bear in mind that 64-bit systems (6g) can address more than 4GiB of
> memory, but 32-bit systems (8g) are limited to less than that in
> addressable space.
>

unread,
Jun 4, 2011, 7:52:11 AM6/4/11
to golang-nuts
On Jun 4, 1:03 pm, John Beisley <great...@gmail.com> wrote:
> Where did the number come from?

The C++/Java/Go/Scale source codes from the article:
http://code.google.com/p/multi-language-bench/source/browse/

Ziad Hatahet

unread,
Jun 4, 2011, 8:59:04 PM6/4/11
to golang-nuts
I found this to be sort of weird:

"E. Java Tunings Jeremy Manson brought the performance of Java on par with the original C++ version. This version is kept in the java_pro directory. Note that Jeremy deliberately refused to optimize the code further, many of the C++ optimizations would apply to the Java version as well."

Then they eventually conclude that C++ wins out by "a large margin". Although I'm not a fan of Java, but I do find this pretty unfair. Given that the sources are available, maybe someone is willing to apply "many of the C++ optimizations" to the Java version and see what the new results are.

--
Ziad

Jeroen Dirks

unread,
Jun 6, 2011, 4:03:38 PM6/6/11
to golang-nuts
I just tried with the code there on windows 7 with 8g and find that it
either leaks memory or the GC is not keeping up with all the
allocations going on. (There are a lot of maps getting created)

The first iterations of the loop of 50 are fast (about a second or
two). Around iteration 10-15 my 2GB box runs out of resources and
starts to thrash. I did not want to wait for it to finish at that
point.

Are there any ways to get statistics from the garbage collector? Or
instruct it to try harder? Could it have trouble collecting cyclical
graphs?

I wonder if there was a regression on the GC code since the author ran
his benchmarks or if the test box he used had a very large swap
volume?

Jeroen

Jeroen Dirks

unread,
Jun 6, 2011, 4:53:34 PM6/6/11
to golang-nuts
The following code causes similar behavior: Maybe a nice test for the
people working on the GC?

package main

type Node struct {
Name int
InEdges []*Node
OutEdges []*Node
}

func breakgc() {
m1 := make(map[int]*Node)
for i := 0; i < 30000; i++ {
m1[i] = new(Node)
}
for i := 0; i < 30000; i++ {
m1[i].InEdges = append(m1[i].InEdges, m1[i + 5 % 30000])
m1[i].OutEdges = append(m1[i].OutEdges, m1[i + 5 % 30000])
}
}

func main() {
for i := 0; i < 5000; i++ {
breakgc()

Tais Plougmann Hansen

unread,
Jun 7, 2011, 3:41:02 AM6/7/11
to Jeroen Dirks, golang-nuts
Did I miss something here? Seems like GC works perfectly.

$ /usr/bin/time ./breakgc 
244.24user 0.25system 4:04.85elapsed 99%CPU (0avgtext+0avgdata 36912maxresident)k
0inputs+0outputs (0major+2673minor)pagefaults 0swaps

$ ps v 1118 1119
  PID TTY      STAT   TIME  MAJFL   TRS   DRS   RSS %MEM COMMAND
 1118 pts/1    S+     0:00      0    11  3976   324  0.0 /usr/bin/time ./breakgc
 1119 pts/1    R+     4:00      0   143 44684  9196  0.1 ./breakgc
--
Regards,
Tais Plougmann Hansen

OSD Consulting ApS
Tel: +45 78101078
CVR: DK31332737


unread,
Jun 7, 2011, 4:27:37 AM6/7/11
to golang-nuts
On Jun 7, 9:41 am, Tais Plougmann Hansen <tai...@osd.dk> wrote:
> Did I miss something here? Seems like GC works perfectly.
>
> $ /usr/bin/time ./breakgc
> 244.24user 0.25system 4:04.85elapsed 99%CPU (0avgtext+0avgdata
> 36912maxresident)k
> 0inputs+0outputs (0major+2673minor)pagefaults 0swaps

I suppose you are using the 64-bit compiler (6g).

The missing piece is that the current GC is a conservative GC. Its
behavior is probabilistic. It may fail to deallocate unused memory.
The problem on a 32-bit machine (compiler: 8g) is that in certain
usage scenarios there is a high probability for the GC to misinterpret
an integer value for a pointer. The benchmark from the paper, and the
program posted by Jeroen Dirks, both suffer from this phenomenon.

The combination (interconnected structures, i.e. graphs)+(32-bit
address space)+(conservative GC that can misinterpret an integer for a
pointer)+(having integer values in the data create by the program, or
in data created by the Go runtime)+(program uses some 100 MB of
memory) yields (high probability of failing to deallocate unused
memory).

On 64-bit machines, the probability that an integer value is
misinterpreted is much smaller. This is due to the fact that e.g.
100MB/(2^64) is a much smaller number than 100MB/(2^32). Which is
exactly why the benchmark from the article is able to finish on 64-bit
machines, but fails on 32-bit.

Tais Plougmann Hansen

unread,
Jun 7, 2011, 6:27:35 AM6/7/11
to ⚛, golang-nuts
Ok. Thank you for clearing that up. I'm still learning go and it is always useful to have a basic understanding of the inner workings.

Jeroen Dirks

unread,
Jun 7, 2011, 2:09:45 PM6/7/11
to golang-nuts
That is a great answer! And I guess I found one more reason to prefer
64 bit platforms:

Better GC performance for conservative GC because integers and
pointers do not look alike!

Would JVM languages have a similar issues or is the garbage collector
there able to tell the
difference between a pointer and a data value?

Gustavo Niemeyer

unread,
Jun 7, 2011, 2:24:52 PM6/7/11
to Jeroen Dirks, golang-nuts
> Better GC performance for conservative GC because integers and
> pointers do not look alike!

They actually do look exactly alike. It's just less probable to find
a conflict.

> Would JVM languages have a similar issues or is the garbage collector
> there able to tell the difference between a pointer and a data value?

I believe Sun's JVM has exact GC.

--
Gustavo Niemeyer
http://niemeyer.net
http://niemeyer.net/blog
http://niemeyer.net/twitter

Bobby Powers

unread,
Jun 7, 2011, 10:27:32 PM6/7/11
to Gustavo Niemeyer, Jeroen Dirks, golang-nuts
yes, I think most JVMs have exact GCs. I know hotspot, jrockit and
azul's stuff are all exact. LLVM has an interesting article on
creating systems with accurate GCs:
http://llvm.org/docs/GarbageCollection.html

Bobby

Russ Cox

unread,
Jun 7, 2011, 10:36:21 PM6/7/11
to Bobby Powers, Gustavo Niemeyer, Jeroen Dirks, golang-nuts
On Tue, Jun 7, 2011 at 22:27, Bobby Powers <bobby...@gmail.com> wrote:
> yes, I think most JVMs have exact GCs.  I know hotspot, jrockit and
> azul's stuff are all exact.  LLVM has an interesting article on
> creating systems with accurate GCs:
> http://llvm.org/docs/GarbageCollection.html

Only lazy people write conservative GCs.

Russ

Martin Capitanio

unread,
Jun 8, 2011, 1:48:52 AM6/8/11
to golan...@googlegroups.com, Gustavo Niemeyer, Jeroen Dirks
> creating systems with accurate GCs:
http://llvm.org/docs/GarbageCollection.html


My favored paper on this topic is still ;)

Martin
 

Elazar Leibovich

unread,
Jun 27, 2011, 6:39:11 AM6/27/11
to golang-nuts
On Jun 7, 11:27 am, ⚛ <0xe2.0x9a.0...@gmail.com> wrote:
> The missing piece is that the current GC is a conservative GC. Its
> behavior is probabilistic. It may fail to deallocate unused memory.
> The problem on a 32-bit machine (compiler: 8g) is that in certain
> usage scenarios there is a high probability for the GC to misinterpret
> an integer value for a pointer.

I really know nothing about garbage collection, but, didn't Hans Bohem
prove[1] that memory leak due to misidentified pointers in
conservative garbage collection can be bounded?

[1] http://www.hpl.hp.com/techreports/2001/HPL-2001-251.pdf

Alexey Gokhberg

unread,
Jun 27, 2011, 6:57:47 AM6/27/11
to golang-nuts
By the way, with my Express Go system I was able to run without
problems the "breakgc" code from the above post of Jeroen Dirks. I use
64-bit Windows but Express Go generates only 32-bit instructions and
manipulates only 32-bit pointers. I use Boehm-Demers-Weiser
conservative garbage collector, version 6.8

peter

unread,
Jun 28, 2011, 1:56:33 PM6/28/11
to golang-nuts
On 6 jun, 22:53, Jeroen Dirks <jeroen.j.di...@gmail.com> wrote:
> The following code causes similar behavior: Maybe a nice test for the
> people working on the GC?

I tried your program on linux 386. To make sure it wouldn't interfere
with other processes I ran ulimit first, set quite high:
ulimit -v 500000

When I run the test program I immediately got a segmentation fault. In
fact, all Go programs segfaulted at start up.

No such problem on linux amd64. Go programs work fine with only a
tenth of the memory available.

Could it be the problem with garbage collection is part of a general
memory problem?

--
Peter

Bobby Powers

unread,
Jun 28, 2011, 2:21:47 PM6/28/11
to peter, golang-nuts
If all go programs segfault on startup, it sounds like you have a bad install :)

unread,
Jun 28, 2011, 4:48:22 PM6/28/11
to golang-nuts
On Jun 28, 7:56 pm, peter <pklei...@xs4all.nl> wrote:
> On 6 jun, 22:53, Jeroen Dirks <jeroen.j.di...@gmail.com> wrote:
>
> > The following code causes similar behavior: Maybe a nice test for the
> > people working on the GC?
>
> I tried your program on linux 386. To make sure it wouldn't interfere
> with other processes I ran ulimit first, set quite high:
> ulimit -v 500000

All Go programs require more than 750 MB of virtual memory to run on
i386. If the virtual memory is not available, the program prints:

throw: runtime: cannot reserve arena virtual address space

Try running with ulimit -v 800000

manas.d...@gmail.com

unread,
Oct 2, 2013, 10:44:43 AM10/2/13
to golan...@googlegroups.com
HI,
the link on this post has expired and i couldnt find the paper could you please send me a copy of the paper should you have it.
thank you
Manas Dadheech

On Friday, 3 June 2011 18:53:48 UTC+5:30, bsr wrote:
Just saw this post on hacker news https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
.. hope u may interested... I wasn't too impressed with the analysis,
as some of the factors stands out for me
1. complexity. It may take years to get to Scala Pro standard, until
then the program would be a port of Java to Scala syntax. Also not
sure how maintainable is it.
2. Binary size. Of course, it is not linear as the size of runtime is
a constant overhead.
3. I am not sure what they mean by "a Go program is only valid if it
comes unmodified out of the automatic formatter gofmt". I think you
could always write with semicolon, spacing  and compile withour gofmt.
thanks,
bsr.

Bienlein

unread,
Oct 2, 2013, 11:40:35 AM10/2/13
to golan...@googlegroups.com, manas.d...@gmail.com

Ian Lance Taylor

unread,
Oct 2, 2013, 11:44:21 AM10/2/13
to manas.d...@gmail.com, golang-nuts
On Wed, Oct 2, 2013 at 7:44 AM, <manas.d...@gmail.com> wrote:
>
> the link on this post has expired and i couldnt find the paper could you
> please send me a copy of the paper should you have it.

The paper is not a good analysis of Go.

If you are interested in Go you would do better to read this blog
entry, which discusses the paper:
http://blog.golang.org/profiling-go-programs

Ian

vel...@gmail.com

unread,
Oct 4, 2013, 2:34:47 PM10/4/13
to golan...@googlegroups.com, manas.d...@gmail.com
Really nice article/paper from Russ Cox and Shenghou Ma.

Did not knew about the:

import _ "net/http/pprof"

Nice! thnx!!!

Op woensdag 2 oktober 2013 17:44:21 UTC+2 schreef Ian Lance Taylor:

Bienlein

unread,
Oct 4, 2013, 4:41:54 PM10/4/13
to golan...@googlegroups.com, manas.d...@gmail.com


Am Mittwoch, 2. Oktober 2013 17:44:21 UTC+2 schrieb Ian Lance Taylor:

The paper is not a good analysis of Go.
Reply all
Reply to author
Forward
0 new messages