another microbenchmark : why is go slower than python?

bbulkow

non lue,

21 nov. 2009, 17:42:3121/11/2009

à golang-nuts

I haven't seen my previous microbenchmark post show up, but I've
ginned up a couple more in the time since I've posted. Please see my
disclaimer about microbenchmarks there.

This microbenchmark pokes at 'map' functionality, with a little
'string' thrown in:

Here's the python:

def t5():
start = float(time.clock())
for x in xrange(1000000):
a = {}
for a1 in xrange(50):
a[a1] = str(a1)
print "t5 %f" % (float(time.clock()) - start)

Here's the go:

func t5() {
start := time.Nanoseconds();
for x := 0; x < 1000000; x++ {
a := make(map[int] string);
for a1 := 0; a1 < 50; a1++ {
a[a1] = strconv.Itoa(a1);
}
}
delta := time.Nanoseconds() - start;
fmt.Printf("t5: time consumed %d.%d\n",delta / 1000000000, (delta
% 1000000000) / 1000000 );
}

(and I have a version in Ruby that's very similar)

Timings on an IBM T61p (Intel T9300 @ 2.5G), 64bit Windows 7, 8 Gig
memory, VMWare 6.5.x, Xubuntu 9.04 64:

Python: 17.3 seconds (CPython 2.6.2 as included in distro)
Go: 67.7 seconds (hg release from a few days ago, standard compile
opts)
Ruby 1.8: 52.3 seconds (ruby 1.8.7 as included in distro)
Ruby 1.9: 23.4 seconds (ruby 1.9.0 as included in distro)

Is there a good explanation for this kind of result?
Is there anything I'm doing in my code that's not idomatic Go?

Has anyone else done microbenchmarks of these languages, and found
similar or divergent results?

Cheers,
-brianb

Rob 'Commander' Pike

non lue,

21 nov. 2009, 17:59:0721/11/2009

à bbulkow,golang-nuts

There is now a nice benchmark capability added by Trevor Strohman. See the docs for gotest for details.

I suspect the great majority of the time in your benchmark is due to Go's current rudimentary garbage collector. Tests like this generate a lot of garbage that is collected slowly. From experiments I've done, a better implementation can make a huge difference. Profiling this test shows at least 50% of the time is in the allocator and collector, as opposed to about 5% printing the string and less than 15% in the map code. A better allocator and collector would make a dramatic change.

The short answer: the Go runtime is new and completely untuned. The libraries need work too.

-rob

MKoistinen

non lue,

21 nov. 2009, 18:05:4621/11/2009

à golang-nuts

I have submitted a Go implementation of the Mandlebrot test on
Shootout.

http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&lang=all&box=1

This version beats out at least one submission in C and another in
Java, not too shabby.

Interestingly, my code even has a call to Sleep() in one of the loops
it to prevent an alloc runtime issue and I suspect that when issues
like this and the general performance tuning of the runtime is
complete, this code will scream.

bbulkow

non lue,

21 nov. 2009, 19:00:2221/11/2009

à golang-nuts

If the allocator is 50% of the time, removing the allocator time would
result in a timing about 2x slower than python.

Regarding "newness", there's nothing on the Go page that says "alpha",
"beta", or "untuned". It claims "fast" "safe" "concurrent" - and I
came in with a certain level of belief. If I was told this is an alpha
implementation that's not ready for prime time, I'd be willing to give
it more slack --- and let me know when it's released.

Out of my 5 microbenchmarks, the only time Go beat Python and Ruby 1.9
was in a simple computation loop with no object references or function
calls. In that test, Go was about 100x faster, implying a Mandelbrot
benchmark would run fast.

That's pretty swank.

Regrettably, little of my real-world code is numerical.

I think it's a little disingenuous to create a firewall between the
speed of the language and the speed of the runtime libraries. They
*are* different, but when we're talking about the language builtins
the distinction is thin. It's like arguing that the language might be
fast, but the implementation is slow because the compiler is bad. The
speed of a language is defined by its best implementation.

Go's designers made a choice to force all object allocation through a
GC system, with the bold statement that modern GCs are nearly
equivalent to explicit-free systems. I would expect, even at this
early stage in a language's development, to see strong runtime
performance from the core builtin collection classes. Otherwise, I
might get the impression this bet is wrong.

-brianb

Rob 'Commander' Pike

non lue,

21 nov. 2009, 19:09:4521/11/2009

à MKoistinen,golang-nuts

Nice.

Three things could make this program faster.

1) Don't double-synchronize. Just have the slave process send you the
row using a chan []byte. Actually, since you know the row, just have
it signal when it's done and you can write the memory in main. See
next point.
2) Allocate the memory all at once. The slaves can slice out the
piece to use.
3) Calling WriteByte once per byte is too slow. out.Wirite(row)
avoids a lot of overhead.

No idea how the I/O compares to the computation in this program but
these might help. Also I bet the inner loop will be better in gccgo.
I see about 10% in my simpleminded version.

If you haven't already, please file a bug report about the malloc
problem you saw.

-rob

baldmountain

non lue,

21 nov. 2009, 19:47:0721/11/2009

à golang-nuts

On Nov 21, 5:59 pm, "Rob 'Commander' Pike" <r...@google.com> wrote:
> I suspect the great majority of the time in your benchmark is due to Go's current rudimentary garbage collector. Tests like this generate a lot of garbage that is collected slowly. From experiments I've done, a better implementation can make a huge difference. Profiling this test shows at least 50% of the time is in the allocator and collector, as opposed to about 5% printing the string and less than 15% in the map code. A better allocator and collector would make a dramatic change.
>

Which is why toy benchmarks are silly. Writing a proper benchmark is
REALLY hard. And to be honest, they often don't mean much in the real
world.

I worked at a company called SavaJe. We made a full Java 2 SE
compliant OS platform for the Compaq iPAQ, (and eventually a phone.)
Internally the OS used the basic Sun Java interpreter. We spent a lot
of time pushing stuff into native code and optimizing the libraries.
Benchmarks of the OS looked horrible. Based on the benchmarks the OS
appeared REALLY slow. But from a user standpoint Swing ran a fast as
on a desktop. Even the SwingSet demo looked good. The benchmarks were
not a good measure of how well the platform worked.

Let's not waste any more time on these ad-hoc benchmarks and take the
language itself for a drive and comment on the langauge design based
on actually trying to accomplish something with it.

In a year or two we can revisit benchmarks once the compilers have
matured and the library has been tuned.

geoff

warmfuzzykitten

non lue,

21 nov. 2009, 21:19:2821/11/2009

à golang-nuts

I'm sure you noticed that the languages that beat your time by nearly
4:1 use approx. 100% of all 4 CPUs, not just 1.

Bob

On Nov 21, 3:05 pm, MKoistinen <mkoisti...@gmail.com> wrote:
> I have submitted a Go implementation of the Mandlebrot test on
> Shootout.
>

> http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&...

konrad

non lue,

21 nov. 2009, 21:22:0221/11/2009

à golang-nuts

At the moment the message is write your code so that it avoids
allocating new memory where possible. As allocating and dropping
memory is an expensive proposition. Haveing gone through similer
expirence on one of the project Euler problems, and having encoutnered
the same issue I have learned my lesson on this one. We are still
dealing with a low level language so treating it as a high level
language is a mistake.

That said eventually Go will end up with a better garbage collector
that is more efficent. And then it may be possible to write Go in a
higher level way. I believe this is a curve that a lot of Common Lisp
implementations went though. In early version the advice was don't
produce garbage because it is slow. Eventually The situation flipped
so that it was offten cheeper to allocate new memory then to reuse
existing structure (I am reliying on second hand accounts for this
statement, I apologise in advance that I can't specifically say which
implementations this applied to).

As was pointed out to me the existing documentation does acknowledge
that the garbage collector is slow and up for replacement.

warmfuzzykitten

non lue,

22 nov. 2009, 03:51:3522/11/2009

à golang-nuts

Which Go compiler are you using? The gcc-based compiler is said to be
freakishly faster for some code.

Bob

bbulkow

non lue,

22 nov. 2009, 05:24:5722/11/2009

à golang-nuts

It sounds like these are known problems and I'm not "doing something
wrong".

The Go designers put a firm stake in the ground: GC is OK for
performance-critical code. Having made that bold choice, I expect to
see a bit more proof even in early days. Out of the box I expected an
allocator about as good as Java 1.6, which is open source and has been
available for several years.

What I've seen so far is a bit in the opposite direction, and the
language is designed with no alternative possible.

That's why the speed of language intrinsic collections matter. They're
hard to code around and replace, especially in a language that doesn't
support operator overload (not being able to '+' on 'big' causes my
stomach to sink - I would vote to promote 'big' into the core
classes).

(Regarding my choice of compiler, I cloned the mercurial repository
and typed 'make' as specified, so I'm using whatever that does. Ubuntu
9.04 comes with GCC 4.3.3 these days. The instructions for using GCC
as the compiler seemed a little awkward; I wanted to travel the well-
trod path - as I have for these python tests, instead of recompiling
with more optimizations, or running Stackless)

-brian

MKoistinen

non lue,

22 nov. 2009, 05:59:2222/11/2009

à golang-nuts

Yeah, I noticed. I kept hitting brick walls when attempting to submit
code to Shootout (like the alloc-related issue I reported above). My
code was rewritten a few times just to work around issues (Issue #250
was another one). I felt like I had achieved something special just
getting it to run on the Shootout box(es), as many times, I'd have
fast-running, perfectly running code here on my Mac, only to have it
recorded as a failure on the Shootout box.

On my Mac, that submission pegs my CPU at 100% (both cores) and spews
out a perfect PBM file about 7X faster than the original submission,
whereas on the Shootout box, its only a little more than 2X as fast.
Go figure.

Regardless, I will definitely implement the suggestions that Mr. Pike
made above.

Jessta

non lue,

22 nov. 2009, 06:19:5622/11/2009

à warmfuzzykitten,golang-nuts

On 22/11/2009, warmfuzzykitten <bobf...@gmail.com> wrote:
> Which Go compiler are you using? The gcc-based compiler is said to be
> freakishly faster for some code.
>
> Bob
>

yeah, the gcc-based compiler doesn't currently do any garage
collection..which would explain the freakish speed.

--
=====================
http://jessta.id.au

Charlie

non lue,

22 nov. 2009, 05:33:5622/11/2009

à golang-nuts

Hmmm...
$ ./6.out
t2: time consumed 37.759
$ export GOGC=off # no GC
$ ./6.out
t2: time consumed 2.156
(on a AMD BE2300 + Fedora 11)

Also get different results for the Python benchmark shown (slower than
Go) -- wonder if Go running under VMWare tickles a VMWare oddity
(page table access costs?)?
/cck

MKoistinen

non lue,

23 nov. 2009, 07:56:3023/11/2009

à golang-nuts

I've made all 3 optimisations that you mentioned below, Rob, this gave
me about 3%, nice.
I've also made each goroutine render a chunk of lines to reduce the
overhead a little for spawning new goroutines, and setting up the
loops, etc. and this gave me back another 1-2% on my system. I just
hope all these changes will increase the throughput on the Shootout
box. Finger's crossed!

Antoine Chavasse

non lue,

23 nov. 2009, 08:05:4923/11/2009

à Jessta,warmfuzzykitten,golang-nuts

On Sun, Nov 22, 2009 at 12:19 PM, Jessta <jes...@gmail.com> wrote:

>>
> yeah, the gcc-based compiler doesn't currently do any garage
> collection..which would explain the freakish speed.
>

Actually, gccgo does emit some early, unoptimized reference counting
code which adds some significant overhead (see
http://groups.google.com/group/golang-nuts/browse_thread/thread/a4ac0c713314098f/258f11dbbbb13eb1)

So gccgo isn't exactly cheating by not having a functional garbage collector.

MKoistinen

non lue,

23 nov. 2009, 14:01:0623/11/2009

à golang-nuts

Wow, the overall impact was substantially greater on the Shootout
hardware than on my Mac.

http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&lang=all&box=1

You can see these 4 improvements have nearly doubled the performance
of the mandlebrot benchmark on the quad-core system. Sweet.

Also, the memory usage has been slashed to nearly 1/4th!

Isaac Gouy

non lue,

24 nov. 2009, 01:15:0124/11/2009

à golang-nuts

http://shootout.alioth.debian.org/u32q/benchmark.php?test=mandelbrot&lang=all&box=1

On Nov 23, 11:01 am, MKoistinen <mkoisti...@gmail.com> wrote:
> Wow, the overall impact was substantially greater on theShootout
> hardware than on my Mac.
>

> http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&...

dlin

non lue,

25 nov. 2009, 09:49:2925/11/2009

à golang-nuts

look at this web site.
http://shootout.alioth.debian.org/

In general benchmark, golang still very very slow compare to java.
But, it lack gccgo.

On 11月24日, 下午2時15分, Isaac Gouy <igo...@yahoo.com> wrote:
> http://shootout.alioth.debian.org/u32q/benchmark.php?test=mandelbrot&...

>
> On Nov 23, 11:01 am, MKoistinen <mkoisti...@gmail.com> wrote:
>
>
>
> > Wow, the overall impact was substantially greater on theShootout
> > hardware than on my Mac.
>
> >http://shootout.alioth.debian.org/u64q/benchmark.php?test=mandelbrot&...
>
> > You can see these 4 improvements have nearly doubled the performance

> > of the mandlebrotbenchmarkon the quad-core system. Sweet.

Oliver Mason

non lue,

25 nov. 2009, 12:13:3325/11/2009

à golang-nuts

I remember when Java came out, everybody was complaining how slow it
was compared to C++ or C. Nowadays, Java is used as a fast comparison
to many other languages. I'm sure go with be the same, once the
language and its tools mature and more optimisations are built into
the compiled code.

For that reason I don't worry too much about performance yet, as long
as it is 'good enough'.

Oliver

On Nov 25, 2:49 pm, dlin <dlin...@gmail.com> wrote:
> look at this web site.http://shootout.alioth.debian.org/

Jon Harrop

non lue,

25 nov. 2009, 14:38:2125/11/2009

à golan...@googlegroups.com

On Wednesday 25 November 2009 17:13:33 Oliver Mason wrote:
> I remember when Java came out, everybody was complaining how slow it
> was compared to C++ or C. Nowadays, Java is used as a fast comparison
> to many other languages.

I recently benchmarked Java. If you add 10M double->double mappings into a
hash table, Java is 32x slower than F# and C#, 13x slower than C++ and even
2.2x slower than OCaml.

> I'm sure go with be the same, once the language and its tools mature and
> more optimisations are built into the compiled code.

Interestingly, Java's type erasure and boxing are largely to blame yet many
people are advocating that Go adopt this inefficient approach to generics
that goes directly against Go's goal of being "fast".

--
Dr Jon Harrop, Flying Frog Consultancy Ltd.
http://www.ffconsultancy.com/?e

Le message a été supprimé

Jon Harrop

non lue,

25 nov. 2009, 19:18:5525/11/2009

à golan...@googlegroups.com

On Wednesday 25 November 2009 19:29:16 inspector_jouve wrote:
> >I recently benchmarked Java. If you add 10M double->double mappings into a
> >hash table, Java is 32x slower than F# and C#, 13x slower than C++ and
> > even 2.2x slower than OCaml.
>

> Java implementation of map is very inefficient in terms of memory use.

Yes, in this case because it boxes every double.

> It could be that hashtable with 10M entries consumed the whole memory,

No, it only consumes 13% of my memory.

> and then, for the most part, you were running GC.

Most of the time is spent in the GC but only because the JVM has boxed every
floating-point number, creating massive numbers of allocations and pointers
for the GC to follow and forcing it to perform dozens of collections (when
the CLR only performs a single collection on the same benchmark). The elapsed
CPU time is actually over 100x worse than F# because the JVM's GC burns all
eight of my cores.

> It might not be a fair test of performance.

It is a fair test of performance: polymorphic data structures like hash tables
are very slow in Java because it boxes rather than generating type
specialized data structures. Indeed, the JVM cannot even represent many type
specialized data structures because it lacks value types.

> Tried to test it with 1M mappings?

F# is still 13x faster than Java.

Esko Luontola

non lue,

25 nov. 2009, 18:39:5725/11/2009

à golang-nuts

Now that you have identified the bottleneck, it's time to optimize the
application. Try a map implementation which is optimized for
primitives: http://fastutil.dsi.unimi.it/

After that, if GC is still the bottleneck, then try also the other GC
algorithms and fine tune their parameters.

Jon Harrop

non lue,

25 nov. 2009, 20:52:1425/11/2009

à golan...@googlegroups.com

On Wednesday 25 November 2009 23:39:57 Esko Luontola wrote:
> Now that you have identified the bottleneck, it's time to optimize the
> application. Try a map implementation which is optimized for
> primitives: http://fastutil.dsi.unimi.it/
>
> After that, if GC is still the bottleneck, then try also the other GC
> algorithms and fine tune their parameters.

Better to use a VM that can express the efficient solution (= has value types)
and automates the generation of efficient solutions using generics.

Linker

non lue,

26 nov. 2009, 12:23:3526/11/2009

à Jon Harrop,golan...@googlegroups.com

Can "GO" stop GC module temp.?

--
Regards,
Linker Lin
linker...@gmail.com

Ian Lance Taylor

non lue,

26 nov. 2009, 14:47:2026/11/2009

à Linker,Jon Harrop,golan...@googlegroups.com

Linker <linker...@gmail.com> writes:

> Can "GO" stop GC module temp.?

We plan to rewrite the garbage collector, so discussion of garbage
collector details is premature.

That said, today the answer is yes, it's easy to temporarily disable
the garbage collector, although I don't think there is any interface
to do so from a Go program.

Ian

Répondre à tous

Répondre à l'auteur

Transférer