benchmark among Go, C++, Java for recursive function and threading

Z Lin

unread,

Aug 21, 2014, 2:01:54 AM8/21/14

to golan...@googlegroups.com

Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

See conclusions in bold fonts below, and more details in the brief code in github. I'm not fully sure I was doing something incorrectly in the code (including C++ and Java as well), feel free to leave comments there, any feedback (including multithread tricks in C++ and Java) is very welcome!

* Go
https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.go
// For FIBONACCI_RECUR, RECUR_N=40, num_tasks=9:
// 5416 ms * 9 sequentially, 2.3495 x
// 20746 ms in parallel    <--- loser... Go recursive function is terrible...
//
// For FIBONACCI_FAST, RECUR_N=90, num_tasks=9:
// 10680 ms * 9 sequentially, 4.7191 x
// 20368 ms in parallel    <--- winner!
//
// For PRIME_NUM:
// 48082 ms * 9 sequentially, 2.4753 x
// 174820 ms in parallel   <--- winner!

* C++

https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.cpp
// For FIBONACCI_RECUR, RECUR_N=40, num_tasks=9:
// 2118 ms * 9 sequentially, 1.9584 x
// 9733 ms in parallel
//
// For FIBONACCI_FAST, RECUR_N=90, num_tasks=9:
// 35980 ms * 9 sequentially, 2.4342 x
// 133028 ms in parallel
//
// For PRIME_NUM:
// 50546 ms * 9 sequentially, 2.3556 x
// 193120 ms in parallel

* Java
https://github.com/movelikeriver/random/blob/master/vector_reallocation/VectorReallocation/VectorReallocation/src/base/Report.java
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9:
// 1283 ms * 9 sequentially, 1.9614 x
// 5887 ms in parallel <--- winner, Java is so good at recursive function??
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9:
// 20413 ms * 9 sequentially, ??
// 217103 ms in parallel, ?? <--- unbelievable, parallel mode is so terrible, worse than sequential mode..
//
// For PRIME_NUM:
// 49654 ms * 9 sequentially, 2.0662 x
// 216279 ms in parallel

egon

unread,

Aug 21, 2014, 2:13:43 AM8/21/14

to golan...@googlegroups.com

On Thursday, 21 August 2014 09:01:54 UTC+3, Z Lin wrote:

Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

Compare these http://play.golang.org/p/8YQQNMP4Z5

Also, your code is way too complicated for a simple thing. You seem to have Java background, so remember that you can have functions in Go.

+ egon

Z Lin

unread,

Aug 21, 2014, 2:33:41 AM8/21/14

to golan...@googlegroups.com

On Wednesday, August 20, 2014 11:13:43 PM UTC-7, egon wrote:

On Thursday, 21 August 2014 09:01:54 UTC+3, Z Lin wrote:
Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

Compare these http://play.golang.org/p/8YQQNMP4Z5
Also, your code is way too complicated for a simple thing. You seem to have Java background, so remember that you can have functions in Go.

any difference between this one and my code in https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.go ?

func (this TaskManager) recur(n int) int {

if n <= 2 {

return n

}

return this.recur(n-1) + this.recur(n-2)

}

Other functions are used for supporting other slow algorithm (to waste some CPU resource, like some redundent for-loop in fibonacciRecur() ) for testing, and also to make the code more configurable for sequtial and parallel modes.

egon

unread,

Aug 21, 2014, 2:45:02 AM8/21/14

to golan...@googlegroups.com

On Thursday, 21 August 2014 09:33:41 UTC+3, Z Lin wrote:

On Wednesday, August 20, 2014 11:13:43 PM UTC-7, egon wrote:

On Thursday, 21 August 2014 09:01:54 UTC+3, Z Lin wrote:
Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

Compare these http://play.golang.org/p/8YQQNMP4Z5
Also, your code is way too complicated for a simple thing. You seem to have Java background, so remember that you can have functions in Go.

any difference between this one and my code in https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.go ?

Run the code and you should see a difference in performance. It shouldn't be too difficult to deduce the reason why.

Z Lin

unread,

Aug 21, 2014, 3:18:56 AM8/21/14

to golan...@googlegroups.com

OMG!! now I got what did you mean by "Java background"...

thanks.

I moved all the static functions out of class, the recursive function performance got improved by a lot.

https://github.com/movelikeriver/random/commit/b3a56b2faaaf87d0584592538bb91fef139a042b

-// 5416 ms * 9 sequentially, 2.3495 x

-// 20746 ms in parallel

+// 1758 ms * 9 sequentially, 1.9492 x

+// 8117 ms in parallel

It beat than C++ version.

Gokhan Gokturk

unread,

Aug 22, 2014, 9:29:51 AM8/22/14

to golan...@googlegroups.com

Are any optimization parameters enabled?

Some level of optimizations are enabled by default in java and go tests, but it seems g++ test runs without any.
I only tested PRIME_NUM benchmark, but on my computer g++ -Ofast -march=native runs 5% faster than golang.

Z Lin

unread,

Aug 24, 2014, 1:36:27 AM8/24/14

to golan...@googlegroups.com

ah... you're right, with "-Ofast -march=native", C++ became the fastest one. :) Just updated the code. thanks!

Brian Dellisanti

unread,

Aug 24, 2014, 9:19:23 AM8/24/14

to golan...@googlegroups.com

The way you wait for threads to terminate in the java implantation of runInParallel is not good. It is busy waiting. I suggest using a CountDownLatch to wait for the workers to terminate.

gerald...@gmail.com

unread,

Aug 24, 2014, 4:59:45 PM8/24/14

to golan...@googlegroups.com

@Z Lin: can you update the code and upload it to gist or somewhere else? whats the final conclusion?

Z Lin

unread,

Aug 25, 2014, 1:27:44 AM8/25/14

to golan...@googlegroups.com, gerald...@gmail.com

I updated the Java code to use CountDownLatch, it makes the multithread code in better performance.

I haven't tried gist yet, will do it tomrrow. No clear conclusion yet, rough conclusion is like: speed: C++ > Go > Java, but Java does best job in recursive function. Will do further investigation later.

Gokhan Gokturk

unread,

Aug 25, 2014, 1:13:00 PM8/25/14

to Z Lin, golan...@googlegroups.com, gerald...@gmail.com

@Z Lin: iirc JVM can inline short functions(10-15 instructions). In C++ and Go, functions are not inlined implicitly. You can try inlining C++ functions explicitly. However, I am not sure if it makes any difference.

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/shZUZ9ZeVqs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

egon

unread,

Aug 25, 2014, 1:18:01 PM8/25/14

to golan...@googlegroups.com, mars....@gmail.com, gerald...@gmail.com

On Monday, 25 August 2014 20:13:00 UTC+3, Gokhan Gokturk wrote:

@Z Lin: iirc JVM can inline short functions(10-15 instructions). In C++ and Go, functions are not inlined implicitly.

Go inlines functions as well, but it's probably not as advanced.

+ egon

br...@aerospike.com

unread,

Aug 25, 2014, 1:36:24 PM8/25/14

to golan...@googlegroups.com, mars....@gmail.com, gerald...@gmail.com

C++ does a huge amount of inlining and a lot more, although it depends on your compiler.

GCC is the most popular, Clang from the LLVM suite is strong, the renowned intel C compiler. All inline where it makes sense (and avoid where cache coherency makes inlining a lose).

As an example of the maturity of the GCC suite, GCC has a great vectorizer, and will look for loops that can be executed in a single instruction, and cost out the time of setting up the more complex op, vs doing a simpler loop. Here's the (now ancient and fully part of GCC for quite a few years) info on vectorization:
https://gcc.gnu.org/projects/tree-ssa/vectorization.html

This is why the 'march' flags really matter in C. Early 64-bit CPUs had very few frills, and if you want to be compatible with the earliest ones, you run some very slow ops. I tend to set -march=nocona , which gives you access to CPUs through about 7 years ago (my 2008 laptop had a Core 2 Duo, which was nocona). 'native' of course is the winner in a speed test but you ship binaries about at your own risk.

Thanks for posting real code, it sharpens the discussion.

-brian

Gokhan Gokturk

unread,

Aug 25, 2014, 2:15:09 PM8/25/14

to br...@aerospike.com, golan...@googlegroups.com, Z Lin, gerald...@gmail.com

@egon: I see, Go indeed inlines some functions, but only really simple ones.

@brian: Ok, I was wrong. gcc does automatic inlining.

Only explanation I could think for Java being faster than C++ was function inlining.

Z Lin

unread,

Aug 25, 2014, 8:05:34 PM8/25/14

to golan...@googlegroups.com, br...@aerospike.com, mars....@gmail.com, gerald...@gmail.com

Here is the conclusion (again, just based on these test cases, it could be pointless in some real production case):
C++ is the overall winner, Java has good performance in recursive function, Go is 20-50% slower than C++

see details below:

Go