benchmark among Go, C++, Java for recursive function and threading

1,130 views
Skip to first unread message

Z Lin

unread,
Aug 21, 2014, 2:01:54 AM8/21/14
to golan...@googlegroups.com
Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

See conclusions in bold fonts below, and more details in the brief code in github.  I'm not fully sure I was doing something incorrectly in the code (including C++ and Java as well), feel free to leave comments there, any feedback (including multithread tricks in C++ and Java) is very welcome!


* Go
https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.go
// For FIBONACCI_RECUR, RECUR_N=40, num_tasks=9:
// 5416 ms * 9 sequentially, 2.3495 x
// 20746 ms in parallel    <--- loser... Go recursive function is terrible...
//
// For FIBONACCI_FAST, RECUR_N=90, num_tasks=9:
// 10680 ms * 9 sequentially, 4.7191 x
// 20368 ms in parallel    <--- winner!
//
// For PRIME_NUM:
// 48082 ms * 9 sequentially, 2.4753 x
// 174820 ms in parallel   <--- winner!


* C++
https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.cpp
// For FIBONACCI_RECUR, RECUR_N=40, num_tasks=9:
// 2118 ms * 9 sequentially, 1.9584 x
// 9733 ms in parallel
//
// For FIBONACCI_FAST, RECUR_N=90, num_tasks=9:
// 35980 ms * 9 sequentially, 2.4342 x
// 133028 ms in parallel
//
// For PRIME_NUM:
// 50546 ms * 9 sequentially, 2.3556 x
// 193120 ms in parallel


* Java
https://github.com/movelikeriver/random/blob/master/vector_reallocation/VectorReallocation/VectorReallocation/src/base/Report.java
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9:
// 1283 ms * 9 sequentially, 1.9614 x
// 5887 ms in parallel          <--- winner, Java is so good at recursive function??
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9:
// 20413 ms * 9 sequentially, ??
// 217103 ms in parallel, ??    <--- unbelievable, parallel mode is so terrible, worse than sequential mode..
//
// For PRIME_NUM:
// 49654 ms * 9 sequentially, 2.0662 x
// 216279 ms in parallel

egon

unread,
Aug 21, 2014, 2:13:43 AM8/21/14
to golan...@googlegroups.com


On Thursday, 21 August 2014 09:01:54 UTC+3, Z Lin wrote:
Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

Also, your code is way too complicated for a simple thing. You seem to have Java background, so remember that you can have functions in Go.

+ egon

Z Lin

unread,
Aug 21, 2014, 2:33:41 AM8/21/14
to golan...@googlegroups.com


On Wednesday, August 20, 2014 11:13:43 PM UTC-7, egon wrote:


On Thursday, 21 August 2014 09:01:54 UTC+3, Z Lin wrote:
Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

Also, your code is way too complicated for a simple thing. You seem to have Java background, so remember that you can have functions in Go.

any difference between this one and my code in https://github.com/movelikeriver/random/blob/master/vector_reallocation/vector_reallocation.go ?

func (this TaskManager) recur(n int) int {
if n <= 2 {
return n
}
return this.recur(n-1) + this.recur(n-2)
}


Other functions are used for supporting other slow algorithm (to waste some CPU resource, like some redundent for-loop in fibonacciRecur() ) for testing, and also to make the code more configurable for sequtial and parallel modes.

egon

unread,
Aug 21, 2014, 2:45:02 AM8/21/14
to golan...@googlegroups.com


On Thursday, 21 August 2014 09:33:41 UTC+3, Z Lin wrote:


On Wednesday, August 20, 2014 11:13:43 PM UTC-7, egon wrote:


On Thursday, 21 August 2014 09:01:54 UTC+3, Z Lin wrote:
Hi, Golang/C++/Java experts,

I just did a benchmark among Go, C++, Java for recursive function and threading, I admit this benchmark may be totally pointless for the real life production performance, which would be much more complicated, but the point is that these tests can still show some observation about the programming languages' performance themselves.

Also, your code is way too complicated for a simple thing. You seem to have Java background, so remember that you can have functions in Go.

Run the code and you should see a difference in performance. It shouldn't be too difficult to deduce the reason why.

Z Lin

unread,
Aug 21, 2014, 3:18:56 AM8/21/14
to golan...@googlegroups.com
OMG!!  now I got what did you mean by "Java background"...

thanks.

I moved all the static functions out of class, the recursive function performance got improved by a lot.

https://github.com/movelikeriver/random/commit/b3a56b2faaaf87d0584592538bb91fef139a042b
-// 5416 ms * 9 sequentially, 2.3495 x
-// 20746 ms in parallel
+// 1758 ms * 9 sequentially, 1.9492 x
+// 8117 ms in parallel

It beat than C++ version.

Gokhan Gokturk

unread,
Aug 22, 2014, 9:29:51 AM8/22/14
to golan...@googlegroups.com
Are any optimization parameters enabled? 
Some level of optimizations are enabled by default in java and go tests, but it seems g++ test runs without any.
I only tested PRIME_NUM benchmark, but on my computer g++ -Ofast -march=native runs 5% faster than golang.

Z Lin

unread,
Aug 24, 2014, 1:36:27 AM8/24/14
to golan...@googlegroups.com
ah... you're right, with "-Ofast -march=native", C++ became the fastest one.  :)  Just updated the code.  thanks!

Brian Dellisanti

unread,
Aug 24, 2014, 9:19:23 AM8/24/14
to golan...@googlegroups.com
The way you wait for threads to terminate in the java implantation of runInParallel is not good. It is busy waiting. I suggest using a CountDownLatch to wait for the workers to terminate.

gerald...@gmail.com

unread,
Aug 24, 2014, 4:59:45 PM8/24/14
to golan...@googlegroups.com
@Z Lin: can you update the code and upload it to gist or somewhere else? whats the final conclusion?

Z Lin

unread,
Aug 25, 2014, 1:27:44 AM8/25/14
to golan...@googlegroups.com, gerald...@gmail.com
I updated the Java code to use CountDownLatch, it makes the multithread code in better performance.

I haven't tried gist yet, will do it tomrrow.  No clear conclusion yet, rough conclusion is like:  speed: C++ > Go > Java, but Java does best job in recursive function.  Will do further investigation later.

Gokhan Gokturk

unread,
Aug 25, 2014, 1:13:00 PM8/25/14
to Z Lin, golan...@googlegroups.com, gerald...@gmail.com
@Z Lin: iirc JVM can inline short functions(10-15 instructions). In C++ and Go, functions are not inlined implicitly. You can try inlining C++  functions explicitly. However, I am not sure if it makes any difference.


--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/shZUZ9ZeVqs/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

egon

unread,
Aug 25, 2014, 1:18:01 PM8/25/14
to golan...@googlegroups.com, mars....@gmail.com, gerald...@gmail.com

On Monday, 25 August 2014 20:13:00 UTC+3, Gokhan Gokturk wrote:
@Z Lin: iirc JVM can inline short functions(10-15 instructions). In C++ and Go, functions are not inlined implicitly.

Go inlines functions as well, but it's probably not as advanced.

+ egon

br...@aerospike.com

unread,
Aug 25, 2014, 1:36:24 PM8/25/14
to golan...@googlegroups.com, mars....@gmail.com, gerald...@gmail.com
C++ does a huge amount of inlining and a lot more, although it depends on your compiler.

GCC is the most popular, Clang from the LLVM suite is strong, the renowned intel C compiler. All inline where it makes sense (and avoid where cache coherency makes inlining a lose).

As an example of the maturity of the GCC suite, GCC has a great vectorizer, and will look for loops that can be executed in a single instruction, and cost out the time of setting up the more complex op, vs doing a simpler loop. Here's the (now ancient and fully part of GCC for quite a few years) info on vectorization:
https://gcc.gnu.org/projects/tree-ssa/vectorization.html

This is why the 'march' flags really matter in C. Early 64-bit CPUs had very few frills, and if you want to be compatible with the earliest ones, you run some very slow ops. I tend to set -march=nocona , which gives you access to CPUs through about 7 years ago (my 2008 laptop had a Core 2 Duo, which was nocona). 'native' of course is the winner in a speed test but you ship binaries about at your own risk.

Thanks for posting real code, it sharpens the discussion.

-brian

Gokhan Gokturk

unread,
Aug 25, 2014, 2:15:09 PM8/25/14
to br...@aerospike.com, golan...@googlegroups.com, Z Lin, gerald...@gmail.com
@egon: I see, Go indeed inlines some functions, but only really simple ones. 
@brian: Ok, I was wrong. gcc does automatic inlining.  

Only explanation I could think for Java being faster than C++ was function inlining. 

Z Lin

unread,
Aug 25, 2014, 8:05:34 PM8/25/14
to golan...@googlegroups.com, br...@aerospike.com, mars....@gmail.com, gerald...@gmail.com
Here is the conclusion (again, just based on these test cases, it could be pointless in some real production case):
C++ is the overall winner, Java has good performance in recursive function, Go is 20-50% slower than C++

see details below:

Go

// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 24319
//

// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 8068
// 3.0143 x

//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 106144

//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 52843
// 2.0087 x
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 482560   <--- winner (but almost the same among three)
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 177735
// 2.7151 x


C++  (winner)


// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=0, total in ms: 13317
//

// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=1, total in ms: 6394
// 2.0827 x

//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=0, total in ms: 75175        <--- winner

//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=1, total in ms: 33619        <--- winner
// 2.2361 x
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=0, total in ms: 503682
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=1, total in ms: 175475       <--- winner (but almost the same among three)
// 2.8704 x


Java  (good performance in recursive function)


// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 12552      <--- winner
//

// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 5583        <--- winner
// 2.2483 x

//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 212962     <--- terrible...
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 103223      <--- terrible...
// 2.0631 x
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 520514
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 185111
// 2.8119 x



BTW, I didn't put the code to Gists because it's just duplicated all the files from https://github.com/movelikeriver/random/tree/master/vector_reallocation , it's easier to just leave a comment there directly.

Luan Cestari

unread,
Aug 27, 2014, 8:43:14 AM8/27/14
to golan...@googlegroups.com, br...@aerospike.com, mars....@gmail.com, gerald...@gmail.com
I took a quick look over the Java code, I would suggest you to take account that Java VM is JIT (not AOT like C++ and Go), so it would be fair to run each test 1500 (defaul value of XX:CompileThreshold parameter) times before start the benchmark (so Java would turn the bytecode to native). There is also other parameters that could help to tune the JVM (like XX:MaxInlineSize ,  XX:FreqInlineSize , and many others http://jvm-options.tech.xebia.fr/? ). I think it would be very interesting to comparison tuning all side, so we could find some cool stuff on the way  =)

Gerald Stan

unread,
Aug 27, 2014, 9:03:34 AM8/27/14
to Luan Cestari, golang-nuts, br...@aerospike.com, mars....@gmail.com
whats a fair test...all optimization parameters on?

Luan Cestari

unread,
Aug 27, 2014, 9:30:53 AM8/27/14
to Gerald Stan, golang-nuts, br...@aerospike.com, mars....@gmail.com
I had the same thought, my conclusion would be yes if the comparison is testing how good each VM (or native code) can handle those tasks , using the full potential of VM (or the compiler, like -O3 option in C++ and other things). We could find some useful options digging this way and maybe even create some RFE for C++/GO/Java =) 

Luan Cestari

"All the gold which is under or upon the earth is not enough to give in exchange for virtue."
Plato
"At his best, man is the noblest of all animals; separated from law and justice he is the worst."
"A true friend is one soul in two bodies."
Aristotle
"The only limit to your impact is your imagination and commitment."
Tony Robbins

egon

unread,
Aug 27, 2014, 9:47:12 AM8/27/14
to golan...@googlegroups.com, gerald...@gmail.com, br...@aerospike.com, mars....@gmail.com
BTW. there's already language shootout (http://benchmarksgame.alioth.debian.org/) that has multiple examples and compares different implementations for speed/memory etc.

+ egon

Gerard

unread,
Aug 27, 2014, 10:04:31 AM8/27/14
to golan...@googlegroups.com
Does the benchmarksgame really matter? It is not that Go is 10x slower than C (well, usually anyway). Just profile your application. An application is so much more than a couple of routines.

Luan Cestari

unread,
Aug 27, 2014, 10:16:10 AM8/27/14
to Gerard, golang-nuts
Well, I think it nice as it implements cool algorithms which we could analyze many different aspects such as CPU,memory,LOC, etc, but it is not what we probably will find in many different projects. I think more than seeing which is faster (for example) we can focus on how to make it faster (or lower memory footprint, or etc), I mean, the language you are going to use in a project depends on many different things (the team, your manager, your company, the customer, you and etc).

Another nice benchmark web site is http://www.techempower.com/benchmarks/ (there is a github account that you can contribute, you can create even different versions of the same frameworks or language to be tested on the next round). 

Luan Cestari

"All the gold which is under or upon the earth is not enough to give in exchange for virtue."
Plato
"At his best, man is the noblest of all animals; separated from law and justice he is the worst."
"A true friend is one soul in two bodies."
Aristotle
"The only limit to your impact is your imagination and commitment."
Tony Robbins


Jesper Louis Andersen

unread,
Aug 27, 2014, 10:40:50 AM8/27/14
to Gerard, golang-nuts

On Wed, Aug 27, 2014 at 4:04 PM, Gerard <gvds...@gmail.com> wrote:
Does the benchmarksgame really matter?

For real world projects? Almost never! The benchmarks in there are:

* way too infected with problems for which C is fast.
* with small tight cpu intensive computation kernels.
* and no basis in reality for typical workloads.

Most programming languages that fare well do so by writing C-in-X. Case in point: Haskell and OCaml solutions.

Real systems lose performance mainly based on two factors. One of the choice of data structures and algorithms. The other is the inherent complexity in the task at hand. Rarely is performance lost due to a missing sub-level optimization in a tight loop. There are problems for which this is important and then you should pick BER-MetaOCaml and derive an assembly kernel for your problem out of metaocaml macro-expansions :)

Since the two main factors is not addressed at all in the benchmarksgame, you run the risk of picking a wrong technology based on the wrong assumptions.

In the same vein: don't pick Go for its performance characteristics, but rather make the choice in order to improve your programs expressibility in the areas of interfaces and channel-based-concurrency. Those are the strenghts of Go and you should play those strengths all the time.


--
J.

egon

unread,
Aug 27, 2014, 10:58:13 AM8/27/14
to golan...@googlegroups.com


On Wednesday, 27 August 2014 17:04:31 UTC+3, Gerard wrote:
Does the benchmarksgame really matter? It is not that Go is 10x slower than C (well, usually anyway). Just profile your application. An application is so much more than a couple of routines.


Totally agree. Most of the time it doesn't matter.

As the top of the benchmarksgame says:

Measurement is highly specific -- the time taken for this benchmark task, by this toy program, with this programming language implementation, with these options, on this computer, with these workloads.

My goal of adding this link to this discussion was to stop people wasting time of writing even more micro benchmarks; the benchmarksgame has a good set of micro benchmarks already.

+ egon
Reply all
Reply to author
Forward
0 new messages