Here is the conclusion (again, just based on these test cases, it could be pointless in some real production case):
C++ is the overall winner, Java has good performance in recursive function, Go is 20-50% slower than C++
see details below:
Go
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 24319
//
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 8068
// 3.0143 x
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 106144
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 52843
// 2.0087 x
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 482560 <--- winner (but almost the same among three)
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 177735
// 2.7151 x
C++ (winner)
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=0, total in ms: 13317
//
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=1, total in ms: 6394
// 2.0827 x
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=0, total in ms: 75175 <--- winner
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=1, total in ms: 33619 <--- winner
// 2.2361 x
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=0, total in ms: 503682
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=1, total in ms: 175475 <--- winner (but almost the same among three)
// 2.8704 x
Java (good performance in recursive function)
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 12552 <--- winner
//
// For FIBONACCI_RECUR, RECUR_N=40 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 5583 <--- winner
// 2.2483 x
//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 212962
<--- terrible...//
// For FIBONACCI_FAST, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 103223
<--- terrible...// 2.0631 x
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=false, total in ms: 520514
//
// For PRIME_NUM, RECUR_N=90 and NUM_TASKS=9
// RUN_IN_PARALLEL=true, total in ms: 185111
// 2.8119 x
BTW, I didn't put the code to Gists because it's just duplicated all the files from
https://github.com/movelikeriver/random/tree/master/vector_reallocation , it's easier to just leave a comment there directly.