Performance comparison of Go, C++, and Java for biological sequencing tool

570 views
Skip to first unread message

Isaac Gouy

unread,
Feb 28, 2019, 12:05:55 PM2/28/19
to golang-nuts
"We reimplemented elPrep in all three languages and benchmarked their runtime performance and memory use. Results: The Go implementation performs best, yielding the best balance between runtime performance and memory use. While the Java benchmarks report a somewhat faster runtime than the Go benchmarks, the memory use of the Java runs is significantly higher."




Louki Sumirniy

unread,
Feb 28, 2019, 2:59:02 PM2/28/19
to golang-nuts
It shouldn't really be surprising. Go and Java share the use of interfaces, but Go's concurrency is far lighter weight, and on top, Java has the extra burden of a virtual machine before it actually hits the CPU as binary code. I suspect also that the Go version could handle a much greater level of concurrency and then the advantage of compilation would be more visible.

oju...@gmail.com

unread,
Mar 6, 2019, 8:07:03 AM3/6/19
to golang-nuts
That doesn't surprises me at all.

A couple years ago I worked for a company where I created prototypes in Go and production code in C++, using the same architecture and algorithms. Go version usually ran 15% faster. After some work both versions could be tuned to run faster, but it amazed me to find that just plain Go code was faster than the corresponding C++ code.

Haddock

unread,
Mar 6, 2019, 8:17:00 AM3/6/19
to golang-nuts
Benchmarks are always limited, I know. But this might indicate some direction:

Robert Engels

unread,
Mar 6, 2019, 8:44:21 AM3/6/19
to Haddock, golang-nuts
As I pointed out long ago on stackoverflow the benchmark games are seriously flawed and should not be used for language performance comparisons. 

As a simple example, look at binary trees. In all of the “fast” implementations, they resort to specialized memory pools that wouldn’t be useable in a highly concurrent system. The Go and Java versions use off the shelf memory management so the code complexity comparisons are not even close. I’m sure you could replicate the performance using off heap structures in Go/Java but who would want to?
--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

bradlee...@gmail.com

unread,
Mar 6, 2019, 3:41:21 PM3/6/19
to golang-nuts
As the saying goes, "there are lies, damned lies and benchmarks."

Java is particularly difficult to get good benchmarks in due to the nature of HotSpot and JIT compilation. I was listening to James Gosling recently and he pointed out that there are over 100 flavors of JIT compilation on the JVM for x86 alone. Essentially the JVM can tweak the compilation parameters for the specific processor it is running on. HotSpot also requires warm up time for it to observe sections of code that are running repeatedly which it then inlines and otherwise optimizes and will JIT compile that optimized code as well. That doesn't even get into the JVM version or the type of code being run. Where the JVM is really going to hit its stride is when you have long running code with a lot of libraries where the JVM can profile the code and optimize it over time.

I'm a big fan of Go and prefer coding in it but that the JVM ran their code a bit faster doesn't surprise me nor does it surprise me that the footprint was larger. That's a big concern in the Dockerized world. If every container has to has its own JVM before you even start, the memory, thread and other resource overhead can get daunting.

Like any geek, I like benchmarks but just have to take them with a grain of salt.

Isaac Gouy

unread,
Mar 6, 2019, 3:54:35 PM3/6/19
to golang-nuts
On Wednesday, March 6, 2019 at 12:41:21 PM UTC-8, Bradlee Johnson wrote:
As the saying goes, "there are lies, damned lies and benchmarks."

After all, facts are facts, and although we may quote one to another with a chuckle the words of the Wise Statesman, 'Lies--damned lies--and statistics,' still there are some easy figures the simplest must understand, and the astutest cannot wriggle out of.

Leonard Henry Courtney, 1895

Dan Kortschak

unread,
Mar 6, 2019, 4:22:56 PM3/6/19
to Isaac Gouy, golang-nuts
It should be pointed out that these three implementations have close to
zero testing. In the absence of that, there is little that should be
drawn from the integration benchmarks that this suggests.

If we relax correct correctness requirements we can get answers in O(1)
with small constants.

On Thu, 2019-02-28 at 09:05 -0800, 'Isaac Gouy' via golang-nuts wrote:
> "We reimplemented elPrep in all three languages and benchmarked
> their 
> runtime performance and memory use. Results: *The Go implementation 
> performs best*, yielding the best balance between runtime performance
> and 
> memory use. While the Java benchmarks report a somewhat faster
> runtime than 
> the Go benchmarks, the memory use of the Java runs is significantly
> higher."
>
> proggit discussion 
> <https://www.reddit.com/r/programming/comments/avsfc6/performance_com
> parison_of_go_c_and_java_for/>
>
> article <https://doi.org/10.1101/558056>
>
>

Isaac Gouy

unread,
Mar 6, 2019, 6:29:42 PM3/6/19
to golang-nuts
On Wednesday, March 6, 2019 at 5:44:21 AM UTC-8, Robert Engels wrote:
As I pointed out long ago on stackoverflow the benchmark games are seriously flawed and should not be used for language performance comparisons. 

As a simple example, look at binary trees. In all of the “fast” implementations, they resort to specialized memory pools that wouldn’t be useable in a highly concurrent system. The Go and Java versions use off the shelf memory management so the code complexity comparisons are not even close. I’m sure you could replicate the performance using off heap structures in Go/Java but who would want to?


Definition of flawed
: having a defect or imperfection a flawed diamond

Please share a perfect alternative comparison ;-)


> all of the “fast” implementations, they resort to specialized memory pools

I doubt `Apache Portable Runtime Pools` were designed to make those tiny programs fast :-)

robert engels

unread,
Mar 6, 2019, 6:37:22 PM3/6/19
to Isaac Gouy, golang-nuts
Ask yourself, if the pools are that important for performance, why are they external? Because some (often?) times the pool provides worse performance - in this particular small benchmark they do not - but a large applications using tons of different pools in the goal of “being fast” just might be (over a generalized whole application malloc/free)

Similar reasoning applies to why most Java code should not pool objects (anymore) - the allocator in most cases is actually more efficient than using a pool - just a simple pointer bump, and the de-allocator is amortized/concurrent.

Bakul Shah

unread,
Mar 6, 2019, 7:03:52 PM3/6/19
to Isaac Gouy, golan...@googlegroups.com
Thanks for an interesting read!

Curious to know if you guys have any estimates on the number of lines, development time and number of bugs for each language implementation? I realize this is subjective but this comparison may be quite meaningful given that the authors had an existing reference implementation of a sort done in CL. It is not often one sees real world examples of multiple implementations done by a small team with the same goals.

Thanks!

Isaac Gouy

unread,
Mar 6, 2019, 7:32:26 PM3/6/19
to golang-nuts
On Wednesday, March 6, 2019 at 4:03:52 PM UTC-8, Bakul Shah wrote:
Thanks for an interesting read!

Curious to know if you guys have any estimates on the number of lines, development time and number of bugs for each language implementation? I realize this is subjective but this comparison may be quite meaningful given that the authors had an existing reference implementation of a sort done in CL. It is not often one sees real world examples of multiple implementations done by a small team with the same goals.


One of the authors Pascal Costanza showed-up on programming reddit to remedy some misconceptions.

You could ask him in-that-discussion or email him directly.

Michael Jones

unread,
Mar 6, 2019, 10:22:41 PM3/6/19
to Isaac Gouy, golang-nuts
There is another problem about these microbenchmarks as well--they often are ports of an originating C-version. 

I just implemented the Rabbit stream cipher and after reading the papers I reviewed several versions online in Java and two in Go, as well as the now open-source C version. It seems that they all started with the C version and did a "c-to-Java" or "c-to-go-ization" of it. The results are not strong for the language. My code took me a day and runs at 830 MB/s vs 620 MB/s peak for any of these. Now this is not a microbenchmark exactly and I may have more experience than some, but it still shows that language-X would seem to do a job at 620 that it could do at 830 with cleaner code. That kind of difference is probably more than the difference between the upper half of the performance reports. 

The base code has the C-ism "... + (a<b)" where the boolean becomes 0/1 for false/true. That's a nice thing and I understood it in 1975 at BTL in Guilford Center NC. But it is not in Go or whatever else. So these port versions all have a function that takes a boolean, has an if statement, and returns 1 or 0 as appropriate. That works. But not anywhere fast enough. I reorganized the code to get rid of that. That made it much faster than any compiler mode or great language design special benefit. A core computational element of Rabbit is its custom highly non-linear G-operator. Every version I can find online does this with four 32-bit multiplies, 2 shifts, 2 adds, and an XOR.I did it with one 64-bit squaring, a shift, and an XOR. It is 20% faster on laptop/desktop/server CPUs. It is no big deal but the reason for the universal implementation was lost to all who ported without understanding and revisiting the needs of the algorithm. This is a common outcome. It makes me look at all such benchmarks and imagine such huge error bars of performance/memory/size/... that direct comparisons of those is not too useful.

On the other hand, I absolutely love such suites because they are probably the best teacher of how programming language concepts work, help/hurt, and are expressed across languages. I've a long history in programming (Fortran II on PDP-8i, BTL CARDIAC, DG Nova, CDC 3300, ...) but learn new and helpful things whenever I see how people answer Project Euler questions or port benchmarks in languages I don't really know well.

My advice is to imagine there is always somebody who could use any computer/language and double whatever you can do. Figure that in when you see that some code is 4% faster in some 20-option GCC compile mode (no frame pointer, no ...) OTOH, if Go is 20x slower than language X that is like waving a red flag at a bull; I must figure out why. So far, there has never been a meaningful reason. It is most often problem redefinition, softness in specification, and sometimes, a very clever data structure or coding technique. Rarely is it that X is just-unfixably-slow and Y is inherently fast. (There are some of those, but few, and even if, it is usually slightly unhelpful like places where Python is really fast and it's because the whole benchmark is a giant matrix SVD done by one Python call to BLAS/LINPACK/LAPACK under the hood. That is a valid result, but does not say much about Python vs anything else for actual code-in-Python.)

A hidden surprise of these kinds of suites (and my coding of Rabbit) is that I'm looking at the generated code and asking, "how can this be really faster?" That made me realize a way to make Go code really faster for 32-bit integer intensive code like this, which is to make the compiler's rewrite rules and register assignments be able to express that two 32-bit variables are "roommates" in a 64-bit register. That's what I'll need to do by hand to get this over 1 GB/s and it would not be difficult. But if the compiler could do it that would make everything in this realm faster. Now I don't understand how to do that, I may not myself be able to do it, but a day of porting and benchmarking taught me what the next barrier to generated code performance may be. That was a surprise and generally a valuable one. If 100 people here have varied realizations like this, we'd really be able to make things fly.

So nothing against small benchmarks and puzzle suites...just don't take the numbers to literally.

Michael

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Michael T. Jones
michae...@gmail.com

Robert Engels

unread,
Mar 6, 2019, 10:29:32 PM3/6/19
to Michael Jones, Isaac Gouy, golang-nuts
I wholeheartedly agree and would add an important point, that ease of development/understanding leads to easier refactoring allowing for improvements in the algorithm- which are usually far more important to performance - which is exactly what you’ve demonstrated. 

Isaac Gouy

unread,
Mar 7, 2019, 12:40:55 PM3/7/19
to golang-nuts
On Wednesday, March 6, 2019 at 7:22:41 PM UTC-8, Michael Jones wrote:
There is another problem about these microbenchmarks as well--they often are ports of an originating C-version.

Which microbenchmarks?

You quoted a reply to a question about "Performance comparison of Go, C++, and Java for biological sequencing tool".

For those who haven't looked, that is about an evaluation (done by the authors of the elPrep tool) to select a new implementation language for the particular case of elPrep.

Michael Jones

unread,
Mar 7, 2019, 3:04:30 PM3/7/19
to Isaac Gouy, golang-nuts
I'm sorry Isaac, I meant multi-language benchmarking generally, nothing about the specific case you mention so i was slightly tangential to your original post.

--
You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Isaac Gouy

unread,
Mar 11, 2019, 6:25:11 PM3/11/19
to golang-nuts
On Wednesday, March 6, 2019 at 5:44:21 AM UTC-8, Robert Engels wrote:
As I pointed out long ago on stackoverflow the benchmark games are seriously flawed and should not be used for language performance comparisons. 

As a simple example, look at binary trees. In all of the “fast” implementations, they resort to specialized memory pools that wouldn’t be useable in a highly concurrent system. The Go and Java versions use off the shelf memory management so the code complexity comparisons are not even close. I’m sure you could replicate the performance using off heap structures in Go/Java but who would want to?



Does that difference show between the Go and Java  programs which use "off the shelf memory management" ?


Isaac Gouy

unread,
Mar 11, 2019, 7:05:04 PM3/11/19
to golang-nuts
On Wednesday, March 6, 2019 at 7:22:41 PM UTC-8, Michael Jones wrote: 
…make the compiler's rewrite rules and register assignments be able to express that two 32-bit variables are "roommates" in a 64-bit register.


 
So nothing against small benchmarks and puzzle suites...just don't take the numbers to literally.

On the benchmarks game website, you should take the numbers literally.

They are what they say they are — but they are no more than that.

In particular, there's no claim that those measurements, of a few tiny programs, somehow define the relative performance of programming languages.

Isaac Gouy

unread,
Mar 12, 2019, 2:12:44 PM3/12/19
to golang-nuts
On Monday, March 11, 2019, 3:54:56 PM PDT, Robert Engels wrote:

> Yes, so use Java - for this synthetic benchmark. I’m not sure what the point is you are trying to make.
> Both Java and Go outperform the C and C++ solutions using off the shelf memory management in the binary tree tests.

In what way do Java and Go outperform the C and C++ solutions there?

Those Java and Go programs are slower ?


> As the real world application demonstrates both Java and Go offer superior performance to C++ in standard use cases.

The authors are admirably specific in their recommendation — "Based on our positive experiences, we recommend authors of other bionformatics tools  for processing SAM/BAM data, and potentially also other sequencing  data  formats, to  also  consider Go as an implementation language."


======

On Monday, March 11, 2019, 4:10:21 PM PDT, Robert Engels wrote:

> You are 100% correct - that is why they have exactly 0 value. Nothing to see here, please move on...

On the contrary, there is value and there is plenty to see.


======

On Monday, March 11, 2019, 4:17:02 PM PDT, Robert Engels wrote:

> Also, you realize that Java  has implemented auto vectorization for a long time...

I do realize that. (I was using analogy).


> But If you want to spend your time coding and debugging C++ or C no one here is stopping you.

I won't spend time coding and debugging C++ or C.

Isaac Gouy

unread,
Mar 13, 2019, 2:13:00 PM3/13/19
to golang-nuts
On Tuesday, March 12, 2019, 12:54:47 PM PDT, Robert Engels wrote:

I swear you are just trolling now or looking at the wrong things. Review https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-gpp-2.html and https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-gcc-1.html

They are some of the slowest. Not surprisingly they also are the most plain Jane solutions and easiest to understand and don’t use specialized memory pools.

 
By "looking at the wrong things" do you mean looking at a C++ program that uses a boost library ?

Doesn't that boost library provide "off the shelf memory management" ?
Reply all
Reply to author
Forward
0 new messages