I don't have much to contribute on where CoralReactor and Netty stand in comparison to each other. But I do have some stuff to contribute on measurement techniques and on what is actually being compared... I was a bit bored sitting in my hotel room, so thanks for providing me with some amusement.. Here are some observations:Apples to Apples? This appears to be a comparison of an apple at rest with a clockwork orange under stress:1. The coralreactor client code recycles a single ByteBuffer pre-allocated and pre-initialized (with 'x's) during construction, and doesn't allocate any new buffer during sends. In contrast, the netty client code allocates a new ByteBuf in each sendMsg() call, and initializes each message with 'x's in each sendMsg() call. Why is the netty client implementation NOT doing similar recycling of a single ByteBuf created at construction time??
2. Similarly, the netty client and server code both goes to great lengths to covert to and use NIO ByteBuffers instead of netty's nice and fast ByteBuf: and instead of using ByteBuf.getLong() directly in the channelRead() method, it hops through extracting a ByteBuffer, going through two methods call (that are not there in the coralreactor client), and hopping into a coralreactor-styled handleMessage() call to process the byte buffer to to a... Butebuffer.getLong(). [Why didn't they put a sleep in there while they were at it?] The coralreactor implementation, in contrast, handles it's input directly in the buffer form it came in, and doesn't have any s̶l̶e̶e̶p̶s̶ extra and unneeded conversion steps in the code. I bet that if you wrote the coralreactor client and server code to convert to netty ByteBufs before it processed stuff, and allocate a new ByteBuf for each send, while keeping the netty code to using (and reusing) ByteBufs, the roles and behavior numbers would roughly reverse...
Measurement:3. The latency measured is one way latency from client to server, measured using System.nanoTime() on each. WTF? [System.nanotTime can't be safely used this way. Even on the same box on the same day.]
But mainly, those black box corralreactor Benchmarker classes do not inspire confidence:
4. To start with, why is there a custom Benchmarker built (just for netty) under com.coralblocks.nettybenchmarks.util.Benchmarker? And why is only the netty code using it for measurement? Why are the two tests not using the same benchmarker (com.coralblocks.coralbits.bench.Benchmarker)?
5. But mainly the confidence is highly degraded by output lines like "99.999% = [avg: 21.146 micros, max: 91.416 micros]". 99.999%'iles don't have averages. The 99.999%'ile is the 99.999%'ile. period. over each period. period. If you want some more ranting discussion on the subject of averaging percentiles you can find it here. When someone reports percentile averages coming out of a black box (whose code you can't find or read to understand how it makes up it's numbers), you have to assume the black box is running on crystal meth.
I could keep going and point to coordinated omission, and explain that percentiles are meaningless when measured this way, but I think there are enough nails in this coffin already.
Hi Gil,I am one of the developers of CoralReactor. We appreciate your feedback. We believe the best way to improve a benchmark is to hear criticism from smart people and offer our arguments. We have some clients that are also clients of Azul and from their feedback we believe our components run well in the Zing VM. Please see my comments below:
On Wednesday, May 13, 2015 at 5:30:29 PM UTC-4, Gil Tene wrote:I don't have much to contribute on where CoralReactor and Netty stand in comparison to each other. But I do have some stuff to contribute on measurement techniques and on what is actually being compared... I was a bit bored sitting in my hotel room, so thanks for providing me with some amusement.. Here are some observations:Apples to Apples? This appears to be a comparison of an apple at rest with a clockwork orange under stress:1. The coralreactor client code recycles a single ByteBuffer pre-allocated and pre-initialized (with 'x's) during construction, and doesn't allocate any new buffer during sends. In contrast, the netty client code allocates a new ByteBuf in each sendMsg() call, and initializes each message with 'x's in each sendMsg() call. Why is the netty client implementation NOT doing similar recycling of a single ByteBuf created at construction time??2. Similarly, the netty client and server code both goes to great lengths to covert to and use NIO ByteBuffers instead of netty's nice and fast ByteBuf: and instead of using ByteBuf.getLong() directly in the channelRead() method, it hops through extracting a ByteBuffer, going through two methods call (that are not there in the coralreactor client), and hopping into a coralreactor-styled handleMessage() call to process the byte buffer to to a... Butebuffer.getLong(). [Why didn't they put a sleep in there while they were at it?] The coralreactor implementation, in contrast, handles it's input directly in the buffer form it came in, and doesn't have any s̶l̶e̶e̶p̶s̶ extra and unneeded conversion steps in the code. I bet that if you wrote the coralreactor client and server code to convert to netty ByteBufs before it processed stuff, and allocate a new ByteBuf for each send, while keeping the netty code to using (and reusing) ByteBufs, the roles and behavior numbers would roughly reverse...We did not purposely chose to write that Netty benchmark to make it slower as your response might suggest. Netty makes it hard to re-use things and forces you to do reference counting on its ByteBufs. I may be mistaken here, but I don't think there is an easy/natural way to write a Netty benchmark using the techniques you described, techniques that, as you noticed, are incorporated from the ground up on CoralReactor.
Anyone that dislikes the quality of that Netty benchmark code is encouraged to make it better, and that's exactly the reason why we included the full Netty benchmark source code in our article.
The Benchmark class is exactly the same for both tests. They were only placed in different packages to make it easier to distribute the Netty code without any Coral Blocks dependencies.
If you or anyone can come up with a better Netty benchmark code that outputs better latency numbers, then that would be a great contribution to the Netty / Low-Latency community.
Our personal opinion is that using ByteBuf from Netty is not a good idea, making things not only slower but more complex.
Are you aware of any benchmarks / comparisons that show that Netty's ByteBuf is faster / better than java.nio.ByteBuffer? I am asking because our benchmarks suggest exactly the opposite.
Again if you or anyone can write a simple Netty benchmark that measures latency and performs around 2 micros per 256-byte message over TCP one-way (or round-trip if you prefer) I would be more than happy to run it on the same machine I am currently running the CoralReactor benchmarks and post the results here.
I would also be wiling to write the equivalent of the Netty benchmark code using CoralReactor and present the code and the numbers here for a comparison. A simple ping-pong benchmark test for measuring latency should be a simple program for any network library.We are making available for download the complete Netty benchmark we used which was already listed in the article, including the Benchmark class that was missing. You can download it from there: http://www.coralblocks.com/NettyBench.zip. Refer to the README.txt file included for the complete command lines on how to execute the client and the server.The reasons why we are confident on CoralReactor being much faster are:1. We are getting very positive feedback from our clients. Like you, they are skeptical and prefer to do their own independent benchmarks and run CoralReactor and CoralFIX on their own environments to come up with their own latency numbers. That's a good thing and we fully encourage them to do that during their free full version trial. Fortunately they have been reporting numbers closer to the ones from our own benchmarks.2. CoralReactor is single-threaded by design, from the groud up. There is only one pinned selector thread doing all operations, re-using and pooling all objects. That does not mean you can't add a second selector thread to scale your architecture, but that's completely different than adding a second thread sharing state with other threads. When that happens you start having to use multithreading techniques that introduce not only complexity but a lot of latency.
3. CoralRector produces zero garbage. That's zero, not little garbage. We wrote a super-optimized NIO reactor and rewrote the EPoll selector implementation for Linux, optimizing and cleaning it to the last bit for performance and zero garbage creation. That allows for the development of ultra-low-latency servers and clients with very little variance.
4. We are using Java as a syntax language and avoiding the JDK completely, at least the classes that do not perform well or produce garbage. We provide tools for our clients (CoralBits) so that they can do the same
5. CoralReactor makes it much easier (and that's a subjective matter but we have been receiving positive feedback from clients about simplicity) to write asynchronous, non-blocking, single-threaded network clients and servers, TCP and UDP including broadcast and multicast.
Measurement:3. The latency measured is one way latency from client to server, measured using System.nanoTime() on each. WTF? [System.nanotTime can't be safely used this way. Even on the same box on the same day.]We have found System.nanoTime() to be fairly reliable and monotonic on the same Linux box without NTP servers. Moreover, System.nanotTime() is being used on both Netty and CoralReactor benchmarks, so they should influence/affect both benchmarks equally. We have also used native RDTSC as a timestamper and the numbers measured were very similar.
But mainly, those black box corralreactor Benchmarker classes do not inspire confidence:As mentioned below, these classes are the same and we are providing the source code together with the netty benchmark source code for download.
4. To start with, why is there a custom Benchmarker built (just for netty) under com.coralblocks.nettybenchmarks.util.Benchmarker? And why is only the netty code using it for measurement? Why are the two tests not using the same benchmarker (com.coralblocks.coralbits.bench.Benchmarker)?Explained above, but for completeness: "The Benchmark class is exactly the same for both tests. They were only in different packages to make it easier to distribute the Netty code without any CoralBlocks dependencies." Source code for the Benchmarker class will be provided from now on.5. But mainly the confidence is highly degraded by output lines like "99.999% = [avg: 21.146 micros, max: 91.416 micros]". 99.999%'iles don't have averages. The 99.999%'ile is the 99.999%'ile. period. over each period. period. If you want some more ranting discussion on the subject of averaging percentiles you can find it here. When someone reports percentile averages coming out of a black box (whose code you can't find or read to understand how it makes up it's numbers), you have to assume the black box is running on crystal meth.Perhaps when you see the code from the Benchmarker class, this will become more clear. We are storing every measurement in a sorted list, then calculating the percentiles on top of it. For example 99.999%'ile means: If you take the 99.999% best measurements of the whole dataset, you will find that the average is X and the max time (biggest outlier) is Y. That's important because your average might be great but you might have some terrible outliers hidden in there. By presenting the worst outlier you can at least have an idea of the worst case scenario for your latency, up to the 99.999%'ile, without having to calculate the standard deviation. Our opinion is that average and worst outlier, up to a percentile, gives enough information to evaluate latency / performance.
I could keep going and point to coordinated omission, and explain that percentiles are meaningless when measured this way, but I think there are enough nails in this coffin already.Thanks for your feedback. Even if it can sometimes be interpreted by some as harsh, we respect it and understand that this is just your personal style. Hopefully the arguments I presented above will offer some balance to this great discussion.