Hi all,
As part of an study, i'm measuring the performance of the Disruptor with the test provided with the source code, but i'm having some troubles understanding the nature of the tests and getting unexpected results i expect you can clarify for me.
First of all, i wanted to get the performance results for Disruptor that are showed in github (
https://github.com/LMAX-Exchange/disruptor/wiki/Performance-Results). When I opened and search in the source code for those test, I couldn't find any document explaining the package hierarchy and the nature of the test, but i assume that the results posted on github are the ones under the packages com.lmax.disruptor.queue and com.lmax.disruptor.sequenced.
Working on the assumption that my choice is correct and executing the tests, i get the following results for classes OneToOneQueueThroughputTest and OneToOneSequencedThroughputTest, after pinning them to specific cores of the CPU (Dual socket Intel Xeon E5-2660 Sandy Bridge):
Starting Queue tests
Run 0, BlockingQueue=3,494,060 ops/sec
Run 1, BlockingQueue=3,624,501 ops/sec
Run 2, BlockingQueue=3,671,071 ops/sec
Run 3, BlockingQueue=3,805,175 ops/sec
Run 4, BlockingQueue=3,824,091 ops/sec
Run 5, BlockingQueue=3,762,227 ops/sec
Run 6, BlockingQueue=3,796,507 ops/sec
Starting Disruptor tests
Run 0, Disruptor=46,232,085 ops/sec
Run 1, Disruptor=40,816,326 ops/sec
Run 2, Disruptor=49,358,341 ops/sec
Run 3, Disruptor=46,598,322 ops/sec
Run 4, Disruptor=46,576,618 ops/sec
Run 5, Disruptor=49,504,950 ops/sec
Run 6, Disruptor=49,554,013 ops/sec
The real trouble i get is when i compare the results of the classes PingPongQueueLatencyTest and PingPongSequencedLatencyTest. The results i get do not match the results from github, even if i perform cpu binding, despite the fact that the results submitted on github were obtained without it as they claim.
Queue PingPong:
Without CPU binding:
PingPongQueueLatencyTest run 0 BlockingQueue
#[Mean = 6.2295, StdDeviation = 120.1229]
#[Max = 17671.1680, Total count = 182200356]
PingPongQueueLatencyTest run 1 BlockingQueue
#[Mean = 5.8534, StdDeviation = 65.7652]
#[Max = 8415.2320, Total count = 197990795]
PingPongQueueLatencyTest run 2 BlockingQueue
#[Mean = 5.0861, StdDeviation = 36.0260]
#[Max = 3637.1200, Total count = 181156206]
taskset 0-7:
PingPongQueueLatencyTest run 0 BlockingQueue
#[Mean = 18.8887, StdDeviation = 493.1312]
#[Max = 40284.1600, Total count = 181431589]
PingPongQueueLatencyTest run 1 BlockingQueue
#[Mean = 22.5575, StdDeviation = 498.5724]
#[Max = 29723.6480, Total count = 181499795]
PingPongQueueLatencyTest run 2 BlockingQueue
#[Mean = 43.7720, StdDeviation = 805.3321]
#[Max = 35377.1520, Total count = 181796565]
------------------------------------------------------------------------------------
Disruptor PingPong:
Without CPU binding:
PingPongSequencedLatencyTest run 0 Disruptor
#[Mean = 1.2853, StdDeviation = 7.0978]
#[Max = 283.2640, Total count = 30188405]
PingPongSequencedLatencyTest run 1 Disruptor
#[Mean = 1.3400, StdDeviation = 7.2893]
#[Max = 288.1120, Total count = 30184346]
PingPongSequencedLatencyTest run 2 Disruptor
#[Mean = 1.3658, StdDeviation = 7.3930]
#[Max = 313.9840, Total count = 30180547]
taskset 0-7:
PingPongSequencedLatencyTest run 0 Disruptor
#[Mean = 0.7346, StdDeviation = 5.7340]
#[Max = 414.8800, Total count = 30242574]
PingPongSequencedLatencyTest run 1 Disruptor
#[Mean = 0.7633, StdDeviation = 6.4105]
#[Max = 450.0960, Total count = 30183025]
PingPongSequencedLatencyTest run 2 Disruptor
#[Mean = 0.8277, StdDeviation = 14.0454]
#[Max = 2438.0160, Total count = 30196642]
Reviewing the code, i stepped into the instruction: histogram.recordValue(t1 - t0, pauseTimeNs); and looking for the definition of the function on the source code of HdrHistogram, i get:
* To compensate for the loss of sampled values when a recorded value is larger than the expected
* interval between value samples, Histogram will auto-generate an additional series of decreasingly-smaller
* (down to the expectedIntervalBetweenValueSamples) value records.
My question is: Why do you auto-generate additional series of values when the latency is bigger than 1000ns?
I changed this instruction for the simple one that only records the values, and the results i get are:
Queue PingPong:
Without CPU binding:
PingPongQueueLatencyTest run 0 BlockingQueue
#[Mean = 7.1360, StdDeviation = 5.5498]
#[Max = 8796.1600, Total count = 30000001]
PingPongQueueLatencyTest run 1 BlockingQueue
#[Mean = 6.9327, StdDeviation = 4.6201]
#[Max = 8462.8480, Total count = 30000001]
PingPongQueueLatencyTest run 2 BlockingQueue
#[Mean = 6.7062, StdDeviation = 4.7566]
#[Max = 7864.3200, Total count = 30000001]
taskset 0-7:
PingPongQueueLatencyTest run 0 BlockingQueue
#[Mean = 6.7305, StdDeviation = 13.4514]
#[Max = 39100.4160, Total count = 30000001]
PingPongQueueLatencyTest run 1 BlockingQueue
#[Mean = 6.6670, StdDeviation = 15.2062]
#[Max = 38625.2800, Total count = 30000001]
PingPongQueueLatencyTest run 2 BlockingQueue
#[Mean = 6.6649, StdDeviation = 20.9045]
#[Max = 32719.8720, Total count = 30000001]
------------------------------------------------------------------------------------
Disruptor PingPong:
Without CPU binding:
PingPongSequencedLatencyTest run 0 Disruptor
#[Mean = 1.0289, StdDeviation = 0.8593]
#[Max = 322.6560, Total count = 30000001]
PingPongSequencedLatencyTest run 1 Disruptor
#[Mean = 1.0397, StdDeviation = 0.8429]
#[Max = 327.5680, Total count = 30000001]
PingPongSequencedLatencyTest run 2 Disruptor
#[Mean = 1.0365, StdDeviation = 0.7842]
#[Max = 288.4000, Total count = 30000001]
taskset 0-7:
PingPongSequencedLatencyTest run 0 Disruptor
#[Mean = 0.5200, StdDeviation = 0.6922]
#[Max = 1800.7680, Total count = 30000001]
PingPongSequencedLatencyTest run 1 Disruptor
#[Mean = 0.5055, StdDeviation = 0.5834]
#[Max = 318.7840, Total count = 30000001]
PingPongSequencedLatencyTest run 2 Disruptor
#[Mean = 0.5127, StdDeviation = 0.5935]
#[Max = 319.2160, Total count = 30000001]
On this run, i appreciate that the results of the Queue with cpu binding are more consistent and believable, and the results of the Disruptor are globally better. But despite of this, the results compared with the ones submitted on github are too different.
With the Queue test i executed the average latency is around 6-7 milliseconds, and on the github page are over 32 milliseconds.
With the Disruptor test i executed the average latency is more than 500 nanoseconds in the best run, even with cpu binding, and on github the results are of 52 nanoseconds.
I wish you can help me understand if i'm taking the measures correctly, and the reason of auto-generating values on the PingPong test, as well the nature of the rest of the test on the source code. Any help will be appreciated.