I'm using disruptor.net port of disruptor. I have significant perfomance problems and I don't know why:
- am I just measure perfomance wrong way? - am I use disruptor wrong way? - .net port of disruptor is so bad?
Probably someone can help me. In my real application I've measured that using Disruptor I spent ~50 microseconds in average to deliver one message. The code below is very simplified version of my real application. The difference is that in real appplication i have several consumers and I do pass "reference" instead of "int".
However even simplified test below produced such output:
4 microsecond for Disruptor is HUGE delay. I expect it always to be much less than 1 microsecond. Waiting for suggestions and hints (also probably someone can test the same code with Java version, it would be easy to port I assume :)
using System; using System.Diagnostics; using System.Linq; using System.Threading; using System.Threading.Tasks; using Disruptor; using Disruptor.Dsl;
namespace DisruptorTest { public sealed class ValueEntry { public int Value { get; set; }
public ValueEntry() { // Console.WriteLine("New ValueEntry created"); } }
public class ValueAdditionHandler : IEventHandler<ValueEntry> { public void OnNext(ValueEntry data, long sequence, bool endOfBatch) {
class Program { public const int length = 10000; public static Stopwatch[] sw = new Stopwatch[length]; public static long[] results = new long[length];
private static readonly int _ringSize = 1048576; // Must be multiple of 2
public static int factorial;
static void Main(string[] args) { for (int i = 0; i < length; i++) { sw[i] = Stopwatch.StartNew(); }
var disruptor = new Disruptor.Dsl.Disruptor<ValueEntry>(() => new ValueEntry(), _ringSize, TaskScheduler.Default);
On Friday, November 16, 2012 11:51:18 AM UTC+4, Oleg Vazhnev wrote:
> I'm using disruptor.net port of disruptor. I have significant perfomance > problems and I don't know why:
> - am I just measure perfomance wrong way? > - am I use disruptor wrong way? > - .net port of disruptor is so bad?
> Probably someone can help me. In my real application I've measured that > using Disruptor I spent ~50 microseconds in average to deliver one message. > The code below is very simplified version of my real application. The > difference is that in real appplication i have several consumers and I do > pass "reference" instead of "int".
> However even simplified test below produced such output:
> 4 microsecond for Disruptor is HUGE delay. I expect it always to be much > less than 1 microsecond. Waiting for suggestions and hints (also probably > someone can test the same code with Java version, it would be easy to port > I assume :)
> using System; > using System.Diagnostics; > using System.Linq; > using System.Threading; > using System.Threading.Tasks; > using Disruptor; > using Disruptor.Dsl;
> namespace DisruptorTest > { > public sealed class ValueEntry > { > public int Value { get; set; }
> class Program > { > public const int length = 10000; > public static Stopwatch[] sw = new Stopwatch[length]; > public static long[] results = new long[length];
> private static readonly int _ringSize = 1048576; // Must be > multiple of 2
> public static int factorial;
> static void Main(string[] args) > { > for (int i = 0; i < length; i++) > { > sw[i] = Stopwatch.StartNew(); > }
> var disruptor = new Disruptor.Dsl.Disruptor<ValueEntry>(() => > new ValueEntry(), _ringSize, TaskScheduler.Default);
> average /= (length - 100);
> should be
> average /= (length / 2)
> Everything else is still correct.
> New output (average a little bit more):
> average = 6 minimum = 3 0-5 = 4040, 5-10 = 625, 10-30 = 191, >30 = 144
> On Friday, November 16, 2012 11:51:18 AM UTC+4, Oleg Vazhnev wrote:
>> I'm using disruptor.net port of disruptor. I have significant perfomance problems and I don't know why:
>> - am I just measure perfomance wrong way?
>> - am I use disruptor wrong way?
>> - .net port of disruptor is so bad?
>> Probably someone can help me. In my real application I've measured that using Disruptor I spent ~50 microseconds in average to deliver one message.
>> The code below is very simplified version of my real application. The difference is that in real appplication i have several consumers and I do pass "reference" instead of "int".
>> However even simplified test below produced such output:
>> 4 microsecond for Disruptor is HUGE delay. I expect it always to be much less than 1 microsecond. Waiting for suggestions and hints (also probably someone can test the same code with Java version, it would be easy to port I assume :)
>> using System;
>> using System.Diagnostics;
>> using System.Linq;
>> using System.Threading;
>> using System.Threading.Tasks;
>> using Disruptor;
>> using Disruptor.Dsl;
>> namespace DisruptorTest
>> {
>> public sealed class ValueEntry
>> {
>> public int Value { get; set; }
>> class Program
>> {
>> public const int length = 10000;
>> public static Stopwatch[] sw = new Stopwatch[length];
>> public static long[] results = new long[length];
>> private static readonly int _ringSize = 1048576; // Must be multiple of 2
>> public static int factorial;
>> static void Main(string[] args)
>> {
>> for (int i = 0; i < length; i++)
>> {
>> sw[i] = Stopwatch.StartNew();
>> }
>> var disruptor = new Disruptor.Dsl.Disruptor<ValueEntry>(() => new ValueEntry(), _ringSize, TaskScheduler.Default);
The results are for my laptop Lenovo W530 (using ivy bridge processor).
I've just also tried to run my test on HP DL360p Gen8 (2 * Xeon E5-2640), the results are: average = 4 minimum = 4 0-5 = 2579, 5-10 = 2388, 10-30 = 32, >30 = 1
This is in Release build with no debugger attached. No I didn't tried to run perf tests, probaly I should. Do you think that my test is correct and the problem is with hardware or sotware?
On Friday, November 16, 2012 1:39:11 PM UTC+4, Olivier Deheurles wrote:
> Could you describe the hardware used? > Have you tried to run the latency perf test in the perf test suite, in > release mode, without debugger attached?
> Olivier
> On 16 nov. 2012, at 07:56, Oleg Vazhnev <ovaz...@gmail.com <javascript:>> > wrote:
> Bug in code
> average /= (length - 100); > should be > average /= (length / 2)
> Everything else is still correct.
> New output (average a little bit more): > average = 6 minimum = 3 0-5 = 4040, 5-10 = 625, 10-30 = 191, >30 = 144
> On Friday, November 16, 2012 11:51:18 AM UTC+4, Oleg Vazhnev wrote:
>> I'm using disruptor.net port of disruptor. I have significant perfomance >> problems and I don't know why:
>> - am I just measure perfomance wrong way? >> - am I use disruptor wrong way? >> - .net port of disruptor is so bad?
>> Probably someone can help me. In my real application I've measured that >> using Disruptor I spent ~50 microseconds in average to deliver one message. >> The code below is very simplified version of my real application. The >> difference is that in real appplication i have several consumers and I do >> pass "reference" instead of "int".
>> However even simplified test below produced such output:
>> 4 microsecond for Disruptor is HUGE delay. I expect it always to be much >> less than 1 microsecond. Waiting for suggestions and hints (also probably >> someone can test the same code with Java version, it would be easy to port >> I assume :)
>> using System; >> using System.Diagnostics; >> using System.Linq; >> using System.Threading; >> using System.Threading.Tasks; >> using Disruptor; >> using Disruptor.Dsl;
>> namespace DisruptorTest >> { >> public sealed class ValueEntry >> { >> public int Value { get; set; }
>> class Program >> { >> public const int length = 10000; >> public static Stopwatch[] sw = new Stopwatch[length]; >> public static long[] results = new long[length];
>> private static readonly int _ringSize = 1048576; // Must be >> multiple of 2
>> public static int factorial;
>> static void Main(string[] args) >> { >> for (int i = 0; i < length; i++) >> { >> sw[i] = Stopwatch.StartNew(); >> }
>> var disruptor = new Disruptor.Dsl.Disruptor<ValueEntry>(() => >> new ValueEntry(), _ringSize, TaskScheduler.Default);
Operating System: Microsoft Windows 8 Pro with Media Center - Version: 6.2.9200 - ServicePack: 0
Number of Processors: 1 - Name: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz - Description: Intel64 Family 6 Model 58 Stepping 9 - ClockSpeed: 2601 Mhz - Number of cores: 4 - Number of logical processors: 8 - Hyperthreading: ON
On Friday, November 16, 2012 2:03:01 PM UTC+4, Oleg Vazhnev wrote:
> The results are for my laptop Lenovo W530 (using ivy bridge processor).
> I've just also tried to run my test on HP DL360p Gen8 (2 * Xeon E5-2640), > the results are: > average = 4 minimum = 4 0-5 = 2579, 5-10 = 2388, 10-30 = 32, >30 = 1
> This is in Release build with no debugger attached. > No I didn't tried to run perf tests, probaly I should. > Do you think that my test is correct and the problem is with hardware or > sotware?
> On Friday, November 16, 2012 1:39:11 PM UTC+4, Olivier Deheurles wrote:
>> Could you describe the hardware used? >> Have you tried to run the latency perf test in the perf test suite, in >> release mode, without debugger attached?
>> Olivier
>> On 16 nov. 2012, at 07:56, Oleg Vazhnev <ovaz...@gmail.com> wrote:
>> Bug in code
>> average /= (length - 100); >> should be >> average /= (length / 2)
>> Everything else is still correct.
>> New output (average a little bit more): >> average = 6 minimum = 3 0-5 = 4040, 5-10 = 625, 10-30 = 191, >30 = 144
>> On Friday, November 16, 2012 11:51:18 AM UTC+4, Oleg Vazhnev wrote:
>>> I'm using disruptor.net port of disruptor. I have significant >>> perfomance problems and I don't know why:
>>> - am I just measure perfomance wrong way? >>> - am I use disruptor wrong way? >>> - .net port of disruptor is so bad?
>>> Probably someone can help me. In my real application I've measured that >>> using Disruptor I spent ~50 microseconds in average to deliver one message. >>> The code below is very simplified version of my real application. The >>> difference is that in real appplication i have several consumers and I do >>> pass "reference" instead of "int".
>>> However even simplified test below produced such output:
>>> 4 microsecond for Disruptor is HUGE delay. I expect it always to be much >>> less than 1 microsecond. Waiting for suggestions and hints (also probably >>> someone can test the same code with Java version, it would be easy to port >>> I assume :)
>>> using System; >>> using System.Diagnostics; >>> using System.Linq; >>> using System.Threading; >>> using System.Threading.Tasks; >>> using Disruptor; >>> using Disruptor.Dsl;
>>> namespace DisruptorTest >>> { >>> public sealed class ValueEntry >>> { >>> public int Value { get; set; }
>>> public class ValueAdditionHandler : IEventHandler<ValueEntry> >>> { >>> public void OnNext(ValueEntry data, long sequence, bool >>> endOfBatch) >>> {
>>> class Program >>> { >>> public const int length = 10000; >>> public static Stopwatch[] sw = new Stopwatch[length]; >>> public static long[] results = new long[length];
>>> private static readonly int _ringSize = 1048576; // Must be >>> multiple of 2
>>> public static int factorial;
>>> static void Main(string[] args) >>> { >>> for (int i = 0; i < length; i++) >>> { >>> sw[i] = Stopwatch.StartNew(); >>> }
>>> var disruptor = new Disruptor.Dsl.Disruptor<ValueEntry>(() >>> => new ValueEntry(), _ringSize, TaskScheduler.Default);
>>> // let's simulate some work >>> factorial = 1; >>> for (int j = 1; j < 100000; j++) >>> { >>> factorial *= j; >>> } >>> }
>>> // be absolutely sure that all events are delivered >>> Thread.Sleep(1000);
>>> long average = 0; >>> long minimum = 10000000000; >>> int firstFive = 0; >>> int fiveToTen = 0; >>> int tenToThirty = 0; >>> int moreThenThirty = 0;
>>> // count only second half because first half might be too >>> slow by some "start-up" reason >>> for (int i = length / 2; i < length; i++) >>> { >>> average += results[i]; >>> if (results[i] < minimum) >>> { >>> minimum = results[i]; >>> } >>> if (results[i] < 5) >>> { >>> firstFive++; >>> } >>> else if (results[i] < 10) >>> { >>> fiveToTen++; >>> } >>> else if (results[i] < 30) >>> { >>> tenToThirty++; >>> } >>> else >>> { >>> moreThenThirty++; >>> } >>> } >>> average /= (length - 100); >>> Console.WriteLine("average = {0} minimum = {1} 0-5 = {2}, >>> 5-10 = {3}, 10-30 = {4}, >30 = {5}", average, minimum, firstFive, >>> fiveToTen, tenToThirty, moreThenThirty); >>> }
I had a closer look to your code:
1. There is no warm up phase so you're measuring JIT time
2. You're not configuring the disruptor to use the right claim strategy, it should be the single threaded one since you have only one producer
3. You're doing computations on the hot path, you should just get raw measures and compute once the test is complete (for instance store stopwatch.GetTimestamp).
4. Your array used to store stopwatches is accessed by both threads in sequence, the first one writing, the other one reading which likely cause lots of false sharing. 5. Your final measures include the instrumentation (stopwatch calls do not come for free). To do this properly you would have to measure the time spent to instrument and subtract it from your final results.
Hope this helps..
Olivier
On 16 nov. 2012, at 13:25, Oleg Vazhnev <ovazh...@gmail.com> wrote:
> Operating System: Microsoft Windows 8 Pro with Media Center
> - Version: 6.2.9200
> - ServicePack: 0
> Number of Processors: 1
> - Name: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz
> - Description: Intel64 Family 6 Model 58 Stepping 9
> - ClockSpeed: 2601 Mhz
> - Number of cores: 4
> - Number of logical processors: 8
> - Hyperthreading: ON
> On Friday, November 16, 2012 2:03:01 PM UTC+4, Oleg Vazhnev wrote:
>> The results are for my laptop Lenovo W530 (using ivy bridge processor).
>> I've just also tried to run my test on HP DL360p Gen8 (2 * Xeon E5-2640), the results are:
>> average = 4 minimum = 4 0-5 = 2579, 5-10 = 2388, 10-30 = 32, >30 = 1
>> This is in Release build with no debugger attached.
>> No I didn't tried to run perf tests, probaly I should.
>> Do you think that my test is correct and the problem is with hardware or sotware?
>> On Friday, November 16, 2012 1:39:11 PM UTC+4, Olivier Deheurles wrote:
>>> Could you describe the hardware used?
>>> Have you tried to run the latency perf test in the perf test suite, in release mode, without debugger attached?
>>> Olivier
>>> On 16 nov. 2012, at 07:56, Oleg Vazhnev <ovaz...@gmail.com> wrote:
>>>> Bug in code
>>>> average /= (length - 100);
>>>> should be
>>>> average /= (length / 2)
>>>> Everything else is still correct.
>>>> New output (average a little bit more):
>>>> average = 6 minimum = 3 0-5 = 4040, 5-10 = 625, 10-30 = 191, >30 = 144
>>>> On Friday, November 16, 2012 11:51:18 AM UTC+4, Oleg Vazhnev wrote:
>>>>> I'm using disruptor.net port of disruptor. I have significant perfomance problems and I don't know why:
>>>>> - am I just measure perfomance wrong way?
>>>>> - am I use disruptor wrong way?
>>>>> - .net port of disruptor is so bad?
>>>>> Probably someone can help me. In my real application I've measured that using Disruptor I spent ~50 microseconds in average to deliver one message..
>>>>> The code below is very simplified version of my real application. The difference is that in real appplication i have several consumers and I do pass "reference" instead of "int".
>>>>> However even simplified test below produced such output:
>>>>> 4 microsecond for Disruptor is HUGE delay. I expect it always to be much less than 1 microsecond. Waiting for suggestions and hints (also probably someone can test the same code with Java version, it would be easy to port I assume :)
>>>>> using System;
>>>>> using System.Diagnostics;
>>>>> using System.Linq;
>>>>> using System.Threading;
>>>>> using System.Threading.Tasks;
>>>>> using Disruptor;
>>>>> using Disruptor.Dsl;
>>>>> namespace DisruptorTest
>>>>> {
>>>>> public sealed class ValueEntry
>>>>> {
>>>>> public int Value { get; set; }
>>>>> public class ValueAdditionHandler : IEventHandler<ValueEntry>
>>>>> {
>>>>> public void OnNext(ValueEntry data, long sequence, bool endOfBatch)
>>>>> {
>>>>> class Program
>>>>> {
>>>>> public const int length = 10000;
>>>>> public static Stopwatch[] sw = new Stopwatch[length];
>>>>> public static long[] results = new long[length];
>>>>> private static readonly int _ringSize = 1048576; // Must be multiple of 2
>>>>> public static int factorial;
>>>>> static void Main(string[] args)
>>>>> {
>>>>> for (int i = 0; i < length; i++)
>>>>> {
>>>>> sw[i] = Stopwatch.StartNew();
>>>>> }
>>>>> var disruptor = new Disruptor.Dsl.Disruptor<ValueEntry>(() => new ValueEntry(), _ringSize, TaskScheduler.Default);
1. Even in real application while measuring Disruptor I do see that it takes 2-3 times longer to process data comparing to BlockingCollection. As real application works for "several hours" I'm sure i do not measure JIT there, so I do think that JIT likely is not the reason why my test above is so slow. 3. Agree, I better to store Stopwatch.ElapsedTicks, however my computations are trivial and can not affect test result significantly. 4. Ok, but i do not know how to make things better... How can I measure delivery of the message from one thread to another without accessing Stopwatch from different threads? 5. Agree, but this can not significantly affect measurement. Stopwatch is able to measure with "microsecond precision" without introducing significant overhead.
2. Can this slowdown disruptor in 10-100 times? I would appreciate If you or someone else can "point" me to a "right" example that covers my needs.
On Friday, November 16, 2012 9:59:22 PM UTC+4, Olivier Deheurles wrote:
> I had a closer look to your code: > 1. There is no warm up phase so you're measuring JIT time > 2. You're not configuring the disruptor to use the right claim strategy, > it should be the single threaded one since you have only one producer > 3. You're doing computations on the hot path, you should just get raw > measures and compute once the test is complete (for instance store > stopwatch.GetTimestamp). > 4. Your array used to store stopwatches is accessed by both threads in > sequence, the first one writing, the other one reading which likely cause > lots of false sharing. > 5. Your final measures include the instrumentation (stopwatch calls do not > come for free). To do this properly you would have to measure the time > spent to instrument and subtract it from your final results.
> Hope this helps..
> Olivier
> On 16 nov. 2012, at 13:25, Oleg Vazhnev <ovaz...@gmail.com <javascript:>> > wrote:
> Adding perf tests result (not sure if this is important). > So the question is still why my original test is so slow.
> Operating System: Microsoft Windows 8 Pro with Media Center > - Version: 6.2.9200 > - ServicePack: 0
> Number of Processors: 1 > - Name: Intel(R) Core(TM) i7-3720QM CPU @ 2.60GHz > - Description: Intel64 Family 6 Model 58 Stepping 9 > - ClockSpeed: 2601 Mhz > - Number of cores: 4 > - Number of logical processors: 8 > - Hyperthreading: ON
> On Friday, November 16, 2012 2:03:01 PM UTC+4, Oleg Vazhnev wrote:
>> The results are for my laptop Lenovo W530 (using ivy bridge processor).
>> I've just also tried to run my test on HP DL360p Gen8 (2 * Xeon E5-2640), >> the results are: >> average = 4 minimum = 4 0-5 = 2579, 5-10 = 2388, 10-30 = 32, >30 = 1
>> This is in Release build with no debugger attached. >> No I didn't tried to run perf tests, probaly I should. >> Do you think that my test is correct and the problem is with hardware or >> sotware?
>> On Friday, November 16, 2012 1:39:11 PM UTC+4, Olivier Deheurles wrote:
>>> Could you describe the hardware used? >>> Have you tried to run the latency perf test in the perf test suite, in >>> release mode, without debugger attached?
>>> Olivier
>>> On 16 nov. 2012, at 07:56, Oleg Vazhnev <ovaz...@gmail.com> wrote:
>>> Bug in code
>>> average /= (length - 100); >>> should be >>> average /= (length / 2)
>>> Everything else is still correct.
>>> New output (average a little bit more): >>> average = 6 minimum = 3 0-5 = 4040, 5-10 = 625, 10-30 = 191, >30 = 144
>>> On Friday, November 16, 2012 11:51:18 AM UTC+4, Oleg Vazhnev wrote:
>>>> I'm using disruptor.net port of disruptor. I have significant >>>> perfomance problems and I don't know why:
>>>> - am I just measure perfomance wrong way? >>>> - am I use disruptor wrong way? >>>> - .net port of disruptor is so bad?
>>>> Probably someone can help me. In my real application I've measured that >>>> using Disruptor I spent ~50 microseconds in average to deliver one message. >>>> The code below is very simplified version of my real application. The >>>> difference is that in real appplication i have several consumers and I do >>>> pass "reference" instead of "int".
>>>> However even simplified test below produced such output:
>>>> 4 microsecond for Disruptor is HUGE delay. I expect it always to be >>>> much less than 1 microsecond. Waiting for suggestions and hints (also >>>> probably someone can test the same code with Java version, it would be easy >>>> to port I assume :)
>>>> using System; >>>> using System.Diagnostics; >>>> using System.Linq; >>>> using System.Threading; >>>> using System.Threading.Tasks; >>>> using Disruptor; >>>> using Disruptor.Dsl;
>>>> namespace DisruptorTest >>>> { >>>> public sealed class ValueEntry >>>> { >>>> public int Value { get; set; }
>>>> public class ValueAdditionHandler : IEventHandler<ValueEntry> >>>> { >>>> public void OnNext(ValueEntry data, long sequence, bool >>>> endOfBatch) >>>> {
>>>> class Program >>>> { >>>> public const int length = 10000; >>>> public static Stopwatch[] sw = new Stopwatch[length]; >>>> public static long[] results = new long[length];
>>>> private static readonly int _ringSize = 1048576; // Must be >>>> multiple of 2
>>>> public static int factorial;
>>>> static void Main(string[] args) >>>> { >>>> for (int i = 0; i < length; i++) >>>> {