In my experience on the same processor will performance best. As the
On Mon, Apr 16, 2012 at 10:20 PM, Sergio <sergio.oliveira...
> Thanks Mike. I am curious to see which one is better. Both consumer
> and producer pinned to the same processor OR pinned to different
> Since they have to share sequences, I am curious if the cache miss
> will be out-weighted by the extra processor horsepower.
> Any guesses or conclusions for that question?
> On Apr 16, 4:07 pm, Michael Barker <mike...@gmail.com> wrote:
>> You are generally correct. Scheduling can cause issues with
>> performance and the degree to which it hurts performance depends what
>> OS and version of said OS you are running on. Some of the 2.6
>> versions of Linux tends to be particularly bad, i.e. tends to move the
>> process from CPU to CPU quite frequently. I've seen better results on
>> Windows and Mac OS X. I've heard anecdotal evidence that the 3.3
>> kernel has some significant improvements in this area.
>> One of the best ways to test this on Linux is to make use of the
>> taskset command to restrict the set of cores that the threads can be
>> scheduled on. I've seen significant performance improvements in
>> applications using this technique (not just the Disruptor). One of
>> the other problems that can occur is the OS attempting to schedule
>> application threads onto cores that are also being used to service OS
>> level interrupt requests. For improved performance on Linux you can
>> use irqbalance or the smp_affinity field in the /proc file system to
>> restrict the CPUs that are used to service interrupts and separate
>> them from the application threads.
>> To summarise, thread affinity can improve performance.
>> On Mon, Apr 16, 2012 at 9:25 PM, Sergio <sergio.oliveira...@gmail.com> wrote:
>> > I have run the same test on two very similar machines, but one with
>> > two physical processors (8 logical ones) and another one with one
>> > physical processor (6 logical ones).
>> > For my surprise the one with just one processor beats the one with
>> > two:
>> > Two Processors:
>> > AtomicQueue => 90,909,090 messages/sec
>> > VolatileQueue => 62,500,000 messages/sec
>> > One Processor:
>> > AtomicQueue => 138,133,212 messages/sec
>> > VolatileQueue => 90,336,277 messages/sec
>> > I suspect it has to do with my lack of thread PINNING, which makes the
>> > same thread bounce between the two processors, messing up the cache.
>> > Or it could be that the sequences (producer and consumer) must be
>> > shared between the two processors (for checking against full and empty
>> > queue). That does not happen when you have the two threads running in
>> > the same physical processor.
>> > Has anyone done any research on that, in other words, on a two-thread
>> > scenario is it better to have disruptor run on a single processor or
>> > to let it use both?
>> > I am about to dive into this to find out:
>> > However if someone has already done this research and has already
>> > reached a conclusion it would be nice. :)
>> > -Sergio