One physical processor better than two !?

154 views
Skip to first unread message

Sergio

unread,
Apr 16, 2012, 4:25:50 PM4/16/12
to Disruptor
I have run the same test on two very similar machines, but one with
two physical processors (8 logical ones) and another one with one
physical processor (6 logical ones).

For my surprise the one with just one processor beats the one with
two:

Two Processors:
AtomicQueue => 90,909,090 messages/sec
VolatileQueue => 62,500,000 messages/sec

One Processor:
AtomicQueue => 138,133,212 messages/sec
VolatileQueue => 90,336,277 messages/sec

I suspect it has to do with my lack of thread PINNING, which makes the
same thread bounce between the two processors, messing up the cache.
Or it could be that the sequences (producer and consumer) must be
shared between the two processors (for checking against full and empty
queue). That does not happen when you have the two threads running in
the same physical processor.

Has anyone done any research on that, in other words, on a two-thread
scenario is it better to have disruptor run on a single processor or
to let it use both?

I am about to dive into this to find out:
http://stackoverflow.com/questions/2238272/java-thread-affinity

However if someone has already done this research and has already
reached a conclusion it would be nice. :)

-Sergio



Michael Barker

unread,
Apr 16, 2012, 5:07:58 PM4/16/12
to lmax-di...@googlegroups.com
You are generally correct. Scheduling can cause issues with
performance and the degree to which it hurts performance depends what
OS and version of said OS you are running on. Some of the 2.6
versions of Linux tends to be particularly bad, i.e. tends to move the
process from CPU to CPU quite frequently. I've seen better results on
Windows and Mac OS X. I've heard anecdotal evidence that the 3.3
kernel has some significant improvements in this area.

One of the best ways to test this on Linux is to make use of the
taskset command to restrict the set of cores that the threads can be
scheduled on. I've seen significant performance improvements in
applications using this technique (not just the Disruptor). One of
the other problems that can occur is the OS attempting to schedule
application threads onto cores that are also being used to service OS
level interrupt requests. For improved performance on Linux you can
use irqbalance or the smp_affinity field in the /proc file system to
restrict the CPUs that are used to service interrupts and separate
them from the application threads.

To summarise, thread affinity can improve performance.

Mike.

Sergio

unread,
Apr 16, 2012, 5:20:58 PM4/16/12
to Disruptor
Thanks Mike. I am curious to see which one is better. Both consumer
and producer pinned to the same processor OR pinned to different
processors.

Since they have to share sequences, I am curious if the cache miss
will be out-weighted by the extra processor horsepower.

Any guesses or conclusions for that question?

-Sergio

Michael Barker

unread,
Apr 16, 2012, 5:23:50 PM4/16/12
to lmax-di...@googlegroups.com
In my experience on the same processor will performance best. As the
data will be shared via the L3 rather than the QPI link or main
memory.

Mike.

Martin Thompson

unread,
Apr 16, 2012, 5:25:57 PM4/16/12
to Disruptor
Hi

Micro benchmarks look good via hyper threading on the same processor
but this is a unrealistic test.

Real tests need to be between processors as real work is typically
involved in processing an event. This requires CPU time for the
cycles to do that work.

Martin...

Sergio

unread,
Apr 16, 2012, 5:34:08 PM4/16/12
to Disruptor
That makes sense, Martin. My test just checks if the sequence is
correct but on a real application something more complex and time
consuming would probably need to be done.

Anyways it looks like thread affinity will play an important whole
when you have more than one physical processor. My next goal is to
understand and integrate the work Peter has done here:

https://github.com/peter-lawrey/Java-Thread-Affinity

So that at least the threads stick to the same core and you can have
an easy option to choose between virtual or real parallelism.

-Sergio

Michael Barker

unread,
Apr 16, 2012, 5:38:02 PM4/16/12
to lmax-di...@googlegroups.com
Just for a little clarity, I'm assuming that when you say physical
processor, you mean socket.

On Mon, Apr 16, 2012 at 10:20 PM, Sergio <sergio.ol...@gmail.com> wrote:

Martin Thompson

unread,
Apr 16, 2012, 5:38:33 PM4/16/12
to Disruptor
Thread affinity is so important to performance. Peter's library is a
great start.

It is worth noting that the latest versions of the Linux kernels are
much better at this. My tests on the 3.3 kernel show it is a huge
step forward.

Martin Thompson

unread,
Apr 16, 2012, 5:41:37 PM4/16/12
to Disruptor
Good point Mike. I'd assumed a processor core which can have hyper
threading. Valid but different interpretations.

Sergio

unread,
Apr 16, 2012, 5:54:04 PM4/16/12
to Disruptor
> Just for a little clarity, I'm assuming that when you say physical processor, you mean socket.

That can be confusing, I know. Just to make sure I understand this
correctly:

Physical processor = chip = socket

One socket can have one or more cores. cores = logical processors

One core can run one or more threads.

Can two threads running in the same core share L1 and L2? I guess they
must, right?

Can two threads running in two separate cores in the same socket share
L1 and L2? Probably not because L1 and L2 are per core, right?

Where does hyperthreading come into play here? Is it the ability for a
core to run more than one thread? That was just a quick guess...

-Sergio

Martin Thompson

unread,
Apr 16, 2012, 6:32:33 PM4/16/12
to Disruptor

On Apr 16, 10:54 pm, Sergio <sergio.oliveira...@gmail.com> wrote:
> > Just for a little clarity, I'm assuming that when you say physical processor, you mean socket.
>
> That can be confusing, I know. Just to make sure I understand this
> correctly:
>
> Physical processor = chip = socket
>
> One socket can have one or more cores. cores = logical processors

On Intel a core can have two logical processors. SMT allows the
pipeline to have two threads on the same core.

> One core can run one or more threads.

Only one at a time.

> Can two threads running in the same core share L1 and L2? I guess they
> must, right?

Only of they share the same core when then execute.

> Can two threads running in two separate cores in the same socket share
> L1 and L2? Probably not because L1 and L2 are per core, right?

No. L1 and L2 are core local.

> Where does hyperthreading come into play here? Is it the ability for a
> core to run more than one thread? That was just a quick guess...

Hyperthreading comes into play because each core has multiple
execution units that be be shared by two threads. This is especially
effective when cache misses occur or the PAUSE instruction is used.

Sergio

unread,
Apr 20, 2012, 3:37:02 PM4/20/12
to Disruptor
To make more tests with MentaQueue I created a new Java Thread
Affinity project => http://mentaaffinity.soliveirajr.com

Now I have to find a machine with enough logical processors to test
this.

-Sergio
Reply all
Reply to author
Forward
0 new messages