Is there a C++ port of the Disruptor pattern in the works?

3,394 views
Skip to first unread message

Gravitas

unread,
Aug 4, 2011, 11:44:54 AM8/4/11
to Disruptor
Hi,

There are already .NET and Java versions of the Disruptor pattern - is
there a C++ version in the works?

If there is no C++ version - what are your thoughts on the the
performance gains by switching to C++?

Martin Thompson

unread,
Aug 4, 2011, 4:31:10 PM8/4/11
to Disruptor
Hi Shane,

We are considering a port of the Disruptor to C++. There are a few
performance gains to be made on specific platforms but I think it will
be marginal. I'm considering a port for more specialised tasks like
replacing part of the network stack.

Martin...

Gravitas

unread,
Aug 6, 2011, 2:20:00 PM8/6/11
to Disruptor
Hi,

Excellent. I'd be happy to help (I have 10 years of experience in C+
+).

As the Disruptor code is so clean and compiles into efficient code, I
think the performance gains might be minimal (but you never know until
you try). The Intel C++ compiler is reported to have 15% to 30% speed
gains over the Microsoft C++ compiler.

For support of existing high performance C++ programs, a C++ version
would be absolutely superb.

Martin Thompson

unread,
Aug 7, 2011, 6:10:34 PM8/7/11
to Disruptor
Thanks Shane,

I have a new version of the Disruptor that can give a 15%-20%
performance improvement but I'm battling with Hotspot. Hotspot often
re-compiles to a worse performance version after a few runs. I plan
to blog on this shortly. However I don't think it will make the same
"mistakes" if not a micro-benchmark.

I spent most of the 90s writing C++ and miss some of the control it
offers. If memory serves me right the Watcom guys had a great C++
compiler that got acquired by Intel. This is probably why they have
better output than the Microsoft C++ compiler. I do love Java, but C/C
++ can give more predictable results. It is not about outright
performance and more about predictability.

For reference I think 30m+ messages per second is possible on a
standard machine with C++ at 30-40ns latency between threads for the
1P1C case. If using C/C++ we can also specialise by platform to avoid
false sharing and use the most efficient memory barriers available.

Watch this space.

Martin...

Fil Mackay

unread,
Aug 7, 2011, 10:26:08 PM8/7/11
to Disruptor
> For reference I think 30m+ messages per second is possible on a
> standard machine with C++ at 30-40ns latency between threads for the
> 1P1C case.  If using C/C++ we can also specialise by platform to avoid
> false sharing and use the most efficient memory barriers available.

Do you really mean 30-40 nanoseconds between threads? What technique
do you use to pass data between threads to achieve this, shared data
presumably but I'm not sure how this ballpark latency can be achieved?

Regards, Fil.

Martin Thompson

unread,
Aug 8, 2011, 1:39:02 AM8/8/11
to Disruptor
Yes I do mean 30-40ns to exchange a single word message between two
threads. Memory barriers are the technique to do this.

http://mechanical-sympathy.blogspot.com/2011/07/memory-barriersfences.html

The current version of the Disruptor can do ~50ns between threads at a
rate of 1m messages per second. Check the following performance test.

http://code.google.com/p/disruptor/source/browse/trunk/code/src/perf/com/lmax/disruptor/Pipeline3StepLatencyPerfTest.java

This test does not work on Windows because of clock resolution. I've
measured it successfully on Linux with the lastest JVMs via hardware
with an invariant TSC. To do this effectively you need to know your
implementation is executing the "rdtsc" asm instruction.

Fil Mackay

unread,
Aug 8, 2011, 1:49:25 AM8/8/11
to lmax-di...@googlegroups.com
Right.. I had written a small app to do a minimalist implementation ad could not get this - but it was the Windows time measurement screwing with me :)

What I'm doing now is getting two threads to pass messages back and forth, then measure a whole batch of them.

Unfortunately I can't use Disruptor (yet) since I need cross-process queues.. I'm also playing with a new form of queue-ish that stores the state of an object (not messages). Then you can use it as a cache to get the latest value any time. Thinking of using a sequence number which would be checked pre- and post-read to ensure the read was consistent.

Regards, Fil.

Gravitas

unread,
Aug 9, 2011, 1:10:19 PM8/9/11
to Disruptor
> For reference I think 30m+ messages per second is possible on a
> standard machine with C++ at 30-40ns latency between threads for the
> 1P1C case.  If using C/C++ we can also specialise by platform to avoid
> false sharing and use the most efficient memory barriers available.
>
> Watch this space.

Would I be correct in saying that if you can do 30m+ messages per
second on standard Intel hardware, its a new world record?

Martin Thompson

unread,
Aug 9, 2011, 3:40:11 PM8/9/11
to Disruptor

cde1537

unread,
Aug 12, 2011, 10:26:57 AM8/12/11
to Disruptor
So how does this ~10% performance gain opportunity to be had on all
memory exchanges using C++ affect your decision making process on
whether to port the disruptor over to c++? I haven't programmed in
java in a few years now but as was mentioned earlier I would have to
think that the control that you have in c++ would provide you with
other ways to either reduce jitter or lower latency.

I would personally love to try integrating the disruptor into some of
our production processes and see how it fares against some of our
queue based architectures. I've used zeromq in cases where the 1P-1C
model allows it but there are a lot of problems in our system that are
many P - many C that the disruptor would be useful for.

~Chris

On Aug 9, 2:40 pm, Martin Thompson <mjpt...@gmail.com> wrote:
> Check out my latest blog :-)
>
> http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency....

Gravitas

unread,
Aug 13, 2011, 6:45:33 AM8/13/11
to Disruptor
I suspect there may be a much higher performance gain when switching
to C++. C++ allows you to allocate a perfectly contiguous area of
memory for the ring buffer, which will mean the memory subsystem can
pre-fetch the next item in the ring buffer more efficiently.

On a high end Intel i7, the maximum bandwidth between two CPUs is
93Gbyte/second. The .NET version of the Disruptor uses 5Gbyte/second
of this bandwidth, when transferring packets that are 1,500 bytes in
size between two CPUs. It runs at 4 million ops per second.

Why do the C++ version? Because ultimately, if you really want high
performance computing, its the only way to squeeze that extra order of
magnitude performance out of the code. And you also get to set new
world records - think of your CV :)

RobotTwo

unread,
Aug 13, 2011, 10:20:57 PM8/13/11
to Disruptor
Hi all,

I see there's interest here, so I thought you should know that I
ported the Disruptor to C++ and started a new google code project for
it at http://code.google.com/p/disruptor-cpp/. See
http://www.2robots.com/2011/08/13/a-c-disruptor/ for some background.

I think the main value isn't so much that the C++ version would be
faster than the Java one (although that may indeed be possible), but
that you can use it from new and existing C++ code.

The main implementation is complete, and I've run some testing
configurations which seem to be functioning fine. However, I'd welcome
anyone who can help port the unit and performance testing framework
from the Java version to C++. This will ensure correctness, and also
allow us to produce a more apples-to-apples performance benchmark.

Thanks,
-Dan

Xin Wang

unread,
Aug 14, 2011, 9:58:37 AM8/14/11
to lmax-di...@googlegroups.com
Have you considered to use template instead of the dynamic polymorphism? I think disruptor is a perfect example to exercise the policy based design and you can get rid of virtual functions (have some performance gain in theory). 

Hart

unread,
Aug 19, 2011, 9:49:55 AM8/19/11
to Disruptor
I was working on a C++ port myself. One area that I'm not so sure
about are the volatile sequence values in the RingBuffer and
BatchConsumer. Volatile semantics are not the same between Java and C+
+. I think in the C++ version you'll need to use read/write memory
barriers around the sequences to provide the same semantics as under
Java.


On Aug 13, 9:20 pm, RobotTwo <robot...@gmail.com> wrote:
> Hi all,
>
> I see there's interest here, so I thought you should know that I
> ported the Disruptor to C++ and started a new google code project for
> it athttp://code.google.com/p/disruptor-cpp/. Seehttp://www.2robots.com/2011/08/13/a-c-disruptor/for some background.

Martin Thompson

unread,
Aug 19, 2011, 10:55:06 AM8/19/11
to Disruptor
Java has a defined memory model since 1.5 that is cross platform.
Unfortunately C++ does not.

For C++ you are OK if the variables for the sequences are qualified as
volatile *and* you are on x86/x64 *and* you use a software compiler
memory barrier after the write, e.g. barrier() on Linux. You have to
be sure your code is safe for "loads can be reordered with older
stores". For the Java implementation this is the case. If you have
any doubt you should use a primitive that generates a store memory
barrier eventually resulting in a "lock" or "mfence/sfence"
instruction. The GNU atomic builtins or boost are a good example of
this.

For C++ you need to consider all the platforms it can be compiled on,
plus how it gets compiled.

On Aug 19, 2:49 pm, Hart <ryan.lee.h...@gmail.com> wrote:
> I was working on a C++ port myself. One area that I'm not so sure
> about are the volatile sequence values in the RingBuffer and
> BatchConsumer. Volatile semantics are not the same between Java and C+
> +. I think in the C++ version you'll need to use read/write memory
> barriers around the sequences to provide the same semantics as under
> Java.
>
> On Aug 13, 9:20 pm, RobotTwo <robot...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Hi all,
>
> > I see there's interest here, so I thought you should know that I
> > ported the Disruptor to C++ and started a new google code project for
> > it athttp://code.google.com/p/disruptor-cpp/. Seehttp://www.2robots.com/2011/08/13/a-c-disruptor/forsome background.

RobotTwo

unread,
Aug 19, 2011, 11:00:07 AM8/19/11
to Disruptor
For the atomic variables I used boost::atomic, which has the same
semantics of std:atomic soon-to-be-appearing in C++0x (and is
basically the same as Java's atomics).

I didn't use any extra barriers around the volatiles -- it's not clear
to me from the comments in this thread if you meant that barrier() is
necessary for the volatiles as well as the atomics.

Hart -- I encourage you to contribute to http://code.google.com/p/disruptor-cpp/
rather than start a new project. Given the same amount of effort,
everyone will end up ahead.


-Dan

Michael Barker

unread,
Aug 19, 2011, 11:06:55 AM8/19/11
to lmax-di...@googlegroups.com
The key semantic that should be maintained is that the write to the
ring buffer event/entry must happen before the update to the sequence
number and be made visible to other threads in that order. Recently
we (and by we I mean Martin) discovered in Java a full volatile
variable is not required to preserve this. AtomicLong.lazySet is
sufficient in this case and is significantly faster as it does not
require the store buffer to be flushed, which is necessary when using
a volatile variable.

Mike.

Martin Thompson

unread,
Aug 19, 2011, 11:13:52 AM8/19/11
to Disruptor
The boost atomic store with memory_order_release for the write would
be sufficient if paired with load and memory_order_acquire. The
compiler should then not reorder around them and generate the
appropriate memory fences. On x86/x64 the store and loads are just
simple mov instructions issued in the right order for the full qword
and not cached in registers. I'd expect a C++ implementation taking
this approach to be very high performance. Many libraries for this
primitives exist, Linux has its own, GNU has Atomic Builtins, Boost
for the C++0x support, Intel macros, etc. The threads in this group
have touched on many of them so probably is a little confusing. Pick
your compiler, hardware, OS, libs to suit preferences/needs.

On Aug 19, 4:00 pm, RobotTwo <robot...@gmail.com> wrote:
> For the atomic variables I used boost::atomic, which has the same
> semantics of std:atomic soon-to-be-appearing in C++0x (and is
> basically the same as Java's atomics).
>
> I didn't use any extra barriers around the volatiles -- it's not clear
> to me from the comments in this thread if you meant that barrier() is
> necessary for the volatiles as well as the atomics.
>
> Hart -- I encourage you to contribute tohttp://code.google.com/p/disruptor-cpp/

Hart

unread,
Aug 20, 2011, 10:18:02 AM8/20/11
to Disruptor
I started my C++ port before I discovered that you had already posted
a port several days earlier. Regardless, I don't work at a company
that would allow me to post my port anyway, but I'd be happy to
contribute to the existing port. Another issue with any C++ port as
Martin pointed out is that the low level details are going to vary
based on the hardware you're running on, the compiler you're using,
and personal preferences and/or corporate standards for external
libraries (boost, ACE, TBB, etc.). I see that you used boost, but
we're unfortunately stuck on an old version of boost. I went the
direction of using a pthread mutex and condition as it's more standard
and doesn't add any additional third party library dependencies. I
also used the GNU atomic methods, but then you need to be on a more
recent version of the compiler to have access to those.

On Aug 19, 10:00 am, RobotTwo <robot...@gmail.com> wrote:
> For the atomic variables I used boost::atomic, which has the same
> semantics of std:atomic soon-to-be-appearing in C++0x (and is
> basically the same as Java's atomics).
>
> I didn't use any extra barriers around the volatiles -- it's not clear
> to me from the comments in this thread if you meant that barrier() is
> necessary for the volatiles as well as the atomics.
>
> Hart -- I encourage you to contribute tohttp://code.google.com/p/disruptor-cpp/

Hart

unread,
Aug 20, 2011, 10:36:20 AM8/20/11
to Disruptor
So, I didn't mark my C++ variables as volatile but instead inserted an
LFENCE before reading them and a SFENCE after writing them. I think
this analogous to a Java volatile based on some other disruptor
documentation I found? Is this not sufficient or is it possibly
overkill? Do I still need to mark the variables as volatile?

Martin Thompson

unread,
Aug 20, 2011, 10:56:22 AM8/20/11
to Disruptor
Depends on your compiler and options. If you try the C++ example I
use in my blog entry on inter thread latency the issue can be seen.

http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency.html

Try removing the volatile keywords and compile the code with and
without the -O3 options and see what happens. The compiler can cache
the variable in a register and spin forever. Preserving order and
flushing buffer is achieved via fences, they don't change the usage of
registers. You have to make sure the compiler generates the right
instructions in the right order.

My understanding is the fences are a better option on AMD and lock ...
on Intel for performance for the same semantics. Not measured this
extensively though.

Hart

unread,
Aug 29, 2011, 12:01:04 PM8/29/11
to Disruptor
So I've updated my port to include all of the 2.0 interface changes
and enhancements. I still have some questions / concerns about the
Sequence atomic I'm using since there's no direct equivalent to the
atomic array you're using. I switched from TBB atomic to boost atomic
as it provides more low level control over memory ordering options.

That aside, I started porting the test cases and found the results to
be a bit strange. The test case performance results seem to line up
with the numbers published for Java except for the Sequencer 3P/1C and
Multicast 1P/3C test cases. For some reason the Sequencer test case
never seems to get much above 1 millions TPS and the multicast test
case peaks at about 3.5 million TPS. Here are the numbers I'm seeing
for the different tests cases.

Unicast 1P/1C; Yielding; 14039811 TPS Mean; 14894143 TPS Peak
Unicast 1P/1C; Busy Spin; 20120166 TPS Mean; 28091542 TPS Peak
Unicast 1P/1C Batched; Yielding; 19,630,858 TPS Mean; 20,666,422 TPS
Peak
Unicast 1P/1C Batched; Busy Spin; 32,227,806 TPS Mean; 33,053,706 TPS
Peak
Sequencer 3P/1C; Yielding; 1,547,915 TPS Mean; 1,771,610 TPS Peak
Pipeline 1P/3C; Yielding; 18,067,570 TPS Mean; 27,891,303 TPS Peak
Pipeline 1P/3C Batched; Yielding; 31,466,469 TPS Mean; 36,902,241 Peak
Multicast 1P/3C; Yielding; 5,307,468 TPS Mean; 5,706,506 TPS Peak
Multicast 1P/3C Batched; Yielding; 3,248,642 TPS Mean; 3,503,955 TPS
Peak
Diamond 1P/3C; Yielding; 18,595,194 TPS Mean, 26,639,389 TPS Peak

It seems strange that the Sequencer and Multicast test cases are so
below the Java numbers when the rest of the test cases seem to meet or
exceed the Java numbers (at least the old ones). Any ideas what might
be going on here or what I should look at?

Related to this I'm having problems running the latency performance
test. Given that the latency test case is based on the 3 step pipeline
case which generated my best numbers (30 million plus TPS), I would
expect some really low latencies. But, when I run the latency test I
get mean latencies well over 1000 nsec. I'm using the RDTSC operation
to generate my timestamps. On my test machine it appears to take about
20 nsec to generate the rdtsc timestamp, but the latency performance
test case factors this cost out. It appears that the act of trying to
measure the latency is completely altering the latency profile. I have
a feeling that perhaps calling the RDTSC instruction is perhaps
flushing registers / cache lines, or for whatever reason isn't being
"processor sympathetic". Maybe it's also synchronization issues
between the RDTSC values on the different processors? My understanding
is that there's no garantee that these values will be in sync between
processors, but then I don't seem to have a better nanosecond
resolution timer option?

Thanks, Ryan



On Aug 20, 9:56 am, Martin Thompson <mjpt...@gmail.com> wrote:
> Depends on your compiler and options.  If you try the C++ example I
> use in my blog entry on inter thread latency the issue can be seen.
>
> http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency....
>
> Try removing the volatile keywords and compile the code with and
> without the -O3 options and see what happens.  The compiler can cache
> the variable in a register and spin forever.  Preserving order and
> flushing buffer is achieved via fences, they don't change the usage of
> registers.  You have to make sure the compiler generates the right
> instructions in the right order.
>
> My understanding is the fences are a better option on AMD and lock ...
> on Intel for performance for the same semantics.  Not measured this
> extensively though.
>
> On Aug 20, 3:36 pm, Hart <ryan.lee.h...@gmail.com> wrote:
>
>
>
> > So, I didn't mark my C++ variables as volatile but instead inserted an
> > LFENCE before reading them and a SFENCE after writing them. I think
> > this analogous to a Java volatile based on some other disruptor
> > documentation I found? Is this not sufficient or is it possibly
> > overkill? Do I still need to mark the variables as volatile?- Hide quoted text -
>
> - Show quoted text -

cde1537

unread,
Aug 29, 2011, 12:30:50 PM8/29/11
to Disruptor
Hart,

Is there a project started that contains your ported code (much like
RobotTwo's disruptor-cpp) for review?

~Chris

Dan Eisner

unread,
Aug 30, 2011, 7:16:14 AM8/30/11
to lmax-di...@googlegroups.com


If there's a c++ implementation people prefer more than disruptor-cpp, I'd be happy to update the code with that version.  I'd rather maintain a repository people use than not.

Sent from my iPhone

cde1537

unread,
Sep 27, 2011, 11:25:19 AM9/27/11
to Disruptor
Dan,

Your port was a great contribution to the community. It was the
perfect entry point for getting the disruptor worked into existing c++
projects. I've been following through the Java changes and some of the
changes to the API made in the past month have been very interesting
(worker pool, event processor vs consumers among others). I started
porting to 2.0 when that came out but before I was able to finish
there 2.0.1, 2.0.2, 2.5, etc.

Sometimes professional workflow allows me to spend time on side
projects like this and other times not so much. The last month has
been fairly hectic and I had to put all this aside. As far as I can
tell there is no publicly released port of the disruptor that matches
API 2.0 or higher. If anyone has found one could they please point me
to it? If not, perhaps Martin or someone else at LMAX can comment on
the possibility of an official port being in the works? If I recall
there were comments made around the 2.0 release that the changes to
the API were in part made to simplify the porting process.

Thanks,
Chris

On Aug 30, 6:16 am, Dan Eisner <eis...@2robots.com> wrote:
> > If there's a c++ implementation people prefer more than disruptor-cpp, I'd be happy to update the code with that version.  I'd rather maintain a repository people use than not.
>
> > Sent from my iPhone
>

Daniel Eisner

unread,
Sep 27, 2011, 12:40:55 PM9/27/11
to lmax-di...@googlegroups.com

Thanks -- yeah, I ran into the same issue that the official release is moving too fast for me to keep up with the C++ side (which is good!). Having said that, anyone who can contribute patches to bring it up to speed is welcome!

 

 

On Tue, 27 Sep 2011 08:25:19 -0700 (PDT), cde1537 wrote:

Dan,

Your port was a great contribution to the community. It was the
perfect entry point for getting the disruptor worked into existing c++
projects. I've been following through the Java changes and some of the
changes to the API made in the past month have been very interesting
(worker pool, event processor vs consumers among others). I started
porting to 2.0 when that came out but before I was able to finish
there 2.0.1, 2.0.2, 2.5, etc.

Sometimes professional workflow allows me to spend time on side
projects like this and other times not so much. The last month has
been fairly hectic and I had to put all this aside. As far as I can
tell there is no publicly released port of the disruptor that matches
API 2.0 or higher. If anyone has found one could they please point me
to it? If not, perhaps Martin or someone else at LMAX can comment on
the possibility of an official port being in the works? If I recall
there were comments made around the 2.0 release that the changes to
the API were in part made to simplify the porting process.

Thanks,
Chris

On Aug 30, 6:16 am, Dan Eisner  wrote:
If there's a c++ implementation people prefer more than disruptor-cpp, I'd be happy to update the code with that version.  I'd rather maintain a repository people use than not.
Sent from my iPhone
On Aug 29, 2011, at 12:30 PM, cde1537 wrote:
Hart,
Is there a project started that contains your ported code (much like RobotTwo's disruptor-cpp) for review?
~Chris
On Aug 20, 9:56 am, Martin Thompson wrote:
Depends on your compiler and options.  If you try the C++ example I use in my blog entry on inter thread latency the issue can be seen.
http://mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency....
Try removing the volatile keywords and compile the code with and without the -O3 options and see what happens.  The compiler can cache the variable in a register and spin forever.  Preserving order and flushing buffer is achieved via fences, they don't change the usage of registers.  You have to make sure the compiler generates the right instructions in the right order.
My understanding is the fences are a better option on AMD and lock ... on Intel for performance for the same semantics.  Not measured this extensively though.

Martin Thompson

unread,
Sep 28, 2011, 6:51:17 AM9/28/11
to Disruptor
I realise there was a flurry of change lately but this should be
settling down now. A lot of the reason for the change was to make the
code more generic and less Java specific. For example, reduce the use
of inner classes. This should make a port to C++, or similar
imperative OO languages, much easier.

I have considered doing a C++ port myself many times but it tends to
be driven by if I have a need for it professionally. This may still
happen before the end of the year.

Do people have a preference of OS, hardware architecture, concurrent
primitives library, and compiler for a C++ port?

Martin...

cde1537

unread,
Sep 28, 2011, 2:35:43 PM9/28/11
to Disruptor
Martin,

Those flurries of change were all positive. You've done a lot of great
work and have implemented a whole slew of requests that people have
put forth. I took the approach that there was enough other work on my
plate already that there would be time to let this project flush out
and come back to it later when time permitted. I see that possibility
for me in the near future.

To answer your closing question my preferences would be as follows:
OS - RedHat
Architecture - 64bit Intel
Concurrent Primitives Library - gnu built in or boost
Compiler - gcc

~Chris

Carl Cook

unread,
Sep 30, 2011, 9:41:29 AM9/30/11
to Disruptor
Hi all,

Not to offend anyone, but I'm quite curious as to why the Disruptor
(and LMAX for that matter) were written in Java. I appreciate that you
get cross-platform support for free (excluding the occasional
incompatibility), and Java is a higher level language (making it
somewhat easier to not worry about the low-level concepts such as
pointer aliasing, memory barriers, etc), but then again, for Disruptor
you probably don't want non-deterministic garbage collection, and you
do want very low level control. Or is Disruptor written in Java
primarily to to help speed up Java-based applications? I ask, because
I too would be interested in a C++ port - it seems like an interesting
project to bring concepts such as lock free queues, pre-allocated
memory, and cache-friendliness together and compare performance
against more traditional approaches.

Martin Thompson

unread,
Sep 30, 2011, 6:31:53 PM9/30/11
to Disruptor
The Disruptor could be written many languages and benefit any of
them. The goal is not to have the absolute best possible
implementation. If it was, even for Java I'd not use any polymorphic
calls and make everything absolutely business case specific. The goal
for LMAX in releasing the Disruptor is to stimulate the trading
community, many of which may write clients that could trade against
our exchange. :-) After surveying retail trading customers we found
Java and .NET are the most common platforms.

Now if you want to get the absolute best possible performance with the
lowest and most predictable latency you probably will not even be
using a main stream OS. Most likely a C/C++ application running in an
embedded type OS on over-clocked and liquid cooled system paired with
some FPGA support for specific tasks. Not exactly mainstream :-)

C++ is a great platform for the Disruptor however it is a significant
effort to create a version to beat the Java performance and make it
portable. A pure Linux/g++/Boost/x64 version is not a major effort,
but others will want Windows or BSD, then there is the range compilers
followed by the range of ways to mange the memory model depending on
CPU architecture. All possible to work with but non-trivial. I'm
might just pick a combination that suits me and release it, then let
others work on the portability.

At the end of the day in my experience all the biggest issues will
still be found in the algorithms used in the business logic. To keep
something as tight as the Disruptor and apply similar thinking to
every line of code you cannot work with teams bigger than 4-6 people.
Java and .NET are not a bad approaches when you do not need the
absolute best performance but have a lot of business logic to develop
for a moderate sized team. Sure there are some gotchas but they can
be managed with the right blend of skills in a team.

I've written a lot of code in C++, yet I wrote a matching engine in
Java that was faster than any others in C/C++ I've heard off. I'm
sure I could do much better even in Java having done it a few times
already. Side by side I can make a C++ algorithm beat Java but it
does take more work. Languages have certain features but in the end
when things are more than micro benchmarks I find it is the algorithms
that matter the most.

"What is commercially good enough?", is the question which we must
always ask.

Gravitas

unread,
Oct 2, 2011, 6:57:36 PM10/2/11
to Disruptor
Hi Martin,

> Side by side I can make a C++ algorithm beat Java but it
> does take more work.  Languages have certain features but in the end
> when things are more than micro benchmarks I find it is the algorithms
> that matter the most.

I appreciate your comment, from someone of your experience this is
useful to know.

In addition, I also have to thank you for one of the comments you made
at the talk at TradeTech 2011. You mentioned a few times (and I
paraphrase) "one class, one thing - the complexity of a class grows as
the square of its concerns.". Over the past month, I've been cutting
down the number of concerns in each class, and creating smaller
classes that can easily be tested. This method certainly makes testing
a snap, and it really does reduce the complexity of the program from
an overall maintenance point of view.

Shane.

Francois Saint-Jacques

unread,
Oct 5, 2011, 10:34:09 PM10/5/11
to Disruptor
Hello,

I've made a port called disruptor--, it's quite different than Dan's
disruptor-cpp.

-no dependency on boost, only C+11 features
-no compilation needed, 100% templated
-autotools integrated for tests/benchmark
-almost up to date with latest LMAX's disruptor 2.6
-using Google C++ style guide

This is an early port, probably not production ready. Next phase is
testing and benchmarks.

Now for the number crunchers, benchmark are run on a (old) Core2 Duo
E8400.

For comparison, I've run Martin's inter thread latency (http://
mechanical-sympathy.blogspot.com/2011/08/inter-thread-latency.html) 50
times getting a mean of 32,451,655 ops/sec with standard dev. of
55,616. The only test supported ATM is 1P- 1EP (1 publisher, 1 batch
event processor), getting a mean of 29,785,805 ops/sec, standard dev.
of 1,475,971, and maximum of 32,211,178 (pretty close to the inter-
thread ops/sec).

I have also run the benchmark on a core i7 930, but there's a really
high variance (12m-60m ops/sec). I'm still trying to find out why
there is such a high variance before posting numbers.

Francois

mikeb01

unread,
Oct 6, 2011, 2:43:07 AM10/6/11
to lmax-di...@googlegroups.com
I've made a port called disruptor--, it's quite different than Dan's
disruptor-cpp.

-no dependency on boost, only C+11 features
-no compilation needed, 100% templated
-autotools integrated for tests/benchmark
-almost up to date with latest LMAX's disruptor 2.6
-using Google C++ style guide

Sound interesting, do you have a link to the source?

Mike. 

Francois Saint-Jacques

unread,
Oct 6, 2011, 8:30:43 AM10/6/11
to lmax-di...@googlegroups.com
https://github.com/fsaintjacques/disruptor--

François

--
Sent from my jetpack.

Francois Saint-Jacques

unread,
Oct 11, 2011, 1:54:16 PM10/11/11
to Disruptor
Is there any information on the exact procedure to replicate the
performance tests you guys are doing?

-I'd like to know how do you deal with CPU affinity, tasket vs cpuset.
Do you exploit HyperThreading?
-Which JVM do you use, and what flags.

François

On Oct 5, 10:34 pm, Francois Saint-Jacques

Hart

unread,
Oct 11, 2011, 3:19:34 PM10/11/11
to Disruptor
I'm in the same boat. I was working on the 2.X ports and having
problems keeping up. I've since created a 2.6 port and my initial
performance numbers didn't look good with the 2.6 changes and I
haven't had a chance to go back and figure out why yet.
> > >>>> - Show quoted text -- Hide quoted text -

Martin Thompson

unread,
Oct 12, 2011, 7:44:33 AM10/12/11
to Disruptor
Is the question, "how do we run the Java performance tests"? Or is
this related to the C++ ports?

On Oct 11, 6:54 pm, Francois Saint-Jacques

Francois Saint-Jacques

unread,
Oct 12, 2011, 8:34:13 AM10/12/11
to lmax-di...@googlegroups.com
This is not related to the C++ port. I should have put that question
in a different post.

I want to replicate the performance tests.


Francois

--
Sent from my jetpack.

Martin Thompson

unread,
Oct 13, 2011, 8:25:43 AM10/13/11
to Disruptor
The results posted are average runs on the standard Hotspot JVM with
default settings on the hardware listed. Runs can be significantly
better if using CPU affinity as can be seen in the article posted
below:

http://java.dzone.com/articles/java-threads-steroids

I'm planning on open sourcing my ExecutorService that allows for
thread affinity on Linux in the next few weeks then republish the
current performance results which have significantly improved.

On Oct 12, 1:34 pm, Francois Saint-Jacques

Francois Saint-Jacques

unread,
Oct 13, 2011, 9:25:00 AM10/13/11
to lmax-di...@googlegroups.com
My next question is about processor topology. I've seen a huge
performance boost by exploiting HyperThreading in the unicast
benchmark. But I feel like it's cheating, since it's on the same core.

This is the kind of details I'm also interested in.

Francois

Martin Thompson

unread,
Oct 13, 2011, 10:04:34 AM10/13/11
to Disruptor
We do not use HT to get good results. I often turn it off in the BIOS
when testing or I bind to particular cores avoiding it. In the real
world HT has advantages for extra concurrent threads when doing a lot
of IO or incurring cache misses thus allowing the pipeline to keep
going for a second thread on that core. We try to avoid both so not a
useful real-world win because typical uses of the Disruptor are CPU
bound.

On Oct 13, 2:25 pm, Francois Saint-Jacques
Reply all
Reply to author
Forward
0 new messages