I'm not sure what the cause of the degradation at 9 cores could be, it
would be something to do with having to move data across sockets,
there also could be some false sharing involved. Perf stat might give
more information.
I haven't really looked at the Abstraktor in detail enough yet to
understand the difference, however I can think of some possible
optimisations to the disruptor implementation. For example it is
possible to change the Disruptor's backing data store with something
that is more cache friendly.
If I was passing the result out from each of the handlers using the
Disruptor, then I'd probably have a small disruptor per handler and
have a single thread polling multiple ring buffers. The is not
particularly well supported in the Disruptor, yet, but is the next
major change I'm looking at. If you have a single Disruptor for the
results you will probably introduce contention. One other
optimisation is if you are taking this approach you don't need to
publish the current sum on every message in, only if the endOfBatch
flag is true. Each event handler could cache the sum value and
publish once the batch is finished.
Mike.