The most efficient way to partition and process many events.

170 views
Skip to first unread message

박성우

unread,
Aug 31, 2022, 3:36:16 PM8/31/22
to Disruptor
I recently encountered disruptor and am trying to introduce it to a project that needs to process many events per second. 

 I understood through searching that events can be sharded (or partitioned) when they can be grouped by a specific key. 

 But I have a question. (Of course, I'll have to do some performance testing, but...) 
 Basically, many consumers of the disruptor have a multicast structure, so When multiple event handlers are sharded and processed through a hash key, the event is allocated and if it is not the corresponding hash key, it is ignored. 

 Wouldn't it be more efficient to partition the event in the first place so that multiple disruptors consume it to a single consumer?

Sam Barker

unread,
Aug 31, 2022, 8:56:40 PM8/31/22
to lmax-di...@googlegroups.com
>  Wouldn't it be more efficient to partition the event in the first place so that multiple disruptors consume it to a single consumer?
Efficient in what way? Each disruptor instance implies a number of threads, often spinning busily, to process events. So while having multiple disruptor instances (which is what I think your question implies) might seem more efficient in the sense that each instance only processes its own events it's less efficient overall as there are more threads contending for the underlying CPU resources. 

Hopefully I've understood your question correctly...

Sam

--
You received this message because you are subscribed to the Google Groups "Disruptor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lmax-disrupto...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/lmax-disruptor/995a62a2-3e22-442a-8153-cffbe58f2b4an%40googlegroups.com.

박성우

unread,
Aug 31, 2022, 9:41:28 PM8/31/22
to Disruptor
Thanks for your comments.
Many sharding examples show the following pattern:

if (hash == hash(event.key) {
  // event processing
} else {
  // ignore
}

It may be a stupid question as I don't fully understand the internal implementation of disruptor yet.
Assuming that the number of shards (consumer group) is large, there is a concern that multicasting to all shards would be wasteful.

2022년 9월 1일 목요일 오전 9시 56분 40초 UTC+9에 s...@quadrocket.co.uk님이 작성:

Sam Barker

unread,
Aug 31, 2022, 10:00:58 PM8/31/22
to lmax-di...@googlegroups.com
Ahhh, I understand your question now. 

Yes, as you say it's not "optimal" to pass every event to every consumer, but it's still a cheap option, assuming a relatively low cost hash function, generally one wouldn't use large numbers of consumers. For the lowest latency use cases the number of consumers needs to be lower than the CPU core count so each thread is able to fully utilise a core. 

Which connects back to my earlier reply: the alternative is to have one disruptor instance for each shard. Therefore one still needs a mechanism to choose a shard to publish too so there is still. 

 Sam

Faraz Babar

unread,
Aug 31, 2022, 10:18:30 PM8/31/22
to lmax-di...@googlegroups.com
If this use case fits within a single cpu with x cores, one way to optimize this would be to compute and include the hash in the message.

If it does not fit a single cpu, you absolutely must pre-partition the messages because the savings in terms of I/O and even compute would be huge.

Sent from my iPhone

On Aug 31, 2022, at 7:01 PM, Sam Barker <s...@quadrocket.co.uk> wrote:



박성우

unread,
Sep 6, 2022, 2:40:10 AM9/6/22
to Disruptor
Thanks for the good advice.

2022년 9월 1일 목요일 오전 11시 18분 30초 UTC+9에 inappinstore님이 작성:
Reply all
Reply to author
Forward
0 new messages