Calling (slow) external services

Showing 1-8 of 8 messages
Calling (slow) external services Magnus Holm 10/6/11 12:38 AM
Hey folks,

I'm wondering how the Disruptor would perform when you need to call
slow external services. Because of the sequential nature of Disruptor,
it seems like one slow call would make all the other dependent
consumers wait. Not all applications need to be strictly sequential.

How would you make the Disruptor perform well under such conditions?

Thanks,
Magnus Holm
Re: Calling (slow) external services Bill 10/6/11 12:42 AM
Generally farm slow/blocking work off to a different thread.  You might then consider have the slow thread write results back to the disruptor and run a basic state machine through it (using the disruptor as a basic scheduler).  This should keep the queue live and is a good programming practice too.

Cheers,
Bill
Re: Calling (slow) external services mikeb01 10/6/11 12:59 AM
The Disruptor also supports efficient batching to help out slow event
processors.  If you are writing to a file or database you can use the
endOfBatch flag on the onEvent method to determine when to flush.
We've found that this write coalescing allows most event processors to
keep up with quite high throughput rates.

Mike.

Re: Calling (slow) external services Magnus Holm 10/6/11 3:40 AM
On Oct 6, 9:59 am, Michael Barker <mike...@gmail.com> wrote:
> The Disruptor also supports efficient batching to help out slow event
> processors.  If you are writing to a file or database you can use the
> endOfBatch flag on the onEvent method to determine when to flush.
> We've found that this write coalescing allows most event processors to
> keep up with quite high throughput rates.
>
> Mike.

I still don't understand how the Disruptor is able to process other
events while one thread is waiting on an external service.

Consider this example:

  Input -> RingBuffer <- A <- B -> RingBuffer -> Output

Input comes into the ring buffer, A is a consumer which calls an
external service, B does a lot of the business logic and passes it on
to another ring buffer (which is then sent out). Let's say there's one
event in A that's taking a while. I understand that you can easily run
several A's in different threads, but I thought that B was not able to
continue until the slow event in A is done.

What have I misunderstood?
Re: Calling (slow) external services mikeb01 10/6/11 3:59 AM
I'm assuming that A & B are separate EventProcessors.  B only needs to
wait on A if B needs the results of A in order to do its business
logic.  If A & B are independent tasks then they can run in parallel.
Re: Calling (slow) external services Magnus Holm 10/6/11 4:13 AM
On Thu, Oct 6, 2011 at 12:59, Michael Barker <mik...@gmail.com> wrote:
> I'm assuming that A & B are separate EventProcessors.  B only needs to
> wait on A if B needs the results of A in order to do its business
> logic.  If A & B are independent tasks then they can run in parallel.

The case is that B must wait on A in order to do its business logic.
Also, note that in this case it's not very important that all events
are processed sequentially.

Re: Calling (slow) external services mikeb01 10/6/11 5:27 AM
Okay I see now.  You could use the approach that Bill suggested or
have a separate ring buffer for responses from A which has B as a
EventProcessor.  You may also want to look at the WorkerPool class
that was added recently.

Within our system we try to avoid slow, synchronous remote calls and
built most of our remoting to work off of asynchronous events.
Therefore we'd send a message to the remote service then continue
processing the remaining events.  At some point later the remote
service would send an event back which would appear as a new event in
the input ring buffer.  The logic that deals with events does end up
working more like a state machine (as Bill suggests) and would invoke
B when the response event arrives.

Mike.

Re: Calling (slow) external services DaveF 10/6/11 5:48 AM
Fundamentally the Disruptor minizes overheads, there is no magic, so if task B needs the results of task A then, you are quite correct, it must wait until task A is complete.

What the Disruptor does is reduce to a minimum the cost of that wait while maximizing the opportunity to do other work in parallel. This is, from our measurements, very different to most other approaches where the cost of the overheads is significantly more than even reasonably efficient code, let alone good code. 

So for example, let's assume that you are receiving messages, journalling them to disk, sending them to a cluster pair, receiving acknowledgements back from the cluster pair and translating them to a usable form all before processing them in your business logic. All of these tasks must be complete before the business logic can process the event, but each of these tasks can operate in parallel, independently of one another. So the tasks can be carried out efficiently in parallel and when they are all complete the business logic can process the message. For most common approaches the cost of the multi-thread re-join is so vast that it can outweigh the other costs. The rejoin with the disruptor incurs no locks, so is cheap and there are big efficiency gains there. 

Further, as Mike said, the batching effect allows catching up between stages. Put simply, the Disruptor moves the costs to where they belong, your functional code rather than the plumbing.

The effects that we measured early in the life of the Disruptor showed that the problem is not really "time of A" + "time of B" but the disproportionate costs of the gaps between A and B when using common concurrency approaches like locks and CAS. To maximize the throughput you need to optimise the performance of A and B, but that is only reasonable, the huge win is that the Disruptor moves the limiting factor to where you are doing useful work rather than where you are preparing to do work. 

Of course if you are doing slow things like writing to disk or communicating over a network then writing code to be as efficient as possible will minimize the straight-line cost of the task. That is a question of mechanical sympathy and good design, but is not strictly related to the Disruptor itself and is very much dependent on the nature of the slow service.

I'd offer a couple of pieces of general, high-level advice for the tasks themselves, but as I said this is really a separate issue from the Disruptor itself: 1) Disks and networks are block devices, treat them as such. 2) Asynchrony is your friend, make all external interactions, particularly with other business services asynchronous.

Hope this is not too vague and helps,

   Dave