Journaler Question?

476 views
Skip to first unread message

Carlus Henry

unread,
Apr 29, 2012, 8:50:31 PM4/29/12
to lmax-di...@googlegroups.com
I have been following the Disruptor for a while now, watching various presentations and reading many different blogs.  There are many questions that I have, but the one that I have been wrestling with for a while now has to do with Journaling.

It is my understanding that a producer puts a message on the ring buffer.  Once the message arrives into the buffer, prior to any business logic being invoked, that message is journaled.  However, the journaler is one of the many asynchronous consumers on the ring buffer.  This would mean to me that the producer is not necessarily guaranteed that their message will end up in a durable event store, and messages could be potentially lost.  Am I understanding the architecture correctly, or is there something that I am missing?

If this is the case, how are people using the Disruptor accounting for this risk?

Thanks everyone,
Carlus

Adrian Sutton

unread,
Apr 30, 2012, 1:47:42 AM4/30/12
to lmax-di...@googlegroups.com
On 30/04/2012, at 10:50 AM, Carlus Henry wrote:
> It is my understanding that a producer puts a message on the ring buffer. Once the message arrives into the buffer, prior to any business logic being invoked, that message is journaled. However, the journaler is one of the many asynchronous consumers on the ring buffer. This would mean to me that the producer is not necessarily guaranteed that their message will end up in a durable event store, and messages could be potentially lost. Am I understanding the architecture correctly, or is there something that I am missing?

With the disruptor, any consumer (or set of consumers) can be gated on another consumer (or set of consumers). So if consumer B is gated on consumer A, the disruptor guarantees that consumer B never processes an event that hasn't already been processed by consumer A.

In the DSL style setup this would look like:

disruptor.handleEventsWith(A).then(B);

Regards,

Adrian Sutton
http://www.symphonious.net




Carlus Henry

unread,
Apr 30, 2012, 7:35:51 AM4/30/12
to lmax-di...@googlegroups.com
Adrian,

Thanks for your quick response.

The question that I am really asking is not so much how to guarantee ordering and creating consumer dependency hierarchies, but more to the idea of journaling, and event sourcing.  I am more concerned with understanding how the messages are guaranteed to be stored in a durable event store without being lost.  Perhaps an example will help illustrate:

A -> M1 -> Ring Buffer -> B

A - Producer
M1 - Message
B - Journaler

Producer A puts message M1 onto the Ring Buffer, asynchronously.  Later, journaler B comes along and stores that message into the durable event store.  If this is correct, then there seems to be an amount of time where M1 could be lost - specifically the time between placing the message on the ring buffer and when the journaler picks it up and stores it to the durable event store.  Is this correct?  How is this risk managed?

Thanks
Carlus

Adrian Sutton

unread,
Apr 30, 2012, 7:46:34 AM4/30/12
to lmax-di...@googlegroups.com
Ah I see.  The key to making this work is deciding at which point a message is considered received.  For LMAX it's received when it reaches the journaller, not when it's read off the wire.  There is no business logic performed at all before journalling so there's no real difference between a message getting lost on the wire and getting lost between the producer and journaller - the producer is essentially just part of the network transfer.

If however your producer is doing actual work, you should split it so that you have a very simple producer that reads off the wire and publishes to the ring buffer, then the journaller writes the raw bytes and then the second half of your current producer does whatever work it needs to do.

Cheers,

Adrian Sutton
http://www.symphonious.net
Spelling by iPad.

Carlus Henry

unread,
Apr 30, 2012, 8:04:31 AM4/30/12
to lmax-di...@googlegroups.com
Adrian,

I think that answers my question.  And of course with many answers, I have a couple more.  Perhaps it is just a mind shift for me, but I am still struggling a bit and hopefully you can help me.

I have been working with JMS Queues and Topics for a number of years now.  In that environment, the producer is guaranteed that the message is durable and persisted once the message arrives on the queue.  If the message doesn't arrive on the queue, an exception is thrown and the producer can react to that exception.  In disruptor, this does not seem to be the case.  The producer would never know if a message arrived successfully or not, since there is not an "ack" back to the producer.

Is there any recourse?  Is there any design pattern that can be used to guarantee that messages are "received" by disruptor?  Off of the top of my head, the only thing I can think of is fronting it with a queue, but that kind of defeats the whole purpose of the architecture, :D.

Thoughts?

Thanks
Carlus

On Monday, April 30, 2012 7:46:34 AM UTC-4, Adrian Sutton wrote:
Ah I see.  The key to making this work is deciding at which point a message is considered received.  For LMAX it's received when it reaches the journaller, not when it's read off the wire.  There is no business logic performed at all before journalling so there's no real difference between a message getting lost on the wire and getting lost between the producer and journaller - the producer is essentially just part of the network transfer.

If however your producer is doing actual work, you should split it so that you have a very simple producer that reads off the wire and publishes to the ring buffer, then the journaller writes the raw bytes and then the second half of your current producer does whatever work it needs to do.

Cheers,

Adrian Sutton
http://www.symphonious.net
Spelling by iPad.

On 30/04/2012, at 9:35 PM, Carlus Henry 

Adrian Sutton

unread,
Apr 30, 2012, 7:48:33 PM4/30/12
to lmax-di...@googlegroups.com
Hi Carlus,

> I have been working with JMS Queues and Topics for a number of years now. In that environment, the producer is guaranteed that the message is durable and persisted once the message arrives on the queue. If the message doesn't arrive on the queue, an exception is thrown and the producer can react to that exception. In disruptor, this does not seem to be the case. The producer would never know if a message arrived successfully or not, since there is not an "ack" back to the producer.
>
> Is there any recourse? Is there any design pattern that can be used to guarantee that messages are "received" by disruptor? Off of the top of my head, the only thing I can think of is fronting it with a queue, but that kind of defeats the whole purpose of the architecture, :D.

There are multiple levels of guarantee that you should separate here:

1. Under normal operation
2. Under a software failure (eg an exception in one of the consumers)
3. Under hardware failure (eg pulling the plug unexpectedly)

Under normal operation we get the guarantee that every event is processed by every consumer (almost by definition).

Under a software failure, you need to start making policy decisions. If a consumer encounters an exception it will defer to the ExceptionHandler it has been set up with (disruptor.handleExceptionsWith is the method that sets this up I think). By default this uses a FatalException handler which will stop the consumer and no further events will be processed. You can however use any exception handling policy you want - so it could either continue with the next message, or use custom code to notify the producer etc.

Under hardware failure you can only preserve messages in one of two ways:

1. Consider a message received when it is journaled to disk (and flushed!). You may still lose messages that were in flight across the network or read from the network and not yet journalled.

2. Build a network protocol that supports replay. For example, at LMAX each event sent over the network includes the ring buffer sequence number - if the receiver gets message 20 but hasn't yet received message 19, it will NAK back to the sender requesting that 19 be sent. Using this system, when the hardware failure is resolved the service starts back up and automatically requests a replay of any messages it missed.

There's nothing really built into the disruptor that does all this stuff for you (because the approach you should take varies so much depending on your exact requirements) though the ring buffer sequence number is a very useful unique ID for messages - LMAX uses it all over the place for this kind of thing.

Carlus Henry

unread,
May 1, 2012, 8:27:41 AM5/1/12
to lmax-di...@googlegroups.com
Adrian,

....that was awesome.  You answered my questions.

I definitely like the approach that LMAX has taken, with the ability to NAK back to the sender requesting that the missing message be resent.  That is an approach that makes sense to me, and allows for the system to recover missed messages.  Perhaps it is time to start another thread on this topic in particular, but I do have a couple of questions about this architecture:

1.)  When a consumer NAKs back that a missing message needs to be resent, what does the consumer do in the meantime?  I would think that it couldn't continue on the buffer (since this would result in messages being consumed out of order).  However, the producer, I am assuming, must not put it back onto the ring buffer, because if it did, the consumer that is waiting would never read it, since it is waiting.  Is there some kind of point to point communication happening?

2.)  Does this mean that LMAX not only has a Journal internal to the Disruptor architecture, but also external which gives the producer the ability to replay any message?

Thanks
Carlus
Reply all
Reply to author
Forward
0 new messages