Chronicle Queue HA with multiple replicas

118 views
Skip to first unread message

Ben

unread,
Oct 7, 2015, 2:50:23 PM10/7/15
to Chronicle
I've been trying to understand how to best use Chronicle's TCP Tailer to provide HA.

I've written a small sample that uses the TCP tailer and as a result found that the index can be set to control the replay start point, assuming you've persisted this index on a host.  Very nice how it seamlessly flips from reading the local and remote messages in the AbstractStatefulExcerpt.nextIndex() :-)

The picture at http://chronicle.software/wp-content/uploads/2014/07/Screen-Shot-2014-09-30-at-11.24.53.png shows how the TCP source/sink can be used between a master and backup host to ensure you have msgs replicated before generating a response.

I was wondering how you could extend this so that responses aren't generated until messages are replicated across N slaves, or if this is even possible/desirable with a library like Chronicle Queue.

Adding new slaves to hop across and back again would progressively add to response times, so presumably you'd need some additional protocol between each host/process to determine the last msg index which is sufficiently replicated before "releasing" a response message ?  Appreciate anything you do in this space would ultimately impact response times, but for some applications guaranteeing you have the data stored AND being relatively fast is desirable.

Interested in this group's thoughts on this. 

Peter Lawrey

unread,
Oct 7, 2015, 4:35:17 PM10/7/15
to java-ch...@googlegroups.com

You can minimise latency if you need one replica like this.

Server A receives a message.
Server A persists and replicates the message
Server B receives a copy of the message.
Server B *processes* the message. At this point there has to be a copy on two machines.
Server B writes the response which is replicated to
Server A who replies.

The means there is one RTT latency between servers and no additional protocol.
At least one of our clients does this for real messages under 25 micro-seconds 99% of the time.

Regards, Peter.

--
You received this message because you are subscribed to the Google Groups "Chronicle" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-chronicl...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Harnit Bakshi

unread,
Dec 18, 2015, 6:14:42 AM12/18/15
to Chronicle
Hi Peter I have just started looking at this library. In the scenario you just mentioned how would you recover if lets say one of the server crashes?
Specially in the case where the processing engine was trying to replicate the state back to the Backup in the diagram which finally becomes the response

How does the Backup know that the Processing engine is dead? You might want to reject the stale order in this case

Thanks,
Harnit

Peter Lawrey

unread,
Dec 18, 2015, 8:42:23 AM12/18/15
to java-ch...@googlegroups.com

The Backup needs to monitor where the processing engine is up to. If the expected response (as it can see requests / in bound events) takes too long it can stop replication and take over.

Regards, Peter.

--

Harnit Bakshi

unread,
Dec 18, 2015, 11:26:32 PM12/18/15
to Chronicle
Thanks for the answer Peter

Harnit Bakshi

unread,
Dec 20, 2015, 1:09:33 AM12/20/15
to Chronicle
I guess we would still need some sort of heartbeat mechanism between the servers. This is because if the Backup Engine dies the processing engine cannot assume a simple time-out strategy as it would not know if an order or request came through or the Backup engine is just idle for that period

This I assume would require some bespoke development on top of existing chronicle software

Harnit


On Friday, December 18, 2015 at 9:42:23 PM UTC+8, Peter Lawrey wrote:

Harnit Bakshi

unread,
Dec 20, 2015, 1:11:43 AM12/20/15
to Chronicle
In the above scenario you still want the Processing engine to become the master and start processing requests

Peter Lawrey

unread,
Dec 20, 2015, 9:27:07 AM12/20/15
to java-ch...@googlegroups.com

Chronicle Queue has built in heartbeats to check a connection is working and a connection event listener interface however it wouldn't detect that the engine has locked up for example.

Peter.

Reply all
Reply to author
Forward
0 new messages