How to respond to RPC not intended for specific instance

Brian Hulse

unread,

Sep 8, 2015, 1:12:32 PM9/8/15

to raft-dev

The way i'm implementing RPCs means that each raft instance will receive all RPCs no matter what state they are in. I don't think this is intended behavior for raft, but I have a good reason for this, it's an implementation detail I won't go into unless it becomes relevant. My question is, should I react to messages that are not intended for my specific instance? For example, say server B is a follower in Term 2 and receives a Replication response RPC intended for a leader in term 3. Should I immediately transition server B to a follower in term 3 since it has inspected an RPC which has a future term even though it wasn't directed to server B? Or should server B ignore all message types that it should not receive (vote responses, replication responses) and wait for the Leader in term 3 to introduce itself with a replication heartbeat?

Oren Eini (Ayende Rahien)

unread,

Sep 8, 2015, 1:19:03 PM9/8/15

to raft...@googlegroups.com

Brian,

I would actually like to know why you got into this scenario, but that is just my curiosity.

This is going to be pretty easy and pretty hard at the same time. Raft allows for message reordering, so it is fine for you to receive messages in T10, T5, T9, T100, T24, etc.

Leave aside complex failover modes...

3 node cluster, with A being the leader, with term 5, index 100.

B was just restarted, and it is in term 5, index 74.

A send AppendEntries (to C, which is up to date), and it gets to B as well.

B can't process that, so it send a "not valid for me, I'm still in 74".

A send it 74, which C also gets, and reply "it isn't valid for me".

Basically, the algorithm works, but you have a lot more processing to do.

You haven't mentioned if you are using RPC (so the reply goes only to the one who initiated the request) or to all of them.

But having to process _response_ messages from everyone is really hard.

If I sent you a message, and I get a message from Joe in reply to the same message, how do I know if it was you or Joe that replied?

You need to have some way to distinguish between them.

And that leads you to roughly RPC semantics.

Hibernating Rhinos Ltd

Oren Eini l CEO l Mobile: + 972-52-548-6969

Office: +972-4-622-7811 l Fax: +972-153-4-622-7811

On Tue, Sep 8, 2015 at 8:12 PM, Brian Hulse <brih...@gmail.com> wrote:

The way i'm implementing RPCs means that each raft instance will receive all RPCs no matter what state they are in. I don't think this is intended behavior for raft, but I have a good reason for this, it's an implementation detail I won't go into unless it becomes relevant. My question is, should I react to messages that are not intended for my specific instance? For example, say server B is a follower in Term 2 and receives a Replication response RPC intended for a leader in term 3. Should I immediately transition server B to a follower in term 3 since it has inspected an RPC which has a future term even though it wasn't directed to server B? Or should server B ignore all message types that it should not receive (vote responses, replication responses) and wait for the Leader in term 3 to introduce itself with a replication heartbeat?

--
You received this message because you are subscribed to the Google Groups "raft-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to raft-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Message has been deleted

Brian Hulse

unread,

Sep 8, 2015, 1:41:40 PM9/8/15

to raft-dev

Thanks! I think I got my answer in that it seems overly complex to try and act upon messages not intended for the current instance.

To satisfy your curiosity, the way i'm implementing communication between instances is using AMQP Queueing software called RabbitMQ to distribute messages. In designing my implementation I wanted to minimize opening connections and channels to Rabbit from my domain.Therefore, each instance will create a single connection to RabbitMQ on startup that will be used for the entire lifetime of the service. Each instance will also create a single queue in Rabbit to handle receiving messages from other instances. To reduce complexity I intend the bindings for this queue to be fairly generic and do not want to have to modify them when transitioning between states. I accepted receiving messages not intended for a specific state as a consequence of this. Receiving and discarding messages that I don't want to process is a lightweight procedure so I can accept the couple of ms overhead.

Oren Eini (Ayende Rahien)

unread,

Sep 8, 2015, 3:40:34 PM9/8/15

to raft...@googlegroups.com

Cant you use a separate topic per node?

Brian Hulse

unread,

Sep 9, 2015, 11:17:36 AM9/9/15

to raft...@googlegroups.com

Yeah, that is an option. However, I would have to use multiple exchanges and exchange bindings per queue mainly to handle fanning out messages to all followers as topics can't be generic like exchange bindings can. I may move there if I run into issues with all messages going to all instances, but I don't think I'll need to go there right now.

I should mention that my implementation is very stripped down. I'm not using raft to do log replication, I'm using it as more of a smart cache manager with consensus and forced expiration. Basically I have a set of responsibilities that need to be distributed to separate servers, these responsibilities are kept in distributed cache. I need to have someone manage distributing responsibilities across some landscape and also be in charge of updates to responsibilities and forcing the landscape to stay up to date with changes. I'm going to have the leader be responsible for compiling the initial list of distributed responsibilities based on the landscape configuration (IE distribute evenly based on the number of followers). The leader will then send heartbeats where the log index will be used to let the followers know they need to update their list of responsibilities. The followers will need to handle their responsibilities and when receiving an incremented index they will update their list of responsibilities and handle any new ones.

So basically it's not necessary for me to manage a local list of Log Entries so I can reduce a lot of the complexity around consensus. In my implementation consensus is "have I refreshed my responsibility list and taken action on it".

All these are guesses though as I'm just getting into implementation, so as usual we'll see what I got wrong when I can start playing with it :)

You received this message because you are subscribed to a topic in the Google Groups "raft-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/raft-dev/N0jwTIBE89M/unsubscribe.
To unsubscribe from this group and all its topics, send an email to raft-dev+u...@googlegroups.com.

Kijana Woodard

unread,

Sep 9, 2015, 12:36:15 PM9/9/15

to raft...@googlegroups.com

"So basically it's not necessary for me to manage a local list of Log Entries..."

...earlier...

"I'm going to have the leader be responsible for compiling the initial list of distributed responsibilities..."

Doesn't the "list of responsibilities" constitute the log entries?

"..topics can't be generic..."

Why would you need generic topics?

Using rabbit topics and queues has some interesting possibilities.

Oren Eini (Ayende Rahien)

unread,

Sep 9, 2015, 1:43:50 PM9/9/15

to raft...@googlegroups.com

I'm not sure that I follow.

a) A RabbitMq queue per node (should be very few of those).

b) RPC is handled via sending a message to the node's queue.

I don't think that there is complex binding here, it is about as simple as it can get.

Also, log replication is a fancy term for making sure that all servers agrees on the list of steps that are required to do something.

In a cache, you need to agree on the order of put/delete/get commands, and you need to have some way to sync time across all nodes (as part of the heartbeat, probably).

But that is consensus, and getting the logs entries in the right order ensure that you get it working.

Brian Hulse

unread,

Sep 9, 2015, 4:00:22 PM9/9/15

to raft...@googlegroups.com

You're right that the responsibilities could institute log entries, but for my specific scenario order of commands is not important (I know, hard to believe, but true) so I've decided to simplify my implementation at the expense of losing some of the raft guarantees that I don't believe are necessary for my needs. Once I start testing this I may find out my assumption was incorrect, but until then I'm going to keep on with my current design.

As far as the bindings, after thinking about it for a bit I think both of you are right, I was overcomplicating it. So the way RabbitMQ works (sorry if you know all of this already) is that you publish a message to an exchange and the exchange then delivers the message to queues that are bound to the exchange. With the queue's exchange binding you can filter messages based on the topic of the message. You can use wildcards in your filter so that your bindings can be generic about the message topics that match.

Usually if you would want to deliver a message to multiple queues you would publish a single message to the exchange and let the exchange handle delivering multiple messages. So for a heartbeat, the leader would publish a single message to the exchange with a topic that looks something like "raft.heartbeat.[term].[leaderid]" and all queues that want heartbeats should have a binding to that exchange with the topic filter something like "raft.heartbeat#" (# is a wildcard).

So I think the way i'm going to use rabbit is each instance will create it's queue with the following bindings to an exchange I'm only using for raft messaging:

- raft.heartbeat#

- raft.voterequest#

- raft.heartbeatresponse.[myinstanceid]#

- raft.voteresponse.[myinstanceid]#

This means responses to RPCs meant for specific instances will only be delivered to the intended queue and since I respond to messages with leader or client id included in them I can put this information in the topic of my response message. For client commands I will probably have the leader open a channel to another queue that clients drop commands into, so if no leader is currently operating the commands will queue up to be processed in order once a new leader is elected and opens a channel to the queue.

Thanks for the responses!

Oren Eini (Ayende Rahien)

unread,

Sep 9, 2015, 4:22:42 PM9/9/15

to raft...@googlegroups.com

If you don't have ordered log entries, you pretty much don't _have_ raft (or any other consensus).

You might want to look at distributed gossip protocols,if you care about that kind of behavior.

Holger Hoffstätte

unread,

Sep 11, 2015, 7:15:21 PM9/11/15

to raft...@googlegroups.com

On 09/08/15 19:41, Brian Hulse wrote:
> To satisfy your curiosity, the way i'm implementing communication
> between instances is using AMQP Queueing software called RabbitMQ

> <https://www.rabbitmq.com/>to distribute messages. In designing my

> implementation I wanted to minimize opening connections and channels
> to Rabbit from my domain.Therefore, each instance will create a

But channels in AMQP 0.9 are extremely lightweight and already multiplexed
over a single connection; their whole purpose is to avoid having to do
what you're trying to do. So..huh? :)

-h

Alvaro Videla

unread,

Sep 22, 2015, 5:50:53 AM9/22/15

to raft-dev

Hi,

On Wednesday, September 9, 2015 at 10:00:22 PM UTC+2, Brian Hulse wrote:

So I think the way i'm going to use rabbit is each instance will create it's queue with the following bindings to an exchange I'm only using for raft messaging:
- raft.heartbeat#
- raft.voterequest#
- raft.heartbeatresponse.[myinstanceid]#
- raft.voteresponse.[myinstanceid]#

Note that in AMQP the dot '.' is important when creating patterns that use wild cards, so you probably want/meant: raft.voterequest.# for example. The dot is the word separator, so if you don't include it, then the hash will be just part of the previous word.

If you want to use wildcards at the publishing end, instead of inside bindings, perhaps use the "reverse topic exchange" https://github.com/videlalvaro/rabbitmq-rtopic-exchange

About RabbitMQ's ordering guarantees, you want to perhaps read this: https://www.rabbitmq.com/semantics.html#ordering i.e.: using many channels to publish messages to the same exchange might make you lose the intended order of messages. This means: RabbitMQ will process publishes to different channels concurrently, so it can't guarantee order there.

Finally, mandatory messages or publisher confirms could be useful for your project: https://www.rabbitmq.com/confirms.html