Best/suggested algorithm to detect failover of master node in client API?

37 views

Skip to first unread message

fj

unread,

May 22, 2015, 9:52:11 AM5/22/15

to redi...@googlegroups.com

Hi all,

I'm implementing a cluster aware client API and I'm wondering what the best way to implement a master to slave failover algorithm is.
Lets say I have 3 masters (M1,M2,M3) each with 2 slaves(M1-S1,M1-S2, M2-S1,M2-S2,M3-S1,M3-S2).
In case M1 dies (e.g. I kill it) either M1-S1 or M1-S2 will be elected to be the new master within 5-10 seconds.

My question is what is the best way to detect the failover situation and find out the new master?
The socket connection to the dead master will break and then I have to figure out who is the new master as quickly as possible.
I see at least the following approaches:

1) Call CLUSTER NODES e.g. every 500 msecs on a remaining connection and wait until there is a change in the cluster topology e.g. one of M1-S1 or M1-S2 is marked as the new master - then update my slot->node map
2) Call CLUSTER SLOTS e.g. every 500 msecs on a remaining connection and wait until there is a change in the cluster topology e.g. one of M1-S1 or M1-S2 is marked as the new master - then update my slot->node map
3) Call ROLE on the two slave connections M1-S1 + M1-S2 repeatedly until one of them changes role from slave to master (this wont work if there are no original slaves left, so wont detect a replica-migrated slave)
4) Issue my normal commands on one of the remaining slave connections (if any left) and keep getting redirection to the old dead master until the cluster topology has been updated.
When the redirection changes to one of the previous slaves that is now master I update my slot->node map.

Any other/better ways?

Ideally I would like Redis to publish a message on a channel when the topology changes so that polling is not required at all, but thats a completely different discussion :-)

Regards,

Flemming

fj

unread,

Jun 12, 2015, 4:56:08 AM6/12/15

to redi...@googlegroups.com

@antirez any chance you could comment on this or even better add a section to the cluster-specification explaining how a client API should implement this in the recommended way?

Regards,

Flemming

Reply all

Reply to author

Forward

0 new messages