Hi,
I've been working on a new monitor mode in sentinel which enables it to monitor read-only relay master in cross datacenter(DC) replication setups.
A relay master is a redis slave replicating from a writeable master in master DC and it in turn have slaves in local DC connected to it for read scaling and optionally has slave-priority set to 0 to prevent it from participating in master DC master failover.
When such a relay master goes down, it would be nice to have the slaves connected to it, to be switched to the other relay master in the local DC in a staggered manner similar to what parallel-sync=1 in sentinel does, instead of having them all sync from a cross DC master at the same time.
the changes here:
https://github.com/antirez/redis/pull/2584introduces a new sentinel config: 'sentinel candidate-slave my-master hostname port'.
the default sentinel failover mode chooses one of the connected slaves of the old master and promotes it as master during failover. this new config specifies the explicit candidate slave to be chosen as new master for the connected slaves of the old relay master. Also, if this config is set, sentinel does not send 'slaveof no one' to the promoted slave, as the the promoted slave is expected to be an another relay master replicating from a cross DC master/slave instance.
Consider a setup with 2 DCs DC1 and DC2 and below replication topology.
DC1 has m1, m2 and s1,s2 redis instances. DC2 also has m1,m2,s1,s2. DC1 is the master DC where all writes happen and DC2 is slave DC which just replicates the whole data set from DC1:
DC1.m1(active write master)
|-DC1.m2
|-DC2.m2
|-DC1-s1
|-DC1-s2
|-DC2-m1
|-DC2-s1
|-DC2-s2
with this change, it is possible to setup sentinel in DC2 to monitor DC2.m1 and add DC2.m2 as the explicit candidate slave.
when DC2-m1 becomes unavailable, the new sentinel monitor makes DC2-s1 and DC2-s2 to replicate of DC2.m2. when DC2-m1 comes back, sentinel will add it as the candidate slave for DC2-m2.