Hi,
I understand that Cluster and Sentinel are for different purposes. I need the benefits of Cluster but I also need an automated failover.
My set up is to have two hosts, each of which has three masters and three slaves. The slaves replicate the masters on the alternate host.
So something like this:
Host A Host B
M1 <=> S4
M2 <=> S5
M3 <=> S6
S1 <=> M4
S2 <=> M5
S3 <=> M6
I'm running more than one instance on a host because Redis is single-threaded and my hosts are multi-core.
I'm able to get systemd to automatically try to restart an instance.
If I prevent a master from being restarting then my cluster promotes its slave to master and continues letting me use it.
So I have automated failover for when a master goes down and systemd can't resuscitate it.
The problem I have is that if I shutdown Host A or otherwise just disconnect it from Host B then my cluster does not failover.
The cluster can only promote slaves to master if the majority of masters agree about this.
In the event where Host A is on fire the cluster is not recognizing that the masters on it are "fail". They stay as "pfail" indefinitely.
I've tried to fix this by using Sentinel (at first just a single instance on a third VM). That machine is able to connect to the hosts with redis-cli, but Sentinel always reports in INFO that the masters have no slaves. It also insists that the masters are objectively down (I have quorum 1 because I'm testing with a single Sentinel for now), even though I have set up the 'auth-pass' and verified that I can connect from the shell with redis-cli. I can confirm that Sentinel is receiving a reply to it's PING because the last-ping-reply is significantly less than the last-ok-ping-reply (and keeps moving when I check).
How could I either:
- Get Redis to promote the slaves on a host to master if my other host is unavailable
- Get Sentinel to recognize that my cluster masters actually do have a slave, and promote it when the master is genuinely down
Any ideas would be much appreciated