Failover Fail

152 views
Skip to first unread message

Israel Brewster

unread,
Oct 7, 2016, 4:07:46 PM10/7/16
to Redis DB
I'm trying to get Redis set up in a two-machine HA configuration, using redis-sentinel to control failover. Redis is version 3.2.3, running on CentOS 6.8. In initial setup goes smoothly, and I can get the two machines setup up as master/slave, with an instance of redis-sentinel running on each one. When I test failover by shutting down redis-server on the master machine, failover works properly, as does failing back once I restart redis-server (which properly comes up as a slave), and shut down the new master. So that part is good.

However, once I move to more "real world" testing by shutting down the master machine completely, redis quickly gets confused and stops working - what it looks like is that I end up in a situation where both machines are set to be slave. How long this takes seems to vary - I've seen it happen immediately, but I've also seen it work through one or two "failovers". However, I have never seen it work reliably.

Detail of testing: Two machines, at 10.211.55.100 and 10.22.55.101, with (initially) 100 being master and 101 being slave. Looking at the redis-sentinel log on 101 (slave machine), I see normal operation:

2785:X 07 Oct 10:59:38.425 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
2785:X 07 Oct 10:59:38.425 # Sentinel ID is 49843b0d832503d8f31fc80f1439479ae8f26934
2785:X 07 Oct 10:59:38.425 # +monitor master mymaster 10.211.55.100 6379 quorum 1

Then I shut down the current master (100), which shows this:

2785:X 07 Oct 11:00:08.563 * +slave slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.018 # +sdown master mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.019 # +odown master mymaster 10.211.55.100 6379 #quorum 1/1
2785:X 07 Oct 11:01:39.019 # +new-epoch 1
2785:X 07 Oct 11:01:39.019 # +try-failover master mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.022 # +vote-for-leader 49843b0d832503d8f31fc80f1439479ae8f26934 1
2785:X 07 Oct 11:01:39.022 # +elected-leader master mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.022 # +failover-state-select-slave master mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.074 # +selected-slave slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.075 * +failover-state-send-slaveof-noone slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.146 * +failover-state-wait-promotion slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.999 # +promoted-slave slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:39.999 # +failover-state-reconf-slaves master mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:40.072 # +failover-end master mymaster 10.211.55.100 6379
2785:X 07 Oct 11:01:40.072 # +switch-master mymaster 10.211.55.100 6379 10.211.55.101 6379
2785:X 07 Oct 11:01:40.072 * +slave slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
2785:X 07 Oct 11:01:41.098 # +sdown slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379

...which appears to me to be a good failover to the slave. So far, so good. Then I bring the old "master" back up:

2785:X 07 Oct 11:08:18.130 # -sdown slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
2785:X 07 Oct 11:08:28.077 * +convert-to-slave slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379

...which also works, apparently converting 100 to a slave. Ok, let's try again. This time, since 101 is now master, we'll shut down 101 and watch the logs on 100:

1757:X 07 Oct 11:15:38.030 # +sdown master mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.030 # +odown master mymaster 10.211.55.101 6379 #quorum 1/1
1757:X 07 Oct 11:15:38.030 # +new-epoch 3
1757:X 07 Oct 11:15:38.030 # +try-failover master mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.034 # +vote-for-leader 49843b0d832503d8f31fc80f1439479ae8f26934 3
1757:X 07 Oct 11:15:38.034 # +elected-leader master mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.034 # +failover-state-select-slave master mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.087 # +selected-slave slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.087 * +failover-state-send-slaveof-noone slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.153 * +failover-state-wait-promotion slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.984 # +promoted-slave slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:38.984 # +failover-state-reconf-slaves master mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:39.032 # +failover-end master mymaster 10.211.55.101 6379
1757:X 07 Oct 11:15:39.032 # +switch-master mymaster 10.211.55.101 6379 10.211.55.100 6379
1757:X 07 Oct 11:15:39.032 * +slave slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
1757:X 07 Oct 11:15:40.081 # +sdown slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379

Again, looks good. Failed back to 100 as the new master, at least apparently. But look what happens when we bring 101 back up again:

1757:X 07 Oct 11:18:46.495 # -sdown slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
1757:X 07 Oct 11:18:56.402 * +convert-to-slave slave 10.211.55.101:6379 10.211.55.101 6379 @ mymaster 10.211.55.100 6379
1757:X 07 Oct 11:19:17.579 # +sdown master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:19:17.579 # +odown master mymaster 10.211.55.100 6379 #quorum 1/1
1757:X 07 Oct 11:19:17.579 # +new-epoch 4
1757:X 07 Oct 11:19:17.580 # +try-failover master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:19:17.635 # +vote-for-leader 49843b0d832503d8f31fc80f1439479ae8f26934 4
1757:X 07 Oct 11:19:17.635 # +elected-leader master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:19:17.635 # +failover-state-select-slave master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:19:17.725 # -failover-abort-no-good-slave master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:19:17.826 # Next failover delay: I will not start a failover before Fri Oct  7 11:20:18 2016
1757:X 07 Oct 11:20:18.086 # +new-epoch 5
1757:X 07 Oct 11:20:18.086 # +try-failover master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:20:18.090 # +vote-for-leader 49843b0d832503d8f31fc80f1439479ae8f26934 5
1757:X 07 Oct 11:20:18.090 # +elected-leader master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:20:18.090 # +failover-state-select-slave master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:20:18.154 # -failover-abort-no-good-slave master mymaster 10.211.55.100 6379
1757:X 07 Oct 11:20:18.254 # Next failover delay: I will not start a failover before Fri Oct  7 11:21:18 2016

The first two lines look good - it sees 101 come back up, and tries to convert it to a slave, which it should. But then it apparently sees *itself* go down (which is odd, since it was running fine as master before bringing 101 back up), starts a new epoch, elects itself as master (again, since it already was the master), aborts failover with a non-good-slave error, and repeats indefinitely. At least, if I am reading the logs correctly. What I can say for sure is that at this point *both* redis-server instances are marked as slaves, and there *is* no master.

The sentinel log on the other machine (101), shows a nearly identical pattern - except it seems to think it is still the master, and tries to convert 100 to slave:

1423:X 07 Oct 11:18:46.363 # Sentinel ID is 49843b0d832503d8f31fc80f1439479ae8f26934
1423:X 07 Oct 11:18:46.363 # +monitor master mymaster 10.211.55.101 6379 quorum 1
1423:X 07 Oct 11:18:56.372 * +convert-to-slave slave 10.211.55.100:6379 10.211.55.100 6379 @ mymaster 10.211.55.101 6379
1423:X 07 Oct 11:19:27.446 # +sdown master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:19:27.446 # +odown master mymaster 10.211.55.101 6379 #quorum 1/1
1423:X 07 Oct 11:19:27.446 # +new-epoch 3
1423:X 07 Oct 11:19:27.446 # +try-failover master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:19:27.463 # +vote-for-leader 49843b0d832503d8f31fc80f1439479ae8f26934 3
1423:X 07 Oct 11:19:27.463 # +elected-leader master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:19:27.463 # +failover-state-select-slave master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:19:27.519 # -failover-abort-no-good-slave master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:19:27.603 # Next failover delay: I will not start a failover before Fri Oct  7 11:20:28 2016
1423:X 07 Oct 11:20:28.382 # +new-epoch 4
1423:X 07 Oct 11:20:28.382 # +try-failover master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:20:28.385 # +vote-for-leader 49843b0d832503d8f31fc80f1439479ae8f26934 4
1423:X 07 Oct 11:20:28.385 # +elected-leader master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:20:28.385 # +failover-state-select-slave master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:20:28.452 # -failover-abort-no-good-slave master mymaster 10.211.55.101 6379
1423:X 07 Oct 11:20:28.507 # Next failover delay: I will not start a failover before Fri Oct  7 11:21:28 2016

So what's going on here? It looks like both sentinels try to convert the other redis-server to a slave, which works, leaving no master. Is redis-sentinel simply not reliable? Or am I doing something bad? Keep in mind, all I've done above is shutdown the master server a couple of times. Any feedback would be appreciated. 

Israel Brewster

unread,
Oct 14, 2016, 2:59:57 PM10/14/16
to Redis DB
So my plan B, since apparently no one has any ideas as to why sentinel would be unreliable, is to look at switching to a pacemaker/corosync solution, if possible, which I've had good luck with for other services. Does anyone know if there are any resource agents written for that? Thanks.

Tuco

unread,
Oct 14, 2016, 11:55:01 PM10/14/16
to Redis DB
You need atleast 3 sentinels for a robust deployment as mentioned here. Having 2 is not enough. Also make sure to have them setup so that whenever a machine is down, you have the majority of sentinels(> N/2 + 1) up so that they can start and agree to a failover.

Israel Brewster

unread,
Oct 15, 2016, 11:40:50 AM10/15/16
to redi...@googlegroups.com, Tuco
On Oct 14, 2016, at 7:55 PM, Tuco <rahul....@gmail.com> wrote:

You need atleast 3 sentinels for a robust deployment as mentioned here. Having 2 is not enough. Also make sure to have them setup so that whenever a machine is down, you have the majority of sentinels(> N/2 + 1) up so that they can start and agree to a failover.

That only says you need three for a “robust” deployment (for some definition of “robust”). It says nothing about needing three for a *functioning* deployment. On the other hand, it may well be that you *do* need three for a reliable, functioning deployment - in which case, using sentinel is not an option for me. The good news is that it does appear that I will be able to use Pacemaker and Corosync to manage Redis master/slave - there is a resource agent available for it here: https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/redis In initial testing, it *occasionally* fails to start redis on the first attempt, however by simply setting pacemaker to restart on failure to start, it *always* works in testing (redis always ends up starting on the second or third attempt). 

So plan B works, and has the added bonus of being able to tie a single IP to the redis master, so I don’t need to jump through hoops of querying a sentinel to get the master IP and then connecting to that in my client apps. Still need to test what happens to a connected client during failover, however - I might need to put in some code to try reconnecting or something.

--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/diz92qmT0-0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

George Chilumbu

unread,
Oct 18, 2016, 5:28:31 AM10/18/16
to Redis DB
Let me provide a report i wrote some time back when we originally setup our redis with sentinel:

The purpose of us using sentinel was to take advantage of its ability to manage a fail over between master and their replicas in the redis cluster. But since the redis cluster can manage master/slave failover, it is kind of redundant and costly to be usingsentinel in addition. So we decided to stop using sentinel as we found out that it was causing connection problems for clients to connect to the redis servers at random times. I also discovered a weird situation in which one of the slaves turned one master and it's slave both to become its slave. Though cannot confirm for sure, but i suspect sentinel might have played a hand in this weird behavior.

Now we got rid of sentinel, and setup a redis cluster instead. We also use consul by harshi corp to handle service discovery and also dns calls. Our redis has since been running smoothly for over a year now with no more issues.

Salvatore Sanfilippo

unread,
Oct 18, 2016, 5:41:53 AM10/18/16
to redi...@googlegroups.com
Hello George, at some point we had a bug in the Sentinel
implementation (not the protocol) that could cause issues during the
reconfiguration. This bug is fixed now.
From the point of view of the connection problems experienced by the
clients, the problem is that the library client used must follow the
Redis protocol and must be smart enough to switch to the next Sentinel
ASAP if the currently contacted one is down. Also to adhere to the
Redis Sentinel protocol means to alway ask Sentinel who is the current
master after each disconnection. We have the protocol documented,
however not all lib clients really implement it the proper way. It's
trivial to check btw, it's not a complex protocol to follow.

Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Salvatore Sanfilippo

unread,
Oct 18, 2016, 5:45:43 AM10/18/16
to redi...@googlegroups.com
Hello, your failover is failing for the motivation outlined in the Sentinel log:

> 1757:X 07 Oct 11:19:17.725 # -failover-abort-no-good-slave master mymaster 10.211.55.100 6379

The slave is considered not reliable for some reason (the reliability
of the slaves are outlined in the Sentinel doc), a common problem is
that the slave was found to be not recently connected to the master.
For instance a slave that is disconnected from the master, and
reboots, is no longer considered a good candidate for the failover.
Normally this happens when you simulate many failovers one after the other.

Redis Cluster has the ability to configure this aspect, and you can
see even: consider slaves always reliable, even if they are
disconnected from the master (that is failing) for a long time.
This is still not the case with Sentinel, however there are plans to
enable this option, in order to make sure that a failover happens, if
the user wants so, even in cases where it is not clear the slave is a
good candidate.

Salvatore

George Chilumbu

unread,
Oct 18, 2016, 6:22:23 AM10/18/16
to Redis DB
Hi Salvatore.  Thanks for providing more insight on this matter.
Reply all
Reply to author
Forward
0 new messages