+vote-for-leader failed during the failover process.

951 views
Skip to first unread message

dennis bing leng

unread,
Feb 21, 2014, 2:41:06 PM2/21/14
to redi...@googlegroups.com
Here is my current redis - sentinel configuration. 
Current Master Ip 172.16.2.154
Slave ip: 172.16.2.153

I have set both sentinel quorum to one, master/sentinel is on one physical box, slave/sentinel is on the other physical one. When i sudo reboot the master (172.16.2.154) box, Here are the log from the slave (172.16.2.153). 

[1510] 14 Feb 02:44:25.406 # Sentinel runid is 44d0bc3d3dc5dd7e9417c5d567344db8fbe298e8
[1510] 14 Feb 02:46:47.934 * +sentinel sentinel 172.16.2.154:26379 172.16.2.154 26379 @ mymaster 172.16.2.154 6379
[1510] 14 Feb 02:47:23.121 # +sdown master mymaster 172.16.2.154 6379
[1510] 14 Feb 02:47:23.121 # +odown master mymaster 172.16.2.154 6379 #quorum 1/1
[1510] 14 Feb 02:47:23.121 # +new-epoch 9
[1510] 14 Feb 02:47:23.121 # +try-failover master mymaster 172.16.2.154 6379
[1510] 14 Feb 02:47:23.121 # +vote-for-leader 44d0bc3d3dc5dd7e9417c5d567344db8fbe298e8 9
[1510] 14 Feb 02:47:33.201 # -failover-abort-not-elected master mymaster 172.16.2.154 6379
[1510] 14 Feb 02:47:36.302 # +sdown sentinel 172.16.2.154:26379 172.16.2.154 26379 @ mymaster 172.16.2.154 6379
[1510] 14 Feb 02:49:33.431 * +reboot master mymaster 172.16.2.154 6379
[1510] 14 Feb 02:49:33.493 * -dup-sentinel master mymaster 172.16.2.154 6379 #duplicate of 172.16.2.154:26379 or 5985170d841ec37c229aa209edd762e96ddb81a6
[1510] 14 Feb 02:49:33.493 * +sentinel sentinel 172.16.2.154:26379 172.16.2.154 26379 @ mymaster 172.16.2.154 6379
[1510] 14 Feb 02:49:33.578 # -sdown master mymaster 172.16.2.154 6379
[1510] 14 Feb 02:49:33.578 # -odown master mymaster 172.16.2.154 6379
[1510] 14 Feb 02:50:28.722 # +new-epoch 10
[1510] 14 Feb 02:50:28.722 # +vote-for-leader 5985170d841ec37c229aa209edd762e96ddb81a6 10
[1510] 14 Feb 02:50:30.680 # +switch-master mymaster 172.16.2.154 6379 172.16.2.153 6379
[1510] 14 Feb 02:50:30.680 * +slave slave 172.16.2.154:6379 172.16.2.154 6379 @ mymaster 172.16.2.153 6379


and I have the same configuration in another environment, which works fine. when I sudo reboot the master (172.16.2.152), here is the log from the slave (172.16.2.151), and you could see the election succeeded. 

[1564] 07 Feb 09:59:11.013 # Sentinel runid is 69d2ccaa0d323e2ed6b8ccabacc875a8b50e12b0
[1564] 07 Feb 10:04:55.602 * -dup-sentinel master mymaster 172.16.2.152 6379 #duplicate of 172.16.2.152:26379 or d032886da3b0e46721debdd06e793e97a2b96c2c
[1564] 07 Feb 10:04:55.602 * +sentinel sentinel 172.16.2.152:26379 172.16.2.152 26379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.190 # +sdown master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.191 # 172.16.2.152:26379 voted for 69d2ccaa0d323e2ed6b8ccabacc875a8b50e12b0 11
[1564] 07 Feb 10:05:01.261 # +elected-leader master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.261 # +failover-state-select-slave master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.317 # +selected-slave slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.317 * +failover-state-send-slaveof-noone slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.393 * +failover-state-wait-promotion slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.236 # +promoted-slave slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.236 # +failover-state-reconf-slaves master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.317 # +failover-end master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.317 # +switch-master mymaster 172.16.2.152 6379 172.16.2.151 6379
[1564] 07 Feb 10:05:02.317 * +slave slave 172.16.2.152:6379 172.16.2.152 6379 @ mymaster 172.16.2.151 6379
[1564] 07 Feb 10:05:01.190 # +odown master mymaster 172.16.2.152 6379 #quorum 1/1
[1564] 07 Feb 10:05:01.190 # +new-epoch 11
[1564] 07 Feb 10:05:01.190 # +try-failover master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.190 # +vote-for-leader 69d2ccaa0d323e2ed6b8ccabacc875a8b50e12b0 11
[1564] 07 Feb 10:05:01.191 # 172.16.2.152:26379 voted for 69d2ccaa0d323e2ed6b8ccabacc875a8b50e12b0 11
[1564] 07 Feb 10:05:01.261 # +elected-leader master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.261 # +failover-state-select-slave master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.317 # +selected-slave slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.317 * +failover-state-send-slaveof-noone slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:01.393 * +failover-state-wait-promotion slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.236 # +promoted-slave slave 172.16.2.151:6379 172.16.2.151 6379 @ mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.236 # +failover-state-reconf-slaves master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.317 # +failover-end master mymaster 172.16.2.152 6379
[1564] 07 Feb 10:05:02.317 # +switch-master mymaster 172.16.2.152 6379 172.16.2.151 6379
[1564] 07 Feb 10:05:02.317 * +slave slave 172.16.2.152:6379 172.16.2.152 6379 @ mymaster 172.16.2.151 6379


I was wondering how they could behave differently as I have configured them the same way for redis and sentinel. 

Cheers

Bing 




Salvatore Sanfilippo

unread,
Feb 21, 2014, 5:39:43 PM2/21/14
to Redis DB
Hello, quorum is just the informal agreement needed to turn into ODOWN state.
This is just a trigger for the election of a Sentinel to get voted to change the configuration of the system.
The actual failover requires to get voted by the majority.

If you have two physical hosts Sentinel is mostly useless, because when one box breaks, the majority, defined as N/2+1 is still "2" and there is no way for the other Sentinel to get elected.
It this configuration it only protects you from software errors, like, if the master Redis process crashes, the two Sentinels are still alive, the failover will be performed.

To have three distinct boxes makes a lot more sense. If you have another box where you run your clients, you may install a third Sentinel there and set the quorum to two.
This way you want both the current master to be down from the point of view of the clients and the slave in order to trigger a failover, and you'll have majority as with three Sentinels you need two votes.

Salvatore


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/groups/opt_out.



--
Salvatore 'antirez' Sanfilippo
open source developer - GoPivotal
http://invece.org

To "attack a straw man" is to create the illusion of having refuted a proposition by replacing it with a superficially similar yet unequivalent proposition (the "straw man"), and to refute it
       — Wikipedia (Straw man page)

dennis bing leng

unread,
Feb 21, 2014, 7:53:37 PM2/21/14
to redi...@googlegroups.com
Thank you, Salvatore

From your perspective, what's the best practice to configure redis with failover in mind via sentinel. 

Cheers

Bing 

S.Kras

unread,
Feb 11, 2015, 12:25:14 PM2/11/15
to redi...@googlegroups.com

Thanks a lot for this explanation.
The reasoning for the behavior of Redis makes sense to me. But the idea
behind it does not seem to be correct and meet the requirements in general.
Why should it be not possible to run a high availability approach on two boxes?
The idea is to run a Redis MASTER on one box and a Redis SLAVE on the other
box. The high availability results by running Redis Sentinel on both boxes and
setting the quorum to 1. When the master box goes down
(including sentinel process) the other box should be able to decide on its own
that the master is not available anymore and should be able to elect
(without any other sentinel) itself as the new master.
What does argue against such behavior?

I did not get the point why Redis sentinel makes a difference between the
configured quorum and the majority that is required for a successful reelection.
Especially in case of two Sentinel instances this behavior seems strange to me.

Is there no possibility to run a high availability setup with Redis on two boxes?

Matt Stancliff

unread,
Feb 11, 2015, 12:37:12 PM2/11/15
to redi...@googlegroups.com

> On Feb 11, 2015, at 09:21, S.Kras <kr...@cosma-consult.de> wrote:
>
> Thanks a lot for this explanation.
> The reasoning for the behavior of Redis makes sense to me. But the idea
> behind it does not seem to be correct and meet the requirements in general.
> Why should it be not possible to run a high availability approach on two boxes?

This confuses people all the time.

There’s two issues here: network problems and server problems.

If your network is okay and you *only* deal with server crashes, then, yes, you could rely on one server to detect another server is crashed.

But, your network will never be okay. Your monitoring system has to detect network failures *and* server failures, and your data can’t tell the difference between a server crashing and a server becoming unreachable.

> The idea is to run a Redis MASTER on one box and a Redis SLAVE on the other
> box. The high availability results by running Redis Sentinel on both boxes and
> setting the quorum to 1. When the master box goes down
> (including sentinel process) the other box should be able to decide on its own
> that the master is not available anymore and should be able to elect
> (without any other sentinel) itself as the new master.
> What does argue against such behavior?

[Server A] ============ [Server B]

If you cut the connection between Server A and Server B, both servers now think they are the only remaining server, they will both become master, and your data will now be out of sync if both servers accept writes.

So, the Redis solution is to have more monitors. Add Sentinel watchers to your Redis servers and _outside_ of your Redis servers. Your Sentinels will, hopefully, have a more complete picture of your network than just your two database servers. Then your Sentinels can vote to take better corrective actions (ideally).

> I did not get the point why Redis sentinel makes a difference between the
> configured quorum and the majority that is required for a successful reelection.
> Especially in case of two Sentinel instances this behavior seems strange to me.
>
> Is there no possibility to run a high availability setup with Redis on two boxes?

You can certainly configure HA with two boxes (set up additional Sentinels on the same machines, use HAproxy and use a custom failover method, etc), but it all depends on your needs.


-Matt
Reply all
Reply to author
Forward
0 new messages