[redis-db] Redis/Sentinel cluster failover results in “failover-abort-not-elected master”

618 views
Skip to first unread message

Abhay Kumar

unread,
Jan 8, 2019, 5:55:57 AM1/8/19
to Redis DB
I am using Redis-4.0.11 version. My set up consists of three node :1 master vm and two slave vms. Sentinels co-deployed on each of these three vms and quoram value is set to 2.

I can see that sometimes when sentinel tries to perform the failover, it doesn't happen. In the sentinel log i can see the following reason for aborting the failover:
-failover-abort-not-elected master mymaster 10.11.17.176 6379

Same error was seen in redis 2.8.1 where it seems that the issue was rectified later on. Please check the below link:

Could anyone know about this and let me know how to fix it?

Best regards,
Abhay

Abhay Kumar

unread,
Jan 12, 2019, 12:21:06 AM1/12/19
to Redis DB

I am using Redis-4.0.11 version. My set up consists of three node :1 master vm and two slave vms. Sentinels co-deployed on each of these three vms and quoram value is set to 2.

I can see that sometimes when sentinel tries to perform the failover, it doesn't happen. In the sentinel log i can see the following reason for aborting the failover:
-failover-abort-not-elected master mymaster 10.11.17.176 6379

Same error was seen in redis 2.8.1 where it seems that the issue was rectified later on. Please check the below link:

Does anyone got the same problem in redis 4? If yes, then is there any way to fix it?

Best regards,
Abhay

sys-ops

unread,
Jan 31, 2019, 3:56:45 PM1/31/19
to redi...@googlegroups.com
Hi Y'All,

Placing sentinels on seperate hosts did not solve this problem in my case. I
have 3 redis instances and 3 sentinels. Master is up and 2 slaves are
connected, yet sentinels want to perfrom failover every minute.

Restarting all sentinels and removing their temp configs at the same time
did not solve the problem.

I have updated to the latest stable version 5.0.3, but it did not fix my
problem.

Also, I found similar problem here:
http://redis-db.2338650.n4.nabble.com/Sentinel-problem-td4410.html
<http://redis-db.2338650.n4.nabble.com/Sentinel-problem-td4410.html> . But
the topic is not finished.

Additional note: there is no firewall between redis and sentinel hosts.

redis1.blue

# Replication
role:slave
master_host:redis2.blue
master_port:6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0

redis2.blue

# Replication
role:master
connected_slaves:2
min_slaves_good_slaves:2
slave0:ip=172.24.23.123,port=6380,state=online,offset=89955552,lag=1
slave1:ip=172.24.23.121,port=6380,state=online,offset=89955466,lag=1
master_replid:29cd21012b52cda54c999ff70da7432abe6077e7

redis3.blue

# Replication
role:slave
master_host:redis2.blue
master_port:6380
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0


sentinel1.blue

master0:name=rds-6380,status=odown,address=172.24.23.121:6380,slaves=0,sentinels=6

sentinel2.blue

master0:name=rds-6380,status=odown,address=172.24.23.121:6380,slaves=0,sentinels=6

sentinel3.blue

master0:name=rds-6380,status=odown,address=172.24.23.121:6380,slaves=0,sentinels=6


sentinel1.blue

19869:X 30 Jan 2019 13:15:44.779 # +new-epoch 10026
19869:X 30 Jan 2019 13:15:44.779 # +try-failover master rds-6380
172.24.23.121 6380
19869:X 30 Jan 2019 13:15:44.779 # +vote-for-leader
2d642654e88b852b7952a85682def21c51f8b714 10026
19869:X 30 Jan 2019 13:15:44.782 # fed493d9c17d8419b4c8f99ae5885714976cd522
voted for 2d642654e88b852b7952a85682def21c51f8b714 10026
19869:X 30 Jan 2019 13:15:44.783 # 025b20a18f6770d1e8532302bcf7394dba1bad0a
voted for 2d642654e88b852b7952a85682def21c51f8b714 10026
19869:X 30 Jan 2019 13:15:55.165 # -failover-abort-not-elected master
rds-6380 172.24.23.121 6380
19869:X 30 Jan 2019 13:15:55.220 # Next failover delay: I will not start a
failover before Wed Jan 30 13:16:45 2019

sentinel2.blue

21469:X 30 Jan 2019 13:15:44.782 # +new-epoch 10026
21469:X 30 Jan 2019 13:15:44.782 # +vote-for-leader
2d642654e88b852b7952a85682def21c51f8b714 10026
21469:X 30 Jan 2019 13:15:44.833 # Next failover delay: I will not start a
failover before Wed Jan 30 13:16:45 2019

sentinel3.blue

21970:X 30 Jan 2019 13:15:44.781 # +new-epoch 10026
21970:X 30 Jan 2019 13:15:44.782 # +vote-for-leader
2d642654e88b852b7952a85682def21c51f8b714 10026
21970:X 30 Jan 2019 13:15:44.861 # Next failover delay: I will not start a
failover before Wed Jan 30 13:16:45 2019


Manual failover does not work either, I suspect, due to master status=odown
and slaves=0.

# redis-cli -h sentinel1.blue -p 26380 SENTINEL failover rds-6380
(error) NOGOODSLAVE No suitable replica to promote

# redis-cli -h sentinel1.blue -p 26380 SENTINEL get-master-addr-by-name
rds-6380
1) "172.24.23.121"
2) "6380"

# redis-cli -h sentinel1.blue -p 26380 SENTINEL slaves rds-6380
(empty list or set)


[ro...@redis1.blue] [2019-01-31 09:24:46] ~
# redis-cli -h redis3.blue -p 6380 subscribe __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "__sentinel__:hello"
3) (integer) 1
^C

[ro...@redis1.blue] [2019-01-31 09:24:58] ~
# redis-cli -h redis2.blue -p 6380 subscribe __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "__sentinel__:hello"
3) (integer) 1
^C

[ro...@redis1.blue] [2019-01-31 09:25:09] ~
# redis-cli -h redis1.blue -p 6380 subscribe __sentinel__:hello
Reading messages... (press Ctrl-C to quit)
1) "subscribe"
2) "__sentinel__:hello"
3) (integer) 1
1) "message"
2) "__sentinel__:hello"
3)
"172.24.23.203,26380,fde792596cb11c017abd77df018148a55074620b,1103,rds-6380,172.24.23.121,6380,0"
1) "message"
2) "__sentinel__:hello"
3)
"172.24.23.201,26380,825d65e0e56816cc8c666400341fb6c186741c54,1103,rds-6380,172.24.23.121,6380,0"
1) "message"
2) "__sentinel__:hello"
3)
"172.24.23.202,26380,7459706aa08c2de1516fb862ec4030581f5dc48b,1103,rds-6380,172.24.23.121,6380,0"
1) "message"
2) "__sentinel__:hello"
3)
"172.24.23.203,26380,fde792596cb11c017abd77df018148a55074620b,1103,rds-6380,172.24.23.121,6380,0"
1) "message"
2) "__sentinel__:hello"
3)
"172.24.23.201,26380,825d65e0e56816cc8c666400341fb6c186741c54,1103,rds-6380,172.24.23.121,6380,0"


[ro...@redis1.blue] [2019-01-31 09:43:41] ~
# redis-cli -p 6380 client list | grep sentinel
id=186741 addr=172.24.23.203:46902 fd=30 name=sentinel-fde79259-pubsub
age=67640 idle=0 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0
oll=0 omem=0 events=r cmd=subscribe
id=186737 addr=172.24.23.201:47722 fd=7 name=sentinel-825d65e0-cmd age=67640
idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0
omem=0 events=r cmd=publish
id=186738 addr=172.24.23.201:57516 fd=28 name=sentinel-825d65e0-pubsub
age=67640 idle=0 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0
oll=0 omem=0 events=r cmd=subscribe
id=186744 addr=172.24.23.202:38082 fd=31 name=sentinel-7459706a-cmd
age=67640 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768
obl=0 oll=0 omem=0 events=r cmd=publish
id=186745 addr=172.24.23.202:54021 fd=32 name=sentinel-7459706a-pubsub
age=67640 idle=0 flags=P db=0 sub=1 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0
oll=0 omem=0 events=r cmd=subscribe
id=186740 addr=172.24.23.203:55403 fd=29 name=sentinel-fde79259-cmd
age=67640 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768
obl=0 oll=0 omem=0 events=r cmd=ping

[ro...@redis2.blue] [2019-01-31 09:45:08] ~
# redis-cli -p 6380 client list | grep sentinel

[ro...@redis3.blue] [2019-01-31 09:45:13] ~
# redis-cli -p 6380 client list | grep sentinel


[ro...@redis1.blue] [2019-01-31 09:44:15] ~
# redis-cli -p 6380 pubsub channels
1) "__sentinel__:hello"

[ro...@redis2.blue] [2019-01-31 09:47:59] ~
# redis-cli -p 6380 pubsub channels
(empty list or set)

[ro...@redis3.blue] [2019-01-31 09:45:19] ~
# redis-cli -p 6380 pubsub channels
(empty list or set)


[ro...@redis1.blue] [2019-01-31 09:49:26] ~
# redis-cli -p 6380 monitor | grep sentinel
1548924655.315747 [0 172.24.23.201:47722] "PUBLISH" "__sentinel__:hello"
"172.24.23.201,26380,825d65e0e56816cc8c666400341fb6c186741c54,1129,rds-6380,172.24.23.121,6380,0"
1548924655.723286 [0 172.24.23.203:55403] "PUBLISH" "__sentinel__:hello"
"172.24.23.203,26380,fde792596cb11c017abd77df018148a55074620b,1129,rds-6380,172.24.23.121,6380,0"
1548924656.006182 [0 172.24.23.202:38082] "PUBLISH" "__sentinel__:hello"
"172.24.23.202,26380,7459706aa08c2de1516fb862ec4030581f5dc48b,1129,rds-6380,172.24.23.121,6380,0"

[ro...@redis2.blue] [2019-01-31 09:49:47] ~
# redis-cli -p 6380 monitor | grep sentinel
^C

[ro...@redis3.blue] [2019-01-31 09:50:32] ~
# redis-cli -p 6380 monitor | grep sentinel
^C


[ro...@redis1.blue] [2019-01-31 09:54:33] ~
# redis-cli -p 6380 config get slaveof
1) "slaveof"
2) "redis2.blue 6380"

[ro...@redis2.blue] [2019-01-31 09:57:08] ~
# redis-cli -p 6380 config get slaveof
1) "slaveof"
2) ""


[ro...@redis1.blue] BLUE/WRO [2019-01-31 09:56:01] ~
# cat /var/run/redis/6380.conf
daemonize yes
pidfile /var/run/redis/6380.pid
logfile /var/log/redis/6380.log
dir /var/lib/redis/6380/
port 6380
bind 0.0.0.0
timeout 0
loglevel notice
slave-serve-stale-data yes
maxclients 1024
maxmemory 256mb
maxmemory-policy volatile-lru
appendonly no
appendfsync no
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 128mb
slowlog-log-slower-than 10000
slowlog-max-len 1024
list-max-ziplist-entries 512
list-max-ziplist-value 64
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
activerehashing yes
min-slaves-to-write 1
slave-priority 313

I use the same config on redis2 and redis3 (except for the slave-priority).

In addition to that redis instance, I have more instances on ports
6381-6384. 3 of those can properly perform the failover and sentinels talk
to all redis instances on all redis hosts. One instance (on port 6382) has
the same failover problem as 6380 instance.

*Why sentinels do not subscribe to 6380 instances on redis2 and redis3?

Why is the master status "odown" in sentinel info and no slaves while redis
master is up and has 2 slaves connected?*

Regards,

Daniel Andrzejewski



--
Sent from: http://redis-db.2338650.n4.nabble.com/
Reply all
Reply to author
Forward
0 new messages