Why slave changed to master without failover in the cluster?

cwh...@gmail.com

unread,

Aug 11, 2014, 11:45:48 PM8/11/14

to redi...@googlegroups.com

The cluster has six nodes, these nodes information is as follows through 'cluster nodes' command:

6f1a37d66aacbcb0128c6306b5b989ad81c160c1 192.168.1.157:6379 master - 0 1407808913042 13 connected 0-5460

876f04626f806226755c98f15a8475c950a8446b 192.168.1.155:6379 master - 0 1407808915045 5 connected 5461-10922

a9ce649f167b83dc477d195e62ca3fe9034db5e6 192.168.1.151:6379 slave 3e6c92b2137397573846ca7ce3385de19b033c13 0 1407808912040 9 connected

f7bbdafd8baee642981062877e51a7a47f85acdd 192.168.1.153:6379 slave 6f1a37d66aacbcb0128c6306b5b989ad81c160c1 0 1407808911039 13 connected

fbb5b1d8d5433be8f5a98d8854fc87b3b266cdff 192.168.1.152:6379 slave 876f04626f806226755c98f15a8475c950a8446b 0 1407808914042 5 connected

3e6c92b2137397573846ca7ce3385de19b033c13 :6379 myself,master - 0 0 9 connected 10923-16383

The ip of 'myself' above is 192.168.1.154.

After wrote some k/v into the cluster, and then executed 'flushall' command on every master node. At last executing 'cluster nodes command again:

6f1a37d66aacbcb0128c6306b5b989ad81c160c1 192.168.1.157:6379 master - 0 1407809551053 13 connected 0-5460

876f04626f806226755c98f15a8475c950a8446b 192.168.1.155:6379 master - 0 1407809548048 5 connected 5461-10922

a9ce649f167b83dc477d195e62ca3fe9034db5e6 192.168.1.151:6379 master - 0 1407809551053 16 connected 10923-16383

f7bbdafd8baee642981062877e51a7a47f85acdd 192.168.1.153:6379 slave 6f1a37d66aacbcb0128c6306b5b989ad81c160c1 0 1407809550051 13 connected

fbb5b1d8d5433be8f5a98d8854fc87b3b266cdff 192.168.1.152:6379 slave 876f04626f806226755c98f15a8475c950a8446b 0 1407809549049 5 connected

3e6c92b2137397573846ca7ce3385de19b033c13 :6379 myself,slave a9ce649f167b83dc477d195e62ca3fe9034db5e6 0 0 9 connected

Obviously, the roles of the previous master '192.168.1.154' and its slave '192.168.1.151' exchanged.

I read the log of '192.168.1.154', but did not understand why the change can happen? The log fragment is as follows:

39690:M 12 Aug 09:54:56.156 # Server started, Redis version 2.9.56

39690:M 12 Aug 09:54:56.158 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.

39690:M 12 Aug 09:54:56.159 * The server is now ready to accept connections on port 6379

39690:M 12 Aug 09:54:59.164 # Cluster state changed: ok

39690:M 12 Aug 09:55:27.900 * Clear FAIL state for node fbb5b1d8d5433be8f5a98d8854fc87b3b266cdff: slave is reachable again.

39690:M 12 Aug 09:55:34.521 * Clear FAIL state for node f7bbdafd8baee642981062877e51a7a47f85acdd: slave is reachable again.

39690:M 12 Aug 09:56:07.015 * Clear FAIL state for node a9ce649f167b83dc477d195e62ca3fe9034db5e6: slave is reachable again.

39690:M 12 Aug 09:56:08.010 * Slave asks for synchronization

39690:M 12 Aug 09:56:08.010 * Full resync requested by slave.

39690:M 12 Aug 09:56:08.010 * Starting BGSAVE for SYNC

39690:M 12 Aug 09:56:08.011 * Background saving started by pid 39829

39829:C 12 Aug 09:56:08.028 * DB saved on disk

39829:C 12 Aug 09:56:08.029 * RDB: 4 MB of memory used by copy-on-write

39690:M 12 Aug 09:56:08.115 * Background saving terminated with success

39690:M 12 Aug 09:56:08.115 * Synchronization with slave succeeded

39690:M 12 Aug 10:09:22.110 * Marking node 876f04626f806226755c98f15a8475c950a8446b as failing (quorum reached).

39690:M 12 Aug 10:09:22.111 # Cluster state changed: fail

39690:M 12 Aug 10:09:48.081 # Failover auth granted to fbb5b1d8d5433be8f5a98d8854fc87b3b266cdff for epoch 15

39690:M 12 Aug 10:09:48.082 * FAIL message received from 876f04626f806226755c98f15a8475c950a8446b about fbb5b1d8d5433be8f5a98d8854fc87b3b266cdff

39690:M 12 Aug 10:09:48.082 # Failover auth denied to a9ce649f167b83dc477d195e62ca3fe9034db5e6: its master is up

39690:M 12 Aug 10:09:48.082 # Configuration change detected. Reconfiguring myself as a replica of a9ce649f167b83dc477d195e62ca3fe9034db5e6

39690:S 12 Aug 10:09:48.183 * Clear FAIL state for node fbb5b1d8d5433be8f5a98d8854fc87b3b266cdff: slave is reachable again.

39690:S 12 Aug 10:09:49.085 * Connecting to MASTER 192.168.1.151:6379

39690:S 12 Aug 10:09:49.085 * MASTER <-> SLAVE sync started

39690:S 12 Aug 10:09:49.085 * Non blocking connect for SYNC fired the event.

39690:S 12 Aug 10:09:49.085 * Master replied to PING, replication can continue...

39690:S 12 Aug 10:09:49.085 * Partial resynchronization not possible (no cached master)

39690:S 12 Aug 10:09:49.086 * Full resync from master: 57beb0e3305365d9576deb5b4afd8000362655d0:514204110

39690:S 12 Aug 10:09:54.933 * MASTER <-> SLAVE sync: receiving 314225183 bytes from master

39690:S 12 Aug 10:09:56.006 * Clear FAIL state for node 876f04626f806226755c98f15a8475c950a8446b: is reachable again and nobody is serving its slots after some time.

39690:S 12 Aug 10:09:56.006 # Cluster state changed: ok

39690:S 12 Aug 10:09:57.823 * MASTER <-> SLAVE sync: Flushing old data

39690:S 12 Aug 10:09:57.824 * MASTER <-> SLAVE sync: Loading DB in memory

39690:S 12 Aug 10:11:24.815 * MASTER <-> SLAVE sync: Finished with success

I understand that the node 876f04626f806226755c98f15a8475c950a8446b(192.168.1.155) was detected failed, even if failover, it should be object to 192.168.1.155 , but why need failover for 192.168.1.154?

Maybe i misunderstand the log, please give me help, thanks!

赵方远

unread,

Jul 23, 2015, 1:15:43 PM7/23/15

to Redis DB, cwh...@gmail.com

I run into the same issue with flushall. Can anyone help? I think this is definitly not a rare case!

在 2014年8月12日星期二 UTC+8上午11:45:48，cwh...@gmail.com写道：

赵方远

unread,

Jul 23, 2015, 2:08:06 PM7/23/15

to Redis DB, cwh...@gmail.com

Got the answer! checkout here.

https://github.com/antirez/redis/issues/2691

by the way are you a Chinese?

在 2014年8月12日星期二 UTC+8上午11:45:48，cwh...@gmail.com写道：

The cluster has six nodes, these nodes information is as follows through 'cluster nodes' command:

Reply all

Reply to author

Forward