After restarting one of the master nodes in a Redis cluster, (error) CLUSTERDOWN The cluster is down is reported when tried to do a "get"

250 views
Skip to first unread message

D Wu

unread,
Jul 2, 2015, 1:40:13 AM7/2/15
to redi...@googlegroups.com
I have set up a Redis cluster with 9 nodes, 3 master, with replica factor of 2, that this, each master has 2 slaves.

The master to slave configuration is like this:

127.0.0.1:50001 -> 127.0.0.1:50004 & 127.0.0.1:50005
127.0.0.1:50002 -> 127.0.0.1:50006 & 127.0.0.1:50007
127.0.0.1:50003 -> 127.0.0.1:50008 & 127.0.0.1:50009

I was able to successflly load 65 million key-value records into each of these 9 nodes by copying a pre-existing .rdb file from another standalone instance.  That's 9 copies of the exact same set of data.

I was able to query for keys from each of these 9 nodes, no problem.

I shut down one of my master nodes 50001.  One of its twos slaves, 50005 got promoted as new master. So far so good.

I then restarted 50001.  Checked the cluster nodes, no problem, although 50001 is now "slave":

127.0.0.1:50001> cluster nodes
8cce17e49471613440cd6f3b08ce3d542097e2e8 192.168.140.108:50002 master - 0 1435790812260 2 connected 5461-10922
d869eed04c5315c62a3442935236c04ff7ce094c 192.168.140.108:50001 myself,slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 0 1 connected
cf41213b634dde9c8a62fdb07f313d45eed59ef4 192.168.140.108:50005 master - 0 1435790811761 10 connected 0-5460
01c5ba957e31c5fff5363fffc34c3069d2fabace 192.168.140.108:50004 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790811761 10 connected
6c80b3e416c119a86d190c5b2d918e1e5b56a97a 192.168.140.108:50007 slave 8cce17e49471613440cd6f3b08ce3d542097e2e8 0 1435790811761 7 connected
1f71755627da96eabb60873f1a302f5aa8766786 192.168.140.108:50006 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790811760 10 connected
ef3f22c8e828fc46f9e587daa04e979d26af3006 192.168.140.108:50003 master - 0 1435790811660 3 connected 10923-16383
5dd0eba1ebe6651d70a20b00aa57e3474b885656 192.168.140.108:50008 slave ef3f22c8e828fc46f9e587daa04e979d26af3006 0 1435790811460 8 connected
083dd9ea0a31d1976e02686940527f428f58f7f6 192.168.140.108:50009 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790811460 10 connected
127.0.0.1:50001>


When I tried to get a value for a given key AFTER the restart of 50001, I was redirected to 50003, a master node, and I got this error message.

127.0.0.1:50001> get 000018846E1B05F3BB3DFF6E5847C4E7
-> Redirected to slot [14297] located at 192.168.140.108:50003
(error) CLUSTERDOWN The cluster is down
192.168.140.108:50003>


Running the "cluster nodes" command on 50003 gives me a HUGE list like this:

127.0.0.1:50003> cluster nodes
6c80b3e416c119a86d190c5b2d918e1e5b56a97a 192.168.140.108:50007 slave 8cce17e49471613440cd6f3b08ce3d542097e2e8 0 1435790069525 7 connected
8cce17e49471613440cd6f3b08ce3d542097e2e8 192.168.140.108:50002 master - 0 1435790069124 2 connected 5461-10922
cf41213b634dde9c8a62fdb07f313d45eed59ef4 192.168.140.108:50005 master - 0 1435790069425 10 connected
ef3f22c8e828fc46f9e587daa04e979d26af3006 192.168.140.108:50003 myself,master - 0 0 3 connected 10923-16383 [0-<-d869eed04c5315c62a3442935236c04ff7ce094c] [1-<-d869eed04c5315c62a3442935236c04ff7ce094c] [2-<-d869eed04c5315c62a3442935236c04ff7ce094c] [3-<-d869eed04c5315c62a3442935236c04ff7ce094c] [4-<-d869eed04c5315c62a3442935236c04ff7ce094c] [5-<-d869eed04c5315c62a3442935236c04ff7ce094c] [6-<-d869eed04c5315c62a3442935236c04ff7ce094c] [7-<-d869eed04c5315c62a3442935236c04ff7ce094c] [8-<-d869eed04c5315c62a3442935236c04ff7ce094c] [9-<-d869eed04c5315c62a3442935236c04ff7ce094c] [10-<-d869eed04c5315c62a3442935236c04ff7ce094c] [11-<-d869eed04c5315c62a3442935236c04ff7ce094c] [12-<-d869eed04c5315c62a3442935236c04ff7ce094c] [13-<-d869eed04c5315c62a3442935236c04ff7ce094c] [14-<-d869eed04c5315c62a3442935236c04ff7ce094c] [15-<-d869eed04c5315c62a3442935236c04ff7ce094c] [16-<-d869eed04c5315c62a3442935236c04ff7ce094c] [17-<-d869eed04c5315c62a3442935236c04ff7ce094c] [18-<-d869eed04c5315c62a3442935236c04ff7ce094c] [19-<-d869eed04c5315c62a3442935236c04ff7ce094c] [20-<-d869eed04c5315c62a3442935236c04ff7ce094c] [21-<-d869eed04c5315c62a3442935236c04ff7ce094c] [22-<-d869eed04c5315c62a3442935236c04ff7ce094c] [23-<-d869eed04c5315c62a3442935236c04ff7ce094c] [24-<-d869eed04c5315c62a3442935236c04ff7ce094c] [25-<-d869eed04c5315c62a3442935236c04ff7ce094c] [26-<-d869eed04c5315c62a3442935236c04ff7ce094c] [27-<-d869eed04c5315c62a3442935236c04ff7ce094c] [28-<-d869eed04c5315c62a3442935236c04ff7ce09
....
[10920-<-8cce17e49471613440cd6f3b08ce3d542097e2e8] [10921-<-8cce17e49471613440cd6f3b08ce3d542097e2e8] [10922-<-8cce17e49471613440cd6f3b08ce3d542097e2e8]
083dd9ea0a31d1976e02686940527f428f58f7f6 192.168.140.108:50009 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790069625 10 connected
5dd0eba1ebe6651d70a20b00aa57e3474b885656 192.168.140.108:50008 slave ef3f22c8e828fc46f9e587daa04e979d26af3006 0 1435790069024 8 connected
01c5ba957e31c5fff5363fffc34c3069d2fabace 192.168.140.108:50004 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790069625 10 connected
1f71755627da96eabb60873f1a302f5aa8766786 192.168.140.108:50006 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790069525 10 connected
d869eed04c5315c62a3442935236c04ff7ce094c 192.168.140.108:50001 slave cf41213b634dde9c8a62fdb07f313d45eed59ef4 0 1435790069024 10 connected
127.0.0.1:50003>
192.168.140.108:50003> info keyspace
# Keyspace
db0:keys=65129392,expires=0,avg_ttl=0
192.168.140.108:50003>
192.168.140.108:50003> cluster info
cluster_state:fail
cluster_slots_assigned:10923
cluster_slots_ok:10923
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:9
cluster_size:2
cluster_current_epoch:10
cluster_my_epoch:3
cluster_stats_messages_sent:144086
cluster_stats_messages_received:141301
192.168.140.108:50003>
192.168.140.108:50003>
192.168.140.108:50003> get 000018846E1B05F3BB3DFF6E5847C4E7
(error) CLUSTERDOWN The cluster is down
192.168.140.108:50003>
192.168.140.108:50003>


Any help would be appreciated.

Thanks.


Reply all
Reply to author
Forward
0 new messages