Why does Redis cluster become un-available when majority or all masters are down

Manusha Wijekoon

unread,

Feb 23, 2017, 6:38:44 AM2/23/17

to Redis DB

I have a cluster which is composed of three partitions, each partition having one master and slave. In one of FT tests, I kill two masters approximately at the same time. This results in an un-responsive cluster and 'cluster nodes' returns following:

96a099e3de2f5c5c3ba2ac61bba3114f1b83bcd7 XXX:7102 myself,slave 1d57ae8d694531c3040c1323dfb5ad416eba14f8 0 0 16 connected
c918e62e16e5cf4dcc964ab6e4471f42c4fedfbb XXX:7103 master,fail? - 1487606447218 1487606445097 11 disconnected 0-5460
730e5ccaf05a51bf1503228cd0381388db2a2606 XXX:7002 slave d35168af1ee9d535d0a96f349e719c2942de811a 0 1487607059329 21 connected
b6af4b7f2993f35014538097df71d2ddcc728472 XXX:7001 slave c918e62e16e5cf4dcc964ab6e4471f42c4fedfbb 0 1487607060342 11 connected
d35168af1ee9d535d0a96f349e719c2942de811a XXX:7101 master - 0 1487607061353 21 connected 5461-10922
1d57ae8d694531c3040c1323dfb5ad416eba14f8 XXX:7003 master,fail? - 1487606447218 1487606444085 17 disconnected 10923-16383

I wish to understand why it is in this state. It does not seem to recover either. However, 4 instances are still running. It could have promoted the slaves of two masters that were killed to become new masters. (Assuming they were in sync).

andyh

unread,

Feb 23, 2017, 7:45:31 AM2/23/17

to redi...@googlegroups.com

This is by design: Redis Cluster can only work when majority of the masters are alive. (https://redis.io/topics/cluster-tutorial). Please read the tutorial.

If you want to know more about why there is such a design, you can read the cluster-spec: https://redis.io/topics/cluster-spec

Andy

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+unsubscribe@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

--

andyh

Andy Huang (Huangkejun)

AlexanderB

unread,

Feb 23, 2017, 1:36:01 PM2/23/17

to Redis DB

The basics of it are that the masters are the only nodes that coordinate and vote on slave promotions. When two fail at the same time, there aren't enough masters left around for quorum to promote any slaves. I heavily suggest reading through those cluster spec docs iandyh posted, they are a great resource for learning about these sorts of cluster behaviors.

Manusha Wijekoon

unread,

Feb 24, 2017, 3:37:49 AM2/24/17

to Redis DB

OK, I think we are good with that. Just wanted to understand if this is normal.

One more related question.

Does Redis cluster guarantee that masters and slaves are run in different boxes automatically? Lets say we deploy 66 instances in three boxes (Each box having 28 cores) and create a cluster. To tolerate single box failures, we want all masters in each box to have a slave running in a different box. Is this guaranteed by Redis cluster?

Andy Huang

unread,

Feb 24, 2017, 10:22:03 AM2/24/17

to redi...@googlegroups.com

Sort of. When you are using redis-trib.rb, it will handle the master allocation for you so that one master goes to one box and slave goes another. I heard they will put the Ruby script into standard Redis API in the future.

Again, please read the cluster tutorial, which includes how you can setup the cluster and how it works.

Andy

发自我的 iPad

--

You received this message because you are subscribed to the Google Groups "Redis DB" group.

To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.

Reply all

Reply to author

Forward