Running more masters and slaves than servers creates a problem of affinity.

21 views
Skip to first unread message

Shawn Magill

unread,
Nov 20, 2015, 6:47:45 PM11/20/15
to Redis DB
I found a minor issue with the cluster tool (redis-trib.rb).  For economic reasons we desire to run more instances of redis than we have machines running redis.  Each server runs say, 3 instances of redis, a master and two slaves.  Masters divvy up the 16384 hashslots evenly.  Each master needs at least one replica, and it's very important that this replica exist on a server other than the one where the master runs, in case the whole server fails.  But the cluster tool isn't aware of the relationship or mapping between masters, slaves, and servers.  It appears that the utility consistently assigns the first of the list of sockets corresponding to instances of redis to the masters, but I'm not able to determine how it assigns the slaves after that.  So it's possible to avoid having two or more masters on one server, but not two or more slaves of the same master on a server, or slaves on the same server as the master.

Of course none of this would matter if the there were a 1-to-1 relationship of instances and servers.  What's needed is some kind of affinity feature or method to convey how masters and slaves are allocated across servers.  For smaller scale redis deploys like mine it would be enough to know how slaves are assigned to masters according to the sockets provided.

Below I've provided the output of the redis-trib.rb tool to show the problem.  With this cluster a total loss of machine two (172.30.6.232) would mean losing the middle master server plus both replicas of the first!  Not a catastrophe but not good either. With nine instances of redis on three servers it should be possible to lose two servers and still have access to all shards.

If anyone can shed light on the matter I would appreciate it.

# /usr/local/bin/redis-trib.rb create \
--replicas 2 \
>>> Creating cluster
Connecting to node 172.30.6.231:6379: OK
Connecting to node 172.30.6.232:6379: OK
Connecting to node 172.30.6.233:6379: OK
Connecting to node 172.30.6.232:6380: OK
Connecting to node 172.30.6.233:6380: OK
Connecting to node 172.30.6.231:6380: OK
Connecting to node 172.30.6.233:6381: OK
Connecting to node 172.30.6.231:6381: OK
Connecting to node 172.30.6.232:6381: OK
>>> Performing hash slots allocation on 9 nodes...
Using 3 masters:
M: 247e3ffb596ee3dcbd6da0ee517fbb675242fe76 172.30.6.231:6379
   slots:0-5460 (5461 slots) master
M: d057060ec152d8e2eae42a8bb1fe829ca8db26c7 172.30.6.232:6379
   slots:5461-10922 (5462 slots) master
M: 3b693e12a4e453d56636e7bef9be6a21a4445409 172.30.6.233:6379
   slots:10923-16383 (5461 slots) master
S: 7cc51c5aaf1abd532609d9f0c9e5b4177cac9c58 172.30.6.232:6380
   replicates 247e3ffb596ee3dcbd6da0ee517fbb675242fe76
S: 543a5cd54f4889b3e7271c55bfecef5acafa8de9 172.30.6.233:6380
   replicates 247e3ffb596ee3dcbd6da0ee517fbb675242fe76
S: c0af71f890073603ceec220f4c40d6d9dfc85118 172.30.6.231:6380
   replicates d057060ec152d8e2eae42a8bb1fe829ca8db26c7
S: e7679137d46ef59dce5f267062a2ba1ef47cfb7a 172.30.6.233:6381
   replicates 3b693e12a4e453d56636e7bef9be6a21a4445409
S: 41e83b6394f2d912f6b81d764d3371635168cbcd 172.30.6.231:6381
   replicates d057060ec152d8e2eae42a8bb1fe829ca8db26c7
S: 34f8109413af05e47da6753b26393604df8c3de7 172.30.6.232:6381
   replicates 3b693e12a4e453d56636e7bef9be6a21a4445409
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join...
>>> Performing Cluster Check (using node 172.30.6.231:6379)
M: 247e3ffb596ee3dcbd6da0ee517fbb675242fe76 172.30.6.231:6379
   slots:0-5460 (5461 slots) master
M: d057060ec152d8e2eae42a8bb1fe829ca8db26c7 172.30.6.232:6379
   slots:5461-10922 (5462 slots) master
M: 3b693e12a4e453d56636e7bef9be6a21a4445409 172.30.6.233:6379
   slots:10923-16383 (5461 slots) master
M: 7cc51c5aaf1abd532609d9f0c9e5b4177cac9c58 172.30.6.232:6380
   slots: (0 slots) master
   replicates 247e3ffb596ee3dcbd6da0ee517fbb675242fe76
M: 543a5cd54f4889b3e7271c55bfecef5acafa8de9 172.30.6.233:6380
   slots: (0 slots) master
   replicates 247e3ffb596ee3dcbd6da0ee517fbb675242fe76
M: c0af71f890073603ceec220f4c40d6d9dfc85118 172.30.6.231:6380
   slots: (0 slots) master
   replicates d057060ec152d8e2eae42a8bb1fe829ca8db26c7
M: e7679137d46ef59dce5f267062a2ba1ef47cfb7a 172.30.6.233:6381
   slots: (0 slots) master
   replicates 3b693e12a4e453d56636e7bef9be6a21a4445409
M: 41e83b6394f2d912f6b81d764d3371635168cbcd 172.30.6.231:6381
   slots: (0 slots) master
   replicates d057060ec152d8e2eae42a8bb1fe829ca8db26c7
M: 34f8109413af05e47da6753b26393604df8c3de7 172.30.6.232:6381
   slots: (0 slots) master
   replicates 3b693e12a4e453d56636e7bef9be6a21a4445409
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.


Reply all
Reply to author
Forward
0 new messages