Cluster Fault Tolerance

Jayakrishna B H

unread,

Mar 30, 2016, 12:48:56 PM3/30/16

to Redis DB

1. How to setup Fault Tolerant Redis (3.0.501 64 bit Windows) Cluster ?

(Redis cluster is fault tolerant only when there is no huge data)

Steps performed for each scenario is given below.

Less Data Scenario:

(Refer to images from LessDataScenario.zip for Less Data Scenario)

1. A cluster is formed with Redis running on 7000, 7001 and 7002 ports as master of 7003, 7004 and 7005 respectively. (ref to 1-InitialClusterSetup.png).

2. Redis 7000 is stopped. (ref to 2-7000Stopped.png)

3. On Stopping Redis 7000, Redis running on 7003 has become master. Cluster state is ok. (ref to 3-7003BecameMaster.png)

4. Cluster check on 7002 showed all slots covered. (ref to 4-ClusterCheck.png)

5. Now, have started 7000 Redis. It has become slave of 7003 Redis. (ref to 5-7000Started.png)

6. Cluster check on both 7002 and 7003 showed [OK] all slots covered. It showed all six nodes (3 masters and 3 slaves) during this check. (ref to 6-ClusterCheck.png)

7. Stopped 7003 Redis which is the master now. (ref to 7-7003Stopped.png)

8. On stopping 7003, 7000 Redis has become master. Cluster state is ok. (ref to 8-7001BecameMaster.png)

9. Cluster check on both 7001 and 7002 showed [OK] all slots covered. (ref to 9-ClusterCheck.png)

10. Now started 7003 again. It has become slave of 7000. (ref to 10-7003Started.png)

11. Cluster check on both 7001 and 7002 showed [OK] all slots covered. It showed all six nodes (3 masters and 3 slaves) during this check. (ref to 11-ClusterCheck.png)

With Huge Data Scenario:

(Refer to images from OneLakhDataScenario.zip for Huge Data Scenario)

1. A cluster is formed with Redis running on 7000, 7001 and 7002 ports as master of 7003, 7004 and 7005 respectively. (ref to 1-InitialClusterSetup.png)

2. Added one lakh keys to the cluster. (ref to 2-OneLakhRecordsAdded.png)

3. Added one more key (ref to 3-Key17000Added.png)

4. Redis 7000 is stopped. (ref to 4-7000Stopped.png)

5. On Stopping Redis 7000, Redis running on 7003 has become master. Cluster state is ok. (ref to 5-7003BecameMaster.png)

6. Cluster check on both 7002 and 7003 showed [OK] all slots covered. (ref to 6-ClusterCheck.png)

7. On 7003, updated the key added in step 3. (ref to 7-Key17000UpdatedIn7003.png)

8. Started 7000 Redis. Became slave of Redis 7003. (ref to 8-7000Started.png)

9. When cluster check is done on 7002 and 7003, neither of them showed the node 7000 as slave of 7003. (ref to 9-ClusterCheck.png)

10. Stopped 7003 Redis which is master now. (ref to 10-7003Stopped.png)

11. 7000 Redis did not become master even though 7003 is stopped. (ref to 11-7000DidnotBecomeMaster.png)

12. Cluster check on 7002 and 7001 did not show 7000 node in it. Cluster status showed [ERR] i.e., not all 16384 slots are covered. (ref to 12-ClusterCheckFail.png)

13. Started 7003 again just to check the cluster status. (ref to 13-7003StartedAgain.png)

14. Cluster check on 7001 and 7002 showed ERR i.e. Not all slots are covered whereas cluster check on 7003 is showing [OK] i.e. All slots are covered. (ref to 14-ClusterCheckAfter7003StartedAgain.png)

Cluster configuration used for Redis7000 is given below.

port 7000

cluster-enabled yes

cluster-config-file nodes-7000.conf

cluster-slave-validity-factor 0

cluster-node-timeout 10000

#repl-ping-slave-period 1

appendonly yes

appendfilename appendonly-7000.aof

dbfilename dump-7000.rdb

#logfile 7000.log

logfile ""

maxmemory 3GB

lua-time-limit 5000

LessDataScenario.zip

OneLakhDataScenario.zip

Jayakrishna B H

unread,

Apr 5, 2016, 2:13:30 PM4/5/16

to Redis DB

Have tried without appendonly and without dump rdb configurations also. Still cluster is failing when the data is huge.
Can someone help me on this?

Tuco

unread,

Apr 6, 2016, 12:53:26 AM4/6/16

to Redis DB

If you see your file, 9-ClusterCheck.png, it is after 7000 is started, but it does not show 7000 anywhere, and it is not a slave of 7003, so obviously it does not become master when 7003 is down. so your issue is because of 7000.

getting to 7000, if you look at file, 8-7000Started.png, it says full resync from master <badf5.....> while <bad.....>is not your 7003 instance....

i am guessing, you had an earlier node, which was of this name while your played around with the cluster....

a good idea will be to delete the files related to 7000....like nodes-7000.conf(which redis creates by itself), rdb of 7000 and aof of 7000, and add it to the cluster and then see...

or if you want to play safe, have another node at 7010, and proceed from there...

most probably your 7000 is corrupted because of earlier experiments, and its becoming all by itself....(as can be seem from 9-CheckCluster.png and 8-7000Started.png)

Jayakrishna B H

unread,

Apr 6, 2016, 7:19:40 AM4/6/16

to Redis DB

Hi,

Firstly thanks for the reply.

Have set up new cluster with different ports 9000 to 9005 and tested the same scenario with less data. I have noticed that whenever a slave is up, the full resync from master log shows a different id (blahblah....:1) which is not from any of the instances of the cluster. This behaviour is same even for successful Less Data Scenario which was posted previously.

Attaching logs of 9000 and 9003 and initial cluster formation details.

9000.log

9003.log

ClusterDetails.png

Tuco

unread,

Apr 6, 2016, 8:13:47 AM4/6/16

to Redis DB

Hi,

Please see this link.

Also, this

As per the 1st link, you will need to check the "repl-timeout" for your case, by default it is 60 secs.

you can find it out by looking in your conf file, or executing, "config get repl-timeout" on cli.

Increase it to some long time and see if it works.

P.S. I am no expert in any way.

Rahul Babbar

unread,

Apr 6, 2016, 8:17:06 AM4/6/16

to redi...@googlegroups.com

Also, see this.

--
You received this message because you are subscribed to a topic in the Google Groups "Redis DB" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/redis-db/gA0DCigjnXo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at https://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Jayakrishna B H

unread,

Apr 18, 2016, 2:35:24 AM4/18/16

to Redis DB

Hi,

Have tried with the replication time out and replication buffer as suggested. Still cluster is behaving the same(failing).

Is there any other way to solve this issue?

Reply all

Reply to author

Forward