Can't connect to a single node in a Redis cluster

1,261 views
Skip to first unread message

hh90

unread,
Dec 19, 2017, 1:21:19 AM12/19/17
to Redis DB
I have a Redis cluster of 4 masters and 3 slaves/replicas. One node in the cluster, a slave, keeps closing the connection every time I try to connect to it. I do not see errors in the log for the slave.

Redis version: 4.0.2

The error I see for every command directed at this node:

C02VJ0T6HV2Q:~ me$ redis-cli -p 7000 cluster nodes
Error: Server closed the connection

Output of cluster health check:

C02VJ0T6HV2Q:~ me$ redis-trib check 127.0.0.1:7003
[ERR] Sorry, can't connect to node 127.0.0.1:7000
>>> Performing Cluster Check (using node 127.0.0.1:7003)
M: fa6ea1b092b1e083329d306d9fcd6b7d3bb95b92 127.0.0.1:7003
   slots:0-5460 (5461 slots) master
   0 additional replica(s)
M: 8f01a3c5f121f289e686c163ede92b95e4ba2938 127.0.0.1:7006
   slots: (0 slots) master
   0 additional replica(s)
S: f0611cd74c54552301100fbbee9c63197e1bcd9b 127.0.0.1:7001
   slots: (0 slots) slave
   replicates 58e1f6ba0609ec71c39afcdaa9cde3ba60536989
M: 58e1f6ba0609ec71c39afcdaa9cde3ba60536989 127.0.0.1:7004
   slots:5461-10922 (5462 slots) master
   1 additional replica(s)
S: 1d95b0f98298f555ee918ee88daf60361c3115a1 127.0.0.1:7005
   slots: (0 slots) slave
   replicates ece88f7e8bdc33e6b1b05e44e8c2b138167ed4d6
M: ece88f7e8bdc33e6b1b05e44e8c2b138167ed4d6 127.0.0.1:7002
   slots:10923-16383 (5461 slots) master
   1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

The log for the node I can't connect to:

C02VJ0T6HV2Q:7000 me$ ../redis-server redis.conf
16590:C 18 Dec 10:18:25.244 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
16590:C 18 Dec 10:18:25.245 # Redis version=4.0.2, bits=64, commit=00000000, modified=0, pid=16590, just started
16590:C 18 Dec 10:18:25.245 # Configuration loaded
16590:M 18 Dec 10:18:25.246 * Increased maximum number of open files to 10032 (it was originally set to 4864).
16590:M 18 Dec 10:18:25.247 * Node configuration loaded, I'm 35cf0c299ce365167f007c45271b5352810395f5
16590:M 18 Dec 10:18:25.247 # Server initialized
16590:M 18 Dec 10:18:25.247 * DB loaded from append only file: 0.000 seconds
16590:M 18 Dec 10:18:25.247 * Ready to accept connections
16590:S 18 Dec 10:18:25.248 * Before turning into a slave, using my master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
16590:S 18 Dec 10:18:25.248 # Cluster state changed: ok
16590:S 18 Dec 10:18:26.286 * Connecting to MASTER 127.0.0.1:7003
16590:S 18 Dec 10:18:26.286 * MASTER <-> SLAVE sync started
16590:S 18 Dec 10:18:26.286 * Non blocking connect for SYNC fired the event.
16590:S 18 Dec 10:18:26.286 * Master replied to PING, replication can continue...
16590:S 18 Dec 10:18:26.287 * Trying a partial resynchronization (request fa18bca2285d749ab71f2c49d0ddb3514b9b8cbf:1).
16590:S 18 Dec 10:18:26.288 * Full resync from master: 1e3c78d6d6b5a395a30e5a9c22bf34423ad4c717:2671882
16590:S 18 Dec 10:18:26.288 * Discarding previously cached master state.
16590:S 18 Dec 10:18:26.295 * MASTER <-> SLAVE sync: receiving 204 bytes from master
16590:S 18 Dec 10:18:26.295 * MASTER <-> SLAVE sync: Flushing old data
16590:S 18 Dec 10:18:26.296 * MASTER <-> SLAVE sync: Loading DB in memory
16590:S 18 Dec 10:18:26.296 * MASTER <-> SLAVE sync: Finished with success
16590:S 18 Dec 10:18:26.296 * Background append only file rewriting started by pid 16592
16590:S 18 Dec 10:18:26.321 * AOF rewrite child asks to stop sending diffs.
16592:C 18 Dec 10:18:26.321 * Parent agreed to stop sending diffs. Finalizing AOF...
16592:C 18 Dec 10:18:26.321 * Concatenating 0.00 MB of AOF diff received from parent.
16592:C 18 Dec 10:18:26.321 * SYNC append only file rewrite performed
16590:S 18 Dec 10:18:26.389 * Background AOF rewrite terminated with success
16590:S 18 Dec 10:18:26.389 * Residual parent diff successfully flushed to the rewritten AOF (0.00 MB)
16590:S 18 Dec 10:18:26.389 * Background AOF rewrite finished successfully

I have tried stopping and restarting the failing node. It still does not accept connections. Any thoughts about why I can't connect to this one node in the cluster?

hh90

unread,
Dec 20, 2017, 1:23:27 PM12/20/17
to Redis DB
The problem was that I had Cassandra running on the same machine as a cluster node listening on port 7000. Cassandra uses port 7000 for internal purposes. I changed the port for my Redis cluster node and could connect to it normally.

If you run into the situation where you can't connect to the cluster node but it's still considered a healthy node, check if there are any other processes listening on the same port as Redis.
Reply all
Reply to author
Forward
0 new messages