Hi All,
We are finding some strange behaviour in our redis cluster.
where a slave tries to do full sync with master and after some loses disconnects and tries to reconnect again.
what we suspect is as redis slave is taking around two minutes to load received rdb file.
meanwhile other master thinks its unreachable and slaves master disconnects it.
master logs
22:M 01 Sep 19:56:51.456 * Unable to partial resync with slave
10.66.110.61:6379 for lack of backlog (Slave request was: 35541788818).
22:M 01 Sep 19:56:51.456 * Delay next BGSAVE for diskless SYNC
22:M 01 Sep 19:56:57.962 * Starting BGSAVE for SYNC with target: slaves sockets
22:M 01 Sep 19:56:58.116 * Background RDB transfer started by pid 249
249:C 01 Sep 19:58:41.721 * RDB: 971 MB of memory used by copy-on-write
22:M 01 Sep 19:58:42.104 * Background RDB transfer terminated with success
22:M 01 Sep 19:58:42.104 # Slave
10.66.110.61:6379 correctly received the streamed RDB file.
22:M 01 Sep 19:58:42.104 * Streamed RDB transfer with slave
10.66.110.61:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
22:M 01 Sep 19:59:00.314 * FAIL message received from cebafb35e58d950b27761e13ed8b8c20734841ef about 45266afe23fdb8fec18c0b2278be1a795e20243d
22:M 01 Sep 19:59:16.023 # Client id=1990468 addr=
10.66.110.61:48599 fd=385 name= age=145 idle=1 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=16384 oll=7277 omem=116824142 events=r cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.
slave logs
22:S 01 Sep 19:56:51.453 * MASTER <-> SLAVE sync started
22:S 01 Sep 19:56:51.454 * Non blocking connect for SYNC fired the event.
22:S 01 Sep 19:56:51.454 * Master replied to PING, replication can continue...
22:S 01 Sep 19:56:51.456 * Trying a partial resynchronization (request 63e4efe185731152f9598d56c1d8dc8a0073ecb3:35541788818).
22:S 01 Sep 19:56:57.962 * Full resync from master: 63e4efe185731152f9598d56c1d8dc8a0073ecb3:35783099392
22:S 01 Sep 19:56:57.962 * Discarding previously cached master state.
22:S 01 Sep 19:56:58.118 * MASTER <-> SLAVE sync: receiving streamed RDB from master
22:S 01 Sep 19:58:41.639 * MASTER <-> SLAVE sync: Flushing old data
22:S 01 Sep 19:59:49.950 * MASTER <-> SLAVE sync: Loading DB in memory
22:S 01 Sep 20:01:39.351 * MASTER <-> SLAVE sync: Finished with success
22:S 01 Sep 20:01:39.352 # Connection with master lost.
Any help to understand this will help.
thanks and regards,
Satyendr