Slaves Perform Full Resync After Restarting

25 views
Skip to first unread message

hh90

unread,
May 23, 2018, 2:17:15 PM5/23/18
to Redis DB
I'm using Redis 3.0.6 running in cluster mode, with 3 masters and 3 slaves.

I noticed the slaves perform a full resync with their master after restarting. I expected the slaves would perform a partial resync to obtain only the statements they missed while they were offline. The restarts are brief and used to apply a configuration change.

Judging from the logs on my slave, it looks like it didn't save some state information about the master and is forced to perform a full sync. I understand this is an expensive operation for the master because it has to fork and save its dataset.

Logs from the master:

19863:M 22 May 17:31:14.879 # Connection with slave client id #22 lost.
19863:M 22 May 17:31:19.512 * Slave 172.20.10.186:6379 asks for synchronization
19863:M 22 May 17:31:19.512 * Full resync requested by slave 172.20.10.186:6379
19863:M 22 May 17:31:19.512 * Starting BGSAVE for SYNC with target: disk
19863:M 22 May 17:31:19.527 * Background saving started by pid 23459
23459:C 22 May 17:31:26.117 * DB saved on disk
23459:C 22 May 17:31:26.130 * RDB: 2 MB of memory used by copy-on-write
19863:M 22 May 17:31:26.193 * Background saving terminated with success
19863:M 22 May 17:31:29.744 * Synchronization with slave 172.20.10.186:6379 succeeded

Logs from the slave:

496:S 22 May 17:31:19.516 * Connecting to MASTER 172.20.0.160:6379
496:S 22 May 17:31:19.516 * MASTER <-> SLAVE sync started
496:S 22 May 17:31:19.516 * Non blocking connect for SYNC fired the event.
496:S 22 May 17:31:19.517 * Master replied to PING, replication can continue...
496:S 22 May 17:31:19.518 * Partial resynchronization not possible (no cached master)
496:S 22 May 17:31:19.534 * Full resync from master: 5b95126e990455d3170578a7e11fb6bcb299e161:503298
496:S 22 May 17:31:26.200 * MASTER <-> SLAVE sync: receiving 252084284 bytes from master
496:S 22 May 17:31:29.943 * MASTER <-> SLAVE sync: Flushing old data
496:S 22 May 17:31:30.721 * MASTER <-> SLAVE sync: Loading DB in memory
496:S 22 May 17:31:34.193 * MASTER <-> SLAVE sync: Finished with success

I read an article on RedisLabs that discussed changing the value of repl-timeout to fix this issue, but it's not clear to me that I'm experiencing replication timeouts by looking at the logs.

Is there another configuration parameter I can look at so the slaves only need to perform partial resyncs after a restart?

hva...@gmail.com

unread,
May 23, 2018, 11:54:51 PM5/23/18
to Redis DB
Consider upgrading to the current version (v4.0.9).
The release notes for v4.0.7 contain this description of a fix:

* Fix many potentially successful partial synchronizations that end
  doing a full SYNC, because of a bug destroying the replication
  backlog on the slave. So after a failover the slave was often not able
  to PSYNC with masters, and a full SYNC was triggered. The bug only
  happened after 1 hour of uptime so escaped the unit tests. (Oran Agra)

The notes for v4.0.2 include something similar:

* A number of bugs were fixed in the area of PSYNC2 replication in the
specific area of restarting an instance with an RDB file having the
repliacation meta-data to continue without a full resynchronization. The
old code allowed several inconsistencies under certain conditions, like
starting a master with an RDB file generated by a slave, and later using
such master to connect previous slaves having the same replication history.
Because of other bugs, sometimes the replication resulted in a full
synchronization even if actually a partial resynchronization was possible
and so forth. Several commits by different authors fix different bugs here.
Reply all
Reply to author
Forward
0 new messages