psync scheduled to be closed ASAP for overcoming of output buffer limits

343 views
Skip to first unread message

Sargu Xcode

unread,
Jan 26, 2018, 4:15:48 AM1/26/18
to Redis DB
Hi All ,

I have a pair of redis replica . Lately due to a planned downtime from the cloud vendor , i had to switch the master-slave setup.
Earlier R1 as master and R2 as slave which had no problems in replications or whatsoever. After the switch where R2 became master
and R1 become slave am keeping facing the issue as above 'subject'. No other config change was done except telling R2 to become master
and R1 slave of R2. Below are the logs form R2 (current master). Not sure how to handle this as i am first time seeing this. Tried to google this , but getting multiple answers.
Appreciate your kind advice on this matter. Thank you

4627:C 26 Jan 17:03:41.752 * DB saved on disk

4627:C 26 Jan 17:03:41.971 * RDB: 245 MB of memory used by copy-on-write

3843:M 26 Jan 17:03:43.321 * Background saving terminated with success

3843:M 26 Jan 17:04:05.149 * Slave 172.31.12.207:6379 asks for synchronization

3843:M 26 Jan 17:04:05.149 * Unable to partial resync with slave 172.31.12.207:6379 for lack of backlog (Slave request was: 1002631585).

3843:M 26 Jan 17:04:05.149 * Starting BGSAVE for SYNC with target: disk

3843:M 26 Jan 17:04:05.451 * Background saving started by pid 4641

4641:C 26 Jan 17:06:06.150 * DB saved on disk

4641:C 26 Jan 17:06:06.395 * RDB: 244 MB of memory used by copy-on-write

3843:M 26 Jan 17:06:06.795 * Background saving terminated with success

3843:M 26 Jan 17:06:49.755 * Synchronization with slave 172.31.12.207:6379 succeeded

3843:M 26 Jan 17:07:07.012 * 10000 changes in 60 seconds. Saving...

3843:M 26 Jan 17:07:07.315 * Background saving started by pid 4662

3843:M 26 Jan 17:08:27.046 # Client id=830 addr=172.31.12.207:12856 fd=100 name= age=262 idle=1 flags=S db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=5384 omem=91413488 events=rw cmd=psync scheduled to be closed ASAP for overcoming of output buffer limits.

3843:M 26 Jan 17:08:27.145 # Connection with slave 172.31.12.207:6379 lost.

4662:C 26 Jan 17:09:08.018 * DB saved on disk

4662:C 26 Jan 17:09:08.235 * RDB: 237 MB of memory used by copy-on-write

3843:M 26 Jan 17:09:08.585 * Background saving terminated with success

3843:M 26 Jan 17:09:09.331 * Slave 172.31.12.207:6379 asks for synchronization

3843:M 26 Jan 17:09:09.331 * Unable to partial resync with slave 172.31.12.207:6379 for lack of backlog (Slave request was: 1093967422).

hva...@gmail.com

unread,
Jan 26, 2018, 10:51:32 AM1/26/18
to Redis DB
In the partial log you posted, I see a pattern:
  • Master saves snapshot to disk
  • Slave requests resync (it has fallen behind in replication)
  • Cannot do partial resync because the replication buffer (backlog) for that slave has overflowed
  • Master switches to full resync, saves to file, transfers file to slave, completes the sync with the slave
Shortly afterward, the Master saves a snapshot to disk again, and the above sequence repeats.

This suggests one of two things:
  1. The network connection between your master and your slave is poor, causing your slave to lose its connection often
  2. Your master server becomes unresponsive when it saves snapshots to disk, causing the slave to give up, disconnect and reconnect
It's not possible to know from the log you posted if the slave sync trouble always happens after a snapshot is saved.  You'll have to look through more of your master's logfile to see if it really is a consistent pattern.

If the slave does consistently disconnect when the master saves a snapshot, then you want to find out what aspect of the snapshot save is causing trouble.  Does the volume of data being written to disk fill up the network interface because the data is being saved to an NFS or SAN volume?  Does the volume of data being saved to disk saturate the server's disk i/o making the disk 100% busy and making the whole server unresponsive?  Does the extra 230/240MB memory consumed while saving push the server into an out-of-memory condition, making it unresponsive?

The Redis logfile on the slave can probably shed some light on why it's disconnecting from the master.

Your monitoring graphs that record the master and slave server's cpu consumption, memory consumption, disk busy %, disk i/o read and write volume, network i/o in and out volume (and other statistics) will be an enormous help in narrowing down the issue that's causing the resyncs and disconnects.
Reply all
Reply to author
Forward
0 new messages