Seems like setting "vm.overcommit_memory = 1" in the slave1 fixed the issue.
I am having this exact same issue trying to sync a slave to another slave of a master.
Will call them slave1 -> slave0 -> master for simpler explanation
slave1 connects to slave0 and performs a SYNC. slave0 then starts a BGSAVE (about 7Gb dataset and ~15GB of free memory). When the BGSAVE finishes, slave0 starts transferring the dataset to slave1. When it finishes, I get this error:
[2729] 06 Jun 21:49:19 # Write error or short write writing to the DB dump file needed for MASTER <-> SLAVE synchrnonization: Operation now in progress
And then it starts it all over again.. Forever and ever...
Wonder what is going on.. both slave1 and slave0 have plenty of free memory and disk space.
The only problem I can think of is that the master is having some sort of issue right now that it cant perform a BGSAVE, always fails:
[root@ip-10-160-99-104 ~]# redis-cli bgsave
(error) ERR
I dont know what is going on because it is logging to stdout and running with 'daemonize yes', so I cant really analyze the logs.
Anyone has any clue? I dont want to restart the master because that would cause me a lot of problems, so I wanted to add a second slave (slave1), promote it to master and then remove the old master.
Thanks, any help would be appreciated.