I have 2 nodes running redis-server.
Node2 became master at
[17181] 06 Apr 16:35:19.183 * MASTER MODE enabled (user request)
Node1 was started as a slave which connected to master and started syncing and loading data
[58496] 06 Apr 16:41:41.937 # Server started, Redis version 2.8.6
[58496] 06 Apr 16:41:41.937 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[58496] 06 Apr 16:42:17.229 * DB loaded from disk: 35.292 seconds
[58496] 06 Apr 16:42:17.229 * The server is now ready to accept connections on port 6379
[58496] 06 Apr 16:42:17.231 * SLAVE OF
169.254.1.3:6379 enabled (user request)
[58496] 06 Apr 16:42:18.251 * Connecting to MASTER
169.254.1.3:6379[58496] 06 Apr 16:42:18.251 * MASTER <-> SLAVE sync started
[58496] 06 Apr 16:42:18.251 * Non blocking connect for SYNC fired the event.
[58496] 06 Apr 16:42:18.251 * Master replied to PING, replication can continue...
[58496] 06 Apr 16:42:18.251 * Partial resynchronization not possible (no cached master)
[58496] 06 Apr 16:42:18.251 * Full resync from master: 95792b8f9b5764018614e125681751ec956b28b3:198
[58496] 06 Apr 16:42:47.969 * MASTER <-> SLAVE sync: receiving 1027944791 bytes from master
[58496] 06 Apr 16:43:01.014 * MASTER <-> SLAVE sync: Flushing old data
[58496] 06 Apr 16:43:30.076 * MASTER <-> SLAVE sync: Loading DB in memory
[58496] 06 Apr 16:44:05.413 * MASTER <-> SLAVE sync: Finished with success
When Node1 was loading data into memory - Node2 master was brought down
[17181] 06 Apr 16:44:02.205 # User requested shutdown...
[17181] 06 Apr 16:44:02.205 * Removing the pid file.
[17181] 06 Apr 16:44:02.205 # Redis is now ready to exit, bye bye...
In Node1, the redis-server never recovered. I see below errors continously until the redis-server was stopped.
[58496] 06 Apr 16:44:05.415 # Connection with master lost.
[58496] 06 Apr 16:44:05.415 * Caching the disconnected master state.
[58496] 06 Apr 16:44:05.925 * Connecting to MASTER
169.254.1.3:6379[58496] 06 Apr 16:44:05.925 * MASTER <-> SLAVE sync started
[58496] 06 Apr 16:44:05.925 * Non blocking connect for SYNC fired the event.
[58496] 06 Apr 16:44:05.925 * Master replied to PING, replication can continue...
[58496] 06 Apr 16:44:10.927 * (Non critical) Master does not understand REPLCONF listening-port: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:10.927 * Trying a partial resynchronization (request 95792b8f9b5764018614e125681751ec956b28b3:339).
[58496] 06 Apr 16:44:15.932 # Unexpected reply to PSYNC from master: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:15.932 * Discarding previously cached master state.
[58496] 06 Apr 16:44:15.932 * Retrying with SYNC...
[58496] 06 Apr 16:44:15.934 # Bad protocol from MASTER, the first byte is not '$' (we received '+OK'), are you sure the host and port are right?
[58496] 06 Apr 16:44:16.851 * Connecting to MASTER
169.254.1.3:6379[58496] 06 Apr 16:44:16.851 * MASTER <-> SLAVE sync started
[58496] 06 Apr 16:44:16.851 * Non blocking connect for SYNC fired the event.
[58496] 06 Apr 16:44:16.851 * Master replied to PING, replication can continue...
[58496] 06 Apr 16:44:21.856 * (Non critical) Master does not understand REPLCONF listening-port: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:21.856 * Partial resynchronization not possible (no cached master)
[58496] 06 Apr 16:44:26.858 # Unexpected reply to PSYNC from master: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:26.858 * Retrying with SYNC...
[58496] 06 Apr 16:44:26.860 # Bad protocol from MASTER, the first byte is not '$' (we received '+OK'), are you sure the host and port are right?
When we were trying to set the redis-server as master with the command "SLAVEOF NO ONE" we are getting below error.
MASTERDOWN Link with MASTER is down and slave-serve-stale-data is set to 'no'. for DB [0]
----------------------------------------------------------------
Note : The size of RDB file is 925MB which is the reason for the delay in loading DB into memory.
Please suggest how can we recover automatically from the above mentioned behavior.
Thanks,
Amar