redis-server went into an error loop without accepting the master role assignment

3,133 views
Skip to first unread message

Amar Nv

unread,
Apr 7, 2015, 11:11:06 AM4/7/15
to redi...@googlegroups.com
I have 2 nodes running redis-server.

Node2 became master at

[17181] 06 Apr 16:35:19.183 * MASTER MODE enabled (user request)

Node1 was started as a slave which connected to master and started syncing and loading data

[58496] 06 Apr 16:41:41.937 # Server started, Redis version 2.8.6
[58496] 06 Apr 16:41:41.937 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
[58496] 06 Apr 16:42:17.229 * DB loaded from disk: 35.292 seconds
[58496] 06 Apr 16:42:17.229 * The server is now ready to accept connections on port 6379
[58496] 06 Apr 16:42:17.231 * SLAVE OF 169.254.1.3:6379 enabled (user request)
[58496] 06 Apr 16:42:18.251 * Connecting to MASTER 169.254.1.3:6379
[58496] 06 Apr 16:42:18.251 * MASTER <-> SLAVE sync started
[58496] 06 Apr 16:42:18.251 * Non blocking connect for SYNC fired the event.
[58496] 06 Apr 16:42:18.251 * Master replied to PING, replication can continue...
[58496] 06 Apr 16:42:18.251 * Partial resynchronization not possible (no cached master)
[58496] 06 Apr 16:42:18.251 * Full resync from master: 95792b8f9b5764018614e125681751ec956b28b3:198
[58496] 06 Apr 16:42:47.969 * MASTER <-> SLAVE sync: receiving 1027944791 bytes from master
[58496] 06 Apr 16:43:01.014 * MASTER <-> SLAVE sync: Flushing old data
[58496] 06 Apr 16:43:30.076 * MASTER <-> SLAVE sync: Loading DB in memory
[58496] 06 Apr 16:44:05.413 * MASTER <-> SLAVE sync: Finished with success

When Node1 was loading data into memory - Node2 master was brought down

[17181] 06 Apr 16:44:02.205 # User requested shutdown...
[17181] 06 Apr 16:44:02.205 * Removing the pid file.
[17181] 06 Apr 16:44:02.205 # Redis is now ready to exit, bye bye...


In Node1, the redis-server never recovered. I see below errors continously until the redis-server was stopped.

[58496] 06 Apr 16:44:05.415 # Connection with master lost.
[58496] 06 Apr 16:44:05.415 * Caching the disconnected master state.
[58496] 06 Apr 16:44:05.925 * Connecting to MASTER 169.254.1.3:6379

[58496] 06 Apr 16:44:05.925 * MASTER <-> SLAVE sync started
[58496] 06 Apr 16:44:05.925 * Non blocking connect for SYNC fired the event.
[58496] 06 Apr 16:44:05.925 * Master replied to PING, replication can continue...
[58496] 06 Apr 16:44:10.927 * (Non critical) Master does not understand REPLCONF listening-port: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:10.927 * Trying a partial resynchronization (request 95792b8f9b5764018614e125681751ec956b28b3:339).
[58496] 06 Apr 16:44:15.932 # Unexpected reply to PSYNC from master: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:15.932 * Discarding previously cached master state.
[58496] 06 Apr 16:44:15.932 * Retrying with SYNC...
[58496] 06 Apr 16:44:15.934 # Bad protocol from MASTER, the first byte is not '$' (we received '+OK'), are you sure the host and port are right?
[58496] 06 Apr 16:44:16.851 * Connecting to MASTER 169.254.1.3:6379
[58496] 06 Apr 16:44:16.851 * MASTER <-> SLAVE sync started
[58496] 06 Apr 16:44:16.851 * Non blocking connect for SYNC fired the event.
[58496] 06 Apr 16:44:16.851 * Master replied to PING, replication can continue...
[58496] 06 Apr 16:44:21.856 * (Non critical) Master does not understand REPLCONF listening-port: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:21.856 * Partial resynchronization not possible (no cached master)
[58496] 06 Apr 16:44:26.858 # Unexpected reply to PSYNC from master: -Reading from master: Connection timed out
[58496] 06 Apr 16:44:26.858 * Retrying with SYNC...
[58496] 06 Apr 16:44:26.860 # Bad protocol from MASTER, the first byte is not '$' (we received '+OK'), are you sure the host and port are right?

When we were trying to set the redis-server as master with the command "SLAVEOF NO ONE" we are getting below error.

MASTERDOWN Link with MASTER is down and slave-serve-stale-data is set to 'no'. for DB [0]

----------------------------------------------------------------

Note : The size of RDB file is 925MB which is the reason for the delay in loading DB into memory.

Please suggest how can we recover automatically from the above mentioned behavior.

Thanks,
Amar




Josiah Carlson

unread,
Apr 7, 2015, 1:19:28 PM4/7/15
to redi...@googlegroups.com
I see that you are currently running 2.8.6, you should upgrade to at least 2.8.19 just as a matter of course.

If you bring down the master without any caught-up slaves, the slave data is going to be incomplete. That puts the Redis slave in a state where it doesn't want to serve any of the data that it has, because it's incomplete. This is stopping you from disabling slaving, because Redis would serve bad data. If you *really* want to serve bad data, you can run the following 3 commands:

CONFIG SET slave-serve-stale-data yes
SLAVEOF NO ONE
CONFIG SET slave-serve-stale-data no

That will tell Redis to serve bad data, to become a master, and then to not serve bad data if it becomes a slave again. If possible in the future, try not to kill the master until the slave has caught up with the replication stream. Unless I am mistaken, that should prevent the error you were seeing before.

 - Josiah


--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages