rs.stepDown() did not trigger immediate re-election after becoming secondary

264 views
Skip to first unread message

Dmitry Mikhaylov

unread,
Dec 13, 2017, 4:15:04 PM12/13/17
to mongodb-user
Hi!

I was doing maintenance on 3-node MongoDB 3.4.7 replica set. As per instruction, I did maintenance one node at a time, starting with secondaries. After bringing up node after maintenance I ensured that it has caught up before proceeding to next. When time has come to update current primary, I have verified that both secondaries are caught up within second and happily issued `rs.stepDown()` command on primary, assuming it will trigger instant re-election and things go smoothly. Well, they did not. As later log analysis shows, after current primary stepped down and became secondary, all 3 nodes waited in SECONDARY state for 10 seconds until starting re-election. Of course this had dismal effects on live system that was using this replica set. 

Here are snippets from my mongodb logs:
2017-12-08 11:31:06.319 prod-mongo-empires2 I COMMAND conn4860 Attempting to step down in response to replSetStepDown command
2017-12-08 11:31:08.326 prod-mongo-empires2 I REPL conn4860 transition to SECONDARY
2017-12-08 11:31:08.336 prod-mongo-empires2 I - conn4775 AssertionException handling request, closing client connection: 172 Operation attempted on a closed transport Session.
... the whole shebang of those assertions and "end connection" messages

2017-12-08 11:31:09.574 prod-mongo-empires1 I REPL ReplicationExecutor Member prod-mongo-empires2:27017 is now in state SECONDARY
2017-12-08 11:31:10.010 prod-mongo-empires3 I REPL ReplicationExecutor Member prod-mongo-empires2:27017 is now in state SECONDARY
^^^^^^^^^^^^^^^^^ here all 3 nodes already become SECONDARY, yet...

2017-12-08 11:31:11.329 prod-mongo-empires3 I REPL rsBackgroundSync could not find member to sync from
2017-12-08 11:31:11.329 prod-mongo-empires3 I REPL replication-2 Choosing new sync source because our current sync source, prod-mongo-empires2:27017, has an OpTime ({ ts: Timestamp 1512725466000|194, t: 14 }) which is not ahead of ours ({ ts: Timestamp 1512725466000|194, t: 14 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)
2017-12-08 11:31:11.329 prod-mongo-empires3 I REPL replication-2 Canceling oplog query due to OplogQueryMetadata. We have to choose a new sync source. Current source: prod-mongo-empires2:27017, OpTime { ts: Timestamp 1512725466000|194, t: 14 }, its sync source index:-1
2017-12-08 11:31:11.329 prod-mongo-empires3 W REPL rsBackgroundSync Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source prod-mongo-empires2:27017 (config version: 17; last applied optime: { ts: Timestamp 1512725466000|194, t: 14 }; sync source index: -1; primary index: -1) is no longer valid
2017-12-08 11:31:11.330 prod-mongo-empires1 I REPL replication-7 Choosing new sync source because our current sync source, prod-mongo-empires2:27017, has an OpTime ({ ts: Timestamp 1512725466000|194, t: 14 }) which is not ahead of ours ({ ts: Timestamp 1512725466000|194, t: 14 }), it does not have a sync source, and it's not the primary (sync source does not know the primary)
2017-12-08 11:31:11.331 prod-mongo-empires1 I REPL replication-7 Canceling oplog query due to OplogQueryMetadata. We have to choose a new sync source. Current source: prod-mongo-empires2:27017, OpTime { ts: Timestamp 1512725466000|194, t: 14 }, its sync source index:-1
2017-12-08 11:31:11.331 prod-mongo-empires1 W REPL rsBackgroundSync Fetcher stopped querying remote oplog with error: InvalidSyncSource: sync source prod-mongo-empires2:27017 (config version: 17; last applied optime: { ts: Timestamp 1512725466000|194, t: 14 }; sync source index: -1; primary index: -1) is no longer valid
2017-12-08 11:31:11.331 prod-mongo-empires1 I REPL rsBackgroundSync could not find member to sync from
2017-12-08 11:31:16.329 prod-mongo-empires3 I REPL SyncSourceFeedback SyncSourceFeedback error sending update to prod-mongo-empires2:27017: InvalidSyncSource: Sync source was cleared. Was prod-mongo-empires2:27017
2017-12-08 11:31:16.330 prod-mongo-empires1 I REPL SyncSourceFeedback SyncSourceFeedback error sending update to prod-mongo-empires2:27017: InvalidSyncSource: Sync source was cleared. Was prod-mongo-empires2:27017
2017-12-08 11:31:17.936 prod-mongo-empires1 I REPL ReplicationExecutor Starting an election, since we've seen no PRIMARY in the past 10000ms
^^^^^^^^^^^^^^^^^ ONLY HERE WE START RE-ELECTION!!!!!


The rest of the process seemed to go OK, but I post it here is just in case:

2017-12-08 11:31:17.936 prod-mongo-empires1 I REPL ReplicationExecutor conducting a dry run election to see if we could be elected
2017-12-08 11:31:17.938 prod-mongo-empires1 I REPL ReplicationExecutor VoteRequester(term 14 dry run) received a yes vote from prod-mongo-empires3:27017; response message: { term: 14, voteGranted: true, reason: "", ok: 1.0 }
2017-12-08 11:31:17.938 prod-mongo-empires1 I REPL ReplicationExecutor dry election run succeeded, running for election
2017-12-08 11:31:17.941 prod-mongo-empires1 I REPL ReplicationExecutor VoteRequester(term 15) received a yes vote from prod-mongo-empires3:27017; response message: { term: 15, voteGranted: true, reason: "", ok: 1.0 }
2017-12-08 11:31:17.941 prod-mongo-empires1 I REPL ReplicationExecutor election succeeded, assuming primary role in term 15
2017-12-08 11:31:17.941 prod-mongo-empires1 I REPL ReplicationExecutor transition to PRIMARY
2017-12-08 11:31:17.941 prod-mongo-empires1 I REPL ReplicationExecutor Entering primary catch-up mode.
2017-12-08 11:31:17.941 prod-mongo-empires1 I ASIO NetworkInterfaceASIO-Replication-0 Connecting to prod-mongo-empires2:27017
2017-12-08 11:31:17.944 prod-mongo-empires1 I ASIO NetworkInterfaceASIO-Replication-0 Successfully connected to prod-mongo-empires2:27017, took 3ms (1 connections now open to prod-mongo-empires2:27017)
2017-12-08 11:31:17.945 prod-mongo-empires1 I REPL ReplicationExecutor Caught up to the latest optime known via heartbeats after becoming primary.
2017-12-08 11:31:17.945 prod-mongo-empires1 I REPL ReplicationExecutor Exited primary catch-up mode.
2017-12-08 11:31:18.329 prod-mongo-empires2 I REPL ReplicationExecutor Member prod-mongo-empires1:27017 is now in state PRIMARY
2017-12-08 11:31:19.328 prod-mongo-empires1 I REPL rsSync transition to primary complete; database writes are now permitted
2017-12-08 11:31:21.332 prod-mongo-empires3 I REPL ReplicationExecutor Member prod-mongo-empires1:27017 is now in state PRIMARY


The whole idea of having 3 nodes sitting there in SECONDARY state and looking at each other with blank faces for 10 seconds is so ridiculous, that I assume that something went wrong. Or is it me who is wrong? 

BR,
Dmitry

Kevin Adistambha

unread,
Dec 14, 2017, 8:20:17 PM12/14/17
to mongodb-user

Hi Dmitry

This is the correct behaviour, since I don’t believe stepDown is special with regards to the replication election protocol. The command just steps down the primary. Following the election protocol, the rest of the set will wait until electionTimeoutMillis (default is 10 seconds) before electing a new primary. Depending on your use case, you could change the setting to a lower value during your maintenance window (see Replica Set Configuration) to achieve your desired behaviour. Please note that lowering the electionTimeoutMillis value (even during maintenance window) will render the replica set more sensitive to transient network issues.

There is a feature request to allow stepDown to be special and immediately calls an election (SERVER-11086). Please watch/upvote the ticket for updates. You can also comment on the ticket describing your use case to better illustrate your needs.

Best regards
Kevin

Dmitry Mikhaylov

unread,
Jan 8, 2018, 3:41:24 AM1/8/18
to mongodb-user
Hi, Kevin!

Thank you for the reply and sorry for the delay. I've upvoted the issue you mentioned, and also added there a suggestion that it can be fixed with just having smaller timeout for the situation when all known members of replica set are online and in SECONDARY state. I'd be glad to provide more context of this can help to raise the priority of an issue.

BR,
Dmitry
Message has been deleted

Weishan Ang

unread,
Jan 9, 2018, 4:43:30 AM1/9/18
to mongodb-user

This blog article might be useful for you. I have used the steps in our production environment with less than a second of interruption.

Dmitry Mikhaylov

unread,
Jan 9, 2018, 5:03:37 AM1/9/18
to mongodb-user
On Tuesday, January 9, 2018 at 11:43:30 AM UTC+2, Weishan Ang wrote:

This blog article might be useful for you. I have used the steps in our production environment with less than a second of interruption.


Thanks a lot!

BR,
Dmitry
Reply all
Reply to author
Forward
0 new messages