ReplicaSet Stopped Working

cachedrive

unread,

Jul 25, 2016, 1:14:16 PM7/25/16

to mongodb-user

I have a MongoDB cluster / replicaSet running in Amazon EC2. The VM's appeared to have been shut down over the weekend due to a really bad situation but regardless I'm here now trying to fire up the VM's. They're all powered back on but when I connect to a member and look at the rs.status():

sam:OTHER> rs.status()

{

"state" : 10,

"stateStr" : "REMOVED",

"uptime" : 772,

"optime" : {

"ts" : Timestamp(1466707271, 1),

"t" : NumberLong(6)

},

"optimeDate" : ISODate("2016-06-23T18:41:11Z"),

"ok" : 0,

"errmsg" : "Our replica set config is invalid or we are not a member of it",

"code" : 93

}

I've looked at several MongoDB docs and Google but can NOT find any logical fix or remedy for this problem. There doesn't appear to be an answer / fix for this sadly. Can you guys tell me what I need to do besides building a new cluster and restoring a known backup on that new replicaSet? That sounds like a lot of crap work.

cachedrive

unread,

Jul 25, 2016, 4:50:09 PM7/25/16

to mongodb-user

I ended up backing up the database and destroying the cluster. I don't see any solutions and MongoDB help is always awful.

Senthil

unread,

Jul 26, 2016, 12:28:10 PM7/26/16

to mongodb-user

Have you tried to add back the removed replica set by using rs.add

Wan Bachtiar

unread,

Aug 5, 2016, 2:14:15 AM8/5/16

to mongodb-user

I’ve looked at several MongoDB docs and Google but can NOT find any logical fix or remedy for this problem. There doesn’t appear to be an answer / fix for this sadly.

Hi Carlos,

Based on the output of your rs.status() and the description on what happened, it is likely that the node transitioned to REMOVED due to either change of IP address and/or port number after a restart.

You may find some traces in the database log messages, similar to below logs:

 W NETWORK  [ReplicationExecutor] Failed to connect to "PREVIOUS_IP:PREVIOUS_PORT" after 5000 milliseconds, giving up.
 W REPL     [ReplicationExecutor] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host described in new configuration 2 for replica set "REPLSET_NAME" maps to this node" while validating { _id: "REPLSET_NAME", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "OTHER_NODE:OTHER_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "PREVIOUS_IP:PREVIOUS_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('57a412fa7dfedc759516ef72') } }
 I REPL     [ReplicationExecutor] New replica set config in use: { _id: "REPLSET_NAME", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "OTHER_NODE:OTHER_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "PREVIOUS_IP:PREVIOUS_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('57a412fa7dfedc759516ef72') } }
 I REPL     [ReplicationExecutor] This node is not a member of the config
 I REPL     [ReplicationExecutor] transition to REMOVED

Where it indicates that there was no host in the locally stored replica set config, matched its own hostname. You could compare it’s previous IP address and port to the current settings.

The above log snippet format is based on the current stable version of MongoDB v3.2.8

Can you guys tell me what I need to do besides building a new cluster and restoring a known backup on that new replicaSet?

You can restore the replica set by reconfiguring the host of the node using rs.reconfig(). You may have to use the force option if there are no PRIMARY node in the replica set. For an example, to fix the host information of a node:

cfg = rs.conf();
cfg.members[1].host = "NEW_IP:NEW_PORT";
cfg.reconfig(cfg);

See rs.reconfig() examples for more usage examples.

Worth noting that generally, deployments utilise CNAME to prevent issues relating to change of IP addresses. I would also recommend the following resources:

You may also be interested to enrol in a free online course at MongoDB University M102: MongoDB for DBAs or M202: MongoDB Advanced Deployment and Operations. A new session has just started a couple of days ago, and it is not too late to join.