I’ve looked at several MongoDB docs and Google but can NOT find any logical fix or remedy for this problem. There doesn’t appear to be an answer / fix for this sadly.
Hi Carlos,
Based on the output of your rs.status() and the description on what happened, it is likely that the node transitioned to REMOVED
due to either change of IP address and/or port number after a restart.
You may find some traces in the database log messages, similar to below logs:
W NETWORK [ReplicationExecutor] Failed to connect to "PREVIOUS_IP:PREVIOUS_PORT" after 5000 milliseconds, giving up.
W REPL [ReplicationExecutor] Locally stored replica set configuration does not have a valid entry for the current node; waiting for reconfig or remote heartbeat; Got "NodeNotFound: No host described in new configuration 2 for replica set "REPLSET_NAME" maps to this node" while validating { _id: "REPLSET_NAME", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "OTHER_NODE:OTHER_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "PREVIOUS_IP:PREVIOUS_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('57a412fa7dfedc759516ef72') } }
I REPL [ReplicationExecutor] New replica set config in use: { _id: "REPLSET_NAME", version: 2, protocolVersion: 1, members: [ { _id: 0, host: "OTHER_NODE:OTHER_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "PREVIOUS_IP:PREVIOUS_PORT", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('57a412fa7dfedc759516ef72') } }
I REPL [ReplicationExecutor] This node is not a member of the config
I REPL [ReplicationExecutor] transition to REMOVED
Where it indicates that there was no host in the locally stored replica set config, matched its own hostname. You could compare it’s previous IP address and port to the current settings.
The above log snippet format is based on the current stable version of MongoDB v3.2.8
Can you guys tell me what I need to do besides building a new cluster and restoring a known backup on that new replicaSet?
You can restore the replica set by reconfiguring the host of the node using rs.reconfig(). You may have to use the force
option if there are no PRIMARY
node in the replica set. For an example, to fix the host information of a node:
cfg = rs.conf();
cfg.members[1].host = "NEW_IP:NEW_PORT";
cfg.reconfig(cfg);
See rs.reconfig() examples for more usage examples.
Worth noting that generally, deployments utilise CNAME to prevent issues relating to change of IP addresses. I would also recommend the following resources:
You may also be interested to enrol in a free online course at MongoDB University M102: MongoDB for DBAs or M202: MongoDB Advanced Deployment and Operations. A new session has just started a couple of days ago, and it is not too late to join.
Kind regards,
Wan.