Restore Process from mongodb snapshots to new cluster Not Working Fine

137 views
Skip to first unread message

Sain

unread,
Apr 6, 2018, 7:22:27 AM4/6/18
to mongodb-dev

Hi,

I am restoring sharded mongodb cluster from snapshots of different mongodb cluster.


I am using mongdb 3.4.7 on centos7.


Steps Followed.

a.) Stop mongos instances
b.) Stop CSRS Servers
c.) Restore data from snapshots

d.) Update shards collection in config db with new shards hostnames

e.) Remove mongodb.lock file, Remove WiredTiger.backup file and start CSRS server

f.) Start mongos servers

g.) Initiated rs.reconfig as there are new hostnames in new cluster and snapshots contain metadata of old cluster
h.) Stop both the shards replica set members

i.) Restore data from snapshots on all shards replica set members

j.) start all nodes of shards replica set members without shardsvr=true first

k.) Cleared per shard recovery information in system.version collection in admin db

l.) Update configsvrstring in system.version collection in admin db

m.) restart all nodes of shards replica set members with shardsvr=true first

n.) Initiated rs.reconfig as there are new hostnames in new cluster and snapshots contain metadata of old cluster



RESULTS
===========
Only one shard is recovered fine.


Other shard shows errors and config replica set which it shows in logs is from snapshot's IP addresses.

And new CSRS started with configsvr=true and it is also running fine but why it is not able to connect to CSRS?


2018-04-06T04:38:14.581+0000 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory '/data/diagnostic.data'

2018-04-06T04:38:14.583+0000 I NETWORK  [thread1] waiting for connections on port 27002

2018-04-06T04:38:14.583+0000 E REPL     [replExecDBWorker-0] Locally stored replica set configuration is invalid; See http://www.mongodb.org/dochub/core/recover-replica-set-from-invalid-config for information on how to recover from this. Got "BadValue: Nodes being used for config servers must be started with the --configsvr flag" while validating { _id: "config_replication", version: 8, configsvr: true, protocolVersion: 1, members: [ { _id: 0, host: "172.31.1.217:27005", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 2.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 1, host: "172.31.7.78:27005", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 }, { _id: 2, host: "172.31.9.244:27005", arbiterOnly: false, buildIndexes: true, hidden: false, priority: 1.0, tags: {}, slaveDelay: 0, votes: 1 } ], settings: { chainingAllowed: true, heartbeatIntervalMillis: 2000, heartbeatTimeoutSecs: 10, electionTimeoutMillis: 10000, catchUpTimeoutMillis: 60000, getLastErrorModes: {}, getLastErrorDefaults: { w: 1, wtimeout: 0 }, replicaSetId: ObjectId('5ab4f4bbd1839cffc5aae980') } }

2018-04-06T04:38:14.583+0000 I -        [replExecDBWorker-0] Fatal Assertion 28544 at src/mongo/db/repl/replication_coordinator_impl.cpp 515

2018-04-06T04:38:14.583+0000 I -        [replExecDBWorker-0] 


***aborting after fassert() failure




Mongos is also not able to start and its logs are, From where does it gets details of HostAndPort of shards,

shards collection of config db is updated fine with new hostnames of shards.


2018-04-06T05:53:46.820+0000 I SHARDING [shard registry reload] Periodic reload of shard registry failed  :: caused by :: 9 Empty host component parsing HostAndPort from ""; will retry after 30s

2018-04-06T05:53:46.822+0000 I NETWORK  [ReplicaSetMonitor-TaskExecutor-0] Successfully connected to 172.31.45.190:27005 (1 connections now open to 172.31.45.190:27005 with a 5 second timeout)

2018-04-06T05:53:46.823+0000 W SHARDING [replSetDistLockPinger] pinging failed for distributed lock pinger :: caused by :: LockStateChangeFailed: findAndModify query predicate didn't match any lock document




Thanks in advance

Sain

Sain

unread,
Apr 6, 2018, 7:26:37 AM4/6/18
to mongodb-dev
Update 
All shards are working fine.
I was using wrong snapshot for other shard.
But mongo query router is still stuck in issues.
Early help will be appreciated.
Reply all
Reply to author
Forward
0 new messages