2016-04-25T21:46:07.346+0500 ***** SERVER RESTARTED *****2016-04-25T21:46:07.359+0500 [DataFileSync] BackgroundJob starting: DataFileSync2016-04-25T21:46:07.359+0500 [DataFileSync] flushing diag log2016-04-25T21:46:07.359+0500 shardKeyTest passed2016-04-25T21:46:07.359+0500 isInRangeTest passed2016-04-25T21:46:07.359+0500 shardObjTest passed2016-04-25T21:46:07.359+0500 [initandlisten] MongoDB starting : pid=9972 port=27017 dbpath=/var/lib/ mongodb 64-bit host=russia2016-04-25T21:46:07.359+0500 [initandlisten] db version v2.6.122016-04-25T21:46:07.359+0500 [initandlisten] git version: d73c92b1c85703828b55c2916a5dd4ad46535f6a2016-04-25T21:46:07.359+0500 [initandlisten] build info: Linux build5.ny.cbi.10gen.cc 2.6.32-431.3.1 .el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_492016-04-25T21:46:07.359+0500 [initandlisten] allocator: tcmalloc2016-04-25T21:46:07.359+0500 [initandlisten] options: { config: "/etc/mongod.conf", diaglog: 7, net: { port: 27017 }, replication: { replSet: "unistore-1" }, storage: { dbPath: "/var/lib/mongodb" }, s ystemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log", verbosity: 1 } }2016-04-25T21:46:07.360+0500 [initandlisten] flushing directory /var/lib/mongodb2016-04-25T21:46:07.388+0500 [initandlisten] journal dir=/var/lib/mongodb/journal2016-04-25T21:46:07.388+0500 [initandlisten] recover : no journal files present, no recovery needed2016-04-25T21:46:07.389+0500 [initandlisten] flushing directory /var/lib/mongodb/journal2016-04-25T21:46:07.390+0500 [initandlisten] flushing directory /var/lib/mongodb/journal2016-04-25T21:46:07.390+0500 [initandlisten] enter repairDatabases (to check pdfile version #)2016-04-25T21:46:07.397+0500 [initandlisten] grid_fs2016-04-25T21:46:07.397+0500 [initandlisten] opening db: grid_fs2016-04-25T21:46:07.731+0500 [initandlisten] local2016-04-25T21:46:07.731+0500 [initandlisten] opening db: local2016-04-25T21:46:07.739+0500 [initandlisten] done repairDatabases2016-04-25T21:46:07.739+0500 [initandlisten] opening db: admin2016-04-25T21:46:07.746+0500 [initandlisten] query admin.system.roles planSummary: EOF ntoreturn:0 n toskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields:0 locks(micros) W:6541 r:61 nreturned:0 reslen:20 0ms2016-04-25T21:46:07.746+0500 [ClientCursorMonitor] BackgroundJob starting: ClientCursorMonitor2016-04-25T21:46:07.746+0500 [PeriodicTaskRunner] BackgroundJob starting: PeriodicTaskRunner2016-04-25T21:46:07.746+0500 [TTLMonitor] BackgroundJob starting: TTLMonitor2016-04-25T21:46:07.746+0500 [IndexRebuilder] BackgroundJob starting: IndexRebuilder2016-04-25T21:46:07.747+0500 [initandlisten] opening db: local2016-04-25T21:46:07.755+0500 [initandlisten] create collection local.startup_log { size: 10485760, capped: true }2016-04-25T21:46:07.755+0500 [initandlisten] command local.$cmd command: create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 numYields:0 reslen:75 7ms2016-04-25T21:46:07.755+0500 [initandlisten] insert local.startup_log ninserted:1 keyUpdates:0 numYields:0 0ms2016-04-25T21:46:07.755+0500 [IndexRebuilder] opening db: grid_fs2016-04-25T21:46:07.756+0500 [rsStart] replSet beginning startup...2016-04-25T21:46:07.756+0500 [rsStart] loadConfig() local.system.replset2016-04-25T21:46:07.907+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:276 nreturned:1 reslen:260 0ms2016-04-25T21:46:07.907+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:46:07.911+0500 [rsStart] getMyAddrs(): [127.0.0.1] [IP.IP.IP.107] [IP.IP.IP.1] [10 .10.10.100] [192.168.0.15] [10.1.0.1] [::1] [fd19:6c87:964:0:230:48ff:fed5:eccc] [fe80::230:48ff:fed 5:eccc%br0] [fe80::ecb9:12ff:fe84:632b%lxcbr0]2016-04-25T21:46:07.911+0500 [rsStart] getallIPs("IP.IP.IP.106"): [IP.IP.IP.106]2016-04-25T21:46:07.911+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:46:07.912+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:46:07.913+0500 [rsStart] replSet REMOVED2016-04-25T21:46:07.913+0500 [rsStart] replSet info self not present in the repl set configuration:2016-04-25T21:46:07.913+0500 [rsStart] { _id: "unistore-1", version: 140, members: [ { _id: 18, host : "IP.IP.IP.73:30000", arbiterOnly: true }, { _id: 2, host: "IP.IP.IP.106:27017", priority: 2.0 } , { _id: 27, host: "79.110.251.109:37017", arbiterOnly: true } ] }2016-04-25T21:46:07.913+0500 [rsStart] loadConfig() local.system.replset2016-04-25T21:46:07.915+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:107 nreturned:1 resl en:260 0ms2016-04-25T21:46:07.915+0500 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and wil l try again.2016-04-25T21:46:07.916+0500 [IndexRebuilder] checking complete2016-04-25T21:46:27.916+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:107 nreturned:1 resl en:260 0ms2016-04-25T21:46:27.916+0500 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and wil l try again.2016-04-25T21:46:47.916+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:85 nreturned:1 resle n:260 0ms2016-04-25T21:46:47.916+0500 [rsStart] trying to contact IP.IP.IP.106:270172016-04-25T21:46:47.916+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:46:48.070+0500 [rsStart] trying to contact IP.IP.IP.109:370172016-04-25T21:46:48.070+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:46:48.071+0500 [rsStart] trying to contact IP.IP.IP.73:300002016-04-25T21:46:48.072+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:46:48.280+0500 [rsStart] getMyAddrs(): [127.0.0.1] [IP.IP.IP.107] [IP.IP.IP.1] [10 .10.10.100] [192.168.0.15] [10.1.0.1] [::1] [fd19:6c87:964:0:230:48ff:fed5:eccc] [fe80::230:48ff:fed 5:eccc%br0] [fe80::ecb9:12ff:fe84:632b%lxcbr0]2016-04-25T21:46:48.281+0500 [rsStart] getallIPs("IP.IP.IP.107"): [IP.IP.IP.107]2016-04-25T21:46:48.281+0500 [rsStart] replSet I am IP.IP.IP.107:270172016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll2016-04-25T21:46:48.281+0500 [rsStart] replSet got config version 141 from a remote, saving locally2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll2016-04-25T21:46:48.281+0500 [rsStart] replSet info saving a newer config version to local.system.replset: { _id: "unistore-1", version: 141, members: [ { _id: 18, host: "IP.IP.IP.73:30000", arbiterOnly: true }, { _id: 2, host: "IP.IP.IP.106:27017", priority: 2.0 }, { _id: 27, host: "IP.IP.IP.109:37017", arbiterOnly: true }, { _id: 21, host: "IP.IP.IP.107:27017", priority: 10.0 } ] }2016-04-25T21:46:48.281+0500 [rsStart] replSet saveConfigLocally done2016-04-25T21:46:48.281+0500 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.2016-04-25T21:47:07.370+0500 [DataFileSync] flushing mmaps took 11ms for 4933 files2016-04-25T21:47:07.370+0500 [DataFileSync] flushing diag log2016-04-25T21:47:07.747+0500 [TTLMonitor] query admin.system.indexes query: { expireAfterSeconds: { $exi sts: true } } planSummary: EOF ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields :0 locks(micros) r:62 nreturned:0 reslen:20 0ms2016-04-25T21:47:07.747+0500 [TTLMonitor] query grid_fs.system.indexes query: { expireAfterSeconds: { $e xists: true } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:10 nscannedObjects:10 keyUpdates:0 numYields:0 locks(micros) r:97 nreturned:0 reslen:20 0ms2016-04-25T21:47:07.747+0500 [TTLMonitor] query local.system.indexes query: { expireAfterSeconds: { $exi sts: true } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:3 nscannedObjects:3 keyUpdates:0 numY ields:0 locks(micros) r:38 nreturned:0 reslen:20 0ms2016-04-25T21:47:07.748+0500 [clientcursormon] mem (MB) res:74 virt:201634932016-04-25T21:47:07.748+0500 [clientcursormon] mapped (incl journal view):201630162016-04-25T21:47:08.281+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:102 nreturned:1 reslen:326 0ms2016-04-25T21:47:08.281+0500 [rsStart] trying to contact IP.IP.IP.106:270172016-04-25T21:47:08.282+0500 [rsStart] trying to contact IP.IP.IP.109:370172016-04-25T21:47:08.283+0500 [rsStart] trying to contact IP.IP.IP.73:300002016-04-25T21:47:08.496+0500 [rsStart] replSet I am IP.IP.IP.107:270172016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll2016-04-25T21:47:08.498+0500 [rsStart] replSet STARTUP22016-04-25T21:47:08.498+0500 [rsMgr] BackgroundJob starting: rsMgr2016-04-25T21:47:08.498+0500 [rsHealthPoll] replSet member IP.IP.IP.106:27017 is up2016-04-25T21:47:08.498+0500 [rsMgr] replSet total number of votes is even - add arbiter or give one member an extra vote2016-04-25T21:47:08.498+0500 [rsGhostSync] BackgroundJob starting: rsGhostSync2016-04-25T21:47:08.498+0500 [rsSync] replSet initial sync pending2016-04-25T21:47:08.498+0500 [SyncSourceFeedbackThread] BackgroundJob starting: SyncSourceFeedbackThread2016-04-25T21:47:08.499+0500 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync2016-04-25T21:47:08.499+0500 [rsHealthPoll] replSet member IP.IP.IP.106:27017 is now in state PRIMARY2016-04-25T21:47:08.500+0500 [rsHealthPoll] replSet member IP.IP.IP.73:30000 is up2016-04-25T21:47:08.500+0500 [rsHealthPoll] replSet member IP.IP.IP.73:30000 is now in state ARBITER2016-04-25T21:47:10.498+0500 [rsHealthPoll] replSet member IP.IP.IP.109:37017 is up2016-04-25T21:47:10.498+0500 [rsHealthPoll] replSet member IP.IP.IP.109:37017 is now in state ARBITER2016-04-25T21:47:10.499+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:47:24.499+0500 [rsSync] replSet initial sync pending2016-04-25T21:47:24.499+0500 [rsSync] replSet syncing to: IP.IP.IP.106:270172016-04-25T21:47:24.499+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:47:24.500+0500 [rsSync] Database::_addNamespaceToCatalog ns: local.replset.minvalid2016-04-25T21:47:24.500+0500 [rsSync] ExtentManager::increaseStorageSize ns:local.replset.minvalid desiredSize:8192 fromFreeList: 0 eloc: 0:239d0002016-04-25T21:47:24.501+0500 [rsSync] Database::_addNamespaceToCatalog ns: local.replset.minvalid.$_id_2016-04-25T21:47:24.501+0500 [rsSync] build index on: local.replset.minvalid properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "local.replset.minvalid" }2016-04-25T21:47:24.501+0500 [rsSync] local.replset.minvalid: clearing plan cache - collection info cache reset2016-04-25T21:47:24.501+0500 [rsSync] allocating new extent2016-04-25T21:47:24.501+0500 [rsSync] ExtentManager::increaseStorageSize ns:local.replset.minvalid.$_id_desiredSize:131072 fromFreeList: 0 eloc: 0:239f0002016-04-25T21:47:24.501+0500 [rsSync] added index to empty collection2016-04-25T21:47:24.501+0500 [rsSync] local.replset.minvalid: clearing plan cache - collection info cache reset2016-04-25T21:47:24.501+0500 [rsSync] replSet initial sync drop all databases2016-04-25T21:47:24.508+0500 [rsSync] dropAllDatabasesExceptLocal 22016-04-25T21:47:24.508+0500 [rsSync] dropDatabase grid_fs2016-04-25T21:47:24.524+0500 [rsSync] lsn set 403812016-04-25T21:47:24.541+0500 [rsSync] removeJournalFiles2016-04-25T21:47:24.542+0500 [rsSync] flushing directory /var/lib/mongodb/journal2016-04-25T21:47:24.543+0500 [rsSync] removeJournalFiles end2016-04-25T21:47:26.501+0500 [ConnectBG] BackgroundJob starting: ConnectBG2016-04-25T21:47:27.121+0500 [ConnectBG] BackgroundJob starting: ConnectBG
What version and topology?
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b13a23f4-3442-4c3a-a8c8-7dbb1b1caee1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Hi Metikov
What is the current state of your deployment? Are you still having corruption issues?
Once apon a time we have noticed data corruption
Could you provide some context on the nature of the corruption, e.g. log messages, errors, was it due to hardware failure, etc?
after restoring new member by mongorestore tool
Did I understand correctly that you created a new server, used mongorestore
to restore a previous dump (which was created using mongodump --repair
on an existing Secondary), and added this new server to the replica set using rs.add()
?
If your deployment is a GridFS storage, you can verify the md5 hash of every file in GridFS by using the filemd5
command and compare it with a known md5 hash of that file, which is stored in the fs.files
collection:
> db.fs.files.find()
{
"_id": ObjectId("57282bb698e914b1a826601d"),
"chunkSize": 261120,
"uploadDate": ISODate("2016-05-03T04:40:22.837Z"),
"length": 25,
"md5": "6395d98c719565ee540354e5f971383a",
"filename": "test.txt"
}
If there is corruption on the data in GridFS (i.e. in the fs.chunks
collection), the md5 value will be different:
> db.runCommand({filemd5: ObjectId("57282bb698e914b1a826601d"), root: "fs"})
{
"numChunks": 1,
"md5": "db3d76e3cd3bf3a5819efd1316e68657",
"ok": 1
}
In the example above, the md5 values between the fs.files
collection and the result of the filemd5
command differs for file test.txt
identified in GridFS by ObjectId("57282bb698e914b1a826601d")
, which indicates that there is a corruption in the GridFS storage of that file.
it wants to make initial sync with dropping all datafiles.
How to avoid this behavior?
The default behaviour for a Secondary’s initial sync step is to drop all databases first. This is because a Secondary is supposed to act as a high-availability failover server that could be called on at any time to replace the function of the Primary. Therefore, it should be as similar to the Primary as possible.
If I understand correctly, your replica set crashed due to data corruption, and you would like to restore the replica set using a good dump. In this case, you wanted to follow the instructions in the Restore a Replica Set from MongoDB Backups page (i.e. create a new replica set using the dumped data). If you restore the data into a new server and then attach that restored server into an existing replica set, the first thing it will do is to perform initial sync with the set’s Primary (which you are seeing).
Mongod version is 2.6.11 and now it is a part of sharded cluster
I’m not sure I fully understand your statement here. Could you please elaborate on how exactly did you convert this replica set into part of a sharded cluster, and for what purpose?
Shard number 1 (unistore-1) consist of Primary , two arbiters and one secondary that has been restored from dump.
If your replica set consists of a Primary, a Secondary, and two Arbiters, I would recommend you to remove one of the Arbiters, since there is no benefit for having two Arbiters vs. only one Arbiter in this configuration. Please see Consider Fault Tolerance page, where this situation is described in detail.
Best regards,
Kevin
Hi Metikov
What is the current state of your deployment? Are you still having corruption issues?
Once apon a time we have noticed data corruption
Could you provide some context on the nature of the corruption, e.g. log messages, errors, was it due to hardware failure, etc?
after restoring new member by mongorestore tool
Did I understand correctly that you created a new server, used
mongorestore
to restore a previous dump (which was created usingmongodump --repair
on an existing Secondary), and added this new server to the replica set usingrs.add()
?
If your deployment is a GridFS storage, you can verify the md5 hash of every file in GridFS by using the
filemd5
command and compare it with a known md5 hash of that file, which is stored in thefs.files
collection:> db.fs.files.find() { "_id": ObjectId("57282bb698e914b1a826601d"), "chunkSize": 261120, "uploadDate": ISODate("2016-05-03T04:40:22.837Z"), "length": 25, "md5": "6395d98c719565ee540354e5f971383a", "filename": "test.txt" }
If there is corruption on the data in GridFS (i.e. in the
fs.chunks
collection), the md5 value will be different:> db.runCommand({filemd5: ObjectId("57282bb698e914b1a826601d"), root: "fs"}) { "numChunks": 1, "md5": "db3d76e3cd3bf3a5819efd1316e68657", "ok": 1 }
In the example above, the md5 values between the
fs.files
collection and the result of thefilemd5
command differs for filetest.txt
identified in GridFS byObjectId("57282bb698e914b1a826601d")
, which indicates that there is a corruption in the GridFS storage of that file.
it wants to make initial sync with dropping all datafiles.
How to avoid this behavior?
The default behaviour for a Secondary’s initial sync step is to drop all databases first. This is because a Secondary is supposed to act as a high-availability failover server that could be called on at any time to replace the function of the Primary. Therefore, it should be as similar to the Primary as possible.
If I understand correctly, your replica set crashed due to data corruption, and you would like to restore the replica set using a good dump. In this case, you wanted to follow the instructions in the Restore a Replica Set from MongoDB Backups page (i.e. create a new replica set using the dumped data). If you restore the data into a new server and then attach that restored server into an existing replica set, the first thing it will do is to perform initial sync with the set’s Primary (which you are seeing).
Mongod version is 2.6.11 and now it is a part of sharded cluster
I’m not sure I fully understand your statement here. Could you please elaborate on how exactly did you convert this replica set into part of a sharded cluster, and for what purpose?
Shard number 1 (unistore-1) consist of Primary , two arbiters and one secondary that has been restored from dump.
If your replica set consists of a Primary, a Secondary, and two Arbiters, I would recommend you to remove one of the Arbiters, since there is no benefit for having two Arbiters vs. only one Arbiter in this configuration. Please see Consider Fault Tolerance page, where this situation is described in detail.
Best regards,
Kevin
Hi Metikov,
Can i add new member with low-priority / zero votes / something else to avoid initial sync (with dropping datafiles)?
Unfortunately no. The first step in initial sync is always dropping all databases. A Secondary is supposed to mirror the data content of the Primary, and there is no setting in MongoDB that can avoid this drop databases step when a server is attached to an existing replica set.
In some point of time we need to use sharding due to big data. All data was more than one server storage(11TB). We added a new shard. After that we get more spave for new data. Some time after we deleted much documents and all data fits to one shard. We have decided to use “desharding” procedure(just remove second shard). Some time later chunk migration was stopped due to data corruption errors.
Could you elaborate on the steps you performed in your desharding procedure?
Also, I’m a little unclear about the current state of your deployment. My understanding is that currently it is a sharded cluster, where each shard is a replica set with Primary-Secondary-Arbiter configuration. Could you elaborate on:
mongos
Additional information that may be helpful:
db.serverCmdLineOpts()
in the mongo
shell from each mongod
in your deploymentrs.status()
from your replica set (you can connect directly to the replica set’s Primary using the mongo
shell to execute this command)sh.status()
from any mongos
processRegarding data corruption issues: Unfortunately there is little anyone can do if you are having data corruption issues due to hardware failures, power outages, etc. There are a couple of things you could try, but both of them involve some downtime:
mongodump --repair
to dump all known good data, and restore to a new deploymentI would also like to clarify a statement you made in your earlier post:
Ititial sync on new member breaks on 12th datafile and new member promoted to state SECONDARY without data (having only 12/3500 files)
It is entirely possible (and perfectly normal) for a Secondary to have fewer number of files compared to the Primary. Once a node reach a Secondary state, it has successfully performed an initial sync, and contains the same data as the Primary. For a list of reasons, please see Why are the files in my data directory larger than the data in my database. If you deleted a large amount of data, the section labeled Empty records is especially relevant.
Best regards,
Kevin
Hi Metikov,
Can i add new member with low-priority / zero votes / something else to avoid initial sync (with dropping datafiles)?
Unfortunately no. The first step in initial sync is always dropping all databases. A Secondary is supposed to mirror the data content of the Primary, and there is no setting in MongoDB that can avoid this drop databases step when a server is attached to an existing replica set.
In some point of time we need to use sharding due to big data. All data was more than one server storage(11TB). We added a new shard. After that we get more spave for new data. Some time after we deleted much documents and all data fits to one shard. We have decided to use “desharding” procedure(just remove second shard). Some time later chunk migration was stopped due to data corruption errors.
Could you elaborate on the steps you performed in your desharding procedure?
Also, I’m a little unclear about the current state of your deployment. My understanding is that currently it is a sharded cluster, where each shard is a replica set with Primary-Secondary-Arbiter configuration. Could you elaborate on:
- how many shards are there currently
- how many config servers
- how many
mongos
Additional information that may be helpful:
- the output of
db.serverCmdLineOpts()
in themongo
shell from eachmongod
in your deployment
- the output of
rs.status()
from your replica set (you can connect directly to the replica set’s Primary using themongo
shell to execute this command)
- the output of
sh.status()
from anymongos
process
Regarding data corruption issues: Unfortunately there is little anyone can do if you are having data corruption issues due to hardware failures, power outages, etc. There are a couple of things you could try, but both of them involve some downtime:
- Restore from a known good backup
- Dump the content of your database with
mongodump --repair
to dump all known good data, and restore to a new deployment
I would also like to clarify a statement you made in your earlier post:
Ititial sync on new member breaks on 12th datafile and new member promoted to state SECONDARY without data (having only 12/3500 files)
It is entirely possible (and perfectly normal) for a Secondary to have fewer number of files compared to the Primary. Once a node reach a Secondary state, it has successfully performed an initial sync, and contains the same data as the Primary. For a list of reasons, please see Why are the files in my data directory larger than the data in my database. If you deleted a large amount of data, the section labeled Empty records is especially relevant.
Best regards,
Kevin
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/O0Z4t17N2wI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c279fe5f-699e-4ed9-a217-e35e82859f0e%40googlegroups.com.