HOw to avoid Initial sync with dropping all databases

182 views
Skip to first unread message

Metikov Vadim

unread,
Apr 25, 2016, 1:34:34 PM4/25/16
to mongodb-user
Hello there!

We have very big replicaset with 7 TB of data. Once apon a time we have noticed data corruption. After that all initial sync' can't be completed.
There are 3500 datafiles like grid_fs.XXXX. Ititial sync on new member breaks on 12th datafile and new member promoted to state SECONDARY without data (having only 12/3500 files).
At that moment we haven't health node. After last crashed secondary it was recovered by copying datafiles from primary and it have ame broken datafiles.
We have dumped all the data from those secondary with "mongodump --repair". But after restoring new member by mongorestore tool it wants to make initial sync with dropping all datafiles.
How to avoid this behavior?

Oplog size is 221GB, from 10th of march, but data dumped in april

Here are mongod log after adding it to replicaset:
2016-04-25T21:46:07.346+0500 ***** SERVER RESTARTED *****
2016-04-25T21:46:07.359+0500 [DataFileSync] BackgroundJob starting: DataFileSync
2016-04-25T21:46:07.359+0500 [DataFileSync] flushing diag log
2016-04-25T21:46:07.359+0500 shardKeyTest passed
2016-04-25T21:46:07.359+0500 isInRangeTest passed
2016-04-25T21:46:07.359+0500 shardObjTest passed
2016-04-25T21:46:07.359+0500 [initandlisten] MongoDB starting : pid=9972 port=27017 dbpath=/var/lib/                                                                                                                                          mongodb 64-bit host=russia
2016-04-25T21:46:07.359+0500 [initandlisten] db version v2.6.12
2016-04-25T21:46:07.359+0500 [initandlisten] git version: d73c92b1c85703828b55c2916a5dd4ad46535f6a
2016-04-25T21:46:07.359+0500 [initandlisten] build info: Linux build5.ny.cbi.10gen.cc 2.6.32-431.3.1                                                                                                                                          .el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2016-04-25T21:46:07.359+0500 [initandlisten] allocator: tcmalloc
2016-04-25T21:46:07.359+0500 [initandlisten] options: { config: "/etc/mongod.conf", diaglog: 7, net:                                                                                                                                           { port: 27017 }, replication: { replSet: "unistore-1" }, storage: { dbPath: "/var/lib/mongodb" }, s                                                                                                                                          ystemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log", verbosity: 1                                                                                                                                           } }
2016-04-25T21:46:07.360+0500 [initandlisten] flushing directory /var/lib/mongodb
2016-04-25T21:46:07.388+0500 [initandlisten] journal dir=/var/lib/mongodb/journal
2016-04-25T21:46:07.388+0500 [initandlisten] recover : no journal files present, no recovery needed
2016-04-25T21:46:07.389+0500 [initandlisten] flushing directory /var/lib/mongodb/journal
2016-04-25T21:46:07.390+0500 [initandlisten] flushing directory /var/lib/mongodb/journal
2016-04-25T21:46:07.390+0500 [initandlisten] enter repairDatabases (to check pdfile version #)
2016-04-25T21:46:07.397+0500 [initandlisten]    grid_fs
2016-04-25T21:46:07.397+0500 [initandlisten] opening db:  grid_fs
2016-04-25T21:46:07.731+0500 [initandlisten]    local
2016-04-25T21:46:07.731+0500 [initandlisten] opening db:  local
2016-04-25T21:46:07.739+0500 [initandlisten] done repairDatabases
2016-04-25T21:46:07.739+0500 [initandlisten] opening db:  admin
2016-04-25T21:46:07.746+0500 [initandlisten] query admin.system.roles planSummary: EOF ntoreturn:0 n                                                                                                                                          toskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields:0 locks(micros) W:6541 r:61 nreturned:0                                                                                                                                           reslen:20 0ms
2016-04-25T21:46:07.746+0500 [ClientCursorMonitor] BackgroundJob starting: ClientCursorMonitor
2016-04-25T21:46:07.746+0500 [PeriodicTaskRunner] BackgroundJob starting: PeriodicTaskRunner
2016-04-25T21:46:07.746+0500 [TTLMonitor] BackgroundJob starting: TTLMonitor
2016-04-25T21:46:07.746+0500 [IndexRebuilder] BackgroundJob starting: IndexRebuilder
2016-04-25T21:46:07.747+0500 [initandlisten] opening db:  local
2016-04-25T21:46:07.755+0500 [initandlisten] create collection local.startup_log { size: 10485760, capped: true }
2016-04-25T21:46:07.755+0500 [initandlisten] command local.$cmd command: create { create: "startup_log", size: 10485760, capped: true } ntoreturn:1 keyUpdates:0 numYields:0  reslen:75 7ms
2016-04-25T21:46:07.755+0500 [initandlisten] insert local.startup_log ninserted:1 keyUpdates:0 numYields:0  0ms
2016-04-25T21:46:07.755+0500 [IndexRebuilder] opening db:  grid_fs
2016-04-25T21:46:07.756+0500 [rsStart] replSet beginning startup...
2016-04-25T21:46:07.756+0500 [rsStart] loadConfig() local.system.replset
2016-04-25T21:46:07.907+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:276 nreturned:1 reslen:260 0ms
2016-04-25T21:46:07.907+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:46:07.911+0500 [rsStart] getMyAddrs(): [127.0.0.1] [IP.IP.IP.107] [IP.IP.IP.1] [10                                                                                                                                          .10.10.100] [192.168.0.15] [10.1.0.1] [::1] [fd19:6c87:964:0:230:48ff:fed5:eccc] [fe80::230:48ff:fed                                                                                                                                          5:eccc%br0] [fe80::ecb9:12ff:fe84:632b%lxcbr0]
2016-04-25T21:46:07.911+0500 [rsStart] getallIPs("IP.IP.IP.106"): [IP.IP.IP.106]
2016-04-25T21:46:07.911+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:46:07.912+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:46:07.913+0500 [rsStart] replSet REMOVED
2016-04-25T21:46:07.913+0500 [rsStart] replSet info self not present in the repl set configuration:
2016-04-25T21:46:07.913+0500 [rsStart] { _id: "unistore-1", version: 140, members: [ { _id: 18, host : "IP.IP.IP.73:30000", arbiterOnly: true }, { _id: 2, host: "IP.IP.IP.106:27017", priority: 2.0 }  , { _id: 27, host: "79.110.251.109:37017", arbiterOnly: true } ] }
2016-04-25T21:46:07.913+0500 [rsStart] loadConfig() local.system.replset
2016-04-25T21:46:07.915+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1                                                                                                                                           ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:107 nreturned:1 resl                                                                                                                                          en:260 0ms
2016-04-25T21:46:07.915+0500 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and wil                                                                                                                                          l try again.
2016-04-25T21:46:07.916+0500 [IndexRebuilder] checking complete
2016-04-25T21:46:27.916+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1                                                                                                                                           ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:107 nreturned:1 resl                                                                                                                                          en:260 0ms
2016-04-25T21:46:27.916+0500 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and wil                                                                                                                                          l try again.
2016-04-25T21:46:47.916+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1                                                                                                                                           ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:85 nreturned:1 resle                                                                                                                                          n:260 0ms
2016-04-25T21:46:47.916+0500 [rsStart] trying to contact IP.IP.IP.106:27017
2016-04-25T21:46:47.916+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:46:48.070+0500 [rsStart] trying to contact IP.IP.IP.109:37017
2016-04-25T21:46:48.070+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:46:48.071+0500 [rsStart] trying to contact IP.IP.IP.73:30000
2016-04-25T21:46:48.072+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:46:48.280+0500 [rsStart] getMyAddrs(): [127.0.0.1] [IP.IP.IP.107] [IP.IP.IP.1] [10                                                                                                                                          .10.10.100] [192.168.0.15] [10.1.0.1] [::1] [fd19:6c87:964:0:230:48ff:fed5:eccc] [fe80::230:48ff:fed                                                                                                                                          5:eccc%br0] [fe80::ecb9:12ff:fe84:632b%lxcbr0]
2016-04-25T21:46:48.281+0500 [rsStart] getallIPs("IP.IP.IP.107"): [IP.IP.IP.107]
2016-04-25T21:46:48.281+0500 [rsStart] replSet I am IP.IP.IP.107:27017
2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll
2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll
2016-04-25T21:46:48.281+0500 [rsStart] replSet got config version 141 from a remote, saving locally
2016-04-25T21:46:48.281+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll
2016-04-25T21:46:48.281+0500 [rsStart] replSet info saving a newer config version to local.system.replset: { _id: "unistore-1", version: 141, members: [ { _id: 18, host: "IP.IP.IP.73:30000", arbiterOnly: true }, { _id: 2, host: "IP.IP.IP.106:27017", priority: 2.0 }, { _id: 27, host: "IP.IP.IP.109:37017", arbiterOnly: true }, { _id: 21, host: "IP.IP.IP.107:27017", priority: 10.0 } ] }
2016-04-25T21:46:48.281+0500 [rsStart] replSet saveConfigLocally done
2016-04-25T21:46:48.281+0500 [rsStart] replSet info Couldn't load config yet. Sleeping 20sec and will try again.
2016-04-25T21:47:07.370+0500 [DataFileSync] flushing mmaps took 11ms  for 4933 files
2016-04-25T21:47:07.370+0500 [DataFileSync] flushing diag log
2016-04-25T21:47:07.747+0500 [TTLMonitor] query admin.system.indexes query: { expireAfterSeconds: { $exi                                                                                                                                      sts: true } } planSummary: EOF ntoreturn:0 ntoskip:0 nscanned:0 nscannedObjects:0 keyUpdates:0 numYields                                                                                                                                      :0 locks(micros) r:62 nreturned:0 reslen:20 0ms
2016-04-25T21:47:07.747+0500 [TTLMonitor] query grid_fs.system.indexes query: { expireAfterSeconds: { $e                                                                                                                                      xists: true } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:10 nscannedObjects:10 keyUpdates:0                                                                                                                                       numYields:0 locks(micros) r:97 nreturned:0 reslen:20 0ms
2016-04-25T21:47:07.747+0500 [TTLMonitor] query local.system.indexes query: { expireAfterSeconds: { $exi                                                                                                                                      sts: true } } planSummary: COLLSCAN ntoreturn:0 ntoskip:0 nscanned:3 nscannedObjects:3 keyUpdates:0 numY                                                                                                                                      ields:0 locks(micros) r:38 nreturned:0 reslen:20 0ms
2016-04-25T21:47:07.748+0500 [clientcursormon] mem (MB) res:74 virt:20163493
2016-04-25T21:47:07.748+0500 [clientcursormon]  mapped (incl journal view):20163016
2016-04-25T21:47:08.281+0500 [rsStart] query local.system.replset planSummary: COLLSCAN ntoreturn:1 ntoskip:0 nscanned:1 nscannedObjects:1 keyUpdates:0 numYields:0 locks(micros) r:102 nreturned:1 reslen:326 0ms
2016-04-25T21:47:08.281+0500 [rsStart] trying to contact IP.IP.IP.106:27017
2016-04-25T21:47:08.282+0500 [rsStart] trying to contact IP.IP.IP.109:37017
2016-04-25T21:47:08.283+0500 [rsStart] trying to contact IP.IP.IP.73:30000
2016-04-25T21:47:08.496+0500 [rsStart] replSet I am IP.IP.IP.107:27017
2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll
2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll
2016-04-25T21:47:08.498+0500 [rsHealthPoll] BackgroundJob starting: rsHealthPoll
2016-04-25T21:47:08.498+0500 [rsStart] replSet STARTUP2
2016-04-25T21:47:08.498+0500 [rsMgr] BackgroundJob starting: rsMgr
2016-04-25T21:47:08.498+0500 [rsHealthPoll] replSet member IP.IP.IP.106:27017 is up
2016-04-25T21:47:08.498+0500 [rsMgr] replSet total number of votes is even - add arbiter or give one member an extra vote
2016-04-25T21:47:08.498+0500 [rsGhostSync] BackgroundJob starting: rsGhostSync
2016-04-25T21:47:08.498+0500 [rsSync] replSet initial sync pending
2016-04-25T21:47:08.498+0500 [SyncSourceFeedbackThread] BackgroundJob starting: SyncSourceFeedbackThread
2016-04-25T21:47:08.499+0500 [rsSync] replSet initial sync need a member to be primary or secondary to do our initial sync
2016-04-25T21:47:08.499+0500 [rsHealthPoll] replSet member IP.IP.IP.106:27017 is now in state PRIMARY
2016-04-25T21:47:08.500+0500 [rsHealthPoll] replSet member IP.IP.IP.73:30000 is up
2016-04-25T21:47:08.500+0500 [rsHealthPoll] replSet member IP.IP.IP.73:30000 is now in state ARBITER
2016-04-25T21:47:10.498+0500 [rsHealthPoll] replSet member IP.IP.IP.109:37017 is up
2016-04-25T21:47:10.498+0500 [rsHealthPoll] replSet member IP.IP.IP.109:37017 is now in state ARBITER
2016-04-25T21:47:10.499+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:47:24.499+0500 [rsSync] replSet initial sync pending
2016-04-25T21:47:24.499+0500 [rsSync] replSet syncing to: IP.IP.IP.106:27017
2016-04-25T21:47:24.499+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:47:24.500+0500 [rsSync] Database::_addNamespaceToCatalog ns: local.replset.minvalid
2016-04-25T21:47:24.500+0500 [rsSync] ExtentManager::increaseStorageSize ns:local.replset.minvalid desiredSize:8192 fromFreeList: 0 eloc: 0:239d000
2016-04-25T21:47:24.501+0500 [rsSync] Database::_addNamespaceToCatalog ns: local.replset.minvalid.$_id_
2016-04-25T21:47:24.501+0500 [rsSync] build index on: local.replset.minvalid properties: { v: 1, key: {  _id: 1 }, name: "_id_", ns: "local.replset.minvalid" }
2016-04-25T21:47:24.501+0500 [rsSync] local.replset.minvalid: clearing plan cache - collection info cache reset
2016-04-25T21:47:24.501+0500 [rsSync] allocating new extent
2016-04-25T21:47:24.501+0500 [rsSync] ExtentManager::increaseStorageSize ns:local.replset.minvalid.$_id_desiredSize:131072 fromFreeList: 0 eloc: 0:239f000
2016-04-25T21:47:24.501+0500 [rsSync]    added index to empty collection
2016-04-25T21:47:24.501+0500 [rsSync] local.replset.minvalid: clearing plan cache - collection info cache reset
2016-04-25T21:47:24.501+0500 [rsSync] replSet initial sync drop all databases
2016-04-25T21:47:24.508+0500 [rsSync] dropAllDatabasesExceptLocal 2
2016-04-25T21:47:24.508+0500 [rsSync] dropDatabase grid_fs
2016-04-25T21:47:24.524+0500 [rsSync] lsn set 40381
2016-04-25T21:47:24.541+0500 [rsSync] removeJournalFiles
2016-04-25T21:47:24.542+0500 [rsSync] flushing directory /var/lib/mongodb/journal
2016-04-25T21:47:24.543+0500 [rsSync] removeJournalFiles end
2016-04-25T21:47:26.501+0500 [ConnectBG] BackgroundJob starting: ConnectBG
2016-04-25T21:47:27.121+0500 [ConnectBG] BackgroundJob starting: ConnectBG

Tim Hawkins

unread,
Apr 25, 2016, 7:05:48 PM4/25/16
to mongodb-user

What version and topology?

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/b13a23f4-3442-4c3a-a8c8-7dbb1b1caee1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Metikov Vadim

unread,
Apr 25, 2016, 9:30:32 PM4/25/16
to mongodb-user
Mongod version is 2.6.11 and now it is a part of sharded cluster. Shard number 1 (unistore-1) consist of Primary , two arbiters and one secondary that has been restored from dump.

вторник, 26 апреля 2016 г., 4:05:48 UTC+5 пользователь Tim Hawkins написал:

Kevin Adistambha

unread,
May 3, 2016, 2:19:50 AM5/3/16
to mongodb-user

Hi Metikov

What is the current state of your deployment? Are you still having corruption issues?

Once apon a time we have noticed data corruption

Could you provide some context on the nature of the corruption, e.g. log messages, errors, was it due to hardware failure, etc?

after restoring new member by mongorestore tool

Did I understand correctly that you created a new server, used mongorestore to restore a previous dump (which was created using mongodump --repair on an existing Secondary), and added this new server to the replica set using rs.add()?

If your deployment is a GridFS storage, you can verify the md5 hash of every file in GridFS by using the filemd5 command and compare it with a known md5 hash of that file, which is stored in the fs.files collection:

> db.fs.files.find()
{
  "_id": ObjectId("57282bb698e914b1a826601d"),
  "chunkSize": 261120,
  "uploadDate": ISODate("2016-05-03T04:40:22.837Z"),
  "length": 25,
  "md5": "6395d98c719565ee540354e5f971383a",
  "filename": "test.txt"
}

If there is corruption on the data in GridFS (i.e. in the fs.chunks collection), the md5 value will be different:

> db.runCommand({filemd5: ObjectId("57282bb698e914b1a826601d"), root: "fs"})
{
  "numChunks": 1,
  "md5": "db3d76e3cd3bf3a5819efd1316e68657",
  "ok": 1
}

In the example above, the md5 values between the fs.files collection and the result of the filemd5 command differs for file test.txt identified in GridFS by ObjectId("57282bb698e914b1a826601d"), which indicates that there is a corruption in the GridFS storage of that file.

it wants to make initial sync with dropping all datafiles.
How to avoid this behavior?

The default behaviour for a Secondary’s initial sync step is to drop all databases first. This is because a Secondary is supposed to act as a high-availability failover server that could be called on at any time to replace the function of the Primary. Therefore, it should be as similar to the Primary as possible.

If I understand correctly, your replica set crashed due to data corruption, and you would like to restore the replica set using a good dump. In this case, you wanted to follow the instructions in the Restore a Replica Set from MongoDB Backups page (i.e. create a new replica set using the dumped data). If you restore the data into a new server and then attach that restored server into an existing replica set, the first thing it will do is to perform initial sync with the set’s Primary (which you are seeing).

Mongod version is 2.6.11 and now it is a part of sharded cluster

I’m not sure I fully understand your statement here. Could you please elaborate on how exactly did you convert this replica set into part of a sharded cluster, and for what purpose?

Shard number 1 (unistore-1) consist of Primary , two arbiters and one secondary that has been restored from dump.

If your replica set consists of a Primary, a Secondary, and two Arbiters, I would recommend you to remove one of the Arbiters, since there is no benefit for having two Arbiters vs. only one Arbiter in this configuration. Please see Consider Fault Tolerance page, where this situation is described in detail.

Best regards,
Kevin

Metikov Vadim

unread,
May 5, 2016, 2:59:18 PM5/5/16
to mongodb-user
Hello! Thank you very much for your answer!

понедельник, 25 апреля 2016 г., 22:34:34 UTC+5 пользователь Metikov Vadim написал:

Metikov Vadim

unread,
May 5, 2016, 3:17:38 PM5/5/16
to mongodb-user
Hello! 
I'm sorry for previous answer. Nothing was saved :(

вторник, 3 мая 2016 г., 11:19:50 UTC+5 пользователь Kevin Adistambha написал:

Hi Metikov

What is the current state of your deployment? Are you still having corruption issues?

 No after chunk migration was stopped i see no errors.

Once apon a time we have noticed data corruption

Could you provide some context on the nature of the corruption, e.g. log messages, errors, was it due to hardware failure, etc?

Yes. That was due to server power loss. Log messages listed here :https://jira.mongodb.org/browse/SERVER-22615

after restoring new member by mongorestore tool

Did I understand correctly that you created a new server, used mongorestore to restore a previous dump (which was created using mongodump --repair on an existing Secondary), and added this new server to the replica set using rs.add()?

Yes. Absolutely.
 

If your deployment is a GridFS storage, you can verify the md5 hash of every file in GridFS by using the filemd5 command and compare it with a known md5 hash of that file, which is stored in the fs.files collection:

> db.fs.files.find()
{
  "_id": ObjectId("57282bb698e914b1a826601d"),
  "chunkSize": 261120,
  "uploadDate": ISODate("2016-05-03T04:40:22.837Z"),
  "length": 25,
  "md5": "6395d98c719565ee540354e5f971383a",
  "filename": "test.txt"
}

If there is corruption on the data in GridFS (i.e. in the fs.chunks collection), the md5 value will be different:

> db.runCommand({filemd5: ObjectId("57282bb698e914b1a826601d"), root: "fs"})
{
  "numChunks": 1,
  "md5": "db3d76e3cd3bf3a5819efd1316e68657",
  "ok": 1
}

In the example above, the md5 values between the fs.files collection and the result of the filemd5 command differs for file test.txt identified in GridFS by ObjectId("57282bb698e914b1a826601d"), which indicates that there is a corruption in the GridFS storage of that file.

Thanks to the solution. But when i try to delete corrupted documents i get same erros like above. 

it wants to make initial sync with dropping all datafiles.
How to avoid this behavior?

The default behaviour for a Secondary’s initial sync step is to drop all databases first. This is because a Secondary is supposed to act as a high-availability failover server that could be called on at any time to replace the function of the Primary. Therefore, it should be as similar to the Primary as possible.

If I understand correctly, your replica set crashed due to data corruption, and you would like to restore the replica set using a good dump. In this case, you wanted to follow the instructions in the Restore a Replica Set from MongoDB Backups page (i.e. create a new replica set using the dumped data). If you restore the data into a new server and then attach that restored server into an existing replica set, the first thing it will do is to perform initial sync with the set’s Primary (which you are seeing).


Can i add new member with low-priority / zero votes / something else to avoid initial sync (with dropping datafiles)?

Mongod version is 2.6.11 and now it is a part of sharded cluster

I’m not sure I fully understand your statement here. Could you please elaborate on how exactly did you convert this replica set into part of a sharded cluster, and for what purpose?

In some point of time we need to use sharding due to big data. All data was more than one server storage(11TB). We added a new shard. After that we get more spave for new data. Some time after we deleted much documents and all data fits to one shard. We have decided to use "desharding" procedure(just remove second shard). Some time later chunk migration was stopped due to data corruption errors.  

Shard number 1 (unistore-1) consist of Primary , two arbiters and one secondary that has been restored from dump.

If your replica set consists of a Primary, a Secondary, and two Arbiters, I would recommend you to remove one of the Arbiters, since there is no benefit for having two Arbiters vs. only one Arbiter in this configuration. Please see Consider Fault Tolerance page, where this situation is described in detail. 

That was temporarily situation. We have three members in each shard(primary, secondary and arbiter) now.


 

Best regards,
Kevin

Kevin Adistambha

unread,
May 8, 2016, 9:58:44 PM5/8/16
to mongodb-user

Hi Metikov,

Can i add new member with low-priority / zero votes / something else to avoid initial sync (with dropping datafiles)?

Unfortunately no. The first step in initial sync is always dropping all databases. A Secondary is supposed to mirror the data content of the Primary, and there is no setting in MongoDB that can avoid this drop databases step when a server is attached to an existing replica set.

In some point of time we need to use sharding due to big data. All data was more than one server storage(11TB). We added a new shard. After that we get more spave for new data. Some time after we deleted much documents and all data fits to one shard. We have decided to use “desharding” procedure(just remove second shard). Some time later chunk migration was stopped due to data corruption errors.

Could you elaborate on the steps you performed in your desharding procedure?

Also, I’m a little unclear about the current state of your deployment. My understanding is that currently it is a sharded cluster, where each shard is a replica set with Primary-Secondary-Arbiter configuration. Could you elaborate on:

  • how many shards are there currently
  • how many config servers
  • how many mongos

Additional information that may be helpful:

  • the output of db.serverCmdLineOpts() in the mongo shell from each mongod in your deployment
  • the output of rs.status() from your replica set (you can connect directly to the replica set’s Primary using the mongo shell to execute this command)
  • the output of sh.status() from any mongos process

Regarding data corruption issues: Unfortunately there is little anyone can do if you are having data corruption issues due to hardware failures, power outages, etc. There are a couple of things you could try, but both of them involve some downtime:

  • Restore from a known good backup
  • Dump the content of your database with mongodump --repair to dump all known good data, and restore to a new deployment

I would also like to clarify a statement you made in your earlier post:

Ititial sync on new member breaks on 12th datafile and new member promoted to state SECONDARY without data (having only 12/3500 files)

It is entirely possible (and perfectly normal) for a Secondary to have fewer number of files compared to the Primary. Once a node reach a Secondary state, it has successfully performed an initial sync, and contains the same data as the Primary. For a list of reasons, please see Why are the files in my data directory larger than the data in my database. If you deleted a large amount of data, the section labeled Empty records is especially relevant.

Best regards,
Kevin

Metikov Vadim

unread,
May 11, 2016, 2:21:04 AM5/11/16
to mongodb-user
Hello!

2016-05-09 6:58 GMT+05:00 Kevin Adistambha <kevi...@mongodb.com>:

Hi Metikov,

Can i add new member with low-priority / zero votes / something else to avoid initial sync (with dropping datafiles)?

Unfortunately no. The first step in initial sync is always dropping all databases. A Secondary is supposed to mirror the data content of the Primary, and there is no setting in MongoDB that can avoid this drop databases step when a server is attached to an existing replica set.


In some point of time we need to use sharding due to big data. All data was more than one server storage(11TB). We added a new shard. After that we get more spave for new data. Some time after we deleted much documents and all data fits to one shard. We have decided to use “desharding” procedure(just remove second shard). Some time later chunk migration was stopped due to data corruption errors.

Could you elaborate on the steps you performed in your desharding procedure?

I used next command:
db.runCommand( { removeShard: "unistore-2" } )

Also, I’m a little unclear about the current state of your deployment. My understanding is that currently it is a sharded cluster, where each shard is a replica set with Primary-Secondary-Arbiter configuration. Could you elaborate on:

  • how many shards are there currently
now it is two shards but second is in draining stage
  • how many config servers
it is three config servers
  • how many mongos
we have six mongos (one on each client)

Additional information that may be helpful:

  • the output of db.serverCmdLineOpts() in the mongo shell from each mongod in your deployment
Here is output:
unistore-1:SECONDARY> db.serverCmdLineOpts()
{
        "argv" : [
                "/usr/bin/mongod",
                "--config",
                "/etc/mongod.conf"
        ],
        "parsed" : {
                "config" : "/etc/mongod.conf",

                "replication" : {
                        "replSet" : "unistore-1"
                },
                "storage" : {
                        "dbPath" : "/var/lib/mongodb"
                },
                "systemLog" : {
                        "destination" : "file",
                        "logAppend" : true,
                        "path" : "/var/log/mongodb/mongod.log"
                }
        },
        "ok" : 1
}
unistore-1:SECONDARY>
 
  • the output of rs.status() from your replica set (you can connect directly to the replica set’s Primary using the mongo shell to execute this command)
Here is output on SECONDARY:
 
unistore-1:SECONDARY> rs.status()
{
        "set" : "unistore-1",
        "date" : ISODate("2016-05-11T06:02:00Z"),
        "myState" : 2,
        "syncingTo" : "79.110.251.109:27017",
        "members" : [
                {
                        "_id" : 3,
                        "name" : "79.172.49.11:27017",
                        "health" : 1,
                        "state" : 2,
                        "stateStr" : "SECONDARY",
                        "uptime" : 477144,
                        "optime" : Timestamp(1462946517, 8),
                        "optimeDate" : ISODate("2016-05-11T06:01:57Z"),
                        "self" : true
                },
                {
                        "_id" : 18,
                        "name" : "79.172.49.73:30000",
                        "health" : 1,
                        "state" : 7,
                        "stateStr" : "ARBITER",
                        "uptime" : 429531,
                        "lastHeartbeat" : ISODate("2016-05-11T06:01:59Z"),
                        "lastHeartbeatRecv" : ISODate("2016-05-11T06:01:59Z"),
                        "pingMs" : 0
                },
                {
                        "_id" : 22,
                        "name" : "79.110.251.109:27017",
                        "health" : 1,
                        "state" : 1,
                        "stateStr" : "PRIMARY",
                        "uptime" : 429156,
                        "optime" : Timestamp(1462946517, 8),
                        "optimeDate" : ISODate("2016-05-11T06:01:57Z"),
                        "lastHeartbeat" : ISODate("2016-05-11T06:01:59Z"),
                        "lastHeartbeatRecv" : ISODate("2016-05-11T06:01:58Z"),
                        "pingMs" : 109,
                        "electionTime" : Timestamp(1462517538, 1),
                        "electionDate" : ISODate("2016-05-06T06:52:18Z")
                }
        ],
        "ok" : 1
}
unistore-1:SECONDARY>
 
  • the output of sh.status() from any mongos process
Here is it:
mongos> sh.status()
--- Sharding Status ---
  sharding version: {
        "_id" : 1,
        "minCompatibleVersion" : 5,
        "currentVersion" : 6,
        "clusterId" : ObjectId("54dada532e13265e4a8c9568")
}
  shards:
        {  "_id" : "unistore-1",  "host" : "unistore-1/79.110.251.109:27017,79.172.49.11:27017" }
        {  "_id" : "unistore-2",  "host" : "unistore-2/79.172.49.11:37017,79.172.49.114:27017",  "draining" : true }
  databases:
        {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
        {  "_id" : "grid_fs_test6",  "partitioned" : false,  "primary" : "unistore-1" }
        {  "_id" : "grid_fs_test3",  "partitioned" : false,  "primary" : "unistore-1" }
        {  "_id" : "grid_fs",  "partitioned" : true,  "primary" : "unistore-1" }
                grid_fs.fs.chunks
                        shard key: { "files_id" : 1, "n" : 1 }
                        chunks:
                                unistore-1      222966
                                unistore-2      150871
                        too many chunks to print, use verbose if you want to force print
        {  "_id" : "kombu_default",  "partitioned" : false,  "primary" : "unistore-1" }
        {  "_id" : "grid_fs_test",  "partitioned" : false,  "primary" : "unistore-1" }
        {  "_id" : "test",  "partitioned" : false,  "primary" : "unistore-1" }
        {  "_id" : "gridfs",  "partitioned" : false,  "primary" : "unistore-1" }

mongos>
 

Regarding data corruption issues: Unfortunately there is little anyone can do if you are having data corruption issues due to hardware failures, power outages, etc. There are a couple of things you could try, but both of them involve some downtime:

  • Restore from a known good backup
It is too slow. Making one backup took more than 36 hours. May be it is because of disk I/O. I will try dump/restore procedure without file creation in future.
  • Dump the content of your database with mongodump --repair to dump all known good data, and restore to a new deployment
Yes i used mongodump with --repair option. Without repair mongodump stopped with error(data corruption: Assertion: 10334:BSONObj size: 0 (0x00000000) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO)

I would also like to clarify a statement you made in your earlier post:

Ititial sync on new member breaks on 12th datafile and new member promoted to state SECONDARY without data (having only 12/3500 files)

It is entirely possible (and perfectly normal) for a Secondary to have fewer number of files compared to the Primary. Once a node reach a Secondary state, it has successfully performed an initial sync, and contains the same data as the Primary. For a list of reasons, please see Why are the files in my data directory larger than the data in my database. If you deleted a large amount of data, the section labeled Empty records is especially relevant.

Yes , i deleted many records but not more than a half. II have 7 TB of data in 8TB datafiles minimum.
 

Best regards,
Kevin

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/O0Z4t17N2wI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user+unsubscribe@googlegroups.com.

To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.

For more options, visit https://groups.google.com/d/optout.

Thank you for cooperation, Kevin.

--
Regards, Vadim
Reply all
Reply to author
Forward
0 new messages