MongoDB duplicate non-sharded collection was created in two shard severs

94 views
Skip to first unread message

Angelo H

unread,
Jun 20, 2016, 7:43:00 PM6/20/16
to mongodb-user

I am running multiple mongoimport processes to import a collection into a mongo cluster. This collection is NOT sharded and its primary server is the shard-01 server.

Here is our sharding infrastructure:

shard-01-replica-set-01 with query-router-01 shard-01-replica-set-02 shard-02-replica-set-01 with query-router-02 shard-01-replica-set-02

There are 3 mongo import processes connecting to query-router-01 and 3 mongo import processes connecting to query-router-02.

When I tried to verify, I found out the collection size is different on query-router-01 and query-router-02. Then I found out 2 databases are created. One is created on shard-01 and another is created on shard-02. What is the issue?

Kevin Adistambha

unread,
Jun 28, 2016, 1:45:40 AM6/28/16
to mongodb-user

Hi Angelo,

I am running multiple mongoimport processes to import a collection into a mongo cluster. This collection is NOT sharded and its primary server is the shard-01 server.

There are 3 mongo import processes connecting to query-router-01 and 3 mongo import processes connecting to query-router-02.

When I tried to verify, I found out the collection size is different on query-router-01 and query-router-02. Then I found out 2 databases are created. One is created on shard-01 and another is created on shard-02. What is the issue?

I’m not sure I fully understand the situation. My understanding so far is:

  • You have two mongos processes in your deployment
  • You have two shards, each of them are a replica set
  • You are restoring the same data, using both mongos at the same time
  • On each of the mongos, there are 3 import processes (totalling 6 import processes overall)
  • The database/collection you are importing is not sharded, and originally it was on shard-01
  • You found that the same database are duplicated in both shards after the import process

Is my understanding correct?

Could you please post:

  • Your MongoDB version, your O/S version, and any MongoDB driver version, if applicable
  • What are you trying to import exactly, is it a database or a collection?
  • What did you use to perform the import (e.g. mongo shell script, tools such as mongorestore, any script using some other language)
  • The exact command sequence you executed for the backup process (e.g. all mongodump command lines)
  • The exact command sequence you executed for the restore process (e.g. all mongorestore command lines)
  • Did you do any post-processing on your backup data?
  • Is there any error messages in the logs during the import process?

Best regards,
Kevin

Angelo H

unread,
Jul 6, 2016, 2:48:50 AM7/6/16
to mongodb-user
  • You are restoring the same data, using both mongos at the same time
       No, I am importing different collections on different mongos. The reason I was doing this was to speed up the import process.
  • Your MongoDB version, your O/S version, and any MongoDB driver version, if applicable
      MongoDB: 3.2.7, Amazon Linux 2015.09, MongoDB shell version 3.2.7.

  • What are you trying to import exactly, is it a database or a collection?
      There are 30 collections in my database. I am trying to split the imports to all mongos to distribute the load and speed up the import.
  • What did you use to perform the import (e.g. mongo shell script, tools such as mongorestore, any script using some other language)
      I used mongoimport.
  • The exact command sequence you executed for the backup process (e.g. all mongodump command lines)
      I use a python script to merge some new raw json we acquired and existing data from the production database. The final json will have the _id ("_id": {"$oid": "564c356c8be642c575dfc833"}) from the production database.
  • The exact command sequence you executed for the restore process (e.g. all mongorestore command lines)
      mongoimport --upsert --host "${HOST}" --port "${PORT}" --db "${DB}" --collection "${COLLECTION}" --file $file

  • Did you do any post-processing on your backup data?
      Yes, I did. Like I described above, I use a python script to merge some new raw json we acquired and existing data from the production database. 

  • Is there any error messages in the logs during the import process?
      No, I don't think so.

Kevin Adistambha

unread,
Jul 12, 2016, 9:09:17 PM7/12/16
to mongodb-user

Hi,

I attempted to recreate your issue but so far had no success. I tried importing 30 collections using two different mongos but so far only a single database was created, which contains all 30 collections imported from both mongos.

No, I am importing different collections on different mongos. The reason I was doing this was to speed up the import process.

If you are trying to speed up the import process, you may want to take a look at the --numInsertionWorkers parameter in mongoimport. This setting defaults to 1, and increasing the number of insertion workers may increase the speed of your import.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages