mongoimport pathetically slow on large updates

222 views
Skip to first unread message

parco...@gmail.com

unread,
Sep 25, 2015, 1:42:44 PM9/25/15
to mongodb-user
I have a collection with 700 million records ( 1 TB )
I am trying to update 200 million of these records as a part of data cleansing operations.

I have tried several approaches, and finally converged on mongoimport. I produce several jsons
with the 200 million records  and then use  --upsert . mongoimport does not do merges/adds to documents,
so I have to dump out the whole document for these 200 M records.

the import crawls and takes around 3-5 minutes for every file which have only 50,000 records.

1. In verbose mode of mongoimport, it is clear that bson creation is not the issue.

2. to debug the problem, I had created a brand new collection and just insert these documents, things move really 
fast around 20-30 seconds per file, which is reasonable.

3. Index updates is not the issue. I have no indexes on this collection.

4. when I do not use --upsert, mongoimport detects there are duplicate records. it goes through the entire file 
   with 50,000 documents in 10 seconds and does not insert anything. So it seems like it is able to locate
   that there are duplicates very quickly.

given the findings of 2 and 4, I am not sure what could be the issue?







Rohit Jain

unread,
Sep 25, 2015, 5:10:30 PM9/25/15
to mongodb-user
Which MongoDB version you are using ?

Regards,
Rohit

Anirban Rahut

unread,
Sep 25, 2015, 6:44:06 PM9/25/15
to mongodb-user
Hello Rohit,

the mongod details are - 
db version v3.0.5
git version: 8bc4ae20708dbb493cb09338d9e7be6698e4a3a3

mongoimport version: 3.0.4
git version: efe71bf185cdcfe9632f1fc2e42ca4e895f93269

the database engine is MMAPV1

tx.

Rohit Jain

unread,
Sep 26, 2015, 12:00:18 AM9/26/15
to mongodb-user
Hello,
 

Depending on your MongoDB configuration, --upsert may impact your mongod‘s performance.

Changed in version 3.0.0: --upsertFields now implies --upsert. As such, you may prefer to use --upsertFields instead of --upsert.

--upsertFields <field1[,field2]>

Specifies a list of fields for the query portion of the upsert. Use this option if the _id fields in the existing documents don’t match the field in the document, but another field or field combination can uniquely identify documents as a basis for performing upsert operations.

Changed in version 3.0.0: Modifies the import process to update existing objects in the database if they match based on the specified fields, while inserting all other objects. You do not need to use --upsert with--upsertFields.

If you do not specify a field, --upsertFields will upsert on the basis of the _id field.

To ensure adequate performance, indexes should exist for this field or fields.

Please refer http://docs.mongodb.org/manual/reference/program/mongoimport/ for more details.


Regards,

Rohit

Reply all
Reply to author
Forward
0 new messages