mongodb script timing out for large collections

103 views
Skip to first unread message

Chris Pearlman

unread,
Jun 7, 2016, 7:26:16 PM6/7/16
to mongodb-user
I have a script that updates a few fields in one collection with values pulled from another collection (all under same database). Each collection has over 50,000 documents which causes either a timeout or it chokes applications using mongodb if I let it run. How can I avoid this?

Amar

unread,
Jun 20, 2016, 2:12:08 AM6/20/16
to mongodb-user

Hi Chris,

I have a script that updates a few fields in one collection with values pulled from another collection

Is this a script in [mongo shell|https://docs.mongodb.com/manual/mongo/] or using a certain driver?

Could you provide more information about:

  • Version of MongoDB and Driver used
  • Your deployment topology (i.e. standalone, replica set, sharded cluster)
  • Example documents and the operations performed on them
  • Are there resource intensive operations on the same machine that runs the mongod process?

causes either a timeout

Are there errors or performance related messages in mongod logs. E.g. network errors, slow queries, etc…

or it chokes applications using mongodb

Is the impact only on applications using mongodb or any applications running on that machine in particular? I.e. Is the machine is reaching it’s resource limits when you script is running?

Have you tried using the profiler or mplotqueries to analyze your operations?

Regards,

Amar

Chris Pearlman

unread,
Jun 20, 2016, 7:52:38 AM6/20/16
to mongodb-user
Hi Amar,

The script is in mongo shell.
MognoDB Version: 3.0.8 
Topology: 3 node replica set
There are no other resource intensive ops on any nodes in the replica set other than mongod. 
The Documents themselves are really small. ~ 2 KiB, but each collection (UserSettings and security_users from script below) has over 50,000 documents.   

Here is the actual script. It's a very simple script to cleanup unused documents. Any help would be great. Thanks for your time!

db.getCollection('UserSettings').find().forEach(function(currentUserSetting){
if (currentUserSetting.Payload.UserId) {
    var userSettingId = currentUserSetting.Payload.UserId
    var matchingUser = db.security_users.findOne({'_id':userSettingId})

    if (!matchingUser){
       db.UserSettings.remove(currentUserSetting)
        //print(currentUserSetting)
    }
}

})

Amar

unread,
Jun 21, 2016, 9:07:07 PM6/21/16
to mongodb-user

Hi Chris,

Looking at the code provided, remove() is given the whole document rather than _id which is less efficient. This could look like this:

db.UserSettings.remove( { _id : currentUserSetting._id } )

Note that since the script uses find() with no constraints, it may need to fetch the whole collection into memory. This may be disruptive to your working set and may contribute to the performance issue you are seeing. For more information, please see Must my working set size fit RAM. Additional things that you can look at are:

  • Are the 3 replica sets running on different machines? If there are more than one mongod process running on a single machine, there may be a resource contention issue.
  • How much is the memory on the machine(s) running mongod?
  • Which storage engine are you using?
  • On average, how many documents are expected to be deleted in each run?
  • is there anything in the mongod logs that may point to resource issues (e.g. slow queries, etc)

Regarding performance issues, these links may be of interest to you:

Regards,

Amar

Reply all
Reply to author
Forward
0 new messages