Cursor picking up updated documents in forEach using $set - need better fix.

1,974 views
Skip to first unread message

Brad Karels

unread,
Jul 10, 2012, 9:27:57 AM7/10/12
to mongod...@googlegroups.com
Case:
Our document structure needs to change and we need to update existing documents in a production system.  Since we are in beta we have less than 1000 documents in the affected collection.

Solution:
Using JavaScript, I created a small script to iterate over each document in the collection and update it accordingly.  Source code below.

Problem:
The forEach loop iterates 101 extra times with each execution.  The 101 operations attempt to update already updated documents and the data ends in an invalid state.  I have discovered that 101 is the default cursor size so I understand the number, but why the cursor would re-pick up documents that have been updated using $set I do not understand.

Workaround:
In my script I have set the cursor size to 10K to get around this for now.  However, in the future when we have much more data this solution will fall down.  So, I am hoping one of you has a greater insight into this and can help me put a solution in place that will handle large data sets should we need to do this in the future.

NOTE: Data is NOT sharded - replSet only.

Original script:

/* Usage: $ mongo [hostname][:port]/[dbname] convert.js --shell */

db.events.find().forEach(function(event) {

      var newMeasurement = {};

      var measurement = event.measurement;

      if (measurement.miles) {

          newMeasurement.distance = {"miles":measurement.miles};

     } else if (measurement.kilometers) {

         newMeasurement.distance = {"kilometers":measurement.kilometers};

     } else if (measurement.meters) {

         newMeasurement.distance = {"meters":measurement.meters};

     }

     if (measurement.seconds) {

         newMeasurement.time = {"seconds":measurement.seconds};

     } else if (measurement.minutes) {

         newMeasurement.time = {"minutes":measurement.minutes};

     } else if (measurement.hours) {

         newMeasurement.time = {"hours":measurement.hours};

     }

     db.events.update( {"_id":event._id}, { $set : { measurement : newMeasurement } } );

 });

Work around (inflated cursor):
 

var cursor = db.events.find().limit(10000);

cursor.forEach(function(event) {

      var newMeasurement = {};

      var measurement = event.measurement;

     ...

     db.events.update( {"_id":event._id}, { $set : { measurement : newMeasurement } } );

 });


So, all in all this is a very basic operation, but the cursor issue is bothersome.  Your input is much appreciated! Thank you.

Scott Hernandez

unread,
Jul 10, 2012, 9:32:32 AM7/10/12
to mongod...@googlegroups.com
You want to use db.coll.find().snapshot().forEach(...)

The snapshot'd cursor/query will only return each document once even if it moves around.

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Brad Karels

unread,
Jul 10, 2012, 10:05:36 AM7/10/12
to mongod...@googlegroups.com
Ah!  Perfect - thank you very much for pointing out this feature!
Reply all
Reply to author
Forward
0 new messages