MongoDB shell: nested iteration through cursors not executing

1,313 views
Skip to first unread message

Gabor Orszaczky

unread,
Jun 11, 2017, 8:54:41 PM6/11/17
to mongodb-user
I want to remove duplicates from a huge collection, by moving documents to another collection, and checking if a duplicate already exists. I'm running the following code in the Mongo shell:

var fromColl = db.from.find(),
fromDoc,
toColl,
toDoc;

//iterate through original collection
fromColl.forEach(function(fromObj){
    //find by name in target collection
    toColl = db.to.find({name: fromObj.name});
    
    if (toColl.length() == 0) {
        //no duplicates found in the target coll, insert
        db.to.insert(fromObj);
    } else {
        //possible duplicates found in the target coll
        toColl.forEach(function(toObj){ // <- not executed
            if (equal(fromObj.data, toObj.data)) {
                //duplicate, skip
            } else {
                //...
            }
        });
    }
});
Even if toColl has elements, the second forEach loop isn't executed. What am I doing wrong?

Rhys Campbell

unread,
Jun 12, 2017, 7:01:38 AM6/12/17
to mongodb-user
Change

toColl = db.to.find({name: fromObj.name});

to...

var toColl = db.to.find({name: fromObj.name});

The first line doesn't function correctly so toCol1.length() will always return 0.
Message has been deleted

Gabor Orszaczky

unread,
Jun 12, 2017, 7:44:37 AM6/12/17
to mongodb-user
Actually toColl is already declared at the top, and it doesn't seem to make a difference if I declare it inside the first forEach function. 
Both ways toColl.length() will return the correct value, so the conditional part works as expected.

I found a workaround, and created an array of the second cursor: toColl = db.to.find({name: fromObj.name}).toArray(); and I iterated the array with a plain JS for loop.

So my problem is solved, but it would be good to know why the nested cusor.forEach() is not executed...

Stephen Steneker

unread,
Jun 12, 2017, 8:31:09 AM6/12/17
to mongodb-user
On Monday, 12 June 2017 21:44:37 UTC+10, Gabor Orszaczky wrote:
Actually toColl is already declared at the top, and it doesn't seem to make a difference if I declare it inside the first forEach function. 
Both ways toColl.length() will return the correct value, so the conditional part works as expected.

I found a workaround, and created an array of the second cursor: toColl = db.to.find({name: fromObj.name}).toArray(); and I iterated the array with a plain JS for loop.

So my problem is solved, but it would be good to know why the nested cusor.forEach() is not executed...

Hi Gabor,

The mongo shell has some shortcuts for working with data in the shell. This is explained in more detail in the MongoDB documentation: Iterate a Cursor in the mongo Shell.

In particular:

 if the returned cursor is not assigned to a variable using the var keyword, then the cursor is automatically iterated up to 20 times to print up to the first 20 documents in the results.

In your code example the var declaration for toColl was prior to executing the find(). 

Iterating all the results with toArray() is a possible approach, but requires loading all documents returned by the cursor into RAM. Manually iterating the cursor is a more scalable approach.

Regards,
Stennie

Gabor Orszaczky

unread,
Jun 12, 2017, 12:45:32 PM6/12/17
to mongodb-user
Hi Stennie,

Thanks for your quick and detailed answer, I really appreciate it.
So I updated my code as follows:

var fromColl = db.from.find();

fromColl
.forEach(function(fromObj){
 
var toColl = db.to.find({name: fromObj.name});

  if (toColl.length() == 0) {
    db
.to.insert(fromObj);
 
} else {
   
print(toColl.length());

    toColl
.forEach(function(toObj){ // <- not executed

     
print('inside forEach');
   
});
 
}
});

And while toColl.length() is printed correctly, the iteration still won't run.
Message has been deleted
Message has been deleted

Gabor Orszaczky

unread,
Jun 12, 2017, 1:04:30 PM6/12/17
to mongodb-user
I'm attaching two files, one that inserts sample data, and a working full example with the failing iteration...
I had to rename the attachments from .js to ._js_ because otherwise it wouldn't post.
copytest._js_
inserttest._js_

Rhys Campbell

unread,
Jun 13, 2017, 10:58:42 AM6/13/17
to mongodb-user
Usage of the length() function appears to be an issue.

Change...

if (!toColl.length())

to...

if (!toColl)

and 

print('duplicate? - found with same name: ' + toColl.length());


to...

print('duplicate? - found with same name: ' + toColl);

This reveals a new error...

total: 1 - inserted: 0 - skipped: 0
duplicate
? - found with same name: DBQuery: test.toColl -> { "name" : "aaa" }
inside forEach
- toObj.name: aaa
2017-06-13T16:53:50.594+0200 E QUERY    [thread1] TypeError: arr1 is undefined :
equal@
(shell):2:17
@(shell):16:5
DBQuery.prototype.forEach@src/mongo/shell/query.js:488:1
@(shell):14:1
DBQuery.prototype.forEach@src/mongo/shell/query.js:488:1
@(shell):1:1


Which is sorted by changing...

if (equal(fromObj.nutr, toObj.nutr))

to...

if (equal(fromObj, toObj))

You want to compare the whole object right?

You rcode then runs but we end up with only 3 record which I think you don;t want...

mongos> db.toColl.find({});
{ "_id" : ObjectId("593fe737e0382abce8cb751e"), "name" : "aaa", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7521"), "name" : "bbb", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7524"), "name" : "ccc", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }


Your equal function needs looking at. I think the compare isn't working for the list.




Rhys Campbell

unread,
Jun 13, 2017, 11:03:09 AM6/13/17
to mongodb-user
Yep, length resets the cursor...

mongos> var c = db.fromColl.find()
mongos
> c
{ "_id" : ObjectId("593fe737e0382abce8cb751e"), "name" : "aaa", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb751f"), "name" : "aaa", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7520"), "name" : "aaa", "data" : [ { "id" : "a", "val" : 2 }, { "id" : "b", "val" : 4 } ] }

{ "_id" : ObjectId("593fe737e0382abce8cb7521"), "name" : "bbb", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7522"), "name" : "bbb", "data" : [ { "id" : "a", "val" : 2 }, { "id" : "b", "val" : 4 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7523"), "name" : "bbb", "data" : [ { "id" : "a", "val" : 3 }, { "id" : "b", "val" : 6 } ] }

{ "_id" : ObjectId("593fe737e0382abce8cb7524"), "name" : "ccc", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7525"), "name" : "ccc", "data" : [ { "id" : "a", "val" : 1 }, { "id" : "b", "val" : 2 } ] }
{ "_id" : ObjectId("593fe737e0382abce8cb7526"), "name" : "ccc", "data" : [ { "id" : "a", "val" : 2 }, { "id" : "b", "val" : 4 } ] }
mongos
> c
mongos
> var c = db.fromColl.find()
mongos
> c.length()
9
mongos
> c



Gabor Orszaczky

unread,
Jun 13, 2017, 11:43:50 AM6/13/17
to mongodb-user
Thanks Rhys, I'm really grateful for your help!

The problem was using
cursor.length()

instead of
cursor.count()


You are also right about the comparison, it was meant to compare the data arrays, and it should have been:
if (equal(fromObj.data, toObj.data))
according to the sample code.

Thanks again!
Reply all
Reply to author
Forward
0 new messages