batchSize and missing results

34 views
Skip to first unread message

Mike Bostock

unread,
Sep 2, 2011, 12:39:50 AM9/2/11
to mongod...@googlegroups.com
If I use batchSize in conjunction with a query that is not entirely
satisfied by an index, then the cursor does not return all of the
expected results; instead, it only returns whatever fits in the first
batch. Is this expected behavior? If so, how do I workaround this
limitation?

I have a collection with time and user_id fields. I want to filter by
time in a given range, and sort by user_id. As the query has a range
and sort on different dimensions, the query cannot be completely
satisfied by an index (such as {time: 1} or {user_id: 1}). If I issue
a query without a batch size, I retrieve the expected number of
results. In mongo version 1.8.3:

cursor = db.things.find({
time: {
$gte: new ISODate("2011-09-01"),
$lt: new ISODate("2011-09-02")
}
}).sort({
user_id: 1
});

Then:

for (var i = 0; cursor.hasNext(); ++i) cursor.next();

After a while, this finishes, and `i` has a large value (many times my
desired batch size), equal to the expected count(). On the other hand,
if I set a batchSize:

cursor = db.things.find({
time: {
$gte: new ISODate("2011-09-01"),
$lt: new ISODate("2011-09-02")
}
}).sort({
user_id: 1
}).batchSize(1000);

With batchSize, the same for loop ends at i = 980. The cursor's
hasNext() is false, so I can't retrieve any more results.

I can fix this by disabling batchSize, but that seems scary because
there could be many matching records. Is there a better way to issue a
query that includes a range and sort on different dimensions?

Thanks,
Mike

Kyle Banker

unread,
Sep 2, 2011, 10:49:26 AM9/2/11
to mongod...@googlegroups.com
You may be encountering a bug...but I'm still unclear about why you need to add batchSize. If you want to limit the records returned, just use limit().

Mike Bostock

unread,
Sep 2, 2011, 12:12:57 PM9/2/11
to mongod...@googlegroups.com
> You may be encountering a bug...but I'm still unclear about why you need to
> add batchSize. If you want to limit the records returned, just use limit().

I don't want to limit the number of records returned; I just don't
want to load them all into memory at once because there could be lots
of them. (On the client, I'm doing some stream-based processing.)

Mike

Kyle Banker

unread,
Sep 2, 2011, 1:33:54 PM9/2/11
to mongod...@googlegroups.com
Cursors always fetch result sets iteratively, so if the result set is large, it won't all be loaded into memory anyway.

If you run mongod with the -vvvvv flag, you'll see all the operations. When you see a 'getmore' operation, that's a cursor
fetching the next set of results.

Mike Bostock

unread,
Sep 2, 2011, 1:44:23 PM9/2/11
to mongod...@googlegroups.com
> Cursors always fetch result sets iteratively, so if the result set is large,
> it won't all be loaded into memory anyway.

Right; in effect I'd be using the default batchSize of 4MB. Should I
file a bug in JIRA for the missing results with smaller batch sizes? I
can try to create a reproducible test case.

Mike

Kyle Banker

unread,
Sep 22, 2011, 11:42:36 AM9/22/11
to mongod...@googlegroups.com
Yes, please file a bug if you're still seeing issues.

- Kyle
Reply all
Reply to author
Forward
0 new messages