I have a collection with time and user_id fields. I want to filter by
time in a given range, and sort by user_id. As the query has a range
and sort on different dimensions, the query cannot be completely
satisfied by an index (such as {time: 1} or {user_id: 1}). If I issue
a query without a batch size, I retrieve the expected number of
results. In mongo version 1.8.3:
cursor = db.things.find({
time: {
$gte: new ISODate("2011-09-01"),
$lt: new ISODate("2011-09-02")
}
}).sort({
user_id: 1
});
Then:
for (var i = 0; cursor.hasNext(); ++i) cursor.next();
After a while, this finishes, and `i` has a large value (many times my
desired batch size), equal to the expected count(). On the other hand,
if I set a batchSize:
cursor = db.things.find({
time: {
$gte: new ISODate("2011-09-01"),
$lt: new ISODate("2011-09-02")
}
}).sort({
user_id: 1
}).batchSize(1000);
With batchSize, the same for loop ends at i = 980. The cursor's
hasNext() is false, so I can't retrieve any more results.
I can fix this by disabling batchSize, but that seems scary because
there could be many matching records. Is there a better way to issue a
query that includes a range and sort on different dimensions?
Thanks,
Mike
I don't want to limit the number of records returned; I just don't
want to load them all into memory at once because there could be lots
of them. (On the client, I'm doing some stream-based processing.)
Mike
Right; in effect I'd be using the default batchSize of 4MB. Should I
file a bug in JIRA for the missing results with smaller batch sizes? I
can try to create a reproducible test case.
Mike