I have a large collection (~165 million docs) with an index that my
app continuously queries against. My queries are performing fairly
well, but there's one variable that I think could use some tuning:
cursor batch size.
According to one of Kristina's comments on
http://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd8169d70c01e4/6803d3b8b51364fc,
the default batch size is min(100 docs, 4MB). My docs are small (~156
bytes each), but some queries can return over 70,000 of them.
Assuming that fewer network round-trips is better, I upped my batch
size to 70,000, a little over 10MB per batch. Subsequent testing
revealed a decrease in performance.
Then, I read
http://www.mongodb.org/display/DOCS/Database+Profiler.
Regarding the reslen metric, it says, "A large number of bytes
returned (hundreds of kilobytes or more) causes slow performance."
That prompted me to revisit the min(100 docs, 4MB) default, which
suggests that a batch size way under 4MB is better.
Thus, before doing more batch size tweaking and testing, I'd like to
know
1) After ensuring that indexes are used properly, can batch size be a
large factor in query performance?
2) Is it better to limit network round-trips or reslen?
3) Scott Hernandez commented on
http://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd8169d70c01e4/6803d3b8b51364fc,
saying, "It is also possible that you could optimize the batch sizes
used for your queries (cursor), depending on how large your documents
are, and how many you expect (less network round-trips are better)."
Could there be a magical ratio of total number of docs returned to
batch size? One of the nice things about my queries is that I can
very accurately predict how many docs will be returned, so adjusting
the batch size accordingly is feasible.