Searching for the optimal batchSize

3,412 views
Skip to first unread message

Allen

unread,
Dec 16, 2010, 2:20:55 PM12/16/10
to mongodb-user
I have a large collection (~165 million docs) with an index that my
app continuously queries against. My queries are performing fairly
well, but there's one variable that I think could use some tuning:
cursor batch size.

According to one of Kristina's comments on
http://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd8169d70c01e4/6803d3b8b51364fc,
the default batch size is min(100 docs, 4MB). My docs are small (~156
bytes each), but some queries can return over 70,000 of them.
Assuming that fewer network round-trips is better, I upped my batch
size to 70,000, a little over 10MB per batch. Subsequent testing
revealed a decrease in performance.

Then, I read http://www.mongodb.org/display/DOCS/Database+Profiler.
Regarding the reslen metric, it says, "A large number of bytes
returned (hundreds of kilobytes or more) causes slow performance."
That prompted me to revisit the min(100 docs, 4MB) default, which
suggests that a batch size way under 4MB is better.

Thus, before doing more batch size tweaking and testing, I'd like to
know
1) After ensuring that indexes are used properly, can batch size be a
large factor in query performance?
2) Is it better to limit network round-trips or reslen?
3) Scott Hernandez commented on
http://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd8169d70c01e4/6803d3b8b51364fc,
saying, "It is also possible that you could optimize the batch sizes
used for your queries (cursor), depending on how large your documents
are, and how many you expect (less network round-trips are better)."
Could there be a magical ratio of total number of docs returned to
batch size? One of the nice things about my queries is that I can
very accurately predict how many docs will be returned, so adjusting
the batch size accordingly is feasible.

Allen

unread,
Dec 16, 2010, 2:22:53 PM12/16/10
to mongodb-user
Probably should have mentioned that I'm using mongo-2.3.jar with
v1.7.3 of the database...

dwight_10gen

unread,
Dec 16, 2010, 8:36:30 PM12/16/10
to mongodb-user
in the c++ driver you could use Exhaust mode for a query if you know
you want everything.

client/dbclient.h
/** Stream the data down full blast in multiple "more"
packages, on the assumption that the client
will fully read all data queried. Faster when you are
pulling a lot of data and know you want to
pull it all down. Note: it is not allowed to not read all
the data unless you close the connection.

Use the query( boost::function<void(const BSONObj&)>
f, ... ) version of the connection's query()
method, and it will take care of all the details for you.
*/
QueryOption_Exhaust = 1 << 6,

same can be theoretically done in any driver although most don't
support it yet.



On Dec 16, 2:20 pm, Allen <allendgilb...@gmail.com> wrote:
> I have a large collection (~165 million docs) with an index that my
> app continuously queries against.  My queries are performing fairly
> well, but there's one variable that I think could use some tuning:
> cursor batch size.
>
> According to one of Kristina's comments onhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd...,
> the default batch size is min(100 docs, 4MB).  My docs are small (~156
> bytes each), but some queries can return over 70,000 of them.
> Assuming that fewer network round-trips is better, I upped my batch
> size to 70,000, a little over 10MB per batch.  Subsequent testing
> revealed a decrease in performance.
>
> Then, I readhttp://www.mongodb.org/display/DOCS/Database+Profiler.
> Regarding the reslen metric, it says, "A large number of bytes
> returned (hundreds of kilobytes or more) causes slow performance."
> That prompted me to revisit the min(100 docs, 4MB) default, which
> suggests that a batch size way under 4MB is better.
>
> Thus, before doing more batch size tweaking and testing, I'd like to
> know
> 1) After ensuring that indexes are used properly, can batch size be a
> large factor in query performance?
> 2) Is it better to limit network round-trips or reslen?
> 3) Scott Hernandez commented onhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd...,

dwight_10gen

unread,
Dec 16, 2010, 8:36:47 PM12/16/10
to mongodb-user
what is your latency between client and server in your actual
deployment?


On Dec 16, 2:20 pm, Allen <allendgilb...@gmail.com> wrote:
> I have a large collection (~165 million docs) with an index that my
> app continuously queries against.  My queries are performing fairly
> well, but there's one variable that I think could use some tuning:
> cursor batch size.
>
> According to one of Kristina's comments onhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd...,
> the default batch size is min(100 docs, 4MB).  My docs are small (~156
> bytes each), but some queries can return over 70,000 of them.
> Assuming that fewer network round-trips is better, I upped my batch
> size to 70,000, a little over 10MB per batch.  Subsequent testing
> revealed a decrease in performance.
>
> Then, I readhttp://www.mongodb.org/display/DOCS/Database+Profiler.
> Regarding the reslen metric, it says, "A large number of bytes
> returned (hundreds of kilobytes or more) causes slow performance."
> That prompted me to revisit the min(100 docs, 4MB) default, which
> suggests that a batch size way under 4MB is better.
>
> Thus, before doing more batch size tweaking and testing, I'd like to
> know
> 1) After ensuring that indexes are used properly, can batch size be a
> large factor in query performance?
> 2) Is it better to limit network round-trips or reslen?
> 3) Scott Hernandez commented onhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd...,

dwight_10gen

unread,
Dec 16, 2010, 8:37:35 PM12/16/10
to mongodb-user
what sort of query were you doing? a bulk db.collection.find() of all
records, or with a condition? how many records and bytes per second
were you achieving?


On Dec 16, 2:20 pm, Allen <allendgilb...@gmail.com> wrote:
> I have a large collection (~165 million docs) with an index that my
> app continuously queries against.  My queries are performing fairly
> well, but there's one variable that I think could use some tuning:
> cursor batch size.
>
> According to one of Kristina's comments onhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd...,
> the default batch size is min(100 docs, 4MB).  My docs are small (~156
> bytes each), but some queries can return over 70,000 of them.
> Assuming that fewer network round-trips is better, I upped my batch
> size to 70,000, a little over 10MB per batch.  Subsequent testing
> revealed a decrease in performance.
>
> Then, I readhttp://www.mongodb.org/display/DOCS/Database+Profiler.
> Regarding the reslen metric, it says, "A large number of bytes
> returned (hundreds of kilobytes or more) causes slow performance."
> That prompted me to revisit the min(100 docs, 4MB) default, which
> suggests that a batch size way under 4MB is better.
>
> Thus, before doing more batch size tweaking and testing, I'd like to
> know
> 1) After ensuring that indexes are used properly, can batch size be a
> large factor in query performance?
> 2) Is it better to limit network round-trips or reslen?
> 3) Scott Hernandez commented onhttp://groups.google.com/group/mongodb-user/browse_thread/thread/f4fd...,

Scott Hernandez

unread,
Dec 16, 2010, 9:27:27 PM12/16/10
to mongod...@googlegroups.com
Most driver also have a toArray/List method on the cursor which will
do the same.

Javascript shell:
db.coll.find().limit(10).toArray()

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Allen

unread,
Jan 4, 2011, 11:23:52 AM1/4/11
to mongodb-user
Should be pretty low...communication is between two AWS EC2 instances
running in the same availability zone.

Allen

unread,
Jan 4, 2011, 11:32:54 AM1/4/11
to mongodb-user
I'm querying with a condition, and I'm averaging about 5900 records
per second (~920,400 bytes/sec).

By the way, sorry for the super-slow responses...I got caught up in
some other things. Thanks in advance for your help!

Eliot Horowitz

unread,
Jan 4, 2011, 11:51:24 AM1/4/11
to mongod...@googlegroups.com
I think the defaults are probably going to work well for you.
You should also look at the 2.4 version of the java driver as there
some bson deserialization speed improvements.

Reply all
Reply to author
Forward
0 new messages