Real world document size vs performance

Dave South

unread,

Jul 30, 2010, 8:09:25 PM7/30/10

to Mongoid

I've read Kyle Banker's suggestions for Mongoid on the MongoDB
website:

http://www.mongodb.org/display/DOCS/Using+Mongoid

He said that documents larger than 100KB become a problem with Mongo.
However, Durran has said in some of his writings that they wanted to
work well with documents over 500KB.

Banker's comments carry a huge amount of weight. But Durran has used
MongoDB in a large deployment. Is 100KB a serious problem? Does
pushing the doc size to 500KB really slow things down?

What about pushing a bit beyond 100KB to something like 150KB or
200KB?

Is it simply a RAM issue or does it bog down MongoDB?

What about that 4MB doc limit. Is that just an illusion like the 64TB
of RAM my 64-bit OS can theoretically address?

Durran Jordan

unread,

Jul 30, 2010, 9:14:37 PM7/30/10

to mon...@googlegroups.com

Well first of all Kyle will know what he is talking about with respect
to anything MongoDB more so than I, so if he says 500k is a limit to
try and not go over, then you probably shouldn't.

That being said, we've worked with docs over 500k in size and not seen
performance implications. But our metrics were always against MySQL or
Postgre - not against a better tuned MongoDB.

I'm more than happy to run metrics, but kind of bogged down with
getting Mongoid tip-top for rc then official release, so for right now
I would stick with the 500k limit, especially since Kyle will beat me
up if I tell you to go over. :)

Kyle Banker

unread,

Jul 30, 2010, 9:38:09 PM7/30/10

to mon...@googlegroups.com

The only thing I would say is to prototype for your use case.

There are a few things to keep in mind regarding large documents (e.g., > 500k):

1. If you're doing a full-document, replace-style, update, that entire 500k needs to be serialized and sent across the wire. This could get expensive on an update-heavy deployment.

2. Same goes for queries. If you're pulling back 500k at a time, that has to go across the network and be deserialized on the driver side.

3. While most atomic updates happen in-place, the document usually has to be rewritten in-place on the server, as this is dictated by the BSON format. If you're doing lots of $push operations on a very large document, that document will have to be rewritten server-side, which, again, on a heavy deployment, could get expensive.

4. If an inner-document is frequently manipulated on its own, it can be less computationally expensive both client-side and server-side simply to store that "many" relationship in its own collection. It's also frequently easier to manipulate the "many" side of a relationship when it's in its own collection.

If going embedded all the way works for your use case, then there's probably no problem with it. But with these extra-large documents, and a heavy load, you may start to see consequences in terms of performance and/or manipilability.

What'd I'd recommend to Dave, and I think Durran would agree, is to do some serious prototyping and benchmarking, and let the results of that investigation determine his course.

Dave South

unread,

Jul 30, 2010, 11:20:59 PM7/30/10

to Mongoid

Thank you for the excellent suggestions and clarifications. Our use-
case is to replace a large, hairy join of SQL data. I suspect this is
where Durran sees the real benefit of large documents compared to
pulling together dozens of tables.

We will work on a few prototype ideas and see how it performs.

It would be extremely interesting to see the metrics of Durran's real-
world performance. BUT I would never want to slow development on
Mongoid. It's amazing how far it's come and how well it works. You
really are making a difference in the community. Honestly, I think
that MongoDB and Mongoid are the next big thing in the Rails
community.

Thank you Durran, Jacques, Kyle and everyone working on this amazing
project.

Reply all

Reply to author

Forward