Hello Charlie,
To get the total size, you could add up the document sizes using $where or Map Reduce and the method, Object.bsonsize(doc).
These solutions, however, rely on Javascript and may not be very efficient. If you can restructure your documents, a better solution may be to store each document's size with the document. The Java method encode() takes a BSONObject and returns a byte array (
http://api.mongodb.org/java/current/), which you could use prior to insertion to calculate, and subsequently store, doc size. The advantage of this strategy is that you can get total size using the aggregation framework (included in v. 2.2), which is written in C++ and may have much better performance:
Here is another link that you may find useful:
As for your second question about index size, you're correct- the size of the index entry is approximately the key size plus some overhead.
On Friday, October 5, 2012 4:30:41 PM UTC-4, Charlie Mason wrote:
Hi All,
I would like to calculate the size of particular documents in a mongo collection. Since the collection will contain many documents created by different users I would like to know how large each users documents are. I would ideally like to avoid having a collection per user, as ultimately it may be shared as a few users may exceed the capacity of one mongod node.
Ideally I would like to calculate the total of all docs that match a particular query. I would like this to be as efficient as possible but it could be done as a batch job if it might to a few mins to perform. I appreciated that the size on disk will be larger because of padding and compaction, its the size of the data itself I am after.
If it can't be done at the DB level can I do it when I write to the collection via the Java Driver and Morphia? Is there some java code that I can use with Morphia to get the size of the documents data?
I would also like to calculate the size of any indexes on fields in a document. Is there any way to estimate how much space they will consume. Is it just a case of storing the fields value a second time, plus some constant overhead presumably?
It doesn't matter if any of the these calculations are off by a byte or two, I just want to be able to calculate rough levels of usage.
Thanks,
Charlie M