How to calculate the storageSize?

441 views
Skip to first unread message

KCOtzen

unread,
Dec 23, 2011, 3:58:04 PM12/23/11
to mongodb-user
I'm using MongoDB and we are really happy with this DB. But recently
our client asked us for the database size in the future.

We know how to calculate this in a typical relational database, but we
don't have a long experience in production with this No-SQL database.

Things that we know:

db.namecollections.stats() give us important information like,
size(documents),avgObjSize(documents), storageSize, totalIndexSize
(more here)

With the "size" and "totalIndexSize" we can calculate the total size
for the collection only, but the big question here is:

Why is there a difference between collection size and
storageSize???

How can one calculate this storageSize, thinking in the future
database size?

Tyler Brock

unread,
Dec 23, 2011, 5:01:48 PM12/23/11
to mongodb-user

Madhava

unread,
Dec 24, 2011, 12:36:41 PM12/24/11
to mongodb-user
It also depends on if you plan to do sharding or replication or both.
Do you plan to add indexes in the future ?
How fast is the application filling data in to the system (mongodb) ?
It depends on quite a number of things.

The estimates that Tyler has recommended are quite good enough to
start with.
Make sure that you also have some automated alerts when the thresholds
are reached.
This way you could proactively take action on increasing the size or
think about sharding or etc..




On Dec 23, 3:58 pm, KCOtzen <castudi...@gmail.com> wrote:

KCOtzen

unread,
Dec 26, 2011, 12:31:28 PM12/26/11
to mongodb-user
On 24 dic, 14:36, Madhava <madhava.gullapa...@gmail.com> wrote:
> It also depends on if you plan to do sharding or replication or both.
> Do you plan to add indexes in the future ?
> How fast is the application filling data in to the system (mongodb) ?
> It depends on quite a number of things.
>

I'll use replication and sharding in the future, currently there is
just one node

> The estimates that Tyler has recommended are quite good enough to
> start with.

You mean "an insane upper bound and then quadruple it" ? Yes i agree
too

> Make sure that you also have some automated alerts when the thresholds
> are reached.
> This way you could proactively take action on increasing the size or
> think about sharding or etc..

Thanks for your tips...i'll keep it in mind

But my question is still more basic(well, i think so, maybe
not)...What is the calculate that mongo DB use for allocating a
storageSize(value from .stats()) look at this values

I created just one document, and storageSize is 4096
> db.bitacora3.stats()

{
"ns" : "mc.bitacora3",
"count" : 1,
"size" : 36,
"avgObjSize" : 36,
"storageSize" : 4096,
"numExtents" : 1,
"nindexes" : 1,
"lastExtentSize" : 4096,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 8176,
"indexSizes" : {
"_id_" : 8176
},
"ok" : 1

}

Then I droped the database, i created the database again and i created
a new collection with just one document BUT with an index

> db.dropDatabase()

{ "dropped" : "mc", "ok" : 1 }
> use mc

switched to db mc

This is the stats result:

> db.bitacora3.stats()

{
"ns" : "mc.bitacora3",
"count" : 1,
"size" : 36,
"avgObjSize" : 36,
"storageSize" : 8192,
"numExtents" : 1,
"nindexes" : 2,
"lastExtentSize" : 8192,
"paddingFactor" : 1,
"flags" : 1,
"totalIndexSize" : 16352,
"indexSizes" : {
"_id_" : 8176,
"name_1" : 8176
},
"ok" : 1

}

As can you see the storeSize now is 8192. The Document data is the
same but i added an index.

*Can you explain me this behavior???
*this storageSize is calculated from collection size + Totalindexes?
from pageSize?

sorry for asking this(storageSize) so many times but it's really
confusing for me. I could see a lot of people asking about storage
size, but i couldn´t find a response. I need to know how this value is
calculate for creating an estimate

Thanks Madhava and thanks Tyler

>
> On Dec 23, 3:58 pm, KCOtzen <castudi...@gmail.com> wrote:
>
>
>
>
>
>
>
> > I'm using MongoDB and we are really happy with this DB. But recently
> > our client asked us for the database size in the future.
>
> > We know how tocalculatethis in a typical relational database, but we
> > don't have a long experience in production with this No-SQL database.
>
> > Things that we know:
>
> >     db.namecollections.stats() give us important information like,
> > size(documents),avgObjSize(documents),storageSize, totalIndexSize
> > (more here)
>
> > With the "size" and "totalIndexSize" we cancalculatethe total size
> > for the collection only, but the big question here is:
>
> >     Why is there a difference between collection size and
> >storageSize???
>
> > How can onecalculatethisstorageSize, thinking in the future
> > database size?

Madhava

unread,
Dec 26, 2011, 9:04:59 PM12/26/11
to mongodb-user
Please take a look at these links -
Storage space is what mongo has created including the unused space.

http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-CheckingSizeofaCollection

http://groups.google.com/group/mongodb-user/browse_thread/thread/761c55c8097dd85d
See Kristina's Response.

Probably you should start using the totalSize() as mentioned in the
first link and
then consider the fact about
1) padding factor and
2) that the deleted documents do not give up space.
Add about 2GB or multiples of 2GB as a means of handling the above 2
scenarios.

This is how I would be doing.

May be someone from 10Gen can post a response to see if this is what
they would recommend or have any better suggestions.
Please do post back on what's the final approach that you had taken to
estimate the sizes.

Thanks,
Reply all
Reply to author
Forward
0 new messages