MongoDB memory usage and working set

Tim

unread,

Jan 7, 2011, 4:59:20 PM1/7/11

to mongodb-user

Hey all,

Our system is not very active at the moment only about about an hour
of each day when we have a lot of write activity and a map/reduce job
that runs after that. We are seeing memory usage on the system hover
around 180MB throughout the day and jump to just under 500MB during
this hour of activity. I'm curious when/how Mongo loads the mapped
files into memory? We have a hair over 3GB of indexes but we are
never seeing the memory usage jump to that (I was assuming it would as
we are accounting for the working set size to be the full index size +
whatever objects we're querying regularly.)

If someone could elaborate on how the mapped files are managed that
would be extremely helpful for us. We're only dealing with a given
month of data at a time for most of our querying and the size of the
database is capped to a 4 month window of data. With this in mind, I
am starting to wonder if our working set size will be proportional to
that month of data we'll be regularly accessing (4months =~ 3.5GB
indexes so one month would be just under 1GB + the objects of that
month. Thus, the memory usage will hover between 1 & 2.5GB as the
amount of objects are consistent at the moment.)

Any input would be greatly appreciated, Tim

Eliot Horowitz

unread,

Jan 7, 2011, 6:48:09 PM1/7/11

to mongod...@googlegroups.com

Working set size is really the amount of data you look plus indexes you need.

Data is generally just the % that your'e looking at less unless its
small documents very fragmented.

For indexes, its the same thing, just what you look at is a bit different.

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Tim

unread,

Jan 8, 2011, 9:15:45 PM1/8/11

to mongodb-user

Thanks Elliot. I was hoping for more detailed explanation of how
Mongo managed the mapped files. We are operating under the assumption
that we need to have enough memory on each machine to hold all of the
indexes in RAM. It's starting to sound like this is a best practice
though? I am still looking for more details on the memory mapped
files as we're at 3GB of indexes with 4GB total RAM available and
never seeing more than 500MB of RAM used at any given time?

Thanks, Tim

Eliot Horowitz

unread,

Jan 8, 2011, 9:18:11 PM1/8/11

to mongod...@googlegroups.com

Not sure what you mean by how we manage them.
Basically the data is in the files. Mongo access what it needs.
The OS will look at what is being used, and if something isn't active,
won't be in ram.

So there are many cases where a fraction of an index is in ram, it is
definitely not a requirement that it all be in ram.
For some uses cases it is though.
So seeing 500mb active isn't surprising.

Would need to know more about your data and access patterns to get
more specific.

Tim

unread,

Jan 9, 2011, 12:37:32 AM1/9/11

to mongodb-user

Manage was a poor word choice. I wanted to know more about how Mongo
accessed the data and when it was released from RAM. I think I
understand now. I'll still provide details of our system though.

Our access patterns are restricted to one collection with a few
indexes. We run a set of map/reduce functions each night that
generate aggregations for this collection. We're typically accessing
between one day and one month of data reduced to the scope of a client/
store in those map/reduce functions. So we're usually reducing by
month, client, and store. Our average document size in that
collection is ~600 bytes. We store a little over 100k documents in
that collection per day. A client may have anywhere from 100 to a few
thousand documents for a given day. A store could vary similarly but
will usually be scoped to a given client.

That said, this is only temporary. Once we are "fully live" we'll be
writing between 100/200 documents per minute to this collection. Our
current plan is to direct all writes to the primary node and execute
our map/reduce functions on one of the secondary nodes in the replica
set. We're seeing (from your explanation and our server monitoring)
that this will work out fine in the case of the secondary node(s). We
are still unaware of what the memory usage will be on the primary node
though. We did a lot of rigorous benchmarking around the writes
initially but we didn't have the volume of data in the system that we
have now.

Thanks again, Tim

Reply all

Reply to author

Forward