memory footprint w/ pymongo

vkuznet

unread,

Nov 10, 2009, 4:31:13 PM11/10/09

to mongodb-user

Hi,
I'm trying to analyze memory footprint of my application w/ pymongo
driver and monitor its memory usage. I've noticed that by the time I
start injecting docs into database, the memory goes up quite
significantly. I have around 300K json docs, which are injected as 5K
bulk insert and memory jump to 1GB from 100MB. I didn't yet apply any
memory profilers, unfortunately there are only few for python, but I
did benchmark this with many iteration and quite certain that jump in
memory happens during injection. Moreover I did change the bulk size
of input list from 1K to 5K and still see identical memory footprint.

I would take any advise and comments, since I obviously not sure which
part of application consume that much. But at the same time I want to
post this on a list and ask people if they notice similar behavior.

For the reference, I'm using linux 64-bit, pymongo 1.0, mongodb 1.0.0.

Thank you,
Valentin.

Michael Dirolf

unread,

Nov 10, 2009, 4:32:31 PM11/10/09

to mongod...@googlegroups.com

How are you passing the docs to insert()? As a list or an iterator of some kind?

vkuznet

unread,

Nov 10, 2009, 4:57:30 PM11/10/09

to mongodb-user

I tried both w/ the same results. But for completeness, here is
snapshot when I used iterators

gen = self.update_records(query, results, header)
while True:
if not self.col.insert(itertools.islice(gen,
self.cache_size)):
break

where update_records just yield dicts and input parameter results is
also generator object.

And usage of list is obvious, I re-use local cache which is filled
with N (of cache_size, e.g. 5K) objects and pass it to insert,
something like
local_cache = []
while True:
... code to get row from input results generator object ....
local_cache.append(row)
if len(local_cache) == self.cache_size:
self.col.insert(local_cache)
local_cache = []

Michael Dirolf

unread,

Nov 10, 2009, 5:59:37 PM11/10/09

to mongod...@googlegroups.com

hmm.. if you do some profiling and think that you are seeing
unnecessary memory usage by pymongo let us know. insert pretty much
just builds the BSON string representing the doc(s) and sends it. so
shouldn't be too much overhead there.

vkuznet

unread,

Nov 16, 2009, 11:58:43 AM11/16/09

to mongodb-user

Hi,
after extensive profile I found the root of the problem and want to
share it here, since it does have some consequences I observe with
mongo.

So, I read large JSON documents, the test was done with 2 documents
each of ~180MB in size. Each JSON object has nested structures inside,
e.g. list of dicts, etc. The large memory footprint was observed not
in pymongo, but rather in JSON parsing part, where a lot of memory
allocation was done to create such big objects in memory. Once this
was identified I switched to XML format for my documents and read then
using iterparse method (from ElementTree) who accept .read() file-like
object) from urllib2.urlopen which returns socket._fileobject. So,
this reduce memory to size of the object I was reading, e.g. from
1.5GB to 300MB/object in my python application.

Now, when I monitor memory usage for my python application it looks
reasonable, but what I observed is that mongod daemon accumulates
memory and not releasing it. To be concrete, once I done inserting
data into db from those two documents, I see that mongod stays using
780MB of RAM even when my python application quit. I understand that
it's going to re-use it for subsequent calls, but it's really worry
me since if I'll interact quite often with mognod its RAM usage will
grow over time. Can someone clarify situation with that? For the
record, I used 64-bit Linux node to run those tests and mongo
1.0.0/1.1.3.

Thank you,
Valentin

Mathias Stearn

unread,

Nov 16, 2009, 12:15:57 PM11/16/09

to mongod...@googlegroups.com

Mongod uses MMAPed files so what looks like a memory leak is just the mapping of your data files into mongod's address space. Nothing to worry about.

Reply all

Reply to author

Forward