Hi,
after extensive profile I found the root of the problem and want to
share it here, since it does have some consequences I observe with
mongo.
So, I read large JSON documents, the test was done with 2 documents
each of ~180MB in size. Each JSON object has nested structures inside,
e.g. list of dicts, etc. The large memory footprint was observed not
in pymongo, but rather in JSON parsing part, where a lot of memory
allocation was done to create such big objects in memory. Once this
was identified I switched to XML format for my documents and read then
using iterparse method (from ElementTree) who accept .read() file-like
object) from urllib2.urlopen which returns socket._fileobject. So,
this reduce memory to size of the object I was reading, e.g. from
1.5GB to 300MB/object in my python application.
Now, when I monitor memory usage for my python application it looks
reasonable, but what I observed is that mongod daemon accumulates
memory and not releasing it. To be concrete, once I done inserting
data into db from those two documents, I see that mongod stays using
780MB of RAM even when my python application quit. I understand that
it's going to re-use it for subsequent calls, but it's really worry
me since if I'll interact quite often with mognod its RAM usage will
grow over time. Can someone clarify situation with that? For the
record, I used 64-bit Linux node to run those tests and mongo
1.0.0/
1.1.3.
Thank you,
Valentin