Re: [mongodb-user] Reading large (2.5GB) bson files with pymongo eats up 30GB+ ram

1,063 views
Skip to first unread message

Bernie Hackett

unread,
Oct 17, 2012, 9:47:38 PM10/17/12
to mongod...@googlegroups.com
I'm a little surprised to hear that PyMongo would use 30+ GB of ram to
decode but mongorestore isn't a very good comparison. mongorestore
reads each document and inserts it into the database. By comparison,
your python code is reading the entire file into a string, passing
that entire string to decode_all, which then has to create dictionary
objects for all of the documents in the file, returning the entire
file as a list of dictionaries. We haven't even gotten to inserting
the documents into MongoDB yet. That's never going to use memory
efficiently.

On Wed, Oct 17, 2012 at 6:28 PM, Matthias Lee <matthia...@gmail.com> wrote:
> Hello there,
>
> Ive been using pymongo for a while and have read a few smaller bson files,
> but today I was trying to convert a large bson file to json. (contains no
> binary data)
> Every way I tried reading and decoding resulted in me maxing out my RAM at
> 32GB.
>
> If there a more efficient way of reading/decoding bson that this:
> import bson
> f = open("bigBson,bson", 'rb')
> result = bson.decode_all(f.read())
>
> perhaps it can be decoded incrementally?
>
> In comparison, using mongorestore to load the same file barely increased my
> memory usage.
>
> Thanks,
>
> Matthias
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Bernie Hackett

unread,
Oct 17, 2012, 9:55:47 PM10/17/12
to mongod...@googlegroups.com
There is some code in the mongo-hadoop connector that can help you with this:

https://github.com/mongodb/mongo-hadoop/blob/master/streaming/language_support/python/pymongo_hadoop/input.py#L7-50

Also, make sure you are using the C extensions for PyMongo. You can
check like this:

python -c 'import pymongo; print pymongo.has_c()'

Matthias Lee

unread,
Oct 19, 2012, 10:55:24 AM10/19/12
to mongod...@googlegroups.com
Thanks, I will have a look at the hadoop connector.

I did check, and I do have the C extension.
Reply all
Reply to author
Forward
0 new messages