Bson to json decoding in python

3,134 views
Skip to first unread message

Mohit Singh

unread,
Aug 28, 2013, 4:24:36 PM8/28/13
to mongod...@googlegroups.com
Hi,

 I have bson formatted data saved in file.

Now I want to convert that data to json?

So I tried this
s = '\x16\x00\x00\x00\x02hello\x00\x06\x00\x00\x00world\x00\x00'
bson_obj = BSON(s)
bson_obj.decode()

This works.. btu when I try on my data
s = """'\x93\x01\x00\x00\x02_id\x00\x1a\x00\x00\x00auromotiveengineering.com\x00\x04name_servers\x00_\x00\x00\x00\x020\x00\x17\x00\x00\x00ns-2.activatedhost.com\x00\x021\x00\x17\x00\x00\x00ns-1.activatedhost.com\x00\x022\x00\x17\x00\x00\x00ns-3.activatedhost.com\x00\x00\nreputation\x00\x04categories\x00\x05\x00\x00\x00\x00\x03host_act\x00\xd7\x00\x00\x00\x03bnMtMi5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMy5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x03bnMtMS5hY3RpdmF0ZWRob3N0LmNvbQ==\x00$\x00\x00\x00\x10seen_first\x00\x00,\xe7F\x10seen_last\x00\x80 \xebF\x00\x00\x00'"""
 
Then i get this error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/bson/__init__.py", line 593, in decode
    (document, _) = _bson_to_dict(self, as_class, tz_aware, uuid_subtype)
bson.errors.InvalidBSON: objsize too large

How do i resovle this?
Thanks

Bernie Hackett

unread,
Aug 28, 2013, 4:42:49 PM8/28/13
to mongod...@googlegroups.com
What did you use to generate that BSON encoded string?


--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Mohit Singh

unread,
Aug 28, 2013, 4:47:02 PM8/28/13
to mongod...@googlegroups.com
These are bson dump I have already got from somewhere.. But I want to upload all these dump files to hdfs and then process it.. but right now.. i have carved out a chunk from those files and am trying my hand to read that file.. in python but with no luck?

Bernie Hackett

unread,
Aug 28, 2013, 4:51:45 PM8/28/13
to mongod...@googlegroups.com
i have carved out a chunk from those files

How did you do that? It would appear that the string that you are trying to decode is not valid BSON.

Bernie Hackett

unread,
Aug 28, 2013, 4:53:03 PM8/28/13
to mongod...@googlegroups.com
Oh, I think I understand. Can you show the code you are using to read the fine? Maybe put it in a gist I can look at?

Bernie Hackett

unread,
Aug 28, 2013, 4:53:22 PM8/28/13
to mongod...@googlegroups.com
That should have read "...to read the file."

Mohit Singh

unread,
Aug 28, 2013, 5:05:53 PM8/28/13
to mongod...@googlegroups.com
Its a very small code actually:


and how i got that sample file was just 
head -100 full_dump > sample

Bernie Hackett

unread,
Aug 28, 2013, 5:44:47 PM8/28/13
to mongod...@googlegroups.com
To decode the entire dump file you can just use bson.decode_all():

>>> f.close()
>>> f = open('/path/to/bar.bson')
>>> bs = f.read()
>>> docs = bson.decode_all(bs)
>>> len(docs)
100
>>> docs[0]
{u'i': 0.0, u'_id': ObjectId('521e6ad0221b5cd6b707ab9d')}
>>> f.close()

If you want to incrementally decode the file you have to do a bit more work. There is some code here that can help you out:


Mohit Singh

unread,
Aug 28, 2013, 5:52:00 PM8/28/13
to mongod...@googlegroups.com
Hi,
 I am getting the same error again.
I have pasted a sample data stream over there?
can you take a look into that?
Thanks

Bernie Hackett

unread,
Aug 28, 2013, 5:59:26 PM8/28/13
to mongod...@googlegroups.com
If those code examples don't work your data is corrupt. Can mongorestore read it?

Mohit Singh

unread,
Aug 28, 2013, 6:01:06 PM8/28/13
to mongod...@googlegroups.com
Yes.. mongorestore reads them just fine.

Bernie Hackett

unread,
Aug 28, 2013, 6:07:38 PM8/28/13
to mongod...@googlegroups.com
Can you email me your bson file directly (assuming it's not too big)? 

Mohit Singh

unread,
Aug 28, 2013, 6:18:44 PM8/28/13
to mongod...@googlegroups.com
Did you got the email???

Bernie Hackett

unread,
Aug 28, 2013, 6:51:16 PM8/28/13
to mongod...@googlegroups.com
For the sake of anyone reading this thread, the problem is the bson file provided was truncated using sed. .bson files are a stream, not one document per line. To only read the first 5 documents out of the file you can use pymongo_hadoop like so:

>>> from pymongo_hadoop.input import BSONInput
>>> f = open('/path/to/dumpfile.bson')
>>> bs = BSONInput(f)
>>> docs = [bs.read() for _ in xrange(5)]
>>> len(docs)
5
>>> bs.close()


 
Reply all
Reply to author
Forward
0 new messages