pymongo/id/JSON serialization

1,470 views
Skip to first unread message

vkuznet

unread,
Nov 2, 2009, 11:49:53 AM11/2/09
to mongodb-user
Hi,
I"m trying to build web application where I want to pass mongo queries
as JSON dicts. Working with python I end-up with a problem that I
can't pass around query on _id, since pymongo accepts only
pymongo.obectid.ObjectId object. The problem is that I cannot
serialize it with JSON and send it over the wire. So, I need to keep
_id as a string, then in my application change it to ObjectId and pass
to pymongo APIs. It seems to me that it can be done internally in
pymongo, which need to make such conversion, since ObjectId is
internal to pymongo and doesn't belong to query syntax. What is
opinion on that? Is it a bug/feature?
Thanks,
Valentin.

Nicolas Clairon

unread,
Nov 2, 2009, 12:17:51 PM11/2/09
to mongod...@googlegroups.com
+1

ObjectID is painfull to work around. I created a `to_json` method in MongoKit
to handle this issue. But this can be completely done by pymongo driver.

Michael Dirolf

unread,
Nov 2, 2009, 1:46:42 PM11/2/09
to mongod...@googlegroups.com
I'm a bit confused what you are looking for here. The driver (and the
database) supports using strings (or any other type) as _id. So if you
want to use a string you should just save documents with string values
for _id. ObjectId is used only if no _id is provided.

We can't handle casting to ObjectId automatically in a reasonable way,
because there is no way to tell if you want to be using an actual
string or a string representing an ObjectId. Maybe I'm not
understanding the feature request correctly though?

vkuznet

unread,
Nov 2, 2009, 2:13:05 PM11/2/09
to mongodb-user
Hi Mike,
here is simple example

#!/usr/bin/env python

from pymongo.connection import Connection
from pymongo.objectid import ObjectId

import json

conn = Connection()
db = conn['das']
col = db['cache']

spec = {'_id':'4aeef071e2194e3794000007'}
print col.find_one(spec)

spec = {'_id':ObjectId('4aeef071e2194e3794000007')}
print col.find_one(spec)

obj = json.dumps(spec)

So, first print return nothing, if I pass just a string I don't get my
result. The second print returns my record, so far so good. While last
line fail with the following error:
TypeError: ObjectId('4aeef071e2194e3794000007') is not JSON
serializable

So the problem:
1. I can't look-up object just using id as a string, unless I'm
missing something.
2. I can't serialize my spec when using ObjectId, so I can't pass it
across application layers.

For the record, I'm using pymongo 1.0.

On Nov 2, 1:46 pm, Michael Dirolf <m...@10gen.com> wrote:
> I'm a bit confused what you are looking for here. The driver (and the
> database) supports using strings (or any other type) as _id. So if you
> want to use a string you should just save documents with string values
> for _id. ObjectId is used only if no _id is provided.
>
> We can't handle casting to ObjectId automatically in a reasonable way,
> because there is no way to tell if you want to be using an actual
> string or a string representing an ObjectId. Maybe I'm not
> understanding the feature request correctly though?
>
>
>
> On Mon, Nov 2, 2009 at 6:17 PM, Nicolas Clairon <clai...@gmail.com> wrote:
>
> > +1
>
> > ObjectID is painfull to work around. I created a `to_json` method in MongoKit
> > to handle this issue. But this can be completely done by pymongo driver.
>

Michael Dirolf

unread,
Nov 2, 2009, 2:16:23 PM11/2/09
to mongod...@googlegroups.com
Right. That is as expected. If you want to be able to query with
strings you should save your documents with strings for _id. One
option would be to save your documents with:
_id: str(ObjectId())

An alternative is to see if the JSON library allows custom
serialization/deserialization. Of course that will have the same
problem that I already suggested, namely that there is no way to tell
absolutely if you want to use a string that just "looks like" an
ObjectId or if you are using a string that should be encoded as an
ObjectId.

vkuznet

unread,
Nov 2, 2009, 2:24:48 PM11/2/09
to mongodb-user
So, if the only option is to save _id as str(ObjectId), why I need to
do it in my application. The big advantages, at least to me, that
Mongo assign unique id for my docs. It is great feature! If they need
to be stored as string I'll be glad to use that, since I can query
them as strings. The question is why should I care. It's better to be
done by my driver. If it's not default, I rather prefer to have option
in driver that I can enable this and forget about it.

From the JSON point of view, the ObjectId is unknown JSON type. So
having it in JSON is unnatural. If I want to work with my doc
elsewhere using generic JSON I don't need any specialization.

I'm trying to find arguments why users need to know about ObjectId and
documents (JSON) need to have special care?

Michael Dirolf

unread,
Nov 2, 2009, 2:30:00 PM11/2/09
to mongod...@googlegroups.com
The problem stems from the fact that MongoDB doesn't store JSON. It
stores BSON, which is a representation format that is more or less a
superset of standard JSON. One of the extension types is ObjectId.
There are other types that will give you the same issue, like datetime
instances or regex's for example. I really don't see a decent way for
the driver to handle these cases for you automatically in any way. If
you need to be able to convert to strict JSON your options are to:

1) avoid using non-strict JSON types (including ObjectId)
or
2) define your own strategy for deciding what is a regular string and
what is a string encoding of one of these special types. I really
don't think this makes sense for the driver to do automatically
because it will (IMO) lead to more confusing / obscure issues down the
road.

Thoughts?

Mathias Stearn

unread,
Nov 2, 2009, 2:40:19 PM11/2/09
to mongod...@googlegroups.com
We could provide a standard serializer to and from this format: http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON. However I think that may still cause issues if people want _id to be an atom rather than a nested object.

vkuznet

unread,
Nov 2, 2009, 2:52:20 PM11/2/09
to mongodb-user
Great, that what I was looking for. I was need to know the reason for
that. It would be nice to have full description of those types on a
Mongo web side. This will allow developers up-front to understand and
plan for them accordingly. You mentioned two so far, is there are any
others?

Back to the problem. The problem is the current default behavior. The
_id is added to my doc automatically, it is not what I put it into the
doc. Moreover it's added as a special type, which I neither instruct
to add, provided or define. As I said, having auto-id generation is a
great feature, but it does create document interoperability problem.
So in this regard, choice of string (universal) type is more
appealing. I'd like to here opinion on that. In my mind having option
to be JSON strict may be a good way to go, instead of special handling
of special types. For instance in insert/update api, I can provide a
flag to be JSON strict and if I store non-supported JSON formats, e.g.
datetime, it should raise an exception, because I ask for being JSON
strict. For all others, including internally generated, the default
type should be string in this case. Is it reasonable?

Michael Dirolf

unread,
Nov 2, 2009, 3:44:05 PM11/2/09
to mongod...@googlegroups.com
Perhaps it makes sense for the driver to have an option to add _id as
a string instead of an ObjectId instance. Could you make a jira for
this? I'd like to think about it just a bit more before going forward
with that though.

Andrew M

unread,
Nov 2, 2009, 4:40:31 PM11/2/09
to mongod...@googlegroups.com
Could we perhaps use
db.collection.find({"_id":{"$oid":"4aaf2393abccaf55470528ad"}})
as a way of handling this?

I'm basing this off http://www.mongodb.org/display/DOCS/Mongo+Extended+JSON

Maybe an option to return the object id like this too
db.collection.find().tojson()
{"_id" : {"$oid":"4aaf2393abccaf55470528ad"} ,.....

in fact if you do
tojson({"_id" : ObjectId( "4aaf2393abccaf55470528ad") .......});

it still returns
{"_id" : ObjectId( "4aaf2393abccaf55470528ad") .... }
Which is not valid json anyway.

Regards
Andrew Mee

Nicolas Clairon

unread,
Nov 2, 2009, 4:46:15 PM11/2/09
to mongod...@googlegroups.com
Coming after the war :-)

Personally, I was first so much frustrated with the ObjectID that I forced
MongoKit to created a string-like id. As I use a lot of copy-paste in
web development, ObjectID was slow down my work.

I'm still wonder what are the advantage of ObjectId over a simple
generated uid string
(well, aside from the size of the field).

Michael Dirolf

unread,
Nov 10, 2009, 3:30:02 PM11/10/09
to mongod...@googlegroups.com
I think this is probably the best solution. Updated the jira accordingly:
http://jira.mongodb.org/browse/PYTHON-69

Michael Dirolf

unread,
Nov 10, 2009, 6:06:29 PM11/10/09
to mongod...@googlegroups.com
Added some tools to make this work with Python's json module - in
pymongo.json_util

See test/test_json_util.py for some examples of how to use it.

- Mike
Reply all
Reply to author
Forward
0 new messages