Using complex objects for _id => cheap "composite" index

693 views
Skip to first unread message

Thilo Planz

unread,
Aug 28, 2010, 6:13:37 AM8/28/10
to mongodb-user
Hi,

I am toying with a simple versioning scheme for keeping older
revisions of MongoDB documents.
I am moving old versions of documents to a shadow collection, and in
order to get an _id property
that is both unique and sorts nicely, I came up with the following

{ _id : { _id : "original document _id", _version : 123 } }

Apparently, I can use complex BSON objects as _id values, and
apparently they are sorted according to
binary representation, which in my case is perfect.

This essentially gives me a composite index for two properties just by
reusing the default _id index.

I could not find any documentation about this, though.
Is this supported / stable behaviour?


Thanks,

Thilo

=============

Example

After inserting into collection 'foo' a document { a=>"x"} , updating
it to { a: "y"}, then to { a: "z" }, and finally deleting it, the
shadow collection will contain the following:

> db.foo.vermongo.find()
{ "_id" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"), "_version" :
1 }, "a" : "x", "_version" : 1 }
{ "_id" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"), "_version" :
2 }, "a" : "y", "_version" : 2 }
{ "_id" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"), "_version" :
3 }, "a" : "z", "_version" : 3 }
{ "_id" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"), "_version" :
4 }, "_version" : "deleted:4" }


The most interesting aspect here is probably the structure of the _id
property of the shadow copies: It is a complex object itself, with two
properties, the original document _id and _version properties. Because
of the way BSON is encoded, you can do range queries against these
ids, similar to what you could do with a composite multi-key index (it
is essential that _id comes before _version, and that _version is
fixed length int32):

# get just revisions 2 to 3 for the document
> db.foo.vermongo.find( { _id : {
"$gt" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"),
"_version" : 1 } ,
"$lt" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"),
"_version" : 4 } } } )

{ "_id" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"), "_version" :
2 }, "a" : "y", "_version" : 2 }
{ "_id" : { "_id" : ObjectId("4c78da1ebd6f95e7fd95b6f2"), "_version" :
3 }, "a" : "z", "_version" : 3 }

roger

unread,
Aug 28, 2010, 10:14:37 AM8/28/10
to mongodb-user
I don't see any problems with this scheme.. it's similar to the id key
itself which is a concatenation of a (timestamp, machine, pid) and a
counter.

-Roger

Mike Dierken

unread,
Nov 26, 2010, 12:01:06 PM11/26/10
to mongod...@googlegroups.com
Thilo,
How has this versioning approach been working for you?
I am designing a system that also would need document revisions and
was looking at using either separate mongo documents, or a single
document with an array of sub-documents. I would prefer separate mongo
documents for revisions so I like your approach.

Mike

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Scott Hernandez

unread,
Nov 26, 2010, 12:45:12 PM11/26/10
to mongod...@googlegroups.com
It is both supported and stable. Many of the sharding/replication
internals use _id values that are embedded docs.

The only thing to watch out for here is the ordering of the keys in
embedded element. They are sorted by their binary representation so
{x:1, y:1} is different than {y:1, x:1}, and sorted differently. Not
only are they sorted differently, they are different values. Some
languages always sort the keys in a dictionary/hash/map by default.

On Sat, Aug 28, 2010 at 3:13 AM, Thilo Planz <thilo...@googlemail.com> wrote:

Reply all
Reply to author
Forward
0 new messages