Custom primary key - looking for best practices ?

2,034 views
Skip to first unread message

Gaurav Kumar

unread,
Mar 3, 2011, 5:09:57 PM3/3/11
to mongod...@googlegroups.com
I understand that "_id" field can be overwritten by client applications. Are there any best practices to be followed when overwriting? The reason I am asking is because, by default, these ids follow a predefined format. As described here - " In MongoDB, the preferred approach is to use Object IDs instead. Object IDs are more synergistic with sharding and distribution." In case I am using my own custom ID, say a GUID, is it possible that it might impact internals of MongoDb. Though I am currently not using Sharding and Replicas, I intend to use them in future.

Thanks,
GK

Bernie Hackett

unread,
Mar 3, 2011, 5:30:49 PM3/3/11
to mongodb-user
You can use your own "_id" (GUIDs are fine). The only rule is that it
has to be unique. If your dataset is going to be large and you are
usually querying on the most recently inserted documents then you also
want your "_id"s to be increasing. This can improve performance when
only part of your index is in memory.

On Mar 3, 10:09 pm, Gaurav Kumar <gauravphoe...@gmail.com> wrote:
> I understand that "_id" field can be overwritten by client applications. Are
> there any best practices to be followed when overwriting? The reason I am
> asking is because, by default, these ids follow a predefined format. As
> described here <http://www.mongodb.org/display/DOCS/Object+IDs> - " *In
> MongoDB, the preferred approach is to use Object IDs instead. Object IDs are
> more synergistic with sharding and distribution."* In case I am using my own

Keith Branton

unread,
Mar 3, 2011, 5:45:29 PM3/3/11
to mongod...@googlegroups.com
Object IDs are more synergistic with sharding and distribution.

Can somebody please explain this statement. In what way are Object IDs more "synergistic" with sharding than a sequence?

Bernie Hackett

unread,
Mar 3, 2011, 6:05:03 PM3/3/11
to mongodb-user
There are cases where "_id" makes a good shard key. See these docs on
choosing a shard key:

http://www.mongodb.org/display/DOCS/Choosing+a+Shard+Key

On Mar 3, 10:45 pm, Keith Branton <ke...@branton.co.uk> wrote:
> > *Object IDs are more synergistic with sharding and distribution.
> > *

Keith Branton

unread,
Mar 3, 2011, 6:39:38 PM3/3/11
to mongod...@googlegroups.com
@Bernie, I read the document you referenced, but see nothing in there that compares the "synergy" (or any other characteristic) of Object Ids to sequences in a sharding context.

Perhaps an example of when an Object Id is more synergistic (or more performant or more compact) than a sequence would help?

The only "synergy" I am aware of w.r.t. object ids is that they double up as an insert timestamp - but that doesn't appear to have any particular advantages over a sequence for sharding or distribution.

As far as I can tell both are ascending, so both will have similar characteristics if used as a shard key - i.e. all inserts will be limited to a single chunk at a time.

Additionally (and assuming I don't need/want an insert timestamp) if I use an int64 as an _id then each entry takes 4 bytes less than with Object Ids. Smaller data and smaller index generally mean smaller working set size and better performance.

Unless I'm missing something... hence the question.

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Bernie Hackett

unread,
Mar 3, 2011, 6:56:27 PM3/3/11
to mongodb-user
One thing to note is that the ObjectIDs are actually created by the
client. If you have multiple clients (potentially on multiple
machines) you will be less likely to have "_id" collisions using
ObjectIDs because of the structure of an ObjectID. In the context of
sharding specifically (ignoring all the other issues previously
mentioned in this thread) an ObjectID won't necessarily work better
than a sequence and obviously isn't more compact than an int64.

Keith Branton

unread,
Mar 3, 2011, 8:17:56 PM3/3/11
to mongod...@googlegroups.com
@Bernie,
 
One thing to note is that the ObjectIDs are actually created by the
client. If you have multiple clients (potentially on multiple
machines) you will be less likely to have "_id" collisions using
ObjectIDs because of the structure of an ObjectID.

But with a correctly implemented sequence collisions are not "less likely" - they are impossible.
 
In the context of
sharding specifically (ignoring all the other issues previously
mentioned in this thread) an ObjectID won't necessarily work better
than a sequence and obviously isn't more compact than an int64.


It sounds like you are saying that http://www.mongodb.org/display/DOCS/Object+IDs section...

Sequence Numbers

Traditional databases often use monotonically increasing sequence numbers for primary keys. In MongoDB, the preferred approach is to use Object IDs instead. Object IDs are more synergistic with sharding and distribution.

...is inaccurate/misleading because Object IDs are not more synergistic than sequences with sharding and distribution.

I appreciate the clarification.

Thanks, Keith.

Jared Rosoff

unread,
Mar 3, 2011, 11:22:07 PM3/3/11
to mongodb-user
The problem with sequence numbers is that you need a central authority
to hand them out whereas ObjectIDs can be generated independently
without central coordination.

Using sequence numbers in a sharded / distributed environment is not
scalable because you require a single central node to hand out those
sequence numbers. At some point, you'll reach the limit of that node
to hand out ID's. Since ObjectIDs are generated by clients without
coordination, there is no bottleneck as you scale your system up.

The default implementation of ObjectID is a very efficient mechanism
to generate globally unique IDs. You can replace the default
implementation with anything you like. For example, you could use
GUIDs easily http://en.wikipedia.org/wiki/Globally_unique_identifier
or any other identifier that is unique within your collection.

-j

Keith Branton

unread,
Mar 4, 2011, 12:53:51 AM3/4/11
to mongod...@googlegroups.com
Thanks Jared. That helps to put things in perspective.

May I suggest...

"Object IDs are more synergistic with sharding and distribution."

be replaced with some of your response - which is much more informative and useful. Maybe...

"Object IDs are efficiently generated by each client without coordination and are, therefore, much more scalable, and so work better with sharding and distribution. Sequence allocation requires coordination and as load increases this will eventually become an impediment to scaling"

...your call.

Thanks again,

Keith.

Gaurav Kumar

unread,
Mar 4, 2011, 1:17:31 AM3/4/11
to mongod...@googlegroups.com

+1. Jared's reply explains architectural aspects precisely.

Reply all
Reply to author
Forward
0 new messages