How can i use my own "_id" ?

750 views
Skip to first unread message

jignesh patel

unread,
Dec 19, 2009, 2:31:22 AM12/19/09
to mongodb-user
Hi, i am newbie to mongodb. I have converted my relational db into
mongodb. Now my question is how can i use my own unique auto increment
ids instead of collection's "_id" field? Is this possible with php
driver ?

Thanks, Jignesh

Eliot Horowitz

unread,
Dec 19, 2009, 8:05:44 AM12/19/09
to mongod...@googlegroups.com
You can add whatever _id field you want to the object you save.
In terms of doing auto-increment, the reason we don't do it is that
its hard to do with sharding.
If you want to do it, you'll need to figure out your own way to keep
that consistent.

> --
>
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>
>

sdotsen

unread,
Dec 19, 2009, 10:04:30 PM12/19/09
to mongodb-user
I'm looking for the exact same thing.

Has anyone figured out a way to make the _id consistent and unique
using a custom solution that is.

On Dec 19, 8:05 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> You can add whatever _id field you want to the object you save.
> In terms of doing auto-increment, the reason we don't do it is that
> its hard to do with sharding.
> If you want to do it, you'll need to figure out your own way to keep
> that consistent.
>
> On Sat, Dec 19, 2009 at 2:31 AM, jignesh patel
>

Eliot Horowitz

unread,
Dec 19, 2009, 10:14:26 PM12/19/09
to mongod...@googlegroups.com
why don't you want to use the auto-generated _ids?
they're a bit ugly, but it forces you to make urls nice anyway :)
and this way it will work with sharding, etc..

sdotsen

unread,
Dec 19, 2009, 10:23:37 PM12/19/09
to mongodb-user
isn't the auto-generated _ids like 20+ characters long? Am I missing
something?

At the moment, I can't think of why I would need a custom _id, but
let's take twitter for example.
Here's the URL of a tweet I sent to you. http://twitter.com/sdotsen/status/6847722531

If I wanted to do the same w/ my own app, how would I go about making
the tweet unique? Besides the username field, how would I
differentiate between all my tweets.
I understand one can use fancy URLs, but that could get out of hand.
Am I missing something w/ the _ids generated by mongodb?


On Dec 19, 10:14 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> why don't you want to use the auto-generated _ids?
> they're a bit ugly, but it forces you to make urls nice anyway :)
> and this way it will work with sharding, etc..
>

Eliot Horowitz

unread,
Dec 19, 2009, 10:47:01 PM12/19/09
to mongod...@googlegroups.com
they are 24 characters in hex- compared to the 11 of the twitter url.
if you base64 encode its 17 characters, and its sharding safe.

sdotsen

unread,
Dec 19, 2009, 11:13:42 PM12/19/09
to mongodb-user
wait, where did you get 17 characters from? what should I be encoding?


On Dec 19, 10:47 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> they are 24 characters in hex- compared to the 11 of the twitter url.
> if you base64 encode its 17 characters, and its sharding safe.
>

> On Sat, Dec 19, 2009 at 10:23 PM, sdotsen <samnang....@gmail.com> wrote:
> > isn't the auto-generated _ids like 20+ characters long? Am I missing
> > something?
>
> > At the moment, I can't think of why I would need a custom _id, but
> > let's take twitter for example.

> > Here's the URL of a tweet I sent to you.http://twitter.com/sdotsen/status/6847722531

Eliot Horowitz

unread,
Dec 19, 2009, 11:15:51 PM12/19/09
to mongod...@googlegroups.com
object ids are 12 bytes, so
2^(8*12)

if you encode in base64,
64^17 > 2^96

so that's one option for putting it in urls

Keith Branton

unread,
Dec 20, 2009, 12:00:24 AM12/20/09
to mongodb-user
Not sure I'd base 64 encode values in urls - it could get pretty
embarrasing when cuss words end up in them - suppose it depends on
your application ;)

I use sequential ids for my application - but I don't care if they are
rigidly chronological or if there are gaps in the sequences from time
to time.

I use an unsharded collection of sequences (one per collection), and
have a stored javascript function that increments and returns the
value for a given sequence. I call the function in a db.eval so it is
executed atomically.

It's actually slightly more complex than that. For performance reasons
each web server requests and caches a block of 100 ids at a time
whenever they run out. This pretty closely mimics the way Oracle Grid
sequences work (or so I've been told). Gaps in the sequence do arise
if servers restart - but that doesn't happen very often.

On Dec 19, 9:15 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> object ids are 12 bytes, so
> 2^(8*12)
>
> if you encode in base64,
> 64^17 > 2^96
>
> so that's one option for putting it in urls
>

Mathias Stearn

unread,
Dec 20, 2009, 12:58:40 AM12/20/09
to mongod...@googlegroups.com
Why not just assign unique prefixes and let each server generate IDs on its own? You could also use postfixes and synchronize the counters periodically if you want to maintain rough chronological order.

Sunny Hirai

unread,
Dec 20, 2009, 2:53:01 AM12/20/09
to mongodb-user
Hi Keith,

You can get a huge number of ids in 8 bytes without running into
embarrassing cuss words by using 0-9, a-z, A-Z but without the vowels
(technically I guess you can get "fcksht" but probably acceptable).
You then get 52^8 combinations which is about 53 trillion ids. With 7
letters, you can still get 1 trillion. 52 is because there are 21
letters without vowels, x2 for the capitalized versions and 10 more
numeric values.

These make pretty nice URLs

/forum/thread/b8xf23Nm

I'd recommending creating the IDs sequentially but make the actual id
in reverse order to improve sort times. This will give a nice
distribution on the first character used for sorting.

Reserving the ids 100 at a time is a good idea. I think I did mine
1000 at a time in an implementation for IDs in postgresql. There is a
good article on distributed ID generation here.

http://horicky.blogspot.com/2007/11/distributed-uuid-generation.html

Funny this got brought up because I was thinking of doing this for
MongoDB as well precisely because of the long URLs. 17 characters is
still pretty long. 7 is quite nice. 24 felt very long.

Note also that if they are sequential, there is no need to pad them so
then can start out small and they can grow to fill any any number of
characters as required. So your first 7 million ids will actually be
only 4 characters long.

Mathias, on the idea of assigning prefixes, the only problem I have
with that is coming up with a scheme to guarantee the prefixes are
unique. This probably means manually assigning them. I found grabbing
a pool of IDs was nice because it didn't matter whether I had 1 or
1000 servers and whether they the number of servers changed often or
never, the algorithm is exactly the same.

Sunny Hirai

Valentin Golev

unread,
Dec 20, 2009, 3:16:40 AM12/20/09
to mongod...@googlegroups.com
For my application (something like calendar of events), I generate URLs using algorithm:

1. Generate url /year/month/day/hour-minute-title, like /2010/01/01/00-00-New-Year-Celebration
2. Query by this url; if anything found, add -1 (like /2010/01/01/00-00-New-Year-Celebration-1)
3. Try incrementing this number until it become truly unique

Now I'm thinking about creating another collection of { url; dbref; } to be able to give an unique url to any entity in my application.
These urls are much nicer, I think. You can try something like this.

- Valentin Golev

Mathias Stearn

unread,
Dec 20, 2009, 7:18:52 AM12/20/09
to mongod...@googlegroups.com
@Valentin: That algorithm has a race condition. It is better to try to insert then call getlasterror (or use safe mode) and try a new id if it failed. Also, if possible, better to add a random offset rather than -1 since collisions will be less likely. See http://github.com/RedBeard0531/Mongurl/blob/master/mongurl.py#L37 for an example.

@Sunny: You could base the prefix on a number guaranteed to be unique such as the IP address (in most cases for servers). If you are reversing the ID anyway, why not just go fully random rather than sequential and minimize the need for synchronization between your servers?

Valentin Golev

unread,
Dec 20, 2009, 11:47:10 AM12/20/09
to mongod...@googlegroups.com
If we do it in eval, there is no race because of blocking the whole db.
- Valentin Golev

chx

unread,
Dec 20, 2009, 4:55:16 PM12/20/09
to mongodb-user
While the thread mostly concentrated on proper _id creation it must be
noted that with the PHP driver once you have a $_id you need to call
$_id= new MongoId($_id); and $_id must be 24 hexidecimal characters
as stated on http://php.net/manual/en/mongoid.construct.php

Regards

ChX

Sunny Hirai

unread,
Dec 20, 2009, 5:45:39 PM12/20/09
to mongodb-user
Hi Mathias,

The prefix could be based on a guaranteed unique number like the IP
address but then I'd have to give up the compactness of the ids. I'd
lose bytes storing the unique prefix. Also, multiple applications
running simultaneously on the same server (same IP address, different
applications) will need to be handled.

Going fully random increases the overhead as one would need to check
to see if the id already exists in the database before inserting.
Another benefit of using a sequential id is that you are guaranteed
uniqueness within the entire database instead of just within a
collection (compared to fully Random, this happens with Mongo's
current id generator though as well). This can eliminate certain types
of bugs and adds flexibility in building features.

As an example, let's say I had both blog posts and wiki page
collections. I write a bit of generic code that allows me to add
comments to either the posts or page collections using their id. I
could, if I wanted, use the same collection for both posts and pages
because the id of the post or page would be unique. Now if I needed to
do some sort of migration on the data, I would only have to deal with
one collection.

I think maybe one thing I forgot to mention about my use case though
is that I built all this into the data access library itself. I
probably wouldn't recommend somebody roll their own without putting it
into its own library. After it is built, using a globally unique id
generator is dead simple (something like db.get_id). It does have to
be built first though. :)

Sunny Hirai

On Dec 20, 4:18 am, Mathias Stearn <math...@10gen.com> wrote:
> @Valentin: That algorithm has a race condition. It is better to try to
> insert then call getlasterror (or use safe mode) and try a new id if it
> failed. Also, if possible, better to add a random offset rather than -1

> since collisions will be less likely. Seehttp://github.com/RedBeard0531/Mongurl/blob/master/mongurl.py#L37for an


> example.
>
> @Sunny: You could base the prefix on a number guaranteed to be unique such
> as the IP address (in most cases for servers). If you are reversing the ID
> anyway, why not just go fully random rather than sequential and minimize the
> need for synchronization between your servers?
>

> On Sun, Dec 20, 2009 at 3:16 AM, Valentin Golev <v.go...@gmail.com> wrote:
> > For my application (something like calendar of events), I generate URLs
> > using algorithm:
>
> > 1. Generate url /year/month/day/hour-minute-title, like
> > /2010/01/01/00-00-New-Year-Celebration
> > 2. Query by this url; if anything found, add -1
> > (like /2010/01/01/00-00-New-Year-Celebration-1)
> > 3. Try incrementing this number until it become truly unique
>
> > Now I'm thinking about creating another collection of { url; dbref; } to be
> > able to give an unique url to any entity in my application.
> > These urls are much nicer, I think. You can try something like this.
>
> > - Valentin Golev
>

> >> mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>


> >> .
> >> > > >> >> >> > For more options, visit this group athttp://
> >> groups.google.com/group/mongodb-user?hl=en.
>
> >> > > >> >> > --
>
> >> > > >> >> > You received this message because you are subscribed to the
> >> Google Groups "mongodb-user" group.
> >> > > >> >> > To post to this group, send email to
> >> mongod...@googlegroups.com.
> >> > > >> >> > To unsubscribe from this group, send email to

> >> mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>


> >> .
> >> > > >> >> > For more options, visit this group athttp://
> >> groups.google.com/group/mongodb-user?hl=en.
>
> >> > > >> > --
>
> >> > > >> > You received this message because you are subscribed to the
> >> Google Groups "mongodb-user" group.
> >> > > >> > To post to this group, send email to
> >> mongod...@googlegroups.com.
> >> > > >> > To unsubscribe from this group, send email to

> >> mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>


> >> .
> >> > > >> > For more options, visit this group athttp://
> >> groups.google.com/group/mongodb-user?hl=en.
>
> >> > > > --
>
> >> > > > You received this message because you are subscribed to the Google
> >> Groups "mongodb-user" group.
> >> > > > To post to this group, send email to mongod...@googlegroups.com.
> >> > > > To unsubscribe from this group, send email to

> >> mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>


> >> .
> >> > > > For more options, visit this group athttp://
> >> groups.google.com/group/mongodb-user?hl=en.
>
> >> --
>
> >> You received this message because you are subscribed to the Google Groups
> >> "mongodb-user" group.
> >> To post to this group, send email to mongod...@googlegroups.com.
> >> To unsubscribe from this group, send email to

> >> mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>


> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/mongodb-user?hl=en.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>

Eliot Horowitz

unread,
Dec 20, 2009, 7:41:08 PM12/20/09
to mongod...@googlegroups.com
Not quite - _id can be any type, so you don't need to create a MongoId

> --
>
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.

> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.

jignesh patel

unread,
Dec 21, 2009, 1:52:14 AM12/21/09
to mongodb-user
Thanks for quick replies, it seems that it's an active community/
group.

My intension was not to change virtually unique field like "_id"
generated by mongo.

I am having an application in which i am using (human readable) id's
of my entity tables like twitter with urls many places.

How can i use another unique auto increment field for reference beside
"_id"?

e.g

category collection:

{ "_id" : ObjectId("4b29ec56641b4f75d67e029e"), "idCategory" : 1,
"value" : "PHP" }
{ "_id" : ObjectId("4b29ec56641b4f75d67e029e"), "idCategory" : 2,
"value" : "JAVA" }
{ "_id" : ObjectId("4b29ec56641b4f75d67e029e"), "idCategory" : 3,
"value" : "RUBY" }
{ "_id" : ObjectId("4b29ec56641b4f75d67e029e"), "idCategory" : 4,
"value" : "ASP" }
....

here "idCategory" field?

On Dec 21, 5:41 am, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> Not quite - _id can be any type, so you don't need to create a MongoId
>

> On Sun, Dec 20, 2009 at 4:55 PM, chx <chx1...@gmail.com> wrote:
> > While the thread mostly concentrated on proper _id creation it must be
> > noted that with the PHP driver once you have a $_id you need to call
> > $_id=  new MongoId($_id); and $_id must be 24 hexidecimal characters

> > as stated onhttp://php.net/manual/en/mongoid.construct.php

Ask Bjørn Hansen

unread,
Dec 21, 2009, 3:05:32 AM12/21/09
to mongod...@googlegroups.com

On Dec 20, 2009, at 22:52, jignesh patel wrote:

> I am having an application in which i am using (human readable) id's
> of my entity tables like twitter with urls many places.


If you want a numeric sequence to make it more user friendly you are doing it wrong. :-) Human readable categories (to take your example) would be "Book", "Blu-Ray", "Dining Tables"; not 1, 2, 3, ...


- ask

Message has been deleted

Anirudh Zala

unread,
Dec 21, 2009, 4:57:49 AM12/21/09
to mongodb-user
Let's ask this question in different way:

Reason for numeric auto_increment in RDBMS serves various purposes
like uniqueness of row/record, small/clean URL, indexes and sorting.
While in case of MongoDB purpose of such auto_generated/auto_increment
field is quite different to keep each record unique throughout all
databases, collections, servers.

Now consider following scenario of database structure:

1 collection called "student" having 1 million documents/records
having 20 fields (or keys) in flat mode (i.e all information regarding
a student is stored in denormalized way as MongoDB suggests). Out of
20 fields, there are 10 fields whose values refer as numeric value to
fields of other entity collections like city, state, country, course1,
course2 etc. because it is required to manage those entities
separately and to show in list box in various forms.

Now in RDBMS, since those ref. fields are Numeric and can never be
more than 3 digits longer (in our case), it helps keep value easily
readable (I can ask my friend hey check record with this number 568),
helps RDBMS sort and search based on numerical values (as there will
be indices on them). While in mongoDB since those ref. are 24
characters long, my question is "would they effect, search, sorting,
disk-space (3 digits vs. 24 characters X for 10 fields X for 1 million
records) etc.?

I know that in mongoDB, "_id" field is mainly designed to keep each
document unique, but I just want to know would it be *effective*
solution to use those 24 characters ref. or to manage numerical digits
on own (like in RDBMS)?

On Dec 19, 6:05 pm, Eliot Horowitz <eliothorow...@gmail.com> wrote:
> You can add whatever _id field you want to the object you save.
> In terms of doing auto-increment, the reason we don't do it is that
> its hard to do with sharding.
> If you want to do it, you'll need to figure out your own way to keep
> that consistent.
>
> On Sat, Dec 19, 2009 at 2:31 AM, jignesh patel
>

Mathias Stearn

unread,
Dec 21, 2009, 9:58:50 AM12/21/09
to mongod...@googlegroups.com
Just to clarify OIDs are 12 bytes, but displayed as 24 hex chars. They also avoid the 5 byte overhead of strings (4 for size, 1 for NULL) and are designed to be (roughly) increasing. Because of how they are generated, you get a created_at timestamp for free.

OIDs should be good enough for most use cases, even if they aren't perfect. If you have a better way to generate a key, there is nothing stopping you from storing any type of data you want in the _id field. 

If you want an increasing number in that field, it shouldn't take more than 20 lines of code to set up a simple server to generate them. A quick google search didn't turn up anything interesting, but if there is enough demand, I could code one up pretty quickly.

Zala

unread,
Dec 21, 2009, 10:48:56 AM12/21/09
to mongodb-user
Thanks Mathias,

But I would like ask what would be best strategy from following 3 in
terms of performance, storage, retrieval, maintenance?

#1 Storing object ID of entities what mongoDB generates as reference
#2 Storing manually generated numerical IDs of entities as reference
(as you have mentioned here)
#3 Entity string itself instead of reference?

Thanks

Anirudh Zala

> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>


> > .
> > > > For more options, visit this group athttp://
> > groups.google.com/group/mongodb-user?hl=en.
>
> > --
>
> > You received this message because you are subscribed to the Google Groups
> > "mongodb-user" group.
> > To post to this group, send email to mongod...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > mongodb-user...@googlegroups.com<mongodb-user%2Bunsu...@googlegroups.com>

Mathias Stearn

unread,
Dec 21, 2009, 11:28:15 AM12/21/09
to mongod...@googlegroups.com
I'd suggest either 1 or 3. 3 has the advantage that if you have a list of IDs you can actual have meaningful data and avoid dereferencing. For example if you put username in _id, and you have a list of users in a group, then you can display that list with links to the individual user's pages with only the data from the group object. OIDs are nice for when you don't have a natural short and unique field, or you want to allow the naturally unique field to change without updating all refering objects.

To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages