Mongodb string id vs objectid, choose how and why?

32,378 views
Skip to first unread message

Serkan Durusoy

unread,
Jan 25, 2014, 11:35:47 AM1/25/14
to meteo...@googlegroups.com
Hi There,

I'm trying to wrap my head around the generation and usage of ID's in meteor/mongo/minimongo.

We have two options; either a String which we can also generate manually using Random.id() or an ObjectID which we can again generate manually using new Meteor.Collection.ObjectId()

It seems that, mongodb's default is objectid (this is also the case if we use command line meteor mongo console) but meteor has opted for string, but allows for the use of either one.

This confuses me and causes the following questions:
  1. Why does meteor see the need for using string instead of objectid? What does choosing string over objectid provide me?
  2. What do I gain/lose if I decide using objectid's?
  3. If I make modifications (especially inserts) to my database directly (or from a non-meteor app that shares the db, which is very possible for any app) does this not introduce a problem with the clashing (or mixed) use of id types?
  4. How does this (or does it at all) relate to flickering problems when there is latency compensation, where one suggested approach is generate the id on the client and pass it along within the insert constructor.
Thanks

Message has been deleted

Abigail Watson

unread,
Jan 25, 2014, 11:48:30 PM1/25/14
to meteo...@googlegroups.com
ObjectId Pros  
- it has an embedded timestamp in it. 
- it's the default Mongo _id type; ubiquitous
- interoperability with other apps and drivers

ObjectId Cons
- it's an object, and a little more difficult to manipulate in practice. 
- there will be times when you forget to wrap your string in new ObjectId()
- it requires server side object creation to maintain _id uniqueness
- which makes generating them client-side by minimongo problematic

String Pros
- developers can create domain specific _id topologies

String Cons
- developer has to ensure uniqueness of _ids
- findAndModify() and getNextSequence() queries may be invalidated


Meteor's choice to go with a string, as I understand it, basically boils down to latency compensation and being able to generate the _id on the client-side in mini-mongo. The default ObjectId implementation didn't lend itself to being generated on the client as part of the latency compensation framework, so they decided to roll their own _id scheme.

Personally, I find the embedded timestamps in ObjectIds to be invaluable later in an application's lifecycle.  They are more difficult to manipulate, and they add more debugging time to an application's development cycle.  But for the extra 10 or 20 hours you put into debugging the ObjectIds, can return 10x or 100x savings down the road.  Example:  at work, we just salvaged a year's worth of production data because of the embedded timestamps, which has saved us probably hundreds of thousands of dollars of R&D time and effort.  

ObjectId's are great if you can ensure that there's one central authority for generating them.  They're also the preferred index type for any type of timeseries data. And while it may seem tempting to try to make a one-or-the-other decision for your entire app, I find choosing a string vs ObjectId (vs some other index scheme) really boils down to the topology of the data in the collection.  

Some useful questions to maybe ask when choosing the _id for a collection:

- Does the data in the collection need latency compensation?
- Is it time-series data?  
- Will other applications or worker utilities be accessing the collection?  
- What is the topology of the data in the collection?

Serkan Durusoy

unread,
Jan 26, 2014, 9:14:51 AM1/26/14
to meteo...@googlegroups.com
Wow! Thank you very much for the very detailed and complete answer.

I need just a few further clarifications, though.

The relevant section of the docs at http://docs.meteor.com/#collection_object_id state:

"ObjectID values created by Meteor will not have meaningful answers to theirgetTimestamp method, since Meteor currently constructs them fully randomly."

Is this the case only if we manually create an ID using myNewID = new Meteor.Collection.ObjectID(); but if we do normal inserts then it contains the timestamp information?

Since objectid's can be generated on the client side, does this render the relevant bit of the objectid cons as well as the latency compensation part false? Also, since it is a 24-digit hexadecimal, there is almost no chance two ID's will be the same, so uniqueness issue is practically non-existent on either server or the client side.

If my correlations are correct, then there is only one downside of objectid's is that they need proper handling and developer attention throughout the code. On the other hand, timestamp advantage seems to be lacking, again may render my initial question valid, how and why do we choose one against the other?

It looks as if it boils down to other applications accessing/modifying the data where objectid's look like they are the more universal method.

Could you also elaborate more on your findAndModify() and getNextSequence() comments?

Abigail Watson

unread,
Jan 26, 2014, 10:32:06 AM1/26/14
to meteo...@googlegroups.com
Hi Serkan,  :)
A normal server-side insert should generate timestamps correctly.  This is what I've been trending towards when I want a collection that has timestamps.

// model.js file
Posts = new Meteor.Collection("posts", {idGeneration : 'MONGO'});

// client 
Meteor.call('insertPost), newPostObjectInfo, function(error,result){
  console.log(result);
});

// server
Meteor.call('insertPost', function(newPostObjectInfo){ 
  Posts.insert({name: newPostObjectInfo.name, owner: newPostObjectInfo.owner});
  return 'success';
}')

As for the client side Id creation, be careful of almost and practically non-existent.  The _ids have to be unique, and will be rejected if they're not.  If you can handle some posts or inserts being rejected, and are willing to take that chance... then, sure.  From my experience, that's a kind of design decision that can result in a Heisenbug later down the road, where one out of a million transaction fails.  But if you're handling 100,000 users, and they have 100 transactions a month, then you're going to start collection dozens of errors.   And the problem grows from there.  

As to how to decide which to use, again... it boils down to data topology.  And queries.  Look at how your data is queried, and how the data is structured.  That forms it's topology.  Uniqueness, auto-incrementation, spatial organization, normalized hashes, timeseries... the index types are your guide to topology.  Ask yourself how you want to index your data, and what kind of indexing you want to do.  The question of String or ObjectId is a single instance of the broader and bigger question of how to Index the data.


Any data that you want to display in a timeline or chronologically are very likely to want to have a timestamp.    (It's one of the most ubiquitous ways to display data.  Think Twitter, Facebook, Quickbooks... anything that you can sort by date.)  On the other hand, geospatial location data usually has no need for timeseries indexing, since its spatial in nature.  And geolocation data often doesn't even need to be unique.   Alternatively, user accounts data is an example of something that needs uniqueness, doesn't need to be a timeseries data, but doesn't hurt if it's timeseries either.  

As for the findAndModify, it's one of the lesser used MongoDB functions, and requires the uniqueness constraint.  Read through this doc for a discussion of what happens when a non-unique _id is used:



--
You received this message because you are subscribed to a topic in the Google Groups "meteor-talk" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/meteor-talk/f-ljBdZOwPk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to meteor-talk...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Nolan Darilek

unread,
Jan 26, 2014, 11:10:53 AM1/26/14
to meteo...@googlegroups.com
One (possibly pedantic) note on the timestamps in Mongo's ObjectId:

IIRC they're somewhat limited to time periods after 2000 and before something like 2038. I could be wrong about the specifics but I'm pretty sure they're bounded.

The lower bound makes them impractical to use for timestamps alone should you ever want to import historic data. While the upper may be a ways in the future, we thought the same about 2-digit years for a time as well. :)

While there may be disadvantages, I prefer to use my own date-based timestamps rather than relying on the ObjectId. Just thought I'd toss that out for anyone considering using them in that way.
You received this message because you are subscribed to the Google Groups "meteor-talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to meteor-talk...@googlegroups.com.
!DSPAM:52e52a7b154116013219562!

Serkan Durusoy

unread,
Jan 26, 2014, 12:40:48 PM1/26/14
to meteo...@googlegroups.com
@Nolan that's very interesting and nice to know. Frankly, I prefer keeping a verbose timestamp within all our documents anyway. For more sensitive data, we also employ an audit trail and data history.

@Abigail, thanks again for a great follow up.

The way I see it now, objectid's are the universally preferred choice and should I need to manipulate the data externally, or even (not likely but still) need to retire meteor in favor of another solution, keeping objectid's would eliminate headaches.

And yes, I see that this brings in an overhead during development where I need to take care of this non-string approach where appropriate (routes, dom node id's, etc), but if that's all, I guess whatever my data topologies are, objectid's seem to be the safer, more conservative choice.

Thanks also for the findAndModify pointer.

Steven J. Dale

unread,
Aug 10, 2014, 3:34:11 PM8/10/14
to meteo...@googlegroups.com
Hi all

@Abigail - thanks for the detailed look into ID's. I'm finally getting rolling with Meteor, and was confused at the different references between articles and the DM book.

This has been really helpful, as the later conversations and consensus to stick with Meteor string ID's here

Has anything on this front changed since?

=: s

kwame

unread,
Sep 23, 2014, 10:35:30 PM9/23/14
to meteo...@googlegroups.com
I have been following the discussion between Strings vs ObjectId as well. I personally thing this is a critical decision point for any real world Meteor app if it has to live with other systems. I just made a comment at https://github.com/meteor/meteor/issues/1834.  Funny enough, I run into this issue when I loaded cities and countries data from http://www.geonames.org/ into my mongo db. The import was done using mongo import which is much faster to bring in a non-backup data. Using mongo import  auto generates Objectid as _id. Thus, references to  cities & countries collection will have to use the Objectid formate. Even if I wanted to use the String format for those collection, I don't know if that would be possible (for now) since there is no way to create a String id corresponding to Random.id() in mongoshell. 

In effect, it looks like we might have to live with a mashup of Strings & Objectids to get the best of both worlds. Just make sure all your collection with String id have a timestamp property (CreatedAt date), we should be fine. Or loss the advantage of String ids (reduced client-side latency) and go with Objectid's everywhere. Well, everywhere except for users collection.

Jon James

unread,
Sep 24, 2014, 11:49:39 AM9/24/14
to meteo...@googlegroups.com
I usually use Collection Hooks on the client and server to add createdAt and updatedAt fields. You can see an example of how it's done in the Collection Behaviours package here.

MongoDB converts JS Date objects new Date() to ISODates, which are more useful than the date stored in an ObjectID.

See this StackOverflow answer for more information.

kwame

unread,
Sep 24, 2014, 12:23:46 PM9/24/14
to meteo...@googlegroups.com
Jon, I do use hooks as well for Date which I agree is much better. Personally, I really don't care if we use strings or objectid. The only thing thats important is 1) they have to be unique and indexable  2)supported by other clients to the database and not just meteor. If we go with Strings, #2 falls apart pretty fast because if i have a mobile app (that doesn't use meteor) which also needs to create data in mongo, then the mobile app will need to generate String id using the the same mechanism - random.id() for _id consistency. Same with using mongoshell. We will need a similar lib to call random.id() in your shell if you need to generate _id's which isn't available currently(I'm sure that can be addressed). In summary, I think new developed systems will gladly use string ids but only if the mechanism for generating the strings ids are somewhat ubiquitous in context to the database. In mongo case, ubiquitous would be access to random.id() in mongo shell and client drivers. 

Abigail Watson

unread,
Sep 24, 2014, 1:07:32 PM9/24/14
to meteo...@googlegroups.com
@Steven, @Jon, @kwame,
Please be sure to comment on issue #1834 and encourage issue #2285 to be reopened.  There's a pull-request waiting to fix the issue, but needs more community voices to weigh in on the issue.

Steven J. Dale

unread,
Sep 26, 2014, 8:38:14 AM9/26/14
to meteo...@googlegroups.com
Hi Abigail

I read through the issues, thanks for referencing. I did some ETL in the past, just enough to know the importance of ID's. However, I don't have enough of an understanding of Meteor and Mongo yet to weigh in. I get some of what you're saying, particularly importance of interop and while I don't understand Meteor's position in the tech details, the simple point about TCO for the community makes sense, particularly if it affects package authors.

My guy tells me interop is a good move, as distributed systems that integrate diff types of DB's will become more and more prevalent. (interesting read btw, just found this yesterday: http://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying

If Meteor can support other db's it will bring in a broader spectrum of use cases that may have never considered the platform otherwise.  I for example, am interested in Neo4J and graph db's, and while I dont yet know how they connect with Meteor, I imagine ID's will be an important consideration for integration and app <-> db design..

The question then is at what cost? Will latency comp and the other benefits still 'just work'..  more support == more code == more bugs == more difficulty in maintenance..  I just don't know enough yet, though community dialog sounds like a good idea

Steven J. Dale

unread,
Sep 26, 2014, 8:39:30 AM9/26/14
to meteo...@googlegroups.com
*my gut,.. haha
Reply all
Reply to author
Forward
0 new messages