Hi Dorian,
Thanks a lot for your answer.
> Int64 is 8 bytes and BSON is 12 bytes. So 8 bytes is lower memory = better.
Yes, that's great. But I read that it is an advantage to use BSON rather than another data type? I do hope Int64 is advantageous as it would serve our purpose :)
> A problem would be if the _id was string for example, much bigger in memory and comparison, for example range queries.
No it's all Int64 in our case.
> The id must be unique and is guaranteed by facebook (i suppose).
Yes.
> The write speed will be slower than ascending int(like epoch) since a bigger part of the index will have to be read to insert the document.
I understand this conceptually. However, since our complete index fits in RAM, I would expect that it doesn't matter (at least not at this scale)?
To summarize:
We've already switched to this solution (Facebook UID as _id). Since we deployed, we experienced a higher lock ratio. Since we mainly do inserts, I was afraid that the new _id might be the culprit. However, we've changed a few other things as well, so I would be really happy if we could exclude the new _id as the culprit as there is almost no way of going back. The fact that you happily use your own _id relieves me a bit. However, our _ids are not ascending so I hope this is not a big issue in case the full index can be kept in RAM.
Dominik
Thanks much for all your answers.
We've let it run in production for a week now and I just wanted to give you some feedback on this. Although traffic was not peaking this week, it seemed just fine with the new ID. We've found the culprit that was slowing down insert performance temporarily and it was not due to the new ID. I feel like custom ID is ok. As long as you can keep the whole index in RAM, I assume performance losses due to index caching are not affecting us. Again, thanks for your help.
Hello,
We intend to use Facebook UID as our _id in a large MongoDB collection with tons of inserts. In the MongoDB documentation we read that:
1. It is advised to store the id in a DOCS:BSON type.
2. It is advised to have _id values roughly in ascending order.
Both properties are not satisfied with our solution. It is critical for us to use Facebook UID (as an Int) as the primary key. The main reason for using Facebook UID is that we want to use it as our shard key so that inserts are spreads evenly among shards.
Is there any way to optimize our solution to get maximum insert performance (we suffer from high lock ratios)?
Thanks a lot.
Dominik
--
From the documentation:
UUIDs
The _id field can be of any type; however, it must be unique. Thus you can use UUIDs in the _id field instead of BSON ObjectIds (BSON ObjectIds are slightly smaller; they need not be worldwide unique, just unique for a single db cluster). When using UUIDs, your application must generate the UUID itself. Ideally the UUID is then stored in the [DOCS:BSON] type for efficiency – however you can also insert it as a hex string if you know space and speed will not be an issue for the use case.
When possible, use _id values that are roughly in ascending order
If the _id's are in a somewhat well defined order, on inserts the entire b-tree for the _id index need not be loaded. BSON ObjectIds have this property.
Note that unlike the BSON Object ID type (see above), most UUIDs do not have a rough ascending order, which creates additional caching needs for their index.