Re: Custom ID / Insert Performance

169 views
Skip to first unread message

ddorian

unread,
Aug 6, 2012, 4:34:37 PM8/6/12
to mongod...@googlegroups.com
Int64 is 8 bytes and BSON is 12 bytes. So 8 bytes is lower memory = better.
A problem would be if the _id was string for example, much bigger in memory and comparison, for example range queries.
The id must be  unique and is guaranteed by facebook (i suppose).
The write speed will be slower than ascending int(like epoch) since a bigger part of the index will have to be read to insert the document.

In my project i have many projects and each project has many documents. I make the "_id" of the documents: project_id + epoch (concatenate them)(if in your case this way the number grows beyond int64 max number then store it bson). Maybe you can do something similar?

dominikgrolimund

unread,
Aug 7, 2012, 10:33:24 AM8/7/12
to mongod...@googlegroups.com

Hi Dorian,

Thanks a lot for your answer.

> Int64 is 8 bytes and BSON is 12 bytes. So 8 bytes is lower memory = better. 

Yes, that's great. But I read that it is an advantage to use BSON rather than another data type? I do hope Int64 is advantageous as it would serve our purpose :)

> A problem would be if the _id was string for example, much bigger in memory and comparison, for example range queries.

No it's all Int64 in our case.

> The id must be  unique and is guaranteed by facebook (i suppose).

Yes.

> The write speed will be slower than ascending int(like epoch) since a bigger part of the index will have to be read to insert the document.

I understand this conceptually. However, since our complete index fits in RAM, I would expect that it doesn't matter (at least not at this scale)?


To summarize:

We've already switched to this solution (Facebook UID as _id). Since we deployed, we experienced a higher lock ratio. Since we mainly do inserts, I was afraid that the new _id might be the culprit. However, we've changed a few other things as well, so I would be really happy if we could exclude the new _id as the culprit as there is almost no way of going back. The fact that you happily use your own _id relieves me a bit. However, our _ids are not ascending so I hope this is not a big issue in case the full index can be kept in RAM.

Dominik

Scott Hernandez

unread,
Aug 7, 2012, 11:19:43 AM8/7/12
to mongod...@googlegroups.com
On Tue, Aug 7, 2012 at 10:33 AM, dominikgrolimund <dom...@silp.com> wrote:
> Hi Dorian,
>
> Thanks a lot for your answer.
>
>> Int64 is 8 bytes and BSON is 12 bytes. So 8 bytes is lower memory =
>> better.
>
> Yes, that's great. But I read that it is an advantage to use BSON rather
> than another data type? I do hope Int64 is advantageous as it would serve
> our purpose :)

A long (64bit int) is defined in bson and is just fine.

>> A problem would be if the _id was string for example, much bigger in
>> memory and comparison, for example range queries.
>
> No it's all Int64 in our case.
>
>> The id must be unique and is guaranteed by facebook (i suppose).
>
> Yes.
>
>> The write speed will be slower than ascending int(like epoch) since a
>> bigger part of the index will have to be read to insert the document.
>
> I understand this conceptually. However, since our complete index fits in
> RAM, I would expect that it doesn't matter (at least not at this scale)?

See my questions below.


> To summarize:
>
> We've already switched to this solution (Facebook UID as _id). Since we
> deployed, we experienced a higher lock ratio. Since we mainly do inserts, I
> was afraid that the new _id might be the culprit. However, we've changed a
> few other things as well, so I would be really happy if we could exclude the
> new _id as the culprit as there is almost no way of going back. The fact
> that you happily use your own _id relieves me a bit. However, our _ids are
> not ascending so I hope this is not a big issue in case the full index can
> be kept in RAM.

Can you post mongostat numbers when locking is high? How many indexes
do you have on those collections? The more indexes, the more IO which
can cause higher locking. It is very unlikely it is the change of the
_id data type. Please also include db.coll.stats() as well.

Please post to gist/pastebin/etc so the output of those commands is
more readable.
> Dominik
>
>
> On Monday, August 6, 2012 10:34:37 PM UTC+2, ddorian wrote:
>>
>> Int64 is 8 bytes and BSON is 12 bytes. So 8 bytes is lower memory =
>> better.
>> A problem would be if the _id was string for example, much bigger in
>> memory and comparison, for example range queries.
>> The id must be unique and is guaranteed by facebook (i suppose).
>> The write speed will be slower than ascending int(like epoch) since a
>> bigger part of the index will have to be read to insert the document.
>>
>> In my project i have many projects and each project has many documents. I
>> make the "_id" of the documents: project_id + epoch (concatenate them)(if in
>> your case this way the number grows beyond int64 max number then store it
>> bson). Maybe you can do something similar?
>
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

dorian i

unread,
Aug 7, 2012, 1:58:31 PM8/7/12
to mongod...@googlegroups.com
My idea was that int64(incrementing, like epoch) is more performant than bson. Right?
It has problems that you can't insert many documents at the same time.

Russell Bateman

unread,
Aug 7, 2012, 3:23:10 PM8/7/12
to mongod...@googlegroups.com
It's also predictable and consequently less secure depending on whether you care if consumers can make something pernicious out of it.

dominikgrolimund

unread,
Aug 13, 2012, 10:03:19 AM8/13/12
to mongod...@googlegroups.com

Thanks much for all your answers.

We've let it run in production for a week now and I just wanted to give you some feedback on this. Although traffic was not peaking this week, it seemed just fine with the new ID. We've found the culprit that was slowing down insert performance temporarily and it was not due to the new ID. I feel like custom ID is ok. As long as you can keep the whole index in RAM, I assume performance losses due to index caching are not affecting us. Again, thanks for your help.


On Monday, August 6, 2012 9:25:07 PM UTC+2, dominikgrolimund wrote:

Hello,

We intend to use Facebook UID as our _id in a large MongoDB collection with tons of inserts. In the MongoDB documentation we read that:

1. It is advised to store the id in a DOCS:BSON type.

2. It is advised to have _id values roughly in ascending order.

Both properties are not satisfied with our solution. It is critical for us to use Facebook UID (as an Int) as the primary key. The main reason for using Facebook UID is that we want to use it as our shard key so that inserts are spreads evenly among shards.

Is there any way to optimize our solution to get maximum insert performance (we suffer from high lock ratios)?

Thanks a lot.

Dominik

--

From the documentation:

UUIDs

The _id field can be of any type; however, it must be unique. Thus you can use UUIDs in the _id field instead of BSON ObjectIds (BSON ObjectIds are slightly smaller; they need not be worldwide unique, just unique for a single db cluster). When using UUIDs, your application must generate the UUID itself. Ideally the UUID is then stored in the [DOCS:BSON] type for efficiency – however you can also insert it as a hex string if you know space and speed will not be an issue for the use case.

When possible, use _id values that are roughly in ascending order

If the _id's are in a somewhat well defined order, on inserts the entire b-tree for the _id index need not be loaded. BSON ObjectIds have this property. 

Note that unlike the BSON Object ID type (see above), most UUIDs do not have a rough ascending order, which creates additional caching needs for their index.
Reply all
Reply to author
Forward
0 new messages