Reno Reckling
unread,Oct 3, 2012, 7:21:19 PM10/3/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to mongod...@googlegroups.com
> We're having an argument as to which one of these shard keys will produce the best insert speed. With the new key we're worried about thrashing the cache on insert vs. the performance of the old key.
Hi,
this depends on the size of your data set and your overall input per day.
Might I ask why you even include the date in the shard key? That does not seem to have any benefits.
I don't think that including the date is a good choice at all if you would like to optimize for
insert performance as chunks get only split in the middle.
Putting the date in front will redirect all of your inserts to the same chunk (meaning to the same
shard too), which gets split in half only if it gets to big, this will definitely thrash your cache
in that case.
The perfect key for maximum input performance would be to make it a strong long random number
without prefixes, so the writes will be distributed evenly across all shards.
If you have a massive amount of writes every day (10s of millions) than your current format of
<day>+<6digitRandomNumber> would inevitably lead to key collisions as your the day is fixed for a
day and the 6 digits would be guaranteed to collide after 10000000 inserts if you are using numbers
from 0 to 9 or after 16777216 inserts if you are using hex numbers on the same day.
So if you have no additional requirements such as locality of data, I would just advice to use a
suitable hashing function to generate strong random sequences that can be used as shard keys. That
way, the writes will distribute evenly across the shards for maximum performance.
Regards,
Reno