Hi Sachin
I am designing the sharding strategy for one of my collection.
Each object in that collection has a timestamp field which I find best to use as my shard key.
Could you post an example document? Do you plan to use only the timestamp as the shard key, or are you using the timestamp as part of a compound shard key?
Shard key selection is an important step in your schema design, since once created, the shard key is immutable. That is, if later on you discover that the shard key is not the best, you would have to dump the collection, recreate the collection with a different shard key, and re-import all the data back in.
Using a monotonically increasing shard key by itself (i.e. not using a compound key where the timestamp is one element of the key) will artificially limit your insert rate. This is because the insert will always happen on the chunk having MaxKey
on it. Since there could only be one chunk having the MaxKey
, and a single chunk can only be located in one shard, your insert rate is practically limited to the capacity of a single machine, no matter how many shards you have. Also, you cannot change this fact in the future unless you dump/recreate/restore the collection.
For more information regarding shard key selection, please see:
Also almost all of my queries will involve timestamp field range.
So here I am slightly confused, would hashed sharding be best for my case or ranged sharding.
If the shard key only contains a single monotonically increasing value (e.g. ObjectId, timestamp), then using a hashed shard key is the recommended approach, since inserts will be more spread out across all the shards. The tradeoff is that you cannot do a range-based query on a hashed value. If you require a range query based on timestamp, then it is recommended to use a compound key, in which the timestamp forms part of the key.
I would suggest you test the shard key in a test environment using the expected workload, and check whether using a monotonically increasing shard key is acceptable to your use case.
Best regards,
Kevin
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: https://docs.mongodb.com/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user+unsubscribe@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/0a7a44b4-ef00-42cf-817c-6b510866ff26%40googlegroups.com.
Hi Sachin
What I understand that simply using {ts: 1} as shard key would help in my queries but won’t help in insert.
This is correct.
So what I understand from your suggestion it is better to use compound key has my shard key.
This is also correct, although the wording I prefer is: if required to have a monotonically increasing field as a shard key, use a compound key as a shard key, so that the shard key does not contain only a single field that is monotonically increasing.
So would a key like {ts:1, pn: 1, un: 1, ss: 1} would be the way to go about it?
That is one possible shard key that could satisfy the queries you posted. However, I would recommend you to create a test deployment and extensively check the explain()
output of more example queries that you will use in production before committing to any shard key selection. Particularly, if you need to sort the results.
What you want to avoid seeing in the explain()
output is stages with "COLLSCAN"
(which means a collection scan, i.e. MongoDB is forced to examine every document in the collection), and "SORT_KEY_GENERATOR"
(which means an in-memory sorting stage, which is limited to 32 MB). See Use Indexes to Sort Query Results for more information.
What if they are missing (as they can be) in some documents. So when they are missing and still used as part of compounded shard key would they cause any issues?
If you include a field as part of the shard key, that field cannot be missing from any document. This is because the entire shard key is used to decide on which shard and chunk that document belongs. If there are missing fields, MongoDB cannot discover this information.
Best regards,
Kevin