MongoDB compound shard key strategy

415 views
Skip to first unread message

Angelo H

unread,
Jun 17, 2016, 5:20:03 PM6/17/16
to mongodb-user

I have documents like:  {_id: "someid1", "bar": "somevaluebar1"} {_id: "someid2", "foo": "somevaluefoo2", "bar": "somevaluebar2"} {_id: "someid3", "foo": "somevaluefoo3", "zoo": "somevaluezoo3"} {_id: "someid4", "zoo": "somevaluezoo4"}


1. If we query documents by "foo" the most and "bar" the second, does it make sense to create a compound shard key like { "foo" : 1, "bar" : 1, "_id" : 1 }?


2. "foo" or "bar" can be missing from the document too so I added "_id" to the compound shard key. Is this a good decision?


3. What will happen if I query by "bar"? Does it hit all the shards to gather the result?

Amar

unread,
Jun 29, 2016, 9:40:42 PM6/29/16
to mongodb-user

Hi Angelo,

  1. If we query documents by “foo” the most and “bar” the second, does it make sense to create a compound shard key like { “foo” : 1, “bar” : 1, “_id” : 1 }?

That really depends on the values of foo, bar and id as well as your requirements for sharding. In general, you need to choose a shard key based on the following criteria:

  1. Cardinality: The shard key should be easily divisible (preferably indefinitely) with respect to your data, to ensure that chunk splits could always happen. Please see Cardinality for examples

  2. Write distribution: The shard key should allow writes to be distributed across the cluster so that writes are not concentrated on a single shard. Please see Write Distribution for more details.

  3. Query isolation: The shard key should allow frequently used queries to target a small number of shards, so that scatter-gather queries are minimized/eliminated. Please see Query Isolation for more details.

More details can be found in Considerations for Selecting Shard Keys and On Selecting a Shard Key for MongoDB
MongoDB performance best practices

  1. “foo” or “bar” can be missing from the document too so I added “_id” to the compound shard key. Is this a good decision?

fields in the shard key cannot be missing. All fields in the shard key should have values for all documents in the sharded collection.

  1. What will happen if I query by “bar”? Does it hit all the shards to gather the result?

Yes this will query all results as the chunks are ordered based on the order of fields in the shard key (foo, bar, _id) and bar is the second element in the shard key. On the other hand, a query on foo and bar might not hit all shards.

One way to check this is to test it and see if the explain plan and see which shards get queried:

mongos> db.shardTest.find( { foo : 4 } ).explain()
{
    "queryPlanner" : {
        "mongosPlannerVersion" : 1,
        "winningPlan" : {
            "stage" : "SINGLE_SHARD",
            "shards" : [
                {
                    "shardName" : "shard01",
(...)

Or a shard merge when more than one shard was queried:

mongos> db.shardTest.find( { bar : 4 } ).explain()
{
    "queryPlanner" : {
        "mongosPlannerVersion" : 1,
        "winningPlan" : {
            "stage" : "SHARD_MERGE",
            "shards" : [
                {
                    "shardName" : "shard01",
(...)

The details will also show which shards where queried.

Regards,

Amar

Reply all
Reply to author
Forward
0 new messages