tag aware sharding

30 views
Skip to first unread message

Himanshu Jain

unread,
Sep 30, 2015, 11:35:52 AM9/30/15
to mongod...@googlegroups.com
Hi, 

I have a mongodb cluster with three shards with shard key as Date column. Database is like every day, I get 30k records. Since Date is monotonically increasing shard key, so I have all the latest data on a single shard. (hence all recent reads are happening on single shard)

What I am looking for is to have data split by month (for each year and going forward too)  - eg: 

Shard 1 - Jan, Apr etc..
Shard 2 - Feb, May etc..
Shard 3 - Mar, June, etc..

Is it possible using Tag Aware Sharding (please suggest how can I achieve). 

Also, if I change shard key to some other key resulting in splitting single day's of data into multiple shards, Would this improve read performance (because in this case, query goes to all shards for single date and then aggregate results from all shards) 

Thanks !

Himanshu Jain

unread,
Oct 1, 2015, 10:12:29 AM10/1/15
to mongod...@googlegroups.com
Any suggestions on this?

Stephen Steneker

unread,
Oct 1, 2015, 11:10:40 PM10/1/15
to mongodb-user

On Thursday, 1 October 2015 01:35:52 UTC+10, himanshu jain wrote:

I have a mongodb cluster with three shards with shard key as Date column. Database is like every day, I get 30k records. Since Date is monotonically increasing shard key, so I have all the latest data on a single shard. (hence all recent reads are happening on single shard)

What I am looking for is to have data split by month (for each year and going forward too)  - eg: 

Shard 1 - Jan, Apr etc..
Shard 2 - Feb, May etc..
Shard 3 - Mar, June, etc..

Is it possible using Tag Aware Sharding (please suggest how can I achieve). 

Hi Himanshu,

I wouldn’t recommend tag aware sharding as a workaround for a poor shard key.

A primary goal of sharding is to enable horizontal scaling, so you can add additional shards to grow with your workload. A monotonically increasing shard key creates a “hot shard” effect where inserts will target whichever shard currently has the chunk with the max shard key value, and chunks will then have to be redistributed to other shards via the balancer.

With a poor shard key you will effectively have the same write limitation as a single shard with the added I/O overhead of frequent data migrations between shards. You will also have an unnecessarily complex environment to administer if you’re not actually getting any benefit out of sharding.

Also, if I change shard key to some other key resulting in splitting single day’s of data into multiple shards, Would this improve read performance (because in this case, query goes to all shards for single date and then aggregate results from all shards)

Generally you want to choose a compound shard key which provides some data locality for your common queries (e.g. fetching data for as single month) while also allowing writes to be evenly distributed across your shards.

You should also be aware that changing a shard key is not possible since the data has already been distributed amongst your shards. If you want a new shard key you need to migrate the data into a new sharded collection.

For more information see: Considerations for Selecting Shard Keys.

Regards,
Stephen

Reply all
Reply to author
Forward
0 new messages