Picking the perfect shard key... lots has been written on this subject...
> Now i want to create a perfect shard key for this collection which will provide write scalability as well as read isolation.
Those are two excellent and most important criteria. Others are to
make sure that it will have high enough granularity to allow splitting
and balancing.
> i am choosing "company-website-url" field as my shard key so now the value of shard key will be unique every time so using this way all the writes will get evenly distributed among all shard instance.
I don't see how that follows. Just because each combination is unique
does NOT mean that you will have even distribution of writes!
Imagine that you get these 5 combinations in a row:
acme-www-jobs
acme-docs-index
acme-www-index
foobar-www-index
acme-www-contact
You just had a rather uneven distribution of writes - everything went
to the shard that had acme and not much when to the shard that had
foobar...
Plus I don't see how combination of company-site-url is unique unless
you plan on aggregating things in a single document per
company-site-url which sounds like a bad idea as the documents will
get way too big.
MongoDB defines ranges for you regardless of what your shard key is.
And when people say that increasing shard key values cause write
hotspots, they mean _inserts_ will cause hotspots because they will
always go into the highest range of shard key values.
MongoDB does not create a chunk for each *value* of the shard key, but
rather for a range of values. This is described in a lot of detail in
the docs and there are also many blog posts about this topic:
http://docs.mongodb.org/manual/core/sharding-introduction/
http://docs.mongodb.org/manual/core/sharded-cluster-mechanics
http://docs.mongodb.org/manual/tutorial/choose-a-shard-key
http://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/
Asya
>
https://groups.google.com/d/msgid/mongodb-user/CANyEVrME80d%3DKEiN3yvqaZ0p%3D2YQsMbg2Y_6UtcfFoCsZRyVCg%40mail.gmail.com.