Selecting a perfect shard key

123 views
Skip to first unread message

Amit Patel

unread,
Aug 9, 2014, 7:21:26 PM8/9/14
to mongod...@googlegroups.com
Hey,

I am bit confused about mongo shard key.

First let me explain you, what i want to do  and what is my understanding.
Please correct if i am wrong

I have a collection called "company" and i want to shard this collection among my 4 shard instances.


Now i want to create a perfect shard key for this collection which will provide write scalability as well as read isolation.

i am choosing "company-website-url"  field as my shard key so now the value of shard key will be unique every time so using this way all the writes will get evenly distributed among all shard instance.

 And  i will always use "company-website-url" field whenever i will query i mean i will always include shard key in my select queries for this collection.  so this way mongos will send query to only that shard instance which store the result for that particular select query/

Please correct me if i am wrong on above understanding.

Now here i have few questions, i would really appreciate if you will clear my below doubts.


I know some question are bit silly but i am really facing too difficulty to understand this thatswhy i am asking in very simple language.

Here my question starts.

On so many documents on the internet i read that if The shard key field is usually of Date, Timestamp or Objectld type. With this pattern all writes are routed to one shard.

i am not understanding why this will always send queries to only one shard instance. suppose i am using _id field as a shard key then why mongo always send writes on only one shard i mean value of id will be also unique every-time  because it will gert increased every-time then why it send all the write to one instance?

does mongo automatically define a range whenever we set this type of shard key? i mean if i am using _id field then will mongo automatically define that all the  _id fielld with value 1-1000 will go in only one shard?

Please clear my this doubt.

My second question is if i am selecting a field as a shard shard key which value has a high level of randomness so in this case  how mongo will route writes to shard instances.

suppose if i have selected user-name as a shard key so there will be high level of randomness in in its values.

1) so will mongo create chunk for each user-name ?

2) And how it will route the writes for this shard key i mean, will mongo distribute writes on round robin fashion because the value of shard key is unique every-time or is there any other criteria on how it distribute writes?


I know some question are bit silly but i am really facing too difficulty to understand this thatswhy i am asking in very simple language.

Please tell me if want any detail from my side.


Amit Patel

unread,
Aug 10, 2014, 1:40:57 PM8/10/14
to mongod...@googlegroups.com
Any suggestion on this?

Asya Kamsky

unread,
Aug 10, 2014, 2:47:20 PM8/10/14
to mongodb-user
Yes, I have a few, but I'm working through answered questions from the past week in order.



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/4da01170-403b-44ae-98a6-1bbf05dbc8f3%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Amit Patel

unread,
Aug 14, 2014, 4:00:42 AM8/14/14
to mongod...@googlegroups.com
Hey Asya,

Please reply whenever you get time.

Looking forward to hear from you soon.

Thanks.

Asya Kamsky

unread,
Aug 14, 2014, 10:13:30 PM8/14/14
to mongodb-user
Picking the perfect shard key... lots has been written on this subject...

> Now i want to create a perfect shard key for this collection which will provide write scalability as well as read isolation.

Those are two excellent and most important criteria. Others are to
make sure that it will have high enough granularity to allow splitting
and balancing.

> i am choosing "company-website-url" field as my shard key so now the value of shard key will be unique every time so using this way all the writes will get evenly distributed among all shard instance.

I don't see how that follows. Just because each combination is unique
does NOT mean that you will have even distribution of writes!
Imagine that you get these 5 combinations in a row:

acme-www-jobs
acme-docs-index
acme-www-index
foobar-www-index
acme-www-contact

You just had a rather uneven distribution of writes - everything went
to the shard that had acme and not much when to the shard that had
foobar...

Plus I don't see how combination of company-site-url is unique unless
you plan on aggregating things in a single document per
company-site-url which sounds like a bad idea as the documents will
get way too big.

MongoDB defines ranges for you regardless of what your shard key is.
And when people say that increasing shard key values cause write
hotspots, they mean _inserts_ will cause hotspots because they will
always go into the highest range of shard key values.

MongoDB does not create a chunk for each *value* of the shard key, but
rather for a range of values. This is described in a lot of detail in
the docs and there are also many blog posts about this topic:

http://docs.mongodb.org/manual/core/sharding-introduction/
http://docs.mongodb.org/manual/core/sharded-cluster-mechanics
http://docs.mongodb.org/manual/tutorial/choose-a-shard-key
http://www.kchodorow.com/blog/2011/01/04/how-to-choose-a-shard-key-the-card-game/

Asya
> --
> You received this message because you are subscribed to the Google Groups
> "mongodb-user"
> group.
>
> For other MongoDB technical support options, see:
> http://www.mongodb.org/about/support/.
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> To post to this group, send email to mongod...@googlegroups.com.
> Visit this group at http://groups.google.com/group/mongodb-user.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/mongodb-user/CANyEVrME80d%3DKEiN3yvqaZ0p%3D2YQsMbg2Y_6UtcfFoCsZRyVCg%40mail.gmail.com.

Amit Patel

unread,
Aug 19, 2014, 2:00:02 PM8/19/14
to mongod...@googlegroups.com
Hey Asya,

Thanks for your reply.

I appreciate your detailed response.

I still have a small query, i read all the documents you provided but i didn't get the straight and accurate answer of my question.

i have too small question, how does mongo decide the range of a chunk? what logic does it use to define range?

Suppose i have a almost empty collection called company and company-url-name is my shard key and my nsert has "abc.com" in company-url-name

then i believe mongo will send my  insert to primary shard instance then it will create first chunk and will insert the value in that chunk but how does mongo decide that what would be the range of that first chunk?
because it has to decide what would the range of the first chunk then only it can create the second chunk.

And will mongo create chunks and store data on same shard instance untill it exceed Migration Threshold? and if your answer is yes then mongo will send all the writes to the same shard instance untill the shard instance reaches the migration threshold.  dont you think it will hamper the performance.

Pls Correct me if i am saying anything wrong.

Thanks.
Amit

Asya Kamsky

unread,
Aug 19, 2014, 6:29:12 PM8/19/14
to mongodb-user
The first chunk is always created as soon as you shard the collection
and its range is from MinKey to MaxKey (i.e. from negative infinity to
infinity).

That will remain the only chunk until there is 64MB of data inserted
at which point the chunk will be split based on the value that's
roughly in the "middle" of the range of shard keys.

Asya
> https://groups.google.com/d/msgid/mongodb-user/1fed6ca7-6155-4068-87b1-f1edb84d93bf%40googlegroups.com.

Asya Kamsky

unread,
Aug 19, 2014, 6:30:12 PM8/19/14
to mongodb-user
And as far as whether it would hamper performance - that's why
sometimes it's a good idea to pre-split/pre-balance:

http://docs.mongodb.org/manual/tutorial/split-chunks-in-sharded-cluster/
Reply all
Reply to author
Forward
0 new messages