How to decide on number of shards

58 views
Skip to first unread message

sjmi...@gmail.com

unread,
May 23, 2016, 5:51:02 AM5/23/16
to mongodb-user
Hi All,
Based on reply to my previous post
https://groups.google.com/d/msg/mongodb-user/_kIEerRpAig/U9jVAgVcAQAJ
I am creating a new topic.

The problems we face is some queries and aggregation pipeline seems to be running slow.
Now it can be due to sub optimal indexes and shard key. However lets say if we fix that, is there any guidelines has how many shards a collection should have.

I have a collection with 150 million documents and each document is about 4 KB.
Right now we have three shards for that collection. The distribution is uniform with each shard sharing 33% of the data,

I am pasting the information asked for:

  • your MongoDB version
    It is 3.0
  • an example document along with any indexes in the relevant collection
    The document contains certain user data example:
    {
       user_name: <string>
       ip_address: <string>
       page_name: <string>
       url: <string>
       created: <iso date>
       access_timestamp: <long>

       //.... some other fields may differ from document to document
    }
    There are hash indexes on each of the above fields (used in find/match queries).
    Created fields however has a sort index to expire the document after a given time

  • your shard key and any reason for its selection
    The shard index is the url and there is no particular reason for it.
    Please note that url may be over 100 characters long.
    We plan to change it to page_name as that will be around 20 characters long. Please let us know if that is advisable.

  • the slow queries, why do you think sub-optimal indexes is the issue, and how did you obtain the measurements for the slow queries
    We are not sure but say a query on page_name, user_name and access_timestamp range takes a long time to run.
    here the timestamp range spans multiple days and the 150 million records holds data for 30 days.
    So it is roughly 5 millions records per day.

  • the output of sh.status()
    I don't have this data now. But has mentioned above the shards are pretty evenly distributed.

So if we say fix our shard key to some smaller field like page name and add composite indexes based on our queries, will increasing number of shards improve the performance, under assumption that more queries would be running in parallel to get the results faster.


Thanks

Sachin


Asya Kamsky

unread,
May 23, 2016, 11:45:47 AM5/23/16
to mongod...@googlegroups.com
You say "There are hash indexes on each of the above fields (used in find/match queries)." 

That sounds highly suboptimal.   You should never be using hash indexes except for your shard key index iff your optimal shard key is monotonically increasing.  

Asya
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: https://docs.mongodb.org/manual/support/
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at https://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c8ddaa74-dfa1-450a-b7e7-4a51f7c98d5d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
Asya Kamsky
Lead Product Manager
MongoDB
Download MongoDB - mongodb.org/downloads
Free MongoDB Monitoring - cloud.mongodb.com
Free Online Education - university.mongodb.com
Get Involved - mongodb.org/community
We're Hiring! - https://www.mongodb.com/careers

sjmi...@gmail.com

unread,
May 23, 2016, 10:30:38 PM5/23/16
to mongodb-user
Hi,
Yes we are fixing these indexes.

So our situation is like this:
We have say around 30 page names. We get around 5 million documents per day distributed among these 30 pages.

So should we set our shard key:
1. Based on page name
or
2. Based on the timestamp or created field
or
3. A compound key based on page name and timestamp/created field

Most of our queries are combination of page name and timestamp fields.

Also once we set the right shard index, and our collection stores roughly 30 days of data which is say 150 million records, would increase of shard number better the query performance.
Right now these 150 million recs are distributed among 3 shards.

Thanks
Sachin

sjmi...@gmail.com

unread,
May 26, 2016, 6:43:47 AM5/26/16
to mongodb-user
Hi All,
So any suggestion on what shard key should we use based on our scenario below.
If selecting a compound key then is there any optimal number on number of fields we can choose for that compound index.

Also please guide us on number of shards we should have for say 150 million documents with each document around 4 KB.
As mentioned below documents can be searched based on 5-6 attributes.

Thanks
Sachin
Reply all
Reply to author
Forward
0 new messages