MongoDB Sharding-Query via mongos takes more time than query on mongod (PRIMARY)

73 views
Skip to first unread message

Babyduck

unread,
Jan 23, 2018, 4:05:36 PM1/23/18
to mongodb-user
Setup Details:

mongos: 
RAM: 8 GB, CPUs: 2

Config Servers (Replica set of 3 config servers):
RAM: 4 GB, CPUs: 2

Shard Cluster-1 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4

Shard Cluster-2 (Replica of 3 mongod):
RAM: 30 GB, CPUs: 4

Sharding:
Collection: rptDlp, Key: {incidentOn: "hashed"}

Description:

I have more than 15 million records in a collection.
I am retrieving last page documents having sorted by a field(indexed one) of type date.

Actual Query:

db.getCollection("rptDlp").find({ incidentOn: { $gte: new Date(1513641600000), $lt: new Date(1516233600000) } })
.sort({ incidentOn: -1 }).skip(15610600).limit(10)


If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds. But through mongos, it takes more than 2 minutes and due to query timeout my application results in showing an error prompt.

If we assume it as network congestion, then every query should take 2 minutes. But when I retrieve documents for first page it shows result in few seconds.

Any suggestion would be helpful, thanks in advance. 

Kevin Adistambha

unread,
Feb 13, 2018, 8:07:59 PM2/13/18
to mongodb-user

Hi,

If I execute this query directly against mongo shard server (PRIMARY), it shows result in 14 seconds.

It’s generally not recommended to connect directly to the shards and perform queries there. The result might not be reliable, since it can contain outdated documents that was moved to another shard but not yet deleted. These “orphaned” documents could be updated in the meantime, rendering the query to return incorrect results.

But through mongos, it takes more than 2 minutes

I believe there are multiple causes to this performance issue:

  • Your shard key is a hashed shard key, and your query is a range query: While hashed shard key is a good solution to ensure that there are no “hot shard” in a monotonically increasing field, it cannot be used for range queries.
  • The query also specified sorting: Due to its properties, the hashed key cannot be used for this.
  • The query skipped over a large number of documents, then returns only a small amount of results: Even when not returned, MongoDB still needs to go through all 15610600 documents to return the correct 10.

The combination of the three situations above resulted in a slow query, since MongoDB basically cannot use any index to help the query.

Without knowing the details of your use case, if this query is vital and would need to be performed many times a day, then using the incidentOn field as the shard key would allow the shard key to be used to help the query. However, this comes with the drawback of having a monotonically increasing shard key. If this is not acceptable, then a more selective query may be required, so that the shard key could be less monotonic.

Please see Shard Keys and On Selecting a Shard Key for MongoDB for more shard key selection considerations.

Best regards
Kevin

Reply all
Reply to author
Forward
0 new messages