which node doing sort for FIND operation on sharded cluster ?

95 views
Skip to first unread message

SeungUck Lee

unread,
Jun 6, 2017, 8:22:21 AM6/6/17
to mongodb-user
According to the MongoDB manual (https://docs.mongodb.com/manual/core/sharded-cluster-query-router/),
mongos will send query (with sort) to PRIMARY SHARD of database, and PRIMARY SHARD does sort & merge result.

Sorting
If the result of the query is not sorted, the mongos instance opens a result cursor that “round robins” results from all cursors on the shards.
If the query specifies sorted results using the sort() cursor method, the mongos instance passes the $orderby option to the shards. The primary shard for the database receives and performs a merge sort for all results before returning the data to the client via the mongos.


But, According to the JIRA (https://jira.mongodb.org/browse/DOCS-7237),
All involved shards(based on shard key) does sort and merge is done by MongoS. (PRIMARY SHARD is not mentioned here).

David Storch added a comment - Jul 07 2016 09:05:11 PM GMT+0000
In 3.2.x versions:
  • For find operations with a .sort(), mongos will forward the sort to all participating shards. The sorted merge will then occur on mongos.
  • For aggregation operations, merging is always done on one of the shards. The shard which performs the merge is currently chosen at random amongst all shards in the cluster. Mongos never performs any merging for aggregation operations, but rather will simply forward the results it receives from the merging shard to the client application.


Which is correct ?

And if manual is right, Why MongoDB implement sort & merge process on PRIMARY SHARD.
Sort operation is nothing to do with PRIMARY SHARD, and this will add more load of PRIMARY SHARD. (PRIMARY SHARD might be busy because all $lookup operations are done for database, And easily sort & merge will make PRIMARY SHARD to be overloaded).

Thanks.

SeungUck Lee

unread,
Jun 14, 2017, 2:19:03 AM6/14/17
to mongodb-user
Is there anyone who can clarify this ?
Thanks in advance.

Kevin Adistambha

unread,
Jun 15, 2017, 9:26:02 PM6/15/17
to mongodb-user

Hi

I believe one point of confusion is that the merging process was defined differently between a find().sort() operation on a sharded cluster, and the aggregation framework. In aggregation, the merging process is not limited to sorting, but also includes $group, $out, etc.

The correct behaviour for find().sort() was described in the ticket you found (https://jira.mongodb.org/browse/DOCS-7237). Specifically this quote from a comment in the ticket: “For find operations, mongos is responsible for doing the sorted merge in all versions of MongoDB”.

Sort operation is nothing to do with PRIMARY SHARD, and this will add more load of PRIMARY SHARD. (PRIMARY SHARD might be busy because all $lookup operations are done for database, And easily sort & merge will make PRIMARY SHARD to be overloaded).

Since mongos is not allowed to write to disk, $sort stage with allowDiskUse: true would not be possible if it’s performed by mongos. Note that in contrast, find().sort() was never allowed to spill to disk, i.e. the query will simply fail if it exceeds 32MB. Also, $out would be writing the new collection into the primary shard of the database. For a more detailed reasoning, please see this comment in SERVER-14985. However, there are plans to improve this behaviour in SERVER-17737

Best regards,
Kevin

SeungUck Lee

unread,
Jun 19, 2017, 8:21:04 AM6/19/17
to mongodb-user
Thanks Kevin.

The correct behaviour for find().sort() was described in the ticket you found (https://jira.mongodb.org/browse/DOCS-7237). Specifically this quote from a comment in the ticket: “For find operations, mongos is responsible for doing the sorted merge in all versions of MongoDB”.

If the query specifies sorted results using the sort() cursor method, the mongos instance passes the $orderby option to the shards. The primary shard for the database receives and performs a merge sort for all results before returning the data to the client via the mongos.

Looks like, mongos pass sort option to primary shard of sharded cluster, in other words mongos will sort sort result primary shard will sort result.
Is this explain for aggregation only ?

Regards,
Matt.

Kevin Adistambha

unread,
Jun 20, 2017, 12:57:53 AM6/20/17
to mongodb-user

Hi

Looks like, mongos pass sort option to primary shard of sharded cluster, in other words mongos will sort sort result primary shard will sort result.
Is this explain for aggregation only ?

Yes, with the caveat that it’s not necessarily the primary shard that will perform the merging, but one of the shards chosen at random. The relevant up-to-date explanation is shown in the comment in DOCS-7237. In particular:

For aggregation operations, merging is always done on one of the shards. The shard which performs the merge is currently chosen at random amongst all shards in the cluster

This explanation also aligns with the fact that the mongos query router is not allowed to use the disk, while the allowDiskUse parameter in aggregation requires MongoDB to be able to write to disk.

Please watch/comment/upvote on https://jira.mongodb.org/browse/DOCS-7237 to receive further updates regarding this.

Best regards,
Kevin

Reply all
Reply to author
Forward
0 new messages