Mongodb 3.2, installed on centos 6, with plenty of RAM and disk. I've a collection with 10K documents of the following structure:
{
"id":5752034,
"score":7.6,
"name":"ASUS X551 15.6-inch Laptop",
"categoryId":"803",
"positiveAspects":[{
"id":30030525,
"name":"price",
"score":9.8,
"frequency":139,
"rank":100098
},
{
"id":30028399,
"name":"use",
"score":9.9,
"frequency":99,
"rank":100099
}
.
.
]
}For each document, the nested array positiveAspects has few hundreds of elements. The collection has the following indexes:
{ "v" : 1, "key" : { "_id" : 1 }, "name" : "_id_", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "positiveAspects.id" : 1.0, "positiveAspects.score" : 1.0 }, "name" : "positiveAspects.id_1_positiveAspects.score_1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "categoryId" : 1.0, "score" : 1.0 }, "name" : "categoryId_1_score_1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "rank" : -1.0 }, "name" : "rank_-1", "ns" : "proddb.product_trees" }
{ "v" : 1, "key" : { "positiveAspects.rank" : -1.0 }, "name" : "positiveAspects.rank_-1", "ns" : "proddb.product_trees" }I would like to run the following aggregation, it takes about 40 seconds:
{
aggregate:"product_trees",
pipeline:[
{
$match:{
categoryId:"803",
score:{
$gte:8.0
}
}
},
{
$unwind:"$positiveAspects"
},
{
$match:{
positiveAspects.id:30030525,
positiveAspects.score:{
$gte:9.0
}
}
},
{
$sort:{
positiveAspects.rank:-1
}
},
{
$project:{
_id:0,
score:1,
id:1,
name:1,
positiveAspects:1
}
},
{
$limit:10
}
]
}With the following explain:
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=proddb.product_treesTree: $and
categoryId == "803"
score $gte 8.0
Sort: {}
Proj: {}
=============================
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 1 is kp: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 } multikey name: 'positiveAspects.id_1_positiveAspects.score_1' io: { v: 1, key: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 }, name: "positiveAspects.id_1_positiveAspects.score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 2 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 3 is kp: { rank: -1.0 } name: 'rank_-1' io: { v: 1, key: { rank: -1.0 }, name: "rank_-1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Index 4 is kp: { positiveAspects.rank: -1.0 } multikey name: 'positiveAspects.rank_-1' io: { v: 1, key: { positiveAspects.rank: -1.0 }, name: "positiveAspects.rank_-1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Predicate over field 'score'
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Predicate over field 'categoryId'
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Relevant index 0 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Rated tree:
$and
categoryId == "803" || First: 0 notFirst: full path: categoryId
score $gte 8.0 || First: notFirst: 0 full path: score
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Tagging memoID 1
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Enumerator: memo just before moving:
2016-06-01T16:10:49.140-0500 D QUERY [conn47] About to build solntree from tagged tree:
$and
categoryId == "803" || Selected Index #0 pos 0
score $gte 8.0 || Selected Index #0 pos 1
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Planner: outputted 1 indexed solutions.
2016-06-01T16:10:49.140-0500 D QUERY [conn47] Only one plan is available; it will be run but will not be cached. query: { categoryId: "803", score: { $gte: 8.0 } } sort: {} projection: {}, planSummary: IXSCAN { categoryId: 1.0, score: 1.0 }
2016-06-01T16:11:27.170-0500 I COMMAND [conn47] command proddb.product_trees command: aggregate { aggregate: "product_trees", pipeline: [ { $match: { categoryId: "803", score: { $gte: 8.0 } } }, { $unwind: "$positiveAspects" }, { $match: { positiveAspects.id: 30030525, positiveAspects.score: { $gte: 9.0 } } }, { $sort: { positiveAspects.rank: -1 } }, { $project: { _id: 0, score: 1, id: 1, name: 1, positiveAspects: 1 } }, { $limit: 10 } ], cursor: {} } keyUpdates:0 writeConflicts:0 numYields:226 reslen:7459 locks:{ Global: { acquireCount: { r: 906 } }, Database: { acquireCount: { r: 453 } }, Collection: { acquireCount: { r: 453 } } } protocol:op_query 38030msTaking out the $sort, the query runs in 2 seconds.
Can you explain why the $sort cause such performance hit, considering there is index it can use? Is there an index I missed? What can be done in order to fix?
Thanks!
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=proddb.product_treesTree: $and
categoryId == "803"
score $gte 8.0
Sort: {}
Proj: {}
=============================
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Index 1 is kp: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 } multikey name: 'positiveAspects.id_1_positiveAspects.score_1' io: { v: 1, key: { positiveAspects.id: 1.0, positiveAspects.score: 1.0 }, name: "positiveAspects.id_1_positiveAspects.score_1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Index 2 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Index 3 is kp: { rank: -1.0 } name: 'rank_-1' io: { v: 1, key: { rank: -1.0 }, name: "rank_-1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Index 4 is kp: { positiveAspects.rank: -1.0 } multikey name: 'positiveAspects.rank_-1' io: { v: 1, key: { positiveAspects.rank: -1.0 }, name: "positiveAspects.rank_-1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Index 5 is kp: { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 } multikey name: 'categoryId_1_score_1_positiveAspects.rank_-1' io: { v: 1, key: { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 }, name: "categoryId_1_score_1_positiveAspects.rank_-1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Predicate over field 'score'
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Predicate over field 'categoryId'
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Relevant index 0 is kp: { categoryId: 1.0, score: 1.0 } name: 'categoryId_1_score_1' io: { v: 1, key: { categoryId: 1.0, score: 1.0 }, name: "categoryId_1_score_1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Relevant index 1 is kp: { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 } multikey name: 'categoryId_1_score_1_positiveAspects.rank_-1' io: { v: 1, key: { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 }, name: "categoryId_1_score_1_positiveAspects.rank_-1", ns: "proddb.product_trees" }
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Rated tree:
$and
categoryId == "803" || First: 0 1 notFirst: full path: categoryId
score $gte 8.0 || First: notFirst: 0 1 full path: score
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Tagging memoID 1
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Enumerator: memo just before moving:
2016-06-02T06:15:48.354-0500 D QUERY [conn61] About to build solntree from tagged tree:
$and
categoryId == "803" || Selected Index #1 pos 0
score $gte 8.0 || Selected Index #1 pos 1
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { categoryId: 1, score: 1, positiveAspects.rank: -1 }, { score: 1 }, { score: 1, positiveAspects.rank: -1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0], field #2['positiveAspects.rank']: [MaxKey, MinKey]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { categoryId: 1, score: 1, positiveAspects.rank: -1 }, { score: 1 }, { score: 1, positiveAspects.rank: -1 }, ]
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Tagging memoID 1
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Enumerator: memo just before moving:
2016-06-02T06:15:48.354-0500 D QUERY [conn61] About to build solntree from tagged tree:
$and
categoryId == "803" || Selected Index #0 pos 0
score $gte 8.0 || Selected Index #0 pos 1
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Planner: adding solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
2016-06-02T06:15:48.354-0500 D QUERY [conn61] Planner: outputted 2 indexed solutions.
2016-06-02T06:15:48.363-0500 D QUERY [conn61] Scoring plan 0:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { categoryId: 1, score: 1, positiveAspects.rank: -1 }, { score: 1 }, { score: 1, positiveAspects.rank: -1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0], field #2['positiveAspects.rank']: [MaxKey, MinKey]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { categoryId: 1, score: 1, positiveAspects.rank: -1 }, { score: 1 }, { score: 1, positiveAspects.rank: -1 }, ]
Stats:
{ "stage" : "FETCH",
"nReturned" : 49,
"executionTimeMillisEstimate" : 0,
"works" : 101,
"advanced" : 49,
"needTime" : 52,
"needYield" : 0,
"saveState" : 1,
"restoreState" : 1,
"isEOF" : 0,
"invalidates" : 0,
"docsExamined" : 49,
"alreadyHasObj" : 0,
"inputStage" : { "stage" : "IXSCAN",
"nReturned" : 49,
"executionTimeMillisEstimate" : 0,
"works" : 101,
"advanced" : 49,
"needTime" : 52,
"needYield" : 0,
"saveState" : 1,
"restoreState" : 1,
"isEOF" : 0,
"invalidates" : 0,
"keyPattern" : { "categoryId" : 1,
"score" : 1,
"positiveAspects.rank" : -1 },
"indexName" : "categoryId_1_score_1_positiveAspects.rank_-1",
"isMultiKey" : true,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : { "categoryId" : [
"[\"803\", \"803\"]" ],
"score" : [
"[8.0, inf.0]" ],
"positiveAspects.rank" : [
"[MaxKey, MinKey]" ] },
"keysExamined" : 101,
"dupsTested" : 101,
"dupsDropped" : 52,
"seenInvalidated" : 0 } }
2016-06-02T06:15:48.363-0500 D QUERY [conn61] Scoring query plan: IXSCAN { categoryId: 1.0, score: 1.0, positiveAspects.rank: -1.0 } planHitEOF=0
2016-06-02T06:15:48.363-0500 D QUERY [conn61] score(1.48545) = baseScore(1) + productivity((49 advanced)/(101 works) = 0.485149) + tieBreakers(0.0001 noFetchBonus + 0.0001 noSortBonus + 0.0001 noIxisectBonus = 0.0003)
2016-06-02T06:15:48.363-0500 D QUERY [conn61] score = 1.48545
2016-06-02T06:15:48.363-0500 D QUERY [conn61] Scoring plan 1:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
Stats:
{ "stage" : "FETCH",
"nReturned" : 101,
"executionTimeMillisEstimate" : 0,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
"needYield" : 0,
"saveState" : 1,
"restoreState" : 1,
"isEOF" : 0,
"invalidates" : 0,
"docsExamined" : 101,
"alreadyHasObj" : 0,
"inputStage" : { "stage" : "IXSCAN",
"nReturned" : 101,
"executionTimeMillisEstimate" : 0,
"works" : 101,
"advanced" : 101,
"needTime" : 0,
"needYield" : 0,
"saveState" : 1,
"restoreState" : 1,
"isEOF" : 0,
"invalidates" : 0,
"keyPattern" : { "categoryId" : 1,
"score" : 1 },
"indexName" : "categoryId_1_score_1",
"isMultiKey" : false,
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 1,
"direction" : "forward",
"indexBounds" : { "categoryId" : [
"[\"803\", \"803\"]" ],
"score" : [
"[8.0, inf.0]" ] },
"keysExamined" : 101,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0 } }
2016-06-02T06:15:48.363-0500 D QUERY [conn61] Scoring query plan: IXSCAN { categoryId: 1.0, score: 1.0 } planHitEOF=0
2016-06-02T06:15:48.363-0500 D QUERY [conn61] score(2.0003) = baseScore(1) + productivity((101 advanced)/(101 works) = 1) + tieBreakers(0.0001 noFetchBonus + 0.0001 noSortBonus + 0.0001 noIxisectBonus = 0.0003)
2016-06-02T06:15:48.363-0500 D QUERY [conn61] score = 2.0003
2016-06-02T06:15:48.364-0500 D QUERY [conn61] Winning solution:
FETCH
---fetched = 1
---sortedByDiskLoc = 0
---getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
---Child:
------IXSCAN
---------keyPattern = { categoryId: 1.0, score: 1.0 }
---------direction = 1
---------bounds = field #0['categoryId']: ["803", "803"], field #1['score']: [8.0, inf.0]
---------fetched = 0
---------sortedByDiskLoc = 0
---------getSort = [{ categoryId: 1 }, { categoryId: 1, score: 1 }, { score: 1 }, ]
2016-06-02T06:15:48.364-0500 D QUERY [conn61] Winning plan: IXSCAN { categoryId: 1.0, score: 1.0 }
2016-06-02T06:16:26.351-0500 I COMMAND [conn61] command proddb.product_trees command: aggregate { aggregate: "product_trees", pipeline: [ { $match: { categoryId: "803", score: { $gte: 8.0 } } }, { $unwind: "$positiveAspects" }, { $match: { positiveAspects.id: 30030525, positiveAspects.score: { $gte: 9.0 } } }, { $sort: { positiveAspects.rank: -1 } }, { $limit: 10 }, { $project: { _id: 0, score: 1, id: 1, name: 1, positiveAspects: 1 } } ], cursor: {}, allowDiskUse: true } fromMultiPlanner:1 keyUpdates:0 writeConflicts:0 numYields:227 reslen:7459 locks:{ Global: { acquireCount: { r: 908 } }, Database: { acquireCount: { r: 454 } }, Collection: { acquireCount: { r: 454 } } } protocol:op_query 37997ms
$match:{
categoryId:"803",
score:{
$gte:8.0
}Hi Barak,
Taking out the $sort, the query runs in 2 seconds.
Can you explain why the $sort cause such performance hit, considering there is index it can use? Is there an index I missed? What can be done in order to fix?
The issue is that once an aggregation pipeline reached the $unwind (or $project) stage, no index can be used anymore. Indexes are tied to the physical location of a document on disk, and $unwind or $project stages mutates the documents in the pipeline such that it doesn’t have any connection to its physical on-disk representation anymore.
With your document structure, the best method to achieve what you need is to put as many constraint in the $match stage as possible, and use a $filter method in the $project stage to eliminate unwanted array elements. For example:
db.test.aggregate([
// Match as many constraints as possible using an index
{$match: {
categoryId:'803',
score:{$gt:7.0},
'positiveAspects.id':30030525,
'positiveAspects.score':{$gte:9.9}}}
// Project with $filter on required array elements
, {$project:{
_id:0,
score:'$score',
id: '$id',
name:'$name',
positiveAspects: {
$filter:{
input:'$positiveAspects',
as:'x',
cond:{$gte:['$$x.score',9.8]}
}
}
}}
, {$unwind:'$positiveAspects'}
, {$sort:{'positiveAspects.rank':-1}}
, {$limit:10}
])
You would also need to create a suitable index for the $match stage:
db.test.createIndex({categoryId:1, score:1, 'positiveAspects.id':1, 'positiveAspects.score':1})
Please note that you can add explain as an optional parameter to the aggregation query in the mongo shell, e.g.:
db.test.aggregate([...pipeline...], {explain:true})
Having said that, if this is a common query that is very frequently performed in your database, you may want to reconsider modifying your schema design if applicable to your use case. For example, if you store the documents with the positiveAspects array pre-unwinded:
{
"_id": ObjectId("57679a2480f6b35bfa4f51a2"),
"id": 5752034,
"score": 7.6,
"name": "ASUS X551 15.6-inch Laptop",
"categoryId": "803",
"positiveAspects": {
"id": 30030525,
"name": "price",
"score": 9.8,
"frequency": 139,
"rank": 100098
}
},
{
"_id": ObjectId("57679a2480f6b35bfa4f51a2"),
"id": 5752034,
"score": 7.6,
"name": "ASUS X551 15.6-inch Laptop",
"categoryId": "803",
"positiveAspects": {
"id": 30028399,
"name": "use",
"score": 9.9,
"frequency": 99,
"rank": 100099
}
}
In this form, you may not need an aggregation query to get the result you wanted. A relatively simple find() query would serve your purpose, with the additional advantage of being able to use an index for both the matching and the sort stage.
Best regards,
Kevin
Hi Olen
Please note that you’re replying to a thread that is 3 years old. What’s discussed in this thread may not be relevant to you, and there may be improvements in the server over the years that could make some things behave better.
If you have an issue that you would like help with, please open a new thread with more information:
Best regards,
Kevin