I have a collection called 'review' with around a million documents, and index defined on the field 'business_id'.
I am issuing the following aggregate query, but it fails to use the index and performs a collection scan.
db.review.aggregate([{$project: {business_id: "$business_id"}}, {$group: {_id: "$business_id", number: {$sum: 1}}}, {$sort: {"number": -1}}]);
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=yelp.reviewTree: $and
Sort: {}
Proj: { business_id: 1, _id: 1 }
=============================
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Index 1 is kp: { business_id: 1.0 } name: 'business_id_1' io: { v: 1, key: { business_id: 1.0 }, name: "business_id_1", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Rated tree:
$and
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Planner: outputted 0 indexed solutions.
2016-07-04T11:01:53.874-0400 D QUERY [conn1] PROJECTION: fetched status: 1
2016-07-04T11:01:53.874-0400 D QUERY [conn1] PROJECTION: Current plan is:
COLLSCAN
---ns = yelp.review
---filter = $and
---fetched = 1
---sortedByDiskLoc = 0
---getSort = []
2016-07-04T11:01:53.874-0400 D QUERY [conn1] PROJECTION: requires fields
2016-07-04T11:01:53.874-0400 D QUERY [conn1] PROJECTION: is covered?: = 1
2016-07-04T11:01:53.874-0400 D QUERY [conn1] PROJECTION: covered via FETCH, using SIMPLE_DOC fast path
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION
Canonical query:
ns=yelp.reviewTree: $and
Sort: {}
Proj: {}
=============================
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Index 1 is kp: { business_id: 1.0 } name: 'business_id_1' io: { v: 1, key: { business_id: 1.0 }, name: "business_id_1", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Rated tree:
$and
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Planner: outputted 0 indexed solutions.
2016-07-04T11:01:53.874-0400 D QUERY [conn1] Planner: outputting a collscan:
COLLSCAN
---ns = yelp.review
---filter = $and
---fetched = 1
---sortedByDiskLoc = 0
---getSort = []
2016-07-04T11:01:53.899-0400 D QUERY [conn1] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2016-07-04T11:01:57.458-0400 I COMMAND [conn1] command yelp.review command: aggregate { aggregate: "review", pipeline: [ { $project: { business_id: "$business_id" } }, { $group: { _id: "$business_id", number: { $sum: 1.0 } } }, { $sort: { number: -1.0 } } ], cursor: {} } cursorid:65044266751 keyUpdates:0 writeConflicts:0 numYields:17413 reslen:5850 locks:{ Global: { acquireCount: { r: 35084 } }, Database: { acquireCount: { r: 17542 } }, Collection: { acquireCount: { r: 17542 } } } protocol:op_command 10038ms
I observe that the planner does not use index and perfors a collection scan.
It does perform an index scan.
How are projections covered by the indexes? According to my understanding, all of the above queries should be covered by the index on 'business_id'.
Thanks.