Aggregate not using indexes.

984 views
Skip to first unread message

Hemant Saxena

unread,
Jul 4, 2016, 2:10:57 PM7/4/16
to mongodb-user
MongoDB version 3.2. 
I have a collection called 'review' with around a million documents, and index defined on the field 'business_id'.
I am issuing the following aggregate query, but it fails to use the index and performs a collection scan.
Query:
  db.review.aggregate([{$project: {business_id: "$business_id"}}, {$group: {_id: "$business_id", number: {$sum: 1}}}, {$sort: {"number": -1}}]);
Isn't this query covered by the index?

The log details for log level 5 are:

2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION 
Canonical query:
ns=yelp.reviewTree: $and
Sort: {}
Proj: { business_id: 1, _id: 1 }
=============================
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Index 1 is kp: { business_id: 1.0 } name: 'business_id_1' io: { v: 1, key: { business_id: 1.0 }, name: "business_id_1", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Rated tree:
$and
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Planner: outputted 0 indexed solutions.
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] PROJECTION: fetched status: 1
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] PROJECTION: Current plan is:
COLLSCAN
---ns = yelp.review
---filter = $and
---fetched = 1
---sortedByDiskLoc = 0
---getSort = []

2016-07-04T11:01:53.874-0400 D QUERY    [conn1] PROJECTION: requires fields
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] PROJECTION: is covered?: = 1
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] PROJECTION: covered via FETCH, using SIMPLE_DOC fast path
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Beginning planning...
=============================
Options = NO_BLOCKING_SORT INDEX_INTERSECTION 
Canonical query:
ns=yelp.reviewTree: $and
Sort: {}
Proj: {}
=============================
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Index 0 is kp: { _id: 1 } unique name: '_id_' io: { v: 1, key: { _id: 1 }, name: "_id_", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Index 1 is kp: { business_id: 1.0 } name: 'business_id_1' io: { v: 1, key: { business_id: 1.0 }, name: "business_id_1", ns: "yelp.review" }
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Rated tree:
$and
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Planner: outputted 0 indexed solutions.
2016-07-04T11:01:53.874-0400 D QUERY    [conn1] Planner: outputting a collscan:
COLLSCAN
---ns = yelp.review
---filter = $and
---fetched = 1
---sortedByDiskLoc = 0
---getSort = []
2016-07-04T11:01:53.899-0400 D QUERY    [conn1] Only one plan is available; it will be run but will not be cached. query: {} sort: {} projection: {}, planSummary: COLLSCAN
2016-07-04T11:01:57.458-0400 I COMMAND  [conn1] command yelp.review command: aggregate { aggregate: "review", pipeline: [ { $project: { business_id: "$business_id" } }, { $group: { _id: "$business_id", number: { $sum: 1.0 } } }, { $sort: { number: -1.0 } } ], cursor: {} } cursorid:65044266751 keyUpdates:0 writeConflicts:0 numYields:17413 reslen:5850 locks:{ Global: { acquireCount: { r: 35084 } }, Database: { acquireCount: { r: 17542 } }, Collection: { acquireCount: { r: 17542 } } } protocol:op_command 10038ms

Even for a simple projection query like this:
  db.review.find({},{ "business_id": true, "_id": 0 });
I observe that the planner does not use index and perfors a collection scan.
However, if I provide the hint:
  db.review.find({},{"business_id": true, "_id": 0 }).hint({"business_id": 1});
It does perform an index scan.

How are projections covered by the indexes? According to my understanding, all of the above queries should be covered by the index on 'business_id'.

Thanks.

Amar

unread,
Jul 6, 2016, 11:03:24 PM7/6/16
to mongodb-user

Hi Hemant,

db.review.aggregate([{$project: {business_id: "$business_id"}}, {$group: {_id: "$business_id", number: {$sum: 1}}}, {$sort: {"number": -1}}]);

Isn’t this query covered by the index?

This aggregation won’t use an index because it has a $project stage. Only $match and $sort can make use of an index if they are in the beginning of the aggregation pipeline. This is documented in Aggregation Pipeline Behavior.

Even for a simple projection query like this:

  db.review.find({},{ "business_id": true, "_id": 0 });

I observe that the planner does not use index and perfors a collection scan.
However, if I provide the hint:

  db.review.find({},{"business_id": true, "_id": 0 }).hint({"business_id": 1});

It does perform an index scan.

Currently, if the query predicate is empty, MongoDB will perform a collections scan.There is a feature request for the query planner to consider index scans on empty query predicates SERVER-20066. Feel free to vote on the ticket if you consider this feature important for you.

If the query was using a filter on business_id:

db.review.find({business_id : x});

Or a sort on business_id:

db.review.find().sort({business_id : 1});

The index will be used.

Regards,

Amar

Reply all
Reply to author
Forward
0 new messages