When to use the aggregation framework, and when not

Jens Segers

unread,

Mar 29, 2015, 2:49:57 PM3/29/15

to mongod...@googlegroups.com

Some users of my Laravel MongoDB package (https://github.com/jenssegers/laravel-mongodb) were complaining about performance issues when doing pagination on large data sets. This used to be implemented by doing skip and limit on the result cursor. To fix this issue, I switched to the aggregation framework to pass the pagination parameters in the query, which should solve this issue.

But now I'm wondering. Is it bad to use the aggregation framework for everything? It seems that you can do any kind of query using the framework, even if you're not doing any aggregations. Is it bad practice, does it have a performance impact?

Will Berkeley

unread,

Mar 30, 2015, 10:40:47 AM3/30/15

to mongod...@googlegroups.com

There's some more overhead for running a find-like query in the aggregation pipeline, but generally the performance should be similar for queries like

db.test.find(Q).sort(S).limit(L)

and pipelines like

db.test.aggregate([

{ "$match" : Q },

{ "$sort" : S },

{ "$limit" : L }

])

One important difference is that aggregation pipelines cannot use an index to cover a pipeline.

> db.test.drop()

> db.test.insert({ "a" : 1 })

> db.test.insert({ "a" : 2 })

> db.test.ensureIndex({ "a" : 1 })

> db.test.find({ "a" : 1 }, { "_id" : 0, "a" : 1 }) // covered, only needs to read index and not actual documents

> db.test.aggregate([

{ "$match" : { "a" : 1 } },

{ "$project" : { "_id" : 0, "a" : 1 } }

]} // still needs to read the matching documents even though it looks equivalent to the above find

I would prefer using find for jobs that can be done by find, and only using aggregation when find is not capable of doing the same thing.

Also, pagination is usually better done using $gt/$lt and limit in combination with a (indexed) sort, rather than skip+limit. For example, to paginate the results of the following query:

db.test.find().sort({ "a" : 1 })

add the appropriate page size to the query

db.test.find().sort({ "a" : 1 }).limit(50)

then, when it's time to retrieve the next page, store the a value of the last document on the page, which for an ascending sort will be the highest a value seen, and then use $gt to find the next documents using the index, rather than having to skip over them

db.test.find({ "a" : { "$gt" : highest_a_value_of_previous_page }).sort("a" : 1 }).limit(50)

-Will

David Alexander

unread,

Mar 30, 2015, 12:32:57 PM3/30/15

to mongod...@googlegroups.com

db.test.find({ "a" : { "$gt" : highest_a_value_of_previous_page }).sort("a" : 1 }).limit(50)

I think that only works consistently where the values of "a" are unique.

Obvious counter example : if you have 75 items where "a" is 3.99 and 75 items where "a" is 6.99.

Page 1: 50 items at 3.99

Page 2: 50 items at 6.99

No page 3.

David

Will Berkeley

unread,

Mar 30, 2015, 12:57:13 PM3/30/15

to mongod...@googlegroups.com

Yes that's true - you can miss repeated values that occur across the boundaries of ideal pages, because the actual page will then skip those boundaries. It's common to use ObjectId for paging, where this problem won't happen because of uniqueness.

Thanks for pointing it out.

-Will

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.

For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/60cff30b-4ee5-414f-a915-1b9993f8947c%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward