Re: Get count of records and documents in single query

2,757 views
Skip to first unread message

Asya Kamsky

unread,
Feb 28, 2013, 6:06:57 AM2/28/13
to mongod...@googlegroups.com
You can do it in a longer aggregation pipeline but wouldn't it be simpler to do a single count query before getting paginated information?
Doing a single "count()" query with the condition lets you use the results in every single aggregation query afterwards.  If you get count in the aggregation itself you will have to pay the penalty for extra steps every time, or you will lose the ability to generalize that aggregation (because first time you will have to do something different than subsequent times).

Do you need to know the total so you can show how many pages there are?   Some web pages "solve" this by not showing number of pages but just [1, 2, ... NEXT   LAST]

( That's probably  because it's easy to do the same query with sort in opposite direction and show just first page from the back)

Asya


On Thursday, February 28, 2013 4:01:41 AM UTC-6, Roman Pleshkov wrote:
Hi all. I'm new in mongo and use mongodb aggregation framework for my queries. I need to retrieve some records which satisfy certain conditions(include pagination+sorting) and also get count of records. 

Now, I perform next steps:
1. Create $match operator: { "$match" : { "year" : "2012" , "author.authorName" : { "$regex" : "au" , "$options" : "i"}}} 
2. Added sorting and  { "$sort" : { "some_field" : -1}} , { "$limit" : 10} , { "$skip" : 0}

After querying i received expected result: 10 documents with all fields.

For pagination i need to know a total counts of records which satisfy this conditions. in my case 25.

I use next query to get count : { "$match" : { "year" : "2012" , "author.authorName" : { "$regex" : "au" , "$options" : "i"}}} , { "$group" : { "_id" : "$all" , "reviewsCount" : { "$sum" : 1}}} ,  { "$sort" : { "some_field" : -1}} , { "$limit" : 10} , { "$skip" : 0} 


But i don't want to perform two separate queries: one for retrieving documents and second for total counts of records which satisfy certain conditions.

I want do it in one single query and get result in next format:

{
        "result" : [
                {
   "my_documets": [
                        {
"_id" : ObjectId("512f1f47a411dc06281d98c0"),
                        "author" : {
                                "authorName" : "author name1",
                                "email" : "ema...@email.com"
}
},
{
"_id" : ObjectId("512f1f47a411dc06281d98c0"),
                        "author" : {
                                "authorName" : "author name2",
                                "email" : "ema...@email.com"
}
}, .......

],
                    "total" : 25
                }
        ],
        "ok" : 1
}

I tried modify the group operator :   { "$group" : { "_id" : "$all" , "author" : "$author"  "reviewsCount" : { "$sum" : 1}}} 
But in this case I got : "exception: the group aggregate field 'author' must be defined as an expression inside an object". If add all fields in _id  then reviewsCount always = 1 because all records are different.

Nobody know how it can be implement in single query ? Maybe mongodb has some features or operators for this case? Implementation with using two separate query reduces performance for querying thousand or millions records. In my application it's very critical performance  issue.

I've been working on this all day and haven't been able to find a solution, so thought i'd turn to the mongodb  community.

Thanks.

Asya Kamsky

unread,
Feb 28, 2013, 3:32:37 PM2/28/13
to mongod...@googlegroups.com
I understand that you feel performance is crucial and I can see how you might think that two queries are slower than one query.
But what we are discussing is not two vs one queries - aggregation is NOT a query, it is an operation and it has multiple steps.
One aggregation operation and one query might be a lot faster than one longer more complex aggregation operation.

What would be even faster is if you tune your schema and indexes so that both the count query and the aggregation operation would 
perform more efficiently.   I can see that you are querying based on a regular expression, case insensitive - this means no index will 
be used and a full collection scan will have to be done.  That's going to make agg and query both painfully slow so of course
it will seem that one longer aggregation operation is better.

But if you fix your schema so that you can index the search term then query will be fast and so will the aggregation operation.

Having said all that, here is your aggregation that will give you both count and author documents:

db.collection.aggregate([
{
"$match" : {
"year" : "2012"
}
},
{
"$group" : {
"_id" : null,
"authors" : {
"$push" : "$author"
},
"reviewsCount" : {
"$sum" : 1
}
}
},
{
"$unwind" : "$authors"
},
{
"$project" : {
"_id" : 0,
"authors" : 1,
"reviewsCount" : 1
}
}
] )


Since I didn't know all your fields I simplified things - you should be able to adapt this to your exact document structure.


On Thursday, February 28, 2013 6:45:05 AM UTC-6, Roman Pleshkov wrote:
Thanks for your quick answer!

You can do it in a longer aggregation pipeline
If I correctly understood, there is an ability to implement a single query for this purpose. Right ? If so, I would like to know how i can do it.

Doing a single "count()" query with the condition lets you use the results in every single aggregation query afterwards.  If you get count in the aggregation itself you will have to pay the penalty for extra steps every time, or you will lose the ability to generalize that aggregation (because first time you will have to do something different than subsequent times).
You propose use two different queries instead of one single. But I want create two implementations (with one single and two different queries) to determine which of them will run faster. 
How I said a performance issues very critical for my application and if single query will works more faster then two different I will use the first case. 
If I wrong in my reasoning, please give me more information for investigation.

Asya Kamsky

unread,
Feb 28, 2013, 3:41:12 PM2/28/13
to mongod...@googlegroups.com
My apologies - I just went back and re-read your partial example and realized I have a completely unnecessary step in my aggregation.

Here is what you actually one (minus the fact that you need to fix the $match to actually match your condition and $push and $project fields how you actually want them:

db.docs.aggregate( [
{
"$match" : {
"year" : "2012"
}
},
{
"$group" : {
"_id" : null,
"my_documents" : {
"$push" : {
"_id" : "$_id",
"year" : "$year",
"author" : "$author"
}
},
"reviewsCount" : {
"$sum" : 1
}
}
},
{
"$project" : {
"_id" : 0,
"my_documents" : 1,
"total" : "$reviewsCount"
}
}
] )

Asya Kamsky

unread,
Feb 28, 2013, 3:50:56 PM2/28/13
to mongod...@googlegroups.com
Roman,

Are you actually planning on using some pipeline operations only available in aggregation framework? 

Because if you're just using {$match} .. {$sort}, {$skip} and {$limit} you can do all this using a regular "find" query and a cursor.
It'll be a single query that will give you the count and then let you iterate over just the specific subset of documents you want.

I've been assuming all along you needed the aggregation framework for some $group or $project or $unwind type operations which you can't do in regular querying.

Asya

Roman Pleshkov

unread,
Mar 1, 2013, 6:57:35 AM3/1/13
to mongod...@googlegroups.com
Thanks you very much for your help.  I'll try a different variants of implementation and inform which is better for me (maybe it will be useful for someone). 
Reply all
Reply to author
Forward
0 new messages