$group and non-existing fields

16,191 views
Skip to first unread message

Fred

unread,
Oct 29, 2012, 11:21:44 AM10/29/12
to mongod...@googlegroups.com
Hello,

I'm trying to use the new aggregation framework and have a problem with non-existing fileds in some documents.

db.test.insert({'user' : 'F42', 'group' : 'admin', 'age':42})
db.test.insert({'user' : 'Fred', 'group' : 'admin', 'age':40})
db.test.insert({'user' : 'Bob', 'group' : 'admin'})

db.test.insert({'user' : 'Foo',  'group' : 'bar'})
db.test.insert({'user' : 'Bar',  'group' : 'bar'})
db.test.insert({'user' : 'B50', 'group' : 'bar', 'age' : 20})
db.test.insert({'user' : 'B20', 'group' : 'bar', 'age' : 50})


as you can see not all users have the age field. I'm trying to get two oldest users from each group:

db.test.aggregate({$project:{user:'$user', group:'$group', age:{$ifNull:['$age', null]}},$group:{_id:'$group', age:'$age'}})

which gives me

{
        "errmsg" : "exception: the group aggregate field 'age' must be defined as an expression inside an object",
        "code" : 15951,
        "ok" : 0
}


doesn't $project produces an output which should be consumed by $group? If so, why it is complaining about non existing field? $ifNull makes sure it is there.

Regards
Fred

Jeremy Mikola

unread,
Oct 29, 2012, 12:30:49 PM10/29/12
to mongod...@googlegroups.com
The problem is not the absence of an "age" field, as the $project will certainly create it if necessary.

You're missing this line in the documentation, though:

Each field defined for the $group (excluding the "_id" field) must use one of the group aggregation function listed below to generate its composite value:

There are currently eight different aggregate expressions that you can use, which you'll find here: http://docs.mongodb.org/manual/reference/aggregation/group/#_S_group

Additionally, it should not be necessary to project "user" as "$user". "1" should suffice, since you're merely saying you wish to include the "user" field. The field reference "$user" would be useful if you were renaming fields in the projection or if you needed the value of "user" in an expression argument.

And beyond that, the $project itself should not even be necessary before the $group. Consider the following shell example:

> db.foo.drop()
true
> db.foo.insert({x:1})
> db.foo.insert({x:3})
> db.foo.insert({})
> db.foo.aggregate({$group:{_id:1, avg:{$avg:"$x"}, sum:{$sum:1}}})
{
"result" : [
{
"_id" : 1,
"avg" : 2,
"sum" : 3
}
],
"ok" : 1
}

The aggregate expressions supported by $group handle missing values intuitively. If a field is absent, it is not considered for an average, but we can still count the document.

Fred

unread,
Oct 29, 2012, 1:14:42 PM10/29/12
to mongod...@googlegroups.com


Thx for help! So, now I have>

mongo> db.test.aggregate({$group:{_id:'$group', maxAge:{$max:'$age'}}})
{
        "result" : [
                {
                        "_id" : "bar",
                        "maxAge" : 50
                },
                {
                        "_id" : "admin",
                        "maxAge" : 42
                }
        ],
        "ok" : 1
}

The oldest user from each group. Question is, how do I add additional fields to the output like 'user' or how can I output the hole document founded by $max?  How do I find second first and oldest users from each group? Is this possible?
 

Jeremy Mikola

unread,
Oct 29, 2012, 1:40:31 PM10/29/12
to mongod...@googlegroups.com

On Monday, October 29, 2012 1:14:42 PM UTC-4, Fred wrote:

Thx for help! So, now I have>

mongo> db.test.aggregate({$group:{_id:'$group', maxAge:{$max:'$age'}}})
{
        "result" : [
                {
                        "_id" : "bar",
                        "maxAge" : 50
                },
                {
                        "_id" : "admin",
                        "maxAge" : 42
                }
        ],
        "ok" : 1
}

The oldest user from each group. Question is, how do I add additional fields to the output like 'user' or how can I output the hole document founded by $max?  How do I find second first and oldest users from each group? Is this possible?


If you were to $sort the documents by age descending prior to $group, you could then $group by your "group" field ("admin" and "bar" buckets) and use the $first expression in $group to collect the first document you encounter.

I don't know of a way to access something that isn't the $first or $last document. At the expense of making the command result much larger, you could collect each grouped document with $push or $addToSet. In that case, you would benefit from a conservative projection to limit the document to just essential fields before collecting them.

If you have frequent need to query for offsets (i.e. skips) among ages for groups, it may be preferable to use the aggregation framework or the distinct() command to collect unique groups and then issue normal queries with a skip and limit. The optimal index in that case would be a compound index of group and age (descending or ascending depending on your needs).

Fred

unread,
Oct 30, 2012, 4:56:30 AM10/30/12
to mongod...@googlegroups.com

Hello Jeremy,

so getting N-top docs is not possible with the framework.

db.test.aggregate({$sort:{age:1}, $group:{_id:'$group', age:{$last:'$age'}, user:{$last:'$user'}}})
You've said, I could get last document with $last operator. what is the syntax for getting the hole document without specifying each field manually? For a big document it is a lot of typing

Fred

Jeremy Mikola

unread,
Nov 10, 2012, 10:21:03 PM11/10/12
to mongod...@googlegroups.com


On Tuesday, October 30, 2012 4:56:31 AM UTC-4, Fred wrote:

so getting N-top docs is not possible with the framework.

With respect to the $group operator, this is not currently possible. I know top-k is supported ($sort followed by $limit), but that's not quite what you need.
 

db.test.aggregate({$sort:{age:1}, $group:{_id:'$group', age:{$last:'$age'}, user:{$last:'$user'}}})
You've said, I could get last document with $last operator. what is the syntax for getting the hole document without specifying each field manually? For a big document it is a lot of typing

I don't believe there's any way to refer to an entire document with $ syntax, but that sounds like a useful feature. Feel free can open a JIRA ticket on the SERVER project (component: "Aggregation Framework") and cross-reference this mailing list thread via the Google Group link.
Reply all
Reply to author
Forward
0 new messages