Aggregation Framework and _id subdocuments

244 views
Skip to first unread message

Christian Csar

unread,
Nov 26, 2012, 3:28:05 PM11/26/12
to mongod...@googlegroups.com
I'm currently trying out the aggregation framework on a collection that sometimes is updated via map reduce so documents only have the _id and value documents with various subdocuments. But I'm running into some things that seem quite odd and I'm at a loss to figure out what I'm supposed to be trying or doing differently. I apologize for all of the text in the email I'm copying. I've tried several different aggregation queries and often not getting results similar to what is described in http://stackoverflow.com/questions/12878722/mongodb-nested-field-in-groups-id but I haven't found any documentation that addresses this. Most recently I have tried the query at the bottom of the email, { $match : { "value.U" : 158, "_id.t" :"I"} , $group : {_id : {Z: "$_id.Z", U : "$value.U" } , cou
nt : {$sum: "$value.I"} }} . Assuming that I understand the $match pipeline correctly it serves a similar function this find query.

mongos> db.hourAggregate.find({"value.U" : 158, "_id.t" : "I"})
{ "_id" : { "K" : 837, "C" : 123837, "t" : "I", "Z" : 137, "T" : ISODate("2011-09-13T17:00:00Z") }, "value" : { "I" : NumberLong(1), "U" : 158, "u" : 84 } }
{ "_id" : { "K" : 836, "C" : 123836, "t" : "I", "Z" : 138, "T" : ISODate("2011-09-13T17:00:00Z") }, "value" : { "I" : NumberLong(1), "U" : 158, "u" : 84 } }
{ "_id" : { "K" : 860, "C" : 123860, "t" : "I", "Z" : 138, "T" : ISODate("2011-09-13T17:00:00Z") }, "value" : { "I" : NumberLong(1), "U" : 158, "u" : 84 } }
{ "_id" : { "K" : 855, "C" : 123855, "t" : "I", "Z" : 138, "T" : ISODate("2011-09-13T17:00:00Z") }, "value" : { "I" : NumberLong(1), "U" : 158, "u" : 84 } }
{ "_id" : { "K" : 839, "C" : 123839, "t" : "I", "Z" : 139, "T" : ISODate("2011-09-13T17:00:00Z") }, "value" : { "I" : NumberLong(1), "U" : 158, "u" : 84 } }
{ "_id" : { "K" : 836, "C" : 123836, "t" : "I", "Z" : 139, "T" : ISODate("2011-09-13T17:00:00Z") }, "value" : { "I" : NumberLong(1), "U" : 158, "u" : 84 } }
(there are 63664 documents that match this query in the data I'm using)

In essence I simply want to get the sum of value.I for all documents with value.U being 158 grouped by the value of Z. _id.t is a type of document produced by the map reduce that outputs to this collection so all the documents with _id.t: "I" are of the form above, though they may have additional fields in value.
As can be seen in the results below, it appears that not only is the match restriction on "value.U" ignored, but that the _id.Z field simply doesn't appear in the output. The expected output has about 3 or so documents similar to {_id: {U:158, Z:138}, count : NumberLong{345numbers)}} as there are around 3 different values for Z (and this is reflected by the map reduce query I'm trying to reimplement in the aggregation framework). My attempt to use the suggested workaround for that stack overflow question was
mongos> db.hourAggregate.aggregate( { $match : { "value.U" : 158, "_id.t" :"I"} , $project : { Z:"$_id.Z", U:"$value.U", I:"$value.I" }, $group : {_id : {Z: "$Z", U : "$U" } , count : {$sum: "$I"} }})
{ "result" : [ ], "ok" : 1 } provided almost instantaneously.

Any guidance would be greatly appreciated, I'm looking forward to being able to run multiple of these queries (varying the number for value.U and adding date restrictions to the match) in parallel on multiple cores with the aggregation framework as opposed to the one core limitation for map reduce.

mongos> db.hourAggregate.aggregate( { $match : { "value.U" : 158, "_id.t" :"I"} , $group : {_id : {Z: "$_id.Z", U : "$value.U" } , cou
nt : {$sum: "$value.I"} }})
{
        "result" : [
                {
                        "_id" : {
                                "U" : 122
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 127
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 192
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 150
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 158
                        },
                        "count" : NumberLong(3230956)
                },
                {
                        "_id" : {
                                "U" : 125
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 91
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 116
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 203
                        },
                        "count" : 0
                },
 {
                        "_id" : {
                                "U" : 111
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 210
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 213
                        },
                        "count" : NumberLong(97058)
                },
                {
                        "_id" : {
                                "U" : 221
                        },
                        "count" : NumberLong(20680)
                },
                {
                        "_id" : {
                                "U" : 139
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 156
                        },
                        "count" : 0
                },
                {
                        "_id" : {
                                "U" : 120
                        },
                        "count" : NumberLong(115950)
                },
                {
                        "_id" : {
                                "U" : 142
                        },
                        "count" : 0
                },

and so forth for quite a few screens including the entry for U:158 with a "count" : NumberLong(<numbers>)

cac...@gmail.com

unread,
Nov 26, 2012, 3:35:02 PM11/26/12
to mongod...@googlegroups.com
Oh and this is 2.2.1.
mongos> version()
version: 2.2.1

Also the value generated for U:158 is exactly twice that generated in total by my map reduce implementation, which is odd as if it were completely ignoring the restriction on _id.t:"I" I think it would be triple.



--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

cac...@gmail.com

unread,
Nov 26, 2012, 4:59:06 PM11/26/12
to mongod...@googlegroups.com
I've confirmed that the aggregation I used is getting the correct result for U:158 if you ignore the additional  restriction on _id.t (it being double rather than triple revealed an unrelated data quality). I was unable to find an example in the documentation for a match with more than one constraint, but operating on the basis that "The syntax passed to the $match is identical to the query syntax" I can't seem to find anything I'm doing incorrectly for the match stage at least.

So in short ignoring the complexities of the documents the $match seems to simply not be working. I'll be quite pleased and chagrined if it turns out to be some sort of typo that's been stymieing me (though an error message instead of incorrect results would really have helped).

Christian

Christian Csar

unread,
Nov 26, 2012, 7:02:17 PM11/26/12
to mongod...@googlegroups.com
Well I've managed to replicate half of the problem with the standard http://media.mongodb.org/zips.json data with db.zipCollection.aggregate( {$match: { "city" : "SPRINGFIELD" , "pop" : {$gt : 30000}}, $group : {_id : "$city", totalPop : {"$sum" : "$pop"}}} ) which I believe should give me the a document showing the total population of all cities (or zip code areas) named SPRINGFIELD with a population over 30K. Instead this appears to give me the sum of the populations of all cities by name so that I have for instance                
{
                        "_id" : "MAINE",
                        "totalPop" : 254
                },

near the end of a rather long list with my match statement having no apparent effect whatsoever. When I aggregate with the match statement alone I get the correct selection of documents. While I haven't successfully replicated the second part of my problem (the _id.Z field not appearing at all) with sample data I suspect it requires the use of a subfield of the initial _id. Given that the match part is correct on it's own and it really seems to match the syntax for using a match query in a pipeline I'm not too clear on where to go from here other than try it out on a fresh mongod instance perhaps on a fresh EC2 instance. If this is a known issue I haven't seen any mention of it on this group, and if it is an issue vs a mistake on my part, it kind of makes the aggregation framework dangerously useless.

Christian

Christian Csar

unread,
Nov 26, 2012, 7:55:09 PM11/26/12
to mongod...@googlegroups.com
Right I seem to have solved at least half of my problem. I got hit by https://jira.mongodb.org/browse/SERVER-6861 and not paying close enough attention to braces. Each pipeline stage is it's own object and they are specified as either an array or as separate arguments. Fortunately 2.3.1 seems to add a nice error message.

I'm still working on the bit that's the nested documents in _id, but now I should at least be running all of the stages I want to.

Christian

cac...@gmail.com

unread,
Nov 26, 2012, 8:51:16 PM11/26/12
to mongod...@googlegroups.com
Right so the work around I linked to in that stack overflow article does appear to work, but I couldn't find a mention of it in the documentation. See below:

mongos> db.hourAggregate.aggregate( { $match : { "value.U" : 158, "_id.t" :"I"}} , {$group : {_id : {Z: "$_id.Z", U : "$value.U" } , count : {$sum: "$value.I"} }})
{
        "result" : [
                {
                        "_id" : {
                                "U" : 158
                        },
                        "count" : NumberLong(1615478)
                }
        ],
        "ok" : 1

}
mongos> db.hourAggregate.aggregate( { $match : { "value.U" : 158, "_id.t" :"I"}} , {$project : { Z:"$_id.Z", U:"$value.U", I : "$value.I"}} ,{$group : {_id : {Z: "$Z", U : "$U" } , count : {$sum: "$I"} }})
{
        "result" : [
                {
                        "_id" : {
                                "Z" : 137,
                                "U" : 158
                        },
                        "count" : NumberLong(541555)
                },
                {
                        "_id" : {
                                "Z" : 138,
                                "U" : 158
                        },
                        "count" : NumberLong(470692)
                },
                {
                        "_id" : {
                                "Z" : 139,
                                "U" : 158
                        },
                        "count" : NumberLong(603231)
                }
        ],
        "ok" : 1
}



Christian Csar

unread,
Nov 29, 2012, 6:57:59 PM11/29/12
to mongod...@googlegroups.com

Andre de Frere

unread,
Dec 2, 2012, 5:20:40 PM12/2/12
to mongod...@googlegroups.com
Hi Christian,

According to the investigation in the ticket you filed, this appears to be resolved in 2.2.2 and 2.3.1.

Have you tried this in the later releases?

Regards,
André
Reply all
Reply to author
Forward
0 new messages