{
"_id" : "uBERl1oijUOsE7fGm4o6RA",
"DataCollection" : [
{
"ModuleType" : "TaskDuration",
"Value" : 200.0
},
{
"ModuleType" : "TaskPriority",
"Value" : "High priority"
},
{]
"ModuleType" : "TaskName",
"Value" : "Test task 1"
}
}
{
"result": [
{ _id: "High Priority", Duration: 500 },
{ _id: "Low Priority", Duration: 100 }
]
}
db.runCommand({
aggregate: "mycollection",
pipeline: [
{ $unwind : "$DataCollection" },
{ $match : { $or : [{ "DataCollection.ModuleType" : "TaskDuration" }, { "DataCollection.ModuleType" : "TaskPriority" }] } },
{ $project : { mui : "$DataCollection.ModuleType", v : "$DataCollection.Value" } },
{ $group : { _id: "$_id", dc: { $addToSet : "$$CURRENT" } } },
{ $group :
{
_id: { $cond: {
if: { $eq: ["$DataCollection.ModuleType", "TaskPriority"] },
then: { $ifNull: [ "$DataCollection.Value", "Uncategorised"] },
else: "__NOMATCH__"
} },
val: { $sum : { $cond: [{ $eq: ["$DataCollection.ModuleType", "TaskDuration"] }, "$DataCollection.Value" , 0]
} }
}
}
],
allowDiskUse: true
});
> db.t99.find()
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "DataCollection" : [ { "ModuleType" : "TaskDuration", "Value" : 200 }, { "ModuleType" : "TaskPriority", "Value" : "High priority" }, { "ModuleType" : "TaskName", "Value" : "Test task 1" } ] }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "DataCollection" : [ { "ModuleType" : "TaskDuration", "Value" : 100 }, { "ModuleType" : "TaskPriority", "Value" : "High priority" }, { "ModuleType" : "TaskName", "Value" : "Test task 2" } ] }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "DataCollection" : [ { "ModuleType" : "TaskDuration", "Value" : 100 }, { "ModuleType" : "TaskPriority", "Value" : "Low Priority" }, { "ModuleType" : "TaskName", "Value" : "Test task 3" } ] }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "DataCollection" : [ { "ModuleType" : "TaskDuration", "Value" : 300 }, { "ModuleType" : "TaskPriority", "Value" : "Low Priority" }, { "ModuleType" : "TaskName", "Value" : "Test task 4" } ] }
>
> unwind
{ "$unwind" : "$DataCollection" }
> project
{ "$project" : { "_id" : "$_id",
"duration" : { "$cond" : [ {"$eq" : [ "$DataCollection.ModuleType", "TaskDuration" ] },"$DataCollection.Value", null ] }, "priority" : { "$cond" : [ {"$eq" : [ "$DataCollection.ModuleType", "TaskPriority" ] }, "$DataCollection.Value", null ] } } }> group1
{ "$group" : { "_id" : "$_id", "duration" : { "$max" : "$duration" }, "priority" : { "$max" : "$priority" } } }
> group2
{ "$group" : { "_id" : "$priority", "duration" : { "$sum" : "$duration" } } }
>
>
> db.t99.aggregate(unwind, project, group1, group2)
{ "_id" : "High priority", "duration" : 300 }
{ "_id" : "Low Priority", "duration" : 400 }I find it easiest to understand these kinds of complex aggregations if you go backwards from the result. So if you look at this output, it's easy to understand how 'group2' works:
> db.t99.aggregate(unwind, project, group1)
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "duration" : 100, "priority" : "Low Priority" }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "duration" : 300, "priority" : "Low Priority" }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "duration" : 100, "priority" : "High priority" }
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "duration" : 200, "priority" : "High priority" }
>
> group2
{ "$group" : { "_id" : "$priority", "duration" : { "$sum" : "$duration" } } }
> db.t99.aggregate(unwind, project)
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "duration" : 200, "priority" : null }
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "duration" : null, "priority" : "High priority" }
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "duration" : null, "priority" : null }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "duration" : 100, "priority" : null }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "duration" : null, "priority" : "High priority" }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "duration" : null, "priority" : null }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "duration" : 100, "priority" : null }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "duration" : null, "priority" : "Low Priority" }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "duration" : null, "priority" : null }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "duration" : 300, "priority" : null }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "duration" : null, "priority" : "Low Priority" }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "duration" : null, "priority" : null }
>
> group1
{ "$group" : { "_id" : "$_id", "duration" : { "$max" : "$duration" }, "priority" : { "$max" : "$priority" } } }
(In case it's not clear, we're re-assembling the original documents by using the original '_id' to group. The "$max" trick works because any value will be greater than 'null')
So the tricky part is how we build the "duration" and "priority" fields in the output from "project". Lets take a look at the input and the actual operation:
> db.t99.aggregate(unwind)
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "DataCollection" : { "ModuleType" : "TaskDuration", "Value" : 200 } }
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "DataCollection" : { "ModuleType" : "TaskPriority", "Value" : "High priority" } }
{ "_id" : ObjectId("537e3081658ed684758a83b9"), "DataCollection" : { "ModuleType" : "TaskName", "Value" : "Test task 1" } }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "DataCollection" : { "ModuleType" : "TaskDuration", "Value" : 100 } }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "DataCollection" : { "ModuleType" : "TaskPriority", "Value" : "High priority" } }
{ "_id" : ObjectId("537e30d0658ed684758a83ba"), "DataCollection" : { "ModuleType" : "TaskName", "Value" : "Test task 2" } }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "DataCollection" : { "ModuleType" : "TaskDuration", "Value" : 100 } }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "DataCollection" : { "ModuleType" : "TaskPriority", "Value" : "Low Priority" } }
{ "_id" : ObjectId("537e310f658ed684758a83bb"), "DataCollection" : { "ModuleType" : "TaskName", "Value" : "Test task 3" } }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "DataCollection" : { "ModuleType" : "TaskDuration", "Value" : 300 } }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "DataCollection" : { "ModuleType" : "TaskPriority", "Value" : "Low Priority" } }
{ "_id" : ObjectId("537e312b658ed684758a83bc"), "DataCollection" : { "ModuleType" : "TaskName", "Value" : "Test task 4" } }
> project
What this does is set the "duration" field to be value of "DatacCollection.Value" only if the DataCollection.ModuleType is "TaskDuration", and null otherwise, and something similar for the 'priority' field. I do this to create the output that can be re-grouped using $max in the first $group stage.
You can use this general design pattern to query any collection that structures data using a key/value schema. (For more about the key/value schema, see this link: http://askasya.com/post/dynamicattributes .)
The design pattern is:
return new BsonDocument {
{ _AggregateCond,
new BsonArray() { condition, BsonValue.Create(ifYes), BsonValue.Create(ifNo) }
}
};
{ $cond : [<<condition document>>, "$<<element>>", null]}
...{ "$group" : { "_id" : "$priority", "duration" : { "$sum" <span style="col
"$cond" : [ {"$eq" : [ "$DataCollection.ModuleType", "TaskPriority" ] }, "$DataCollection.Value", null ]
--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/c8f86701-f8d8-4138-918e-64fbf1515837%40googlegroups.com.
One quick question: is it ever the case that there is more than one document in the DataCollection[] array that contains the key/value pair "ModuleType" : "TaskDuration"? If so, that simplifies the problem a lot.