Grouped Distinct Count on multiple fields

25 views
Skip to first unread message

Ben Foster

unread,
May 16, 2014, 6:51:12 AM5/16/14
to mongod...@googlegroups.com
I need to perform a distinct count on *multiple* fields. The issue I'm facing is that I really need to split the aggregation pipeline, then merge the final results.

Suppose we have a group ordering system where multiple users can add lines to an order. An order line has a single product and each line needs to be approved.

A document for a single line looks like so:

{
   
"_id" : ObjectId("5374d63e4ba1a72c8877808c"),
   
"groupOrderId" : ObjectId("5374d63e4ba1a72c8877808a"),
   
"userId" : 2,
   
"productId" : 1,
   
"approved" : false
}

We want to display a report for each group order that displays the number of unique products on the order, the number of unique users, the number of order lines and the number of order lines that have been approved. Something like:

{
 
"groupOrderId" : ObjectId("5374d63e4ba1a72c8877808a"),
 
"uniqueProducts" : 5,
 
"uniqueUsers" : 2,
 
"numberOfLines" : 10,
 
"numberOfApprovedLines": 2
}

For the distinct count, in SQL we might do something like:

SELECT Id, COUNT(DISTINCT [ProductId]) uniqueProducts, SELECT COUNT(DISTINCT [UserId]) uniqueUsers
FROM
Orders
GROUP BY
Order.Id

To achieve this in Mongo I am using the aggregation pipeline. First I group by order id and product id to get me the unique products on the order and then I group by order id to get the count of products:

db.scorecardSubmissions.aggregate([
 
{
    $group
: {
      _id
: {
        groupOrderId
: '$groupOrderId',
        productId
: '$productId'
     
},
      numberOfLines
: {
        $sum
: 1
     
},
      numberOfApprovedLines
: {
        $sum
: {
          $cond
: [
           
{
              $eq
: ['$approved', true]
           
},
             
1,
             
0
         
]
       
}
     
}
   
},  
 
},
 
 
{
    $group
: {
      _id
: {
        groupOrderId
: '$_id.groupOrderId'
     
},
      uniqueProducts
: {
        $sum
: 1
     
},
      numberOfLines
: {
        $sum
: '$numberOfLines'
     
},
      numberOfApprovedLines
: {
        $sum
: '$numberOfApprovedLines'
     
}
   
}
 
}
])

This returns a result set like:

{
 
"groupOrderId" : ObjectId("5374d63e4ba1a72c8877808a"),
 
"uniqueProducts" : 5,
 
"numberOfLines" : 10,
 
"numberOfApprovedLines": 2
}


Note that I am missing uniqueUsers and is the part I need help with.

I can't figure out how to return the distinct count of both users and products for each order within a single aggregation.

Asya Kamsky

unread,
May 16, 2014, 2:56:07 PM5/16/14
to mongodb-user
Are you using 2.6 yet?   If so then it's quite easy:

{
    
"_id" : ObjectId("5374d63e4ba1a72c8877808c"),
    
"groupOrderId" : ObjectId("5374d63e4ba1a72c8877808a"),
    
"userId" : 2,
    
"productId" : 1,
    
"approved" : false
}

db.scorecardSubmissions.aggregate([
   { $group: {
      _id: '$groupOrderId',
      up
: { $addToSet: '$productId'},
      
numLines:  { $sum: 1 },
      numAppLines
: {$sum : { $cond: { if: { $eq: ['$approved', true] }, then: 1, else: 0 } }
   } },
   { $project : {
       "groupOrderId" :"$_id",
       "uniqueProducts" : { $size : "$up" },
       "numberOfLines" : "$numLines",
       
"numberOfApprovedLines": "$numAppLines"
   }

       
Size operator is new to 2.6, I'm also using 2.6 $cond syntax (old one still works).
 
Asya



--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/0074c667-9d0b-4c73-ba41-2530a828c131%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Asya Kamsky

unread,
May 16, 2014, 3:24:00 PM5/16/14
to mongodb-user
Sorry, forgot to simplify one more thing (you can compare booleans directly, you don't need to check for equality to true):

db.scorecardSubmissions.aggregate([
   { $group: {
      _id: '$groupOrderId',
      up
: { $addToSet: '$productId'},
      
numLines:  { $sum: 1 },

      numAppLines
: {$sum : { $cond: { if: $approved', then: 1, else: 0 } }
Reply all
Reply to author
Forward
0 new messages