[Aggregation] Grouping multiple fields

4,304 views
Skip to first unread message

Nathan Wells

unread,
Jun 28, 2012, 7:17:26 PM6/28/12
to mongod...@googlegroups.com
I'm using the aggregation framework, and I'm wondering if there's a way to group multiple fields separately. What I'm looking for is the ability to get a list of valid values to query for, and their counts, so that I can do a guided filter, as you might see on newegg.com. Here's what I've got for my pipeline (pardon the Groovy syntax):

 pipeline : [
      ['$group': [
        "_id": [ manufacturer: '$manufacturer', features: "$features"],
        'count': ['$sum': 1]]]
    ],

This gives me the count of every combination of manufacturer and features (and does so pretty quickly), but I'm looking for the counts separately of each value of manufacturer and each value of features (which is an array, btw).

Not sure if I'm explaining this clearly enough... Can I do what I'm doing, or should I make multiple aggregation queries (which would seem more expensive)?

Scott Hernandez

unread,
Jun 28, 2012, 7:23:59 PM6/28/12
to mongod...@googlegroups.com
Can you give an example of the output you want based on example input?

--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb

Nathan Wells

unread,
Jun 29, 2012, 12:27:22 AM6/29/12
to mongod...@googlegroups.com
certainly. here's a list of documents:

{_id:1, manufacturer: "Sony", features: ["fast", "sleek", "sexy"]}
{_id:2, manufacturer: "Sony", features: ["lightweight", "sleek"]}
{_id:3, manufacturer: "Sony", features: ["cheap", "fast"]}
{_id:4, manufacturer: "Apple", features: ["fast", "sleek", "usable"]}
{_id:5, manufacturer: "Apple", features: ["lightweight", "sleek", "cheap"]}
{_id:6, manufacturer: "Apple", features: ["fast", "usable", "codecs"]}

I'd like to output to look like:

{
  "manufacturer": {
    "Sony":3,
    "Apple":3
  },
  "features": {
    "fast": 4,
    "cheap": 2,
    "codecs": 1,
    "usable": 2,
    "lightweight": 2,
    "sleek": 3,
    "sexy": 1,
  }
}
 
Nathan Wells

Nathan Wells

unread,
Jun 29, 2012, 4:48:14 PM6/29/12
to mongod...@googlegroups.com
I'm getting closer... the following pipeline

    pipeline : [
      ['$unwind': '$features'],
      ['$group': [
        '_id': 'fields',
        'states': ['$addToSet': '$state.state'],
        'features': ['$addToSet': '$features']
        ]
      ]
    ],

returns this JSON:     

{
  "serverUsed": "localhost/127.0.0.1:27017",
  "result": [{
    "_id": "fields",
    "states": [
        "Running",
        "Down",
        "Idle"
    ],
    "features": [
        "cluster2",
        "cifs",
         "vlan3",
         "vlan1",
         "nfs",
         "cluster1",
         "vlan2",
         "iscsi",
         "cluster3",
         "cluster4",
         "vlan4"
      ]
  }],
  "ok": 1.0

Nathan Wells

unread,
Jun 29, 2012, 6:06:24 PM6/29/12
to mongod...@googlegroups.com
I'm achieving what I need by using the below pipeline with $push instead of $addToSet, then using groovy to "countBy" each string. However, this means I end up transferring about 2mb of data across the wire (according to mongostat) whenever this query is run. This is most unfortunate. Also, (without aggregation using indexes), the pipeline execution itself takes 1.2 seconds or so for 20,000 documents. This is on my personal workstation with 8gb RAM and an i5 processor @ 2.67 GHz. I'm expecting this will speed up significantly when indexes are leveraged.

It seems to me that what I need is a "$count" group aggregation function. Even if there are other ways of accomplishing my goal with existing functions, this would seem to be a huge usability boon, not to mention that it would map pretty clearly to the SQL Count function. Am I on the right track here? should I submit a JIRA detailing what I'm thinking?

Nathan Wells

unread,
Jul 5, 2012, 2:19:54 AM7/5/12
to mongod...@googlegroups.com
Sorry, I'm new to this forum... Should I submit a JIRA, or get input from 10gen folks first? Does my desired functionality make sense? Thanks.

Nat

unread,
Jul 5, 2012, 2:22:59 AM7/5/12
to mongod...@googlegroups.com
Please submit it to JIRA so it can be tracked.
From: Nathan Wells <nww...@gmail.com>
Date: Wed, 4 Jul 2012 23:19:54 -0700 (PDT)
Subject: Re: [mongodb-user] [Aggregation] Grouping multiple fields

Nathan Wells

unread,
Jul 9, 2012, 6:59:35 PM7/9/12
to mongod...@googlegroups.com, nat....@gmail.com
I created the JIRA here:


Note that I named the issue "New group aggregation function: $count", when it would be more precise to say "New pipeline operator...". I'd appreciate if someone can change this.

Thanks,
Nathan W

Niraj Darji

unread,
Sep 1, 2015, 3:47:00 PM9/1/15
to mongodb-user, nww...@gmail.com

Hi Nathan,
 Did you find the correct approach for getting data as per your defined structure?

 I have checked JIRA but no answered there too. 
 If you find the approch then please help me. 

Thanks

Stephen Steneker

unread,
Sep 2, 2015, 11:21:15 PM9/2/15
to mongodb-user, nww...@gmail.com
On Wednesday, 2 September 2015 05:47:00 UTC+10, Niraj Darji wrote:
 Did you find the correct approach for getting data as per your defined structure?

 I have checked JIRA but no answered there too. 
 If you find the approch then please help me. 

Hi Niraj,

Instead of replying to old discussions, please start a new discussion with more information on your specific use case (eg. example document, MongoDB server and/or driver version, expected output, what you've tried, ..).

It looks like the original feature suggested as a result of this thread is still unresolved (https://jira.mongodb.org/browse/SERVER-6365), but there may be alternative solutions available depending on your version of MongoDB and what you're actually trying to achieve.

Thanks,
Stephen

Niraj Darji

unread,
Sep 19, 2015, 2:30:19 AM9/19/15
to mongodb-user, nww...@gmail.com
Hi Stephen,

Yes, you are right. I started my own and find solution. 

Thanks and Regards,
Niraj N. Darji
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages