MongoDB aggregation pipeline $match order

79 views
Skip to first unread message

Nenad Milosavljevic

unread,
Feb 23, 2015, 5:01:55 AM2/23/15
to mongod...@googlegroups.com
I have MongoDB cluster with 4 shards. My shard key is: { client_id: 1,  date: 1 }
Used collection have about 50 M of documents.

I will paste 1 document as sample data:
<pre>

    {
        "_id" : ObjectId("54e069353e9104db470065e6"),
        "campaign_name" : "SC - Biker Planet",
        "adgroup_name" : "motociclista cerca",
        "client_id" : NumberLong(143),
        "adgroup_id" : NumberLong(28469),
        "campaign_id" : NumberLong(849),
        "device" : "desktop",
        "clicks" : NumberLong(0),
        "conv" : NumberLong(0),
        "cost" : NumberLong(0),
        "impressions" : NumberLong(1),
        "date" : ISODate("2014-02-22T05:00:00.000Z")
    }

</pre>
Now, i have installed MongoDB 3.0 RC9 (with great wiredTiger storage engine), and i am comparing following 2 queries:

A)
<pre>

    db.google_raw_id.aggregate([{
        $match: {
            client_id: 143,
            campaign_name: 'SC - Dating For Seniors',
            date: {
                $gte: ISODate("2014-01-10T00:00:00.0Z"),
                $lte: ISODate("2015-01-10T00:00:00.0Z")
            }
        }
    }, {
        $group: {
            _id: "$campaign_name",
            cost: {$sum: "$cost"},
            clicks: {$sum: "$clicks"},
            impressions: {$sum: "$impressions"}
        }
    }])

</pre>

and

B)
<pre>

    db.google_raw_id.aggregate([{
        $match: {
            client_id: 143,
            date: {
                $gte: ISODate("2014-01-10T00:00:00.0Z"),
                $lte: ISODate("2015-01-10T00:00:00.0Z")
            }
        }
    }, {
        $group: {
            _id: "$campaign_name",
            cost: {$sum: "$cost"},
            clicks: {$sum: "$clicks"},
            impressions: {$sum: "$impressions"}
        }
    }, {
        $match: {
            _id: 'SC - Dating For Seniors',
        }
    }])

</pre>

Query A take about 0.35 sec to be executed, and query B take about 1.1 sec, and they both return same result.
As you can see only diff between them is that i moved "campaign_name" filter from $match before $group to $match after $group.

Is this ok approach to use, since shard key { client_id, date } only work fast if there is no other filters in first $match, or i am doing something wrong in my configuration?

Here i have explain for query A and B with MongoDB aggregation used 

Here i have query A and B simplified to use .find()

I also have index on campaign_name, but it seems that aggregation query is slower if campaign_name is used in first match, because that's not part of shard key, and that's why Mongo have to check in more than 1 shard.

To me that's logical explanation why query B is faster.
In query B, Mongo reduce result to smaller data set, and than apply campaign_name on that, so it's faster and does not go over other shards.
Just i was expecting that MongoDB can figure out that automatically :)

Asya Kamsky

unread,
Feb 25, 2015, 4:30:31 AM2/25/15
to mongodb-user

According to what you say query B is 3 times *slower* than A so I'm not sure why you would want to use it.

Asya


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/bad4b154-924d-4313-aeba-8b363f2338cb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
{ "name" : "Asya Kamsky",
  "place" : [ "New York", "Palo Alto", "Everywhere else" ],
  "email" : "as...@mongodb.com",
  "blog" : "http://www.askasya.com/",
  "twitter": "@asya999" }

guido hornig

unread,
Feb 25, 2015, 7:41:49 AM2/25/15
to mongod...@googlegroups.com
--
Reply all
Reply to author
Forward
0 new messages