MongoDB aggregation framework works slowly.

35 views
Skip to first unread message

Isabek Tashiev

unread,
Sep 19, 2014, 7:20:28 AM9/19/14
to mongod...@googlegroups.com
Hi guys,

I am a newbie for MongoDB. I use mongo aggregation framework.

Let's say, I have a collection which contains 16M rows, and document structure looks like:
{
 "_id" : `someId`,
 "date" : `someDate`,
 "domain" : `someDomain`,
 "adNetwork" : `someNetwork`,
 "os" : `someOs`,
 "country" : "US",
 "timestamp" : 1405181011069,
}

I want to group by day. I have two queries:

1. took ~74 seconds
{
    $group: {
        _id: {
            day: {$subtract: ['$timestamp', {$mod: ['$timestamp', 1000*3600*24]}]},
            domain: '$domain',
            adNetwork: '$adNetwork',
            country: '$country'
        },
        firstSeen: {
            $min: '$timestamp'
        },
        lastSeen: {
            $max: '$timestamp'
        }
    }
},
{
    $group: {
        _id: {
            domain: '$_id.domain',
            adNetwork: '$_id.adNetwork',
            country: '$_id.country'
        },
        firstSeen: {
            $min: '$firstSeen'
        },
        lastSeen: {
            $max: '$lastSeen'
        },
        activeDays: {
            $sum: 1
        }
    }
}

2. took ~90 seconds
{
    $group: {
        _id: {
            y: {$year: '$_date'},
            m: {$month: '$_date'},
            d: {$dayOfMonth: '$_date'},
            domain: '$domain',
            adNetwork: '$adNetwork',
            country: '$country'
        },
        firstSeen: {
            $min: '$timestamp'
        },
        lastSeen: {
            $max: '$timestamp'
        }
    }
},
{
    $group: {
        _id: {
            domain: '$_id.domain',
            adNetwork: '$_id.adNetwork',
            country: '$_id.country'
        },
        firstSeen: {
            $min: '$firstSeen'
        },
        lastSeen: {
            $max: '$lastSeen'
        },
        activeDays: {
            $sum: 1
        }
    }
}

I think my queries are quite slow. Any ideas?
 
Isabek

Will Berkeley

unread,
Sep 19, 2014, 1:30:56 PM9/19/14
to mongod...@googlegroups.com
Your queries must walk the full collection to create the groups- that's a lot of work when you have 16 million documents. You shouldn't need to frequently group all your documents by day since you only get a new day, well, once a day. Ad hoc aggregations like this should start with a $match stage to narrow down the docs with some indexed query so the pipeline doesn't have to process every single document. Why are you grouping by day? What are you doing with the results and how frequently? I imagine there's a better way to do this for whatever your use case is.

-Will
Reply all
Reply to author
Forward
0 new messages