Hello Ryan,
I'm wondering if the query in $match is to blame for poor performance.
Could you explain the logic behind the parent_ids that you're
aggregating at any given time? Could you also run this query
with .explain()? For example, in the shell:
db.yourCollection.find({"parent_id": {$in: [3269,...]}}).explain()
On Aug 30, 4:44 pm, Ryan <
r...@rcpsystems.com> wrote:
> Hello,
>
> I am still fairly new to Mongo 2.2 but am testing it out with about 60+
> million records of interval data in 5 minute increments.
> These get imported and may, at times, need to be updated in the future.
>
> A document in my intervals collection looks something like this:
>
> {
> "_id": ObjectId("50338d0s627cdf4a420ea3db"),
> "parent_id": NumberInt(3776),
> "amount": 98.35,
> "int_min": NumberInt(5),
> "dt": ISODate("2012-07-01T00: 10: 00.0Z")
>
> }
>
> One of the things I would like to be able to do is to sum "amount" of all
> records for a given date range where the parent_id could be 100 (as an
> example, could be more or less) unique parent_id's.
>
> I have tried this using the aggregation framework with the PHP driver:
>
> $result = $db->command(
> array(
> "aggregate" => my_intervals",
> "pipeline" => array(
> array(
> '$match' => array(
> 'parent_id' => array(
> '$in' => array(
> 3269,2125,2135,2145,2901,2903,2905,2907,
> 2909,2911,2913,2915,2917 // could be many more parent_id's here
> )
> ),
> 'dt' => array(
> '$gte' => $start_date,
> '$lte' => $end_date
> )
> )
> ),
> array(
> '$group' => array(
> '_id' => '$parent_id',
> 'sum' => array(
> '$sum' => '$amount'
> )
> )
> ),
> array(
> '$project' => array(
> '_id' => 0,
> 'parent_id' => '$_id',
> 'sum' => '$sum',
> )
> )
> )
> )
> );
>
> And this gets me close in that it sums up the amounts by parent_id and I
> can then get the total sum using PHP, but this all takes around 30 seconds
> or longer.
> I need to be able to perform these kinds of calculations against millions
> of interval data in milliseconds preferably.
>
> I have been googling how people have dealt with similar situations by
> rolling up their data into daily and monthly increments and came across
> some interesting informationhttp://
www.slideshare.net/dacort/mongodb-realtime-data-collection-and...,
> andhttp://
blog.tommoor.com/post/24059620728/realtime-analytics-at-buffer...
> but am still trying to wrap my head around how to achieve this so any help
> would be great.
>
> Thanks