Pre-aggregation Question

59 views
Skip to first unread message

Fumiko

unread,
Oct 5, 2012, 12:54:28 PM10/5/12
to mongod...@googlegroups.com
Hi,

I'm new to mongodb and trying to use it for reporting taking some data from MySQL (e.g. fetch-and-modify as a purchase is made).  There are several filters to the report such as product category id, gender, country, affiliate, etc.  I'm thinking of two ways.  One is 
{ _id : today,
  sale_amount : 100.00,
  gender : ,
  country : ,
...... }
for each purchase (without pre-aggregation).  Another one is
{ _id : { date : today, gender :    , country :    , (more filters) },
  sale_amount : 100.00,
  sale_count : 1 }

for each combination of filters (with pre-aggregation). An example of what we need to see in the reporting is "Sales of product A per day made by male in some country in the last 90 days."
If somebody can give me advice on how pre-aggregation should be done in this example, I'd highly appreciate it.
Thank you in advance.

Fumiko

Jenna deBoisblanc

unread,
Oct 10, 2012, 1:36:41 PM10/10/12
to mongod...@googlegroups.com
Hello Fumiko,

Could you give us a little more information? When you say, "Sales of product A per day made by male in some country in the last 90 days" - do you need the total sales of product A in dollars, a count of the sales of product A, or both?

If you choose the first approach, you can use the aggregation framework (available in v. 2.2+) or Map Reduce to aggregate total sales based on query filters:

The better solution may be to use the $inc operator to update an existing document as new products are purchased {_id: <filter>, same_amount: x, sale_count: y}

Fumiko

unread,
Oct 11, 2012, 7:44:57 PM10/11/12
to mongod...@googlegroups.com
Hello Jenna,

I truly appreciate your help.  Yes, both dollar amount and count are needed.  Your suggestion is pre-aggregation.  I'd read those links prior to this posting and watched some recordings, and I may be having difficulty understanding it right.  But, using those several filters as compound ids, and if, say to report "Sales in dollar amount per day made by male in the past 90 days" (i.e. one filter, gender, instead of multiple filters), then we are pre-aggregating wtih $inc followed by aggregating ($match: { _id.gender: "male" }) as well as by day?  There may be 0 or more filters needed depending on report (sometimes all filters are needed and other times none - just sales in dollar amount, for example).  I'd like to know if filters (includes date) as compound id is how it should be done or not.  If not, could you please suggest what else can be possible?  I'm not at all familiar with map-reduce.  I'm hoping pre-aggregation or just plain aggregation itself is the solution.
Thank you.

Fumiko

Jenna deBoisblanc

unread,
Oct 19, 2012, 12:20:56 PM10/19/12
to mongod...@googlegroups.com
Hello Fumiko,

I think the following examples may help, but please let me know if this is not what you are trying to accomplish.

A female in the US purchases an umbrella for $100. We insert a document:
> doc = 
{ _id : 1,
  product: "umbrella",
  sale_amount : 100.00,
  sale_count: 1,
  gender : female,
  country : USA,
}
> db.collection.insert(doc)

Another female in the US purchases an umbrella for $86, and we would like to update the document to adjust the sale amount and count:

> db.collection.update({product: "umbrella", gender: female, country: USA}, {$inc: {sale_amount: 86, sale_count: 1}})
> db.collection.find()
{ _id : 1,
  product: "umbrella",
  sale_amount : 186.00,
  sale_count: 2,
  gender : female,
  country : USA,
}

Alternatively, you can insert a new document every time a purchase is made, and perform aggregation at a later point.  The advantage of this procedure is that you maintain a record of every purchase (useful for tracking), but the disadvantage is that your data size will grow, and in addition, the aggregation command may be more expensive than a simple update with $inc.  Here is an example of aggregation using the aggregation framework:
{ _id : 1,
  product: "umbrella",
  sale_amount : 100.00,
  gender : "female",
  country : "USA",
}
{ _id : 2,
  product: "umbrella",
  sale_amount : 86.00,
  gender : "female",
  country : "USA",
}

> db.collection.aggregate(
     {$match: {product: "umbrella", gender: "female", country: "USA"}}, 
     {$group: 
        { _id: {product: "$product", gender: "$gender", country: "$country"}, 
          count: {$sum: 1}, 
          totalAmount: {$sum: "$sale_amount"}
        }
     })

  {
   "result" : [
{
"_id" : {
"product" : "umbrella",
"gender" : "female",
"country" : "USA"
},
"count" : 2,
"totalAmount" : 186
}
],
"ok" : 1
  }


Map Reduce involves first emitting, and subsequently grouping, documents. The process is similar to the aggregation command above.  The advantage of MR is that you can output the results to a new collection; however, since MR uses Javascript (and MongoDB currently uses a single-threaded Javascript engine), the performance may not be as good as the aggregation framework. Nevertheless, if you would like sample MR functions, please let me know.
Reply all
Reply to author
Forward
0 new messages