Creating 'summary documents' on active data

136 views
Skip to first unread message

Wouter

unread,
Oct 14, 2009, 10:34:40 AM10/14/09
to mongodb-user
I want to be able to create 'summary documents' (the Mongo equivalent
of summary tables :-)).

Once the summary document is created, it will be updated using $inc /
$set, meaning the summary will be always up to date (that's what I
need :-)).

The process to create a summary looks like this:

1) Retrieve relevant data

2) Calculate summary

3) Store results in summary document

4) Update summary document when data is updated/added.

However, if data is added/updated between step 1 and step 3, the
summary will be incorrect. Any suggestions how to deal with this?!

dwight_10gen

unread,
Oct 14, 2009, 10:51:23 AM10/14/09
to mongodb-user
how often do you want to (re)calculate the summarizes? near realtime
or infrequent (daily)?

Wouter

unread,
Oct 14, 2009, 11:34:36 AM10/14/09
to mongodb-user
The summaries are never recalculated. Once a summary has been created,
it is updated in real time using $inc. I need real time summaries.

Let's say I'm offering real time analytics using Mongo (http://
blog.mongodb.org/post/171353301/using-mongodb-for-real-time-
analytics), but I also save info about each visit, and I allow
customized reports.

So with every visit:

1) Upsert data for this visitor in the visits collection.
2) Update all reports to which this visitor belongs (using $inc)

Initially, there is one report (summary document): the one that
includes all visitors to the site.

Now, users can create their own customized reports. EG: generate a
report for all US visitors that spend at least 5 minutes on my site.

1) Retrieve all visit data find({ site_id : 123, country : 'US',
total_time : { $gt : '300' }})

2) Calculate summary (number of visits, time_spent, other statistics)

3) Store results in summary document (db.reports.insert ({ ...}))

4) Update summary document when data is updated/added.
(db.reports.update( { total : $inc .... })

Now, if a US visitor is active on the site between step 1 and step 3,
the report could end up being incorrect...

How to prevent that :-)

Mathias Stearn

unread,
Oct 14, 2009, 12:29:27 PM10/14/09
to mongod...@googlegroups.com
One way would be to start the $inc process for new documents before you start calculating the report, then rather than inserting the report wholesale, you could upsert it using $inc on each field. You would have to limit the timestamps in your find() when calculating the summary to ensure that new visits aren't double counted.

--Mathias
Reply all
Reply to author
Forward
0 new messages