Querying 'views' with a time frame

41 views
Skip to first unread message

strada

unread,
Jul 7, 2011, 12:54:39 PM7/7/11
to mongodb-user
I'm using mongodb to store image metadata and S3 urls. I would like to
show most viewed items on a time period, like 'today', 'this week' ,
just like youtube does with videos. How should I store 'views' on
mongodb to query that kind of data?

I'm thinking of creating a view collection and storing each 'view' as
a document with date information, and also storing the view count in
individual image documents for fast retrieval of view counts. These
two will be updated on each pageview. I'm assuming MapReduce would get
the job done form there, but I can't quite picture it.

Dwight Merriman

unread,
Jul 7, 2011, 11:04:26 PM7/7/11
to mongod...@googlegroups.com
one way is to have a view_stats collection, as you say.  on each view of an image, $inc a counter in a particular doc in view_stats.  you might for example have a doc per image per hour.  or you could have the hours as an array or subobject.  to keep it simple imagine something like 

{ image: <imageid>, hour : <hoursinceepoch>, views : <number> }

you can then periodically run a map/reduce job which outputs to some collection such as most_viewed_today.  then anytime needed you query that collection.  the stats don't need to be instantly up to date (typically) so you can run the map/reduce periodically -- maybe once per 5 minutes.



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Dwight Merriman

unread,
Jul 7, 2011, 11:05:46 PM7/7/11
to mongod...@googlegroups.com
you might also want to make view_stats a capped collection so that old stats automatically eject.

an index on hour might make sense.  in your map/reduce job you coudl do something like 

  { hour : { gte : <start>, lte : <end> } }

as a query filter and the index would then go to just those stats documents.
Reply all
Reply to author
Forward
0 new messages