Mongo Analytics Schema

279 views
Skip to first unread message

Moazam

unread,
Feb 4, 2012, 7:16:39 PM2/4/12
to mongodb-user
I am currently evaluating MongoDB for analytics. I have read beginners
book and over dozen of slides for getting started with MongoDB. So far
I know how to write insert/upsert/query code from PHP.

I have been looking at analytics examples but there are only real time
analytics examples available. I would appreciate MongoDB contributors/
developers if you guys can help me in transforming following MySQL
schema into MongoDB schema.

Here is the MySQL schema

Monthly stats table holds 6 months stats.

Create table monthly_stats (
stats_id Bigint NOT NULL AUTO_INCREMENT,
article_id Int UNSIGNED NOT NULL,
author_id Int UNSIGNED NOT NULL,
article_views Int UNSIGNED NOT NULL DEFAULT 0,
stats_date Date NOT NULL,
Primary Key (stats_id))

After 6 months we archive stats into final_stats

Create table final_stats (
article_id Int UNSIGNED NOT NULL,
author_id Int UNSIGNED NOT NULL,
article_views Int UNSIGNED NOT NULL DEFAULT 0,
total_searches Int UNSIGNED NOT NULL DEFAULT 0,
Primary Key (article_id));

Search stats table contains article searches, holds 6 months search
data, after six months we update total_searches in final_stats table

Create table search_stats (
article_id Int UNSIGNED NOT NULL,
website Enum(''main','blog','forum') NOT NULL DEFAULT 'main',
search_engine Varchar(30) NOT NULL,
keywords Varchar(255) NOT NULL,
total_searches Int UNSIGNED NOT NULL DEFAULT 0,
Primary Key (search_stats_id));

I have designed following MongoDB schema:

I have broken stats into day wise collection. We have around 20
million page views daily and roughly 90K referrals from different
websites and search engines

collection: jan_31_2012
{
_id: (This is article_id)
author_id: int
views: int
total_searches: int
search_stats { //Optional: Tracks only search engine users
website: str
search_engine: str
keywords: str
total_searches: str
//NOTES:search_engine and keywords are composite keys
}
}

Questions

1. I read documents over 16 MB are slow, if this is the case can you
suggest me a better schema?
2. How do I check and append/update search_stats? i.e. if following
values exists: search_engine: Google and keywords: seo and then
update total_searches on both places otherwise push a new value in
search_status and update total_searches
3. How do I query 2 weeks/months old stats? i.e. SELECT * FROM
article_stats WHERE stats_date BETWEEN XX AND XX

I believe this question will help many people who are switching to
MongoDB for analytics.

Thanks

Chris Harris

unread,
Feb 7, 2012, 6:21:44 AM2/7/12
to mongodb-user
Hi Moazam,

1. I read documents over 16 MB are slow, if this is the case can you
suggest me a better schema?

At the moment a single document has a max size of 16MB. Why not change
you schema to something like:

collection: articles
{
_id: (This is article_id)
author_id: int
views: int
total_searches: int
data : date
}

collection : search_stats
{
_id : id
website: str
search_engine: str
keywords: [ str ]
total_searches: str
data : date
article_id : (This is article_id)
}

See more examples at http://www.mongodb.org/display/DOCS/Schema+Design

2. How do I check and append/update search_stats? i.e. if following
values exists: search_engine: Google and keywords: seo and then
update total_searches on both places otherwise push a new value in
search_status and update total_searches.

You can use an "upsert" flag on a update to create the new value if it
is not there and the $inc increment the counter.

See http://php.net/manual/en/mongocollection.update.php

3. How do I query 2 weeks/months old stats? i.e. SELECT * FROM
article_stats WHERE stats_date BETWEEN XX AND XX

This should be easy with the schema above as the collection has a
timestamp value and not a separate collection per day

Chris

Martin Wawrusch

unread,
Feb 7, 2012, 6:26:58 AM2/7/12
to mongod...@googlegroups.com
The following presentation might be helpful, it contains a section about mongodb and analytics:  http://www.slideshare.net/jrosoff/scaling-rails-yottaa 



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.







Moazam

unread,
Feb 7, 2012, 6:54:00 PM2/7/12
to mongodb-user
I am confused with the term document size 16 MB. According to
documentation A collection contains several documents. I am
considering a document as a row for understanding purpose. Does that
mean a row can't contain more than 16 MB or entire collection?


On Feb 7, 11:26 am, Martin Wawrusch <mar...@wawrusch.com> wrote:
> The following presentation might be helpful, it contains a section about
> mongodb and analytics:http://www.slideshare.net/jrosoff/scaling-rails-yottaa
>
>
>
>
>
>
>
> On Tue, Feb 7, 2012 at 3:21 AM, Chris Harris <char...@10gen.com> wrote:
> > Hi Moazam,
>
> > 1. I read documents over 16 MB are slow, if this is the case can you
> > suggest me a better schema?
>
> > At the moment a single document has a max size of 16MB. Why not change
> > you schema to something like:
>
> > collection: articles
> > {
> >  _id: (This is article_id)
> >  author_id: int
> >  views: int
> >  total_searches: int
> >   data : date
> > }
>
> > collection : search_stats
> > {
> >    _id : id
> >     website: str
> >    search_engine: str
> >    keywords: [ str ]
> >    total_searches: str
> >     data : date
> >    article_id : (This is article_id)
> > }
>
> > See more examples athttp://www.mongodb.org/display/DOCS/Schema+Design
>
> > 2. How do I check and append/update search_stats? i.e. if following
> > values exists: search_engine: Google and keywords: seo and  then
> > update total_searches on both places otherwise push a new value in
> > search_status and update total_searches.
>
> > You can use an "upsert" flag on a update to create the new value if it
> > is not there and the $inc increment the counter.
>
> > Seehttp://php.net/manual/en/mongocollection.update.php

Max Schireson

unread,
Feb 7, 2012, 7:12:28 PM2/7/12
to mongod...@googlegroups.com


On Feb 7, 2012 3:54 PM, "Moazam" <moazam...@gmail.com> wrote:
>
> I am confused with the term document size 16 MB. According to
> documentation A collection contains several documents. I am
> considering a document as a row for understanding purpose. Does that
> mean a row can't contain more than 16 MB or entire collection?

A document can't be more than 16mb. Collections are routinely much larger.

Sam Millman

unread,
Feb 8, 2012, 3:34:26 AM2/8/12
to mongod...@googlegroups.com
And that's the whole document including sub arrays/documents it is not 16MB per root document and then per sub document.

So one row in your collection in its entirety (everything, including subdocuments) has a 16MB limit on size.
Reply all
Reply to author
Forward
0 new messages