How far to push the document nesting?

Lloyd Cledwyn

unread,

Feb 17, 2012, 2:11:06 AM2/17/12

to mongod...@googlegroups.com

I am relatively new to MongoDB, and so far am really impressed. I am struggling with the best way to setup my document stores though. I am trying to do some summary analytics using twitter data and I am not sure whether to put the tweets into the user document, or to keep those as a separate collection. It seems like putting the tweets inside the user model would quickly hit the limit with regards to size. If that is the case then what is a good way to be able to run MapReduce across a group of user's tweets?

I hope I am not being too vague but I don't want to get too specific and too far down the wrong path as far as setting up my domain model.

As I am sure you are all bored of hearing, I am used to RDB

|USER___|

---------

|ID

|Name

|Etc.

|TWEET__|

---------

|ID

|UserID

|Etc

It seems like

User

|-Tweet (0..3000)

|-Entities

|-Hashtags (0..10+)

|-urls (0..5)

|-user_mentions (0..12)

|-GeoData (0..20)

|-somegroupID

would quickly bloat the User document beyond capacity. But I would like to run analysis on tweets belonging to users with similar somegroupID. It conceptually makes sense to to the model layout as above, but at what point is that too unweildy? And what are viable alternatives?

Nat

unread,

Feb 17, 2012, 6:37:16 AM2/17/12

to mongod...@googlegroups.com

I would not store tweet data inside user. It's better to keep them separated. If you need to run analytic based on user profile such as age, sex, etc, you might store them together with tweet data, it will make it easier to run map/reduce or aggregation on it.

From: Lloyd Cledwyn <cle...@gmail.com>

Sender: mongod...@googlegroups.com

Date: Thu, 16 Feb 2012 23:11:06 -0800 (PST)

To: <mongod...@googlegroups.com>

ReplyTo: mongod...@googlegroups.com

Subject: [mongodb-user] How far to push the document nesting?

--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To view this discussion on the web visit https://groups.google.com/d/msg/mongodb-user/-/6UaV3E6xhfoJ.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.

Lloyd Cledwyn

unread,

Feb 18, 2012, 12:48:59 AM2/18/12

to mongod...@googlegroups.com

Interesting. Of course. So counter intuitive from a "normalized" mindset. May add a [duplicated] data element across thousands of elements, but if that helps performance when I'm hoping for it, it could just work.

Nat

unread,

Feb 18, 2012, 12:52:33 AM2/18/12

to mongod...@googlegroups.com

Like many other nosqls, mongodb doesn't offer join operations. Denormalizing can give better performance than keep multiple fetching other table to simulate joins especially when you only do it one-off for analytical purpose.

From: Lloyd Cledwyn <cle...@gmail.com>

Sender: mongod...@googlegroups.com

Date: Fri, 17 Feb 2012 23:48:59 -0600

To: <mongod...@googlegroups.com>

ReplyTo: mongod...@googlegroups.com

Subject: Re: [mongodb-user] How far to push the document nesting?

Lloyd Cledwyn

unread,

Feb 18, 2012, 12:56:14 AM2/18/12

to mongod...@googlegroups.com

Do I need to worry about update/insert performance, if say I add a "somegroupID" element to all the tweets associated with a user, thus updating thousands of documents? (Of course over and over for each user in the "somegroupID") And I can appreciate that once that element is present in all those documents then doing a mapReduce / analysis for all those documents is straight forward.

Chris Winslett

unread,

Feb 18, 2012, 7:10:02 AM2/18/12

to mongodb-user

Lloyd,

You will find this video interesting:

http://www.10gen.com/presentations/mongosv-2011/schema-design-at-scale

Essentially, in one document, store one days of tweets for one
person. The reasoning:

- Querying typically consists of days and users

Therefore, you can have the following index:

{user_id: 1, date: 1} # Date needs to be last because you will range
and sort on the date

Have fun!

Chris
MongoHQ

On Feb 17, 11:56 pm, Lloyd Cledwyn <cled...@gmail.com> wrote:
> Do I need to worry about update/insert performance, if say I add a
> "somegroupID" element to all the tweets associated with a user, thus
> updating thousands of documents? (Of course over and over for each user in
> the "somegroupID") And I can appreciate that once that element is present
> in all those documents then doing a mapReduce / analysis for all those
> documents is straight forward.
>
>
>
>
>
>
>

> On Fri, Feb 17, 2012 at 11:52 PM, Nat <nat.lu...@gmail.com> wrote:
> > **

> > Like many other nosqls, mongodb doesn't offer join operations.
> > Denormalizing can give better performance than keep multiple fetching other
> > table to simulate joins especially when you only do it one-off for
> > analytical purpose.

> > ------------------------------
> > *From: * Lloyd Cledwyn <cled...@gmail.com>
> > *Sender: * mongod...@googlegroups.com
> > *Date: *Fri, 17 Feb 2012 23:48:59 -0600
> > *To: *<mongod...@googlegroups.com>
> > *ReplyTo: * mongod...@googlegroups.com
> > *Subject: *Re: [mongodb-user] How far to push the document nesting?

>
> > Interesting. Of course. So counter intuitive from a "normalized"
> > mindset. May add a [duplicated] data element across thousands of elements,
> > but if that helps performance when I'm hoping for it, it could just work.
>

> > On Fri, Feb 17, 2012 at 5:37 AM, Nat <nat.lu...@gmail.com> wrote:
>
> >> **

> >> I would not store tweet data inside user. It's better to keep them
> >> separated. If you need to run analytic based on user profile such as age,
> >> sex, etc, you might store them together with tweet data, it will make it
> >> easier to run map/reduce or aggregation on it.

> >> ------------------------------
> >> *From: * Lloyd Cledwyn <cled...@gmail.com>
> >> *Sender: * mongod...@googlegroups.com
> >> *Date: *Thu, 16 Feb 2012 23:11:06 -0800 (PST)
> >> *To: *<mongod...@googlegroups.com>
> >> *ReplyTo: * mongod...@googlegroups.com
> >> *Subject: *[mongodb-user] How far to push the document nesting?

> >> to run analysis on tweets belonging to users with similar *somegroupID*.

Reply all

Reply to author

Forward