Hi everyone. This probably isn't a surprise given the topic, but I'm brand new to MongoDB. Over the last few days I've been reading through the manual including http://www.mongodb.org/display/DOCS/Schema+Design and I watched the attached video from MongoSV as well as a number of videos from MongoNYC. All of this is very informative and helpful, however...
The question of when to embed versus when to do something more relational is still the point of greatest confusion for me.
Let's say I was modeling professional sports so I have collections for the NFL, NHL and NBA. Obviously, each of these has teams and they need to be related. Coming from a relational database background, like most people, I should make a collection called teams or even sub collections like nfl_teams and nba_teams then relate them by IDs. Since teams really do kind of stand on their own anyway, this approach seems to make sense even in Mongo. Yet, realistically this is not a huge amount of data. Is there any reason teams can't be embedded within their respective leagues?
To expand a little bit, if I was constructing a website that dealt with sports news, while a team page is probably accessible, it's still going to need to context of its respective league. If this was relational, for instances, a team would have a league_id and in order to properly build out the web page I'd basically ALWAYS select the league in relation to the team. In this sense it seems completely natural that a team would be inside a league (and, in this case, I have NO reason to compare teams across league boundaries so the fact that it's harder to do that in separate collections isn't even a concern).
What I wouldn't necessarily do, however, is select ALL the teams when I needed just one. Does this raise any efficiency issues? Is Mongo going to send my web app some blob of data every time I request information from league? Should I even bother thinking in those terms and just go with what makes sense naturally and with queries? (I can always slice out the current team too. Would that be the best approach?)
I really like the hierarchal and schema-less nature of a document DB. However, when to embed versus when to go relational seems to be the single hardest thing to decide on.
The blog/comments example is used over and over but it never seems to really answer the question. In general, I'm trying to keep the mindset of embedding whenever I can and going with a sort of hybrid approach of embedding essential data needed in the collection and then relating to "details" style documents in other collections.
So yeah, the biggest crux in my mind is if I have some 4MB (or I guess 16MB now) document that is getting information pulled from it on every request, is that even a problem? Am I over-thinking this?
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
I'm still a newbie, so I've been asking myself the same questions. However, what has seemed to be very helpful is to ask myself what data I want to pull back from the DB rather than what I want to put in.
This is because while it's easy to embed a document inside another one, at that point, it's no longer really a separate document as far as mongo is concerned and is really a part of that bigger document. That means that you can't really return *only* the embedded document, but return the parent document that contains that embedded fragment.
So in your sports team example, you probably want to retrieve "teams" and lists of teams that match certain criteria, rather than retrieving "leagues" that contain those teams. This being said, it might make total sense to keep a back reference to the league and whatever league parameter you might want to search on when retrieving teams from your team collection.
oO
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
That's how I have it mapped out right now. Some of the talks I watched at the NYC event started making me think I was being too relational and not embedding enough.
Does anyone here make regular use of deeply embedded documents? I'm just curious.
> I'm still a newbie, so I've been asking myself the same questions. However, > what has seemed to be very helpful is to ask myself what data I want to pull > back from the DB rather than what I want to put in.
> This is because while it's easy to embed a document inside another one, at > that point, it's no longer really a separate document as far as mongo is > concerned and is really a part of that bigger document. That means that you > can't really return *only* the embedded document, but return the parent > document that contains that embedded fragment.
> So in your sports team example, you probably want to retrieve "teams" and > lists of teams that match certain criteria, rather than retrieving "leagues" > that contain those teams. This being said, it might make total sense to keep > a back reference to the league and whatever league parameter you might want > to search on when retrieving teams from your team collection.
> oO
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
@cult hero: lots of questions here, let's see if I can work through enough of them to point you in the right direction.
> ... if I have some 4MB (or I guess16MB now) document that is getting
information pulled from it on every request, is that even a problem?
16MB is a pretty hefty network load for each web request.
> Am I over-thinking this?
Probably. 16MB is an enormous amount of text / data. This is probably a sign that you have too much data in a single object.
> Is there any reason teams can't be embedded within their respective
leagues?
No. However, from a design perspective this means that you're treating Leagues as a "first-class top-level object" and Teams as "only important in context of their league".
> Does this raise any efficiency issues?
Well constantly calling to League when a user wants information for a Team, may become a little redundant.
> Is Mongo going to send my web app some blob of data every time I request information
from league?
You can control the fields that are returned.
> Should I even bother thinking in those terms and just go with what makes
sense naturally and with queries?
Yes!
In fact, I think this is best way to approach the design of "relational things" in MongoDB.
When deciding on embedding, you have to choose to embed based on the contextual value of that embedded object, not just based on the relationship. Just because League is a parent of Team doesn't mean that one needs to be embedded in the other.
*"What value does the embeddable object have outside of the parent object"?*
Is a Team object valuable outside of the context of its League? Will you query on it directly? Probably. Is a Player object valuable outside of the context of its Team? Probably. Is a Player's stats for 2007 valuable outside of the context of its Player? Probably not.
On Fri, Apr 1, 2011 at 11:38 AM, cult hero <binarypala...@gmail.com> wrote: > Hi everyone. This probably isn't a surprise given the topic, but I'm > brand new to MongoDB. Over the last few days I've been reading through > the manual including http://www.mongodb.org/display/DOCS/Schema+Design > and I watched the attached video from MongoSV as well as a number of > videos from MongoNYC. All of this is very informative and helpful, > however...
> The question of when to embed versus when to do something more > relational is still the point of greatest confusion for me.
> Let's say I was modeling professional sports so I have collections for > the NFL, NHL and NBA. Obviously, each of these has teams and they need > to be related. Coming from a relational database background, like most > people, I should make a collection called teams or even sub > collections like nfl_teams and nba_teams then relate them by IDs. > Since teams really do kind of stand on their own anyway, this approach > seems to make sense even in Mongo. Yet, realistically this is not a > huge amount of data. Is there any reason teams can't be embedded > within their respective leagues?
> To expand a little bit, if I was constructing a website that dealt > with sports news, while a team page is probably accessible, it's still > going to need to context of its respective league. If this was > relational, for instances, a team would have a league_id and in order > to properly build out the web page I'd basically ALWAYS select the > league in relation to the team. In this sense it seems completely > natural that a team would be inside a league (and, in this case, I > have NO reason to compare teams across league boundaries so the fact > that it's harder to do that in separate collections isn't even a > concern).
> What I wouldn't necessarily do, however, is select ALL the teams when > I needed just one. Does this raise any efficiency issues? Is Mongo > going to send my web app some blob of data every time I request > information from league? Should I even bother thinking in those terms > and just go with what makes sense naturally and with queries? (I can > always slice out the current team too. Would that be the best > approach?)
> I really like the hierarchal and schema-less nature of a document DB. > However, when to embed versus when to go relational seems to be the > single hardest thing to decide on.
> The blog/comments example is used over and over but it never seems to > really answer the question. In general, I'm trying to keep the mindset > of embedding whenever I can and going with a sort of hybrid approach > of embedding essential data needed in the collection and then relating > to "details" style documents in other collections.
> So yeah, the biggest crux in my mind is if I have some 4MB (or I guess > 16MB now) document that is getting information pulled from it on every > request, is that even a problem? Am I over-thinking this?
> -- > You received this message because you are subscribed to the Google Groups > "mongodb-user" group. > To post to this group, send email to mongodb-user@googlegroups.com. > To unsubscribe from this group, send email to > mongodb-user+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/mongodb-user?hl=en.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
> When deciding on embedding, you have to choose to embed based on the > contextual value of that embedded object, not just based on the > relationship. Just because League is a parent of Team doesn't mean that one > needs to be embedded in the other.
I think some of those videos instilled me with an unnecessary fear of using relational associations instead of embedding. In this case I should be looking at things in a relational matter though. The league should actually be a pretty small document because other objects that are independent and should be top level (e.g. players and teams) are probably going to need to reference that object all the time so you don't want a dump truck of data in there.
However, individual games do seem like they can be neatly wrapped up in a season object and embedded and then have those embedded documents reference something like a GameDetails collection with the box score, player stats and stuff like that. In my head I kind of view this as a "hybrid" approach. Keep some of the data in one collection and most of it in a details oriented collection. That way I can build a scoreboard easily from a season document and never touch the games themselves. The only issue at that point becomes finding something similar to a trigger to keep the data synced.
Thanks for your input. I can tell this is going to be a work in progress for some time.
On Apr 1, 3:11 pm, Gaetan Voyer-Perrault <ga...@10gen.com> wrote:
> @cult hero: lots of questions here, let's see if I can work through enough > of them to point you in the right direction.
> > ... if I have some 4MB (or I guess16MB now) document that is getting
> information pulled from it on every request, is that even a problem?
> 16MB is a pretty hefty network load for each web request.
> > Am I over-thinking this?
> Probably. 16MB is an enormous amount of text / data. This is probably a sign > that you have too much data in a single object.
> > Is there any reason teams can't be embedded within their respective
> leagues?
> No. However, from a design perspective this means that you're treating > Leagues as a "first-class top-level object" and Teams as "only important in > context of their league".
> > Does this raise any efficiency issues?
> Well constantly calling to League when a user wants information for a Team, > may become a little redundant.
> > Is Mongo going to send my web app some blob of data every time I request information
> from league?
> You can control the fields that are returned.
> > Should I even bother thinking in those terms and just go with what makes
> sense naturally and with queries?
> Yes!
> In fact, I think this is best way to approach the design of "relational > things" in MongoDB.
> When deciding on embedding, you have to choose to embed based on the > contextual value of that embedded object, not just based on the > relationship. Just because League is a parent of Team doesn't mean that one > needs to be embedded in the other.
> *"What value does the embeddable object have outside of the parent object"?*
> Is a Team object valuable outside of the context of its League? Will you > query on it directly? Probably. > Is a Player object valuable outside of the context of its Team? Probably. > Is a Player's stats for 2007 valuable outside of the context of its Player? > Probably not.
> - Gates
> On Fri, Apr 1, 2011 at 11:38 AM, cult hero <binarypala...@gmail.com> wrote: > > Hi everyone. This probably isn't a surprise given the topic, but I'm > > brand new to MongoDB. Over the last few days I've been reading through > > the manual includinghttp://www.mongodb.org/display/DOCS/Schema+Design > > and I watched the attached video from MongoSV as well as a number of > > videos from MongoNYC. All of this is very informative and helpful, > > however...
> > The question of when to embed versus when to do something more > > relational is still the point of greatest confusion for me.
> > Let's say I was modeling professional sports so I have collections for > > the NFL, NHL and NBA. Obviously, each of these has teams and they need > > to be related. Coming from a relational database background, like most > > people, I should make a collection called teams or even sub > > collections like nfl_teams and nba_teams then relate them by IDs. > > Since teams really do kind of stand on their own anyway, this approach > > seems to make sense even in Mongo. Yet, realistically this is not a > > huge amount of data. Is there any reason teams can't be embedded > > within their respective leagues?
> > To expand a little bit, if I was constructing a website that dealt > > with sports news, while a team page is probably accessible, it's still > > going to need to context of its respective league. If this was > > relational, for instances, a team would have a league_id and in order > > to properly build out the web page I'd basically ALWAYS select the > > league in relation to the team. In this sense it seems completely > > natural that a team would be inside a league (and, in this case, I > > have NO reason to compare teams across league boundaries so the fact > > that it's harder to do that in separate collections isn't even a > > concern).
> > What I wouldn't necessarily do, however, is select ALL the teams when > > I needed just one. Does this raise any efficiency issues? Is Mongo > > going to send my web app some blob of data every time I request > > information from league? Should I even bother thinking in those terms > > and just go with what makes sense naturally and with queries? (I can > > always slice out the current team too. Would that be the best > > approach?)
> > I really like the hierarchal and schema-less nature of a document DB. > > However, when to embed versus when to go relational seems to be the > > single hardest thing to decide on.
> > The blog/comments example is used over and over but it never seems to > > really answer the question. In general, I'm trying to keep the mindset > > of embedding whenever I can and going with a sort of hybrid approach > > of embedding essential data needed in the collection and then relating > > to "details" style documents in other collections.
> > So yeah, the biggest crux in my mind is if I have some 4MB (or I guess > > 16MB now) document that is getting information pulled from it on every > > request, is that even a problem? Am I over-thinking this?
> > -- > > You received this message because you are subscribed to the Google Groups > > "mongodb-user" group. > > To post to this group, send email to mongodb-user@googlegroups.com. > > To unsubscribe from this group, send email to > > mongodb-user+unsubscribe@googlegroups.com. > > For more options, visit this group at > >http://groups.google.com/group/mongodb-user?hl=en.
-- You received this message because you are subscribed to the Google Groups "mongodb-user" group. To post to this group, send email to mongodb-user@googlegroups.com. To unsubscribe from this group, send email to mongodb-user+unsubscribe@googlegroups.com. For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.