Embed vs Reference

385 views
Skip to first unread message

Dilip

unread,
Mar 21, 2011, 4:36:42 PM3/21/11
to mongodb-user

I am not sure if I am understanding the distinction made between
embedding an object and holding a reference to it on this page:
http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-Embedvs.Reference

Is it saying that given:

pubic class Foo
{
public Bar b;
}

If Foo and Bar belong to different collections, then Foo is holding a
"reference" to Bar while if they belong to the same collection then
Foo "embeds" Bar?

What is the point of this distinction in practical terms?

If Foo and Bar belong to the same collection, the JSON for a couple of
stored Foo document looks like this:

{
"_id": {
"$oid": "4d87b3417d190a147c000001"
},
"b": {
"MyId": 33
}
}

{
"_id": {
"$oid": "4d87b3417d190a147c000002"
},
"b": {
"MyId": 44
}
}

If they belong to a different collection (so Bar gets an ID of its own
by virtue of being a top-level object of a different collection), the
stored JSON for a Foo document now looks like this:

{
"_id": {
"$oid": "4d87b4427d190a11fc000003"
},
"b": {
"_id": {
"$oid": "4d87b4427d190a11fc000001"
},
"MyId": 33
}
}

{
"_id": {
"$oid": "4d87b4427d190a11fc000004"
},
"b": {
"_id": {
"$oid": "4d87b4427d190a11fc000002"
},
"MyId": 44
}
}

What is the pertinent difference here?

Jared Rosoff

unread,
Mar 21, 2011, 5:58:17 PM3/21/11
to mongodb-user
You correctly describe the difference between "Embed" and "Reference",
however your document examples are a little off. In your post, both
documents have the actual value (MyId) in the first document, which
means they are both basically a dbref.

Here's different examples that hopefully illustrate the difference a
little better.

Embed:

> db.first_collection.findOne()
{ '_id' : "4d87b4427d190a11fc000003",
'b': {
'MyId' : 33
}
}

Reference:

>var first = db.first_collection.findOne()
{ '_id' : "4d87b4427d190a11fc000003",
'b': "4d87b4427d190a11fc000002"
}

db.second_collection.find( { "_id" : first.b } )
{ '_id' : "4d87b4427d190a11fc000002",
"MyID" : 44
}




On Mar 21, 1:36 pm, Dilip <rdil...@gmail.com> wrote:
> I am not sure if I am understanding the distinction made between
> embedding an object and holding a reference to it on this page:http://www.mongodb.org/display/DOCS/Schema+Design#SchemaDesign-Embedv...

Keith Branton

unread,
Mar 21, 2011, 6:25:34 PM3/21/11
to mongod...@googlegroups.com
I always consider embedding to be simply a performance optimization, rather than a schema design decision.

You can model your data without any embedding if you like, and use references. If you embed, the embedded documents can certainly have their own ids if you want them to (mine often do so I can query to find the document containing the embedded document with a given id), but they don't always need to.

I would still urge normalizing data to 3nf as a general rule - so hearing that "embedding is a good thing" should not encourage you to mess up the integrity of your data.

The beauty of embedding is that you can embed arbitrarily-shaped related data within a single document. Then you can fetch or update the whole lot in a single operation. This is the kind of thing you would need several sql statements to do in traditional sql (oracle "select cursor" expressions can pretty much achieve the same effect though)

The main (and annoying) restriction is the overall document size. If you think it may be possible for a collection of embedded items to grow in an unbounded way - that it might ever exceed the arbitrary 16MB limit, then you simply can't risk embedding - even if 99.9% of your data will always be way below the limit.



--
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com.
To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.


Dilip

unread,
Mar 22, 2011, 8:23:20 AM3/22/11
to mongodb-user

On Mar 21, 5:58 pm, Jared Rosoff <j...@10gen.com> wrote:
> You correctly describe the difference between "Embed" and "Reference",
> however your document examples are a little off. In your post, both
> documents have the actual value (MyId) in the first document, which
> means they are both basically a dbref.
>
> Here's different examples that hopefully illustrate the difference a
> little better.
>
> Embed:
>
> > db.first_collection.findOne()
>
> { '_id' : "4d87b4427d190a11fc000003",
>   'b': {
>     'MyId' : 33
>   }
>
> }
>
> Reference:
>
> >var first = db.first_collection.findOne()
>
> { '_id' : "4d87b4427d190a11fc000003",
>   'b': "4d87b4427d190a11fc000002"
>
> }
>
> db.second_collection.find( { "_id" : first.b } )
> { '_id' : "4d87b4427d190a11fc000002",
>   "MyID" : 44
>
> }

Jared
I get it now! On hindsight it should've been obvious to me. When using
reference, you store the Id of the contained object rather than the
object itself.
Reply all
Reply to author
Forward
0 new messages