Beyond: Embedded vs Reference

114 views
Skip to first unread message

Charly

unread,
Jan 22, 2010, 12:54:38 PM1/22/10
to Mongoid
Hi,

I am pasting a blog post I started writing after discovering mongoid
on hashrockets liveshow and beeing very excited yet disappointed by
the idea mongoid is for hierarchy only.... I spent a week thinking
about it and wanted to share my thoughts. But actually i am much more
interested in having your thoughts than have it published, thats why
i'm posting it here instead.

(and excuse my french, i am french!)
.....

Lets take the canonical Book app.
My Book app is very book centric, & for many good/bad reasons I do not
want my books to be embedded in the Author class.

2 solutions :

1° the SQLlike solution, where book has the reference author_id and
Author is a seperate collection. The problem is you can't do joins, so
finding conditions with association (eg books where author.age < 40)
and eager loading (eg Book.all.each { |book|
book.author.name }) become difficult .

2° the Document Oriented solution, where the Author is embedded in
Books. After all why not, you can live with that level of duplication
if its just a name. But wait you want to add the Author's bio, date of
birth, and plenty more attributes... Yup Author really is a collection
by itself, it can't just be embedded or it becomes a consistency
nightmare.

So from there one possible move is keep the SQL set of mind and do
some caching : on top of author_id you add the field author_name and
Book pulls out the Author#name on a before_save callback so its
available as a condition on Book.find and for direct display in your
index files.

But that is stupid. We're still thinking like SQL junkies here,
carefully denormalizing a little our data for performance... without
the goodness of SQL! DocumentObject is half way through the SQL and
the View, its like a show.erb file without the html tags plus all the
power of Business Logic. It is not some poor Database intermediate
pledging for a little content through complex ORMs asking you to
fullfill more queries before they indulge in providing it.

No ! The hole point of DocumentObject is to mirror the content of the
ObjectModel as closely as possible. Which means if a Model is composed
of another Model, it should be reflected in its data structure, and
that is precisely what an embedded object is in mongoDB/mongoid.

In practice that means having Author stored both in an independent
collection "authors" !!!!and!!!! embedded in Book. The embedded author
would have a reference (e.g :master_id) to its couterpart in authors
collection to keep in sync. You could then decide if the embedded is a
light copy of its master or a full blown nested mirror... The main
idea is you are not trying to shape your programm to the way your data
is stored, you are bending the data to the way you domain model is
shaped, for the same reasons you chose Ruby over Java. Happiness :-)

Call it glorified caching if you wish, I think this could be a small
yet true paradigm shift. Why ?
- first of all your domain model rules the data not the contrary !
- you are not constrained anymore to hierarchy vs reference, they both
complete themselves
- you keep the consistency of the dot notation (no ugly author_name
here)
- caching is not an after thought, it is part of your model right from
the beginning.
- duplication may be a problem for some but consider it also like a
new possiblity : Author can be tied for some attributes to its master
(name, date_of_birth) but then the bio could change/or not depending
on the parent's scope (book).

Of course one should not ignore the drawbacks, it would be poorly
suited for a heavy writing model app. But I think that suddenly widens
the horizon, you don't have to torture yourself figuring out which
from mongomapper or mongoid is best suited for your app, or should i
stick to SQL.... etc.

.....

I hope this sounds convincing. I know I a haven't covered all
associations (many, many to many etc) but it is a start.


charly

brainopia

unread,
Jan 23, 2010, 4:10:38 AM1/23/10
to Mongoid
Great idea, I like it. Would be great to have a facility to declare a
model both as embedded and normal document.
For example, belongs_to :person, :mirror => true.

By the way, during last railsrumble I used mongodb (via pure ruby
driver) exactly this way. I had a big movie collection (copy of all
upcoming films from imbd) with embedded collections like genres,
actors, directors. I've used embedded documents to search for movies
(it was certainly faster then referencing and using joins), but I
still needed for genres, actors, directors to behave as normal
collections. So I've setup a script to mirror data from embedded
collections to corresponding normal collections. In the end, my only
drawback was bigger size of db, but in return I've got blazing speed,
ability to work with my data without constraints and all mongodb
features :)

Durran Jordan

unread,
Jan 23, 2010, 11:00:02 AM1/23/10
to mon...@googlegroups.com
If we were do go down this kind of route, we'd need to make sure we've got a solid strategy for having the least amount of stale data without affecting performance at all, but it sounds tempting.

Charly

unread,
Jan 24, 2010, 10:50:49 AM1/24/10
to Mongoid
Thank you for your answers ! And don't resist the temptation!

Here's some more thoughts :

You cannot avoid the performance hit on the writing part, so you
should be able to decide which attributes are going to live on the
embedded version to minor that. Those would typically be the ones with
small footprints (booleans, integers, small strings, etc) unlikely
updated, & usefull for aggregation or find conditions. To keep data in
sync nothing extraordinnary : a timestamp on each embedded version to
compare with the "master" timestamp versions.
The DSL could look something like that :

class Books
has_one :author, :exclude =>["bio", "ratings"]
end

To go a little further I was imagining what a user's library app would
look like.
In a SQL environment you'd have a library table with user_id, and
book_id, and :

class User
has_many :books, :through => :library
end

whereas in mongoid :

class Library
has_one :user
has_many :books, :only => ["title", "author.name"] do
field :rating, integer
field :read, boolean
include Book::LibraryMethods #could be a convention
end
end

Each row of the library collection would contain all the users book.
But those books wouldn't be a simple proxy of Book, but a striped out/
enhanced version, specially suited for the Library. The Developper
should be encouraged to use the power of modules so that each set of
methods are consistent with each scope.

class Author
has_many :books, :only =>["title", "publication_date"] do
field :writing_context
include Book::AuthorMethods
end
end

Another approach would be to simply dump the proxy design and go with
modules. It strikes me that an embedded object in Mongo is very
similar to a Ruby module : it only lives in the context of the class/
docObject it is included in. That drifts us away from the current dsl
of mongoid and I haven't put much thouth in yet. However I have a few
more ideas. I'm sure we haven't yet scratched the surface of the doc
object possibilities....

charly

Durran Jordan

unread,
Jan 24, 2010, 1:28:19 PM1/24/10
to mon...@googlegroups.com
I like the thoughts here... I am going through a refactoring around associations at the moment to first handle extensions on them, then I'll have a look at what can be done with them at that point.

2010/1/24 Charly <charl...@gmail.com>

Charly

unread,
Jan 25, 2010, 9:23:15 AM1/25/10
to Mongoid
I am starting a project with Mongoid now and will take time to dig
more in its code. Hopefully I'll end up sending a pull request.

Meanwhile here's a some experimental code for accessing embedded
association without a proxy :

http://gist.github.com/285877

The idea is rather than proxying an Author outside a Book it feeds
directly on an "module Author" nested in Book.
In practice it creates the Book#author method which returns an
anonymous object with module Author included

There's probably a lot of caveats but it is dead simple and reflects
nicely the idea of embedded documents.

On Jan 24, 7:28 pm, Durran Jordan <dur...@gmail.com> wrote:
> I like the thoughts here... I am going through a refactoring around
> associations at the moment to first handle extensions on them, then I'll
> have a look at what can be done with them at that point.
>

> 2010/1/24 Charly <charlysi...@gmail.com>

Durran Jordan

unread,
Jan 26, 2010, 6:32:33 PM1/26/10
to mon...@googlegroups.com
I like the look of the gist - the intent is there, and it feels very much like an accurate representation of the model in the db.

It may be too far of a departure of where Mongoid is now to go in that direction though.. I'll keep it in mind when doing changes and new features, but I don't think it will get exactly there... Maybe MongoDoc's a closer fit?

2010/1/25 Charly <charl...@gmail.com>

Charly

unread,
Jan 27, 2010, 5:31:45 AM1/27/10
to Mongoid
I was aware it might be a little to far from mongoid and actually
started a toy odm (mongoose) to try and implement the idea.
MongoDoc could be the right pick though, thanks for the suggestion.
I like the latest refactoring BTW, sounds very promising.
charly


On Jan 27, 12:32 am, Durran Jordan <dur...@gmail.com> wrote:
> I like the look of the gist - the intent is there, and it feels very much
> like an accurate representation of the model in the db.
>
> It may be too far of a departure of where Mongoid is now to go in that
> direction though.. I'll keep it in mind when doing changes and new features,
> but I don't think it will get exactly there... Maybe MongoDoc's a closer
> fit?
>

> 2010/1/25 Charly <charlysi...@gmail.com>

Kyle Banker

unread,
Jan 28, 2010, 12:17:55 PM1/28/10
to mon...@googlegroups.com
Charly. Cool ideas in that gist.  Let us all know if mongoose develops.
Reply all
Reply to author
Forward
0 new messages