2 ActiveRecord features I can offer: eager loading, optimized updates

37 views
Skip to first unread message

Nate Wiger

unread,
Mar 20, 2008, 12:54:49 AM3/20/08
to Ruby on Rails: Core
Hey all,

So there are a couple features that my team would find useful in
ActiveRecord, that I can volunteer our time to work on. I mainly want
to see if there's going to be support in the Rails core team to these
changes into core before doing a full dev cycle on them.

Please let me know if you support these ideas. If so, my team can work
on implementing them.


1) Eager association loading

Currently, if you have a model like this:

class Forum < AR
has_many :posts
end
class Post < AR
belongs_to :forum
end

Then walk thru these records:

<% for forum in @forums %>
Title: <%= forum.post.subject %>
<% end %>

Currently, this will do an extra SQL select * from post for each loop.

DataMapper (datamapper.org) has an interesting approach: The first
time that a sub-association is loaded, then all associations are
preloaded, since most times you're in a loop (as above). While
the :include option helps with this somewhat, when passing collection
objects around, you often don't know how the associations are going to
be used.


2) Optimized updates

Currently, if one attribute is changed in an AR model, then the entire
model is parsed and sent as an UPDATE statement. This is not really
desirable from a network or database standpoint.

There are already some counters in AR for monitoring changes. Is
there a per-attribute @changed data structure? It seems like this
could just be set to true when updating a column. Then, on save, only
those with the @changed flag are actually converted to SQL.


These are changes we really need for our local copy anyways.
Thoughts?

Thanks,
Nate

Nate Wiger

unread,
Mar 20, 2008, 1:02:37 AM3/20/08
to Ruby on Rails: Core
My forum example was terrible. Here's what I meant:

<% for post in @posts %>
Title: <%= post.forum.title %>
<% end %>


Not a perfect example, but you get the idea. Ideally, you'd do one
bulk select, similar to how :include works, on the first missing
record

select * from forums where id in (12, 157, 2 , 34)

You would get the id list from a distinct list of all values in
post.forum_id

-Nate

Frederick Cheung

unread,
Mar 20, 2008, 4:38:02 AM3/20/08
to rubyonra...@googlegroups.com

On 20 Mar 2008, at 04:54, Nate Wiger wrote:

>
> Currently, this will do an extra SQL select * from post for each loop.
>
> DataMapper (datamapper.org) has an interesting approach: The first
> time that a sub-association is loaded, then all associations are
> preloaded, since most times you're in a loop (as above). While
> the :include option helps with this somewhat, when passing collection
> objects around, you often don't know how the associations are going to
> be used.
>

There's certainly value to be found here. I've definitely been in the
case where one consumer of a method or an association would quite like
a whole bunch of things included, but the other wouldn't.

> 2) Optimized updates
>
> Currently, if one attribute is changed in an AR model, then the entire
> model is parsed and sent as an UPDATE statement. This is not really
> desirable from a network or database standpoint.
>
> There are already some counters in AR for monitoring changes. Is
> there a per-attribute @changed data structure? It seems like this
> could just be set to true when updating a column. Then, on save, only
> those with the @changed flag are actually converted to SQL.
>

I can certainly see where you're coming from and something similar has
often crossed my mind. The big issue I have with this one is
consistency:
Currently I can look at an object, call save! and I know that what's
in the db is what I just saw, whereas if you only changed the
attributes you touched then you can't be sure of this. There's cases
where the right thing to do is probably just to use locking of one
form or another, but I can't help but feel that's a bit overkill for
the simple case.

Fred


Mislav Marohnić

unread,
Mar 20, 2008, 6:04:57 AM3/20/08
to rubyonra...@googlegroups.com

On Thu, Mar 20, 2008 at 5:54 AM, Nate Wiger <nwi...@gmail.com> wrote:

2) Optimized updates

+1 for this, be sure to read Pratik's writeup http://m.onkey.org/2007/12/18/partial-updates

It couldn't replace current behavior right away (Frederick explained why), but it could have its own option (or even a method).

These should, in my opinion, do partial updates by default: update_attribute, update_attributes.

Frederick Cheung

unread,
Mar 20, 2008, 7:16:23 AM3/20/08
to rubyonra...@googlegroups.com

Agreed, especially update_attribute. The name just screams "I'm going
to change one attribute" but then it resaves the whole object.

Fred

Rick DeNatale

unread,
Mar 20, 2008, 8:00:04 AM3/20/08
to rubyonra...@googlegroups.com

I don't think we'd want to change this default unconditionally.
Perhaps update_attribute(s) should do a partial update iff the table
had a lock attribute, or the model otherwise declared the desire for
partial update.


--
Rick DeNatale

My blog on Ruby
http://talklikeaduck.denhaven2.com/

Nate Wiger

unread,
Mar 20, 2008, 11:52:02 AM3/20/08
to Ruby on Rails: Core
Interesting post by Pratik. I don't know how much real-world impact
it would really have, though. I've never run into the situation where
I'm save()'ing a record, followed by doing lots more stuff with it.
Normally you redirect and go to another resource.

Nevertheless, I understand why we couldn't change the default
behavior.

For both of these features, I'd recommend AR::B class settings so the
user can control them:

ActiveRecord::Base.eager_loading = true
ActiveRecord::Base.partial_updates = true

That way, it could be set in environment.rb, and overridden on a per-
model basis:

environment.rb
ActiveRecord::Base.eager_loading = true
ActiveRecord::Base.partial_updates = true

class CouldHaveInconsistency < AR::Base
partial_updates = false
end


Finally, if a user wanted to do optimistic locking to better ensure
their partial_updates are ok, then they would do that simply by adding
lock_version as is currently. However, I think the two pieces of
functionality should be kept separate, so the user can choose.

Sound good?


Michael Koziarski

unread,
Mar 23, 2008, 7:49:00 PM3/23/08
to rubyonra...@googlegroups.com
> Finally, if a user wanted to do optimistic locking to better ensure
> their partial_updates are ok, then they would do that simply by adding
> lock_version as is currently. However, I think the two pieces of
> functionality should be kept separate, so the user can choose.
>
> Sound good?

This has come up a bunch of times here and doesn't tend to get
resolved. In hibernate this feature is called 'dynamic-update', and
you can have a look at their forums etc to see the potential issues
that people hit. It's worth noting that this feature requires
optimistic locking.

I'm certainly interested in seeing patches which implement partial
updates like this, but it's simply not safe to use unless the object
is locked in some way. Otherwise you have *no* way to know that the
values in a row correspond to a valid object. This isn't something
that rails should allow, the risks of corrupted / invalid data FAR
outweigh the difficulty of adding a locking column.


--
Cheers

Koz

Rick Olson

unread,
Mar 24, 2008, 4:22:16 AM3/24/08
to rubyonra...@googlegroups.com
> 1) Eager association loading
>
> Currently, if you have a model like this:
>
> class Forum < AR
> has_many :posts
> end
> class Post < AR
> belongs_to :forum
> end
>
> Then walk thru these records:
>
> <% for forum in @forums %>
> Title: <%= forum.post.subject %>
> <% end %>
>
> Currently, this will do an extra SQL select * from post for each loop.

That's not completely true true. ActiveRecord has basic query
caching, so (in your example), each forum would only be loaded once.
There's been interest in porting my active_record_context plugin into
ActiveRecord though. I have a writeup on it at
http://activereload.net/2007/5/23/spend-less-time-in-the-database-and-more-time-outdoors.
It's basically a simpler, more manual implementation of DM's
association eager loading. There's been some work done to the eager
including functionality that makes my plugin less useful, but
something more automatic would be welcome.

>
> 2) Optimized updates
>
> Currently, if one attribute is changed in an AR model, then the entire
> model is parsed and sent as an UPDATE statement. This is not really
> desirable from a network or database standpoint.

Jeremy (bitsweat) has a plugin for tracking updates to ActiveRecord
attributes: http://code.bitsweat.net/svn/dirty/README It sounds like
this will be going into core. With that, it'd make it pretty easy to
perform dynamic updates. Koz's concerns are definitely valid
though...

--
Rick Olson
http://lighthouseapp.com
http://weblog.techno-weenie.net
http://mephistoblog.com

DHH

unread,
Mar 24, 2008, 5:17:04 PM3/24/08
to Ruby on Rails: Core
> DataMapper (datamapper.org) has an interesting approach: The first
> time that a sub-association is loaded, then all associations are
> preloaded, since most times you're in a loop (as above).  While
> the :include option helps with this somewhat, when passing collection
> objects around, you often don't know how the associations are going to
> be used.

Delayed execution is nice. Definitely worth exploring.

> 2) Optimized updates

Jeremy Kemper has a dirty tracking plugin that's going into Active
Record shortly. That'll solve the first part of the problem (knowing
what changed). The second part, just saving what changed, is a little
bit more tricky. Records can easily have interdependent fields where a
record is only valid if the two fields together fulfill some
requirement. If you save only changed fields, you risk ending up with
an invalid object. Locking can help mitigate this, but it's something
that needs to be considered as part of the implementation.
--
David Heinemeier Hansson

DHH

unread,
Mar 24, 2008, 8:11:19 PM3/24/08
to Ruby on Rails: Core
> > 2) Optimized updates
>
> Jeremy Kemper has a dirty tracking plugin that's going into Active
> Record shortly. That'll solve the first part of the problem (knowing
> what changed). The second part, just saving what changed, is a little
> bit more tricky. Records can easily have interdependent fields where a
> record is only valid if the two fields together fulfill some
> requirement. If you save only changed fields, you risk ending up with
> an invalid object. Locking can help mitigate this, but it's something
> that needs to be considered as part of the implementation.

Actually, I'm coming around on this. The common case is that you don't
have validation concerns that implicate multiple attributes. The
common case is that it's nice if someone updating the title of a post
won't overwrite the separate changes made to the body by someone else.
We should certainly make it possible and easy to enforce full-
integrity writes, but I think the work should lie with people doing
that.

Naturally, we can't just change this as existing implementations may
break. But I think we should make partial updates the default for apps
created with 2.1. And then finally the default for all by 3.0. We can
do this by making an option like config.active_record.partial_updates
= true that's only set for apps created post-2.1 and when that option
is not set, we complain in the log about the deprecated status.

This should ensure that old apps continue to work just fine and can
upgrade to the partial-update style at their leisure while everything
new enjoys the revised style.

Worth noting is that the dirty tracking that facilitates this relies
on no changes in place. So post.title << "more" won't trigger a dirty
mark and thus won't include post.title in an update. Which is another
reason to make sure that existing apps don't get the partial updates
before they're ready for it.
--
David Heinemeier Hansson

Michael Koziarski

unread,
Mar 24, 2008, 8:40:51 PM3/24/08
to rubyonra...@googlegroups.com
> Naturally, we can't just change this as existing implementations may
> break. But I think we should make partial updates the default for apps
> created with 2.1. And then finally the default for all by 3.0. We can
> do this by making an option like config.active_record.partial_updates
> = true that's only set for apps created post-2.1 and when that option
> is not set, we complain in the log about the deprecated status.

I think an ever better and safer option is to add optimistic locking
based on updated_at. Most models have this now, and it's a useful
field (compared to an opaque incrementing integer). That way we can
enable it by default, and not worry about busted validations.

If we really want to we can then add a "I know what I'm doing, do
partial updates without a lock" option, but the fact that we'd be the
first orm to do this makes me question the sanity of even considering
that option :).

> This should ensure that old apps continue to work just fine and can
> upgrade to the partial-update style at their leisure while everything
> new enjoys the revised style.
>
> Worth noting is that the dirty tracking that facilitates this relies
> on no changes in place. So post.title << "more" won't trigger a dirty
> mark and thus won't include post.title in an update. Which is another
> reason to make sure that existing apps don't get the partial updates
> before they're ready for it.
>
>
> --
> David Heinemeier Hansson
> >
>

--
Cheers

Koz

Alexey Verkhovsky

unread,
Mar 24, 2008, 9:07:43 PM3/24/08
to rubyonra...@googlegroups.com
On Mon, Mar 24, 2008 at 6:40 PM, Michael Koziarski
<mic...@koziarski.com> wrote:
> I think an ever better and safer option is to add optimistic locking
> based on updated_at.

Uniqueness of timestamps is not a safe assumption. It will find a way
to bite you, eventually.

--
Alexey Verkhovsky
CruiseControl.rb [http://cruisecontrolrb.thoughtworks.com]
RubyWorks [http://rubyworks.thoughtworks.com]

Assaf Arkin

unread,
Mar 24, 2008, 11:12:42 PM3/24/08
to rubyonra...@googlegroups.com
On Mon, Mar 24, 2008 at 6:07 PM, Alexey Verkhovsky
<alexey.v...@gmail.com> wrote:
>
> On Mon, Mar 24, 2008 at 6:40 PM, Michael Koziarski
> <mic...@koziarski.com> wrote:
> > I think an ever better and safer option is to add optimistic locking
> > based on updated_at.
>
> Uniqueness of timestamps is not a safe assumption. It will find a way
> to bite you, eventually.

Particularly fun when you get the code working smoothly on the dev
box, only to fail randomly on more expensive and faster deployment
server.

Version numbers work better, handled by optimistic locking, and a
cheap way to calculate ETag on your models.

Assaf

Michael Koziarski

unread,
Mar 24, 2008, 11:23:30 PM3/24/08
to rubyonra...@googlegroups.com
On Tue, Mar 25, 2008 at 2:07 PM, Alexey Verkhovsky
<alexey.v...@gmail.com> wrote:
>
> On Mon, Mar 24, 2008 at 6:40 PM, Michael Koziarski
> <mic...@koziarski.com> wrote:
> > I think an ever better and safer option is to add optimistic locking
> > based on updated_at.
>
> Uniqueness of timestamps is not a safe assumption. It will find a way
> to bite you, eventually.

This is certainly a risk, but it's much less likely than it seems at
first. The particular values of the timestamp column are just opaque
keys used for checking no one else has updated it. I certainly agree
that a version number is the way to go, but if we wanted to push
partial updates out more widely, this would be a low risk way of doing
it.

--
Cheers

Koz

Alexey Verkhovsky

unread,
Mar 24, 2008, 11:36:29 PM3/24/08
to rubyonra...@googlegroups.com
On Mon, Mar 24, 2008 at 9:23 PM, Michael Koziarski
<mic...@koziarski.com> wrote:
> This is certainly a risk, but it's much less likely than it seems at
> first.

In distant, but painful memory, it turned out to be *more* likely than
it seemed at first. Likely enough, in that particular scenario, that
it actually happened once in a couple of weeks. And by virtue of being
unreproducible, this bug caused a lot of hurt.

Frederick Cheung

unread,
Mar 25, 2008, 4:53:58 AM3/25/08
to rubyonra...@googlegroups.com

On 25 Mar 2008, at 03:36, Alexey Verkhovsky wrote:

>
> On Mon, Mar 24, 2008 at 9:23 PM, Michael Koziarski
> <mic...@koziarski.com> wrote:
>> This is certainly a risk, but it's much less likely than it seems at
>> first.
>
> In distant, but painful memory, it turned out to be *more* likely than
> it seemed at first. Likely enough, in that particular scenario, that
> it actually happened once in a couple of weeks. And by virtue of being
> unreproducible, this bug caused a lot of hurt.

Given that the entire problem that optimistic locking is trying to
solve is multiple people fiddling with something at the same time
you'd think the odds of a 'collision' would be relatively likely when
there were indeed multiple people touching the row in the database at
close together times.

Fred

Brian Mcquay

unread,
Mar 26, 2008, 8:59:31 PM3/26/08
to rubyonra...@googlegroups.com
Nate Wiger wrote:
> 2) Optimized updates
>
> Currently, if one attribute is changed in an AR model, then the entire
> model is parsed and sent as an UPDATE statement. This is not really
> desirable from a network or database standpoint.
>
> There are already some counters in AR for monitoring changes. Is
> there a per-attribute @changed data structure? It seems like this
> could just be set to true when updating a column. Then, on save, only
> those with the @changed flag are actually converted to SQL.

It isn't quite the solution to the problem, just a hack around it, but
it solves the problem of saving just the pieces of a model I need. See:

http://lists.rubyonrails.org/pipermail/rails/2006-May/039499.html

But that didn't work for me exactly. I had to include some of the model
associations and some of the columns that require validations, like
"enforces_length_of :title" for instance. Adding that to the :select =>
[] and doing an update_attribute instead of a object.save successfully
saved just want I wanted to without saving the entire model object.

position = params[:position]
if section = @stuff.sections.find(params[:id], :select => 'sections.id,
sections.bogus_id, sections.title, sections.position')
section.update_attribute(:position, position) if section.position !=
position
end

This is what I ended up with. All of these objects are owned and edited
by individual users of the system so I think its safe to assume sure
there won't be any concurrency issues.
--
Posted via http://www.ruby-forum.com/.

Brian Mcquay

unread,
Mar 26, 2008, 9:06:44 PM3/26/08
to rubyonra...@googlegroups.com
Brian Mcquay wrote:
>
> It isn't quite the solution to the problem, just a hack around it, but
> it solves the problem of saving just the pieces of a model I need. See:
> ...

I posted my previous comment prematurely. Upon further testing it isn't
working properly. Please disregard my previous comments.

Sam Smoot

unread,
Mar 26, 2008, 11:56:56 PM3/26/08
to Ruby on Rails: Core
On Mar 25, 3:53 am, Frederick Cheung <frederick.che...@gmail.com>
wrote:
I've never heard it put so well. I'm going to steal this if you don't
mind. ;-)

@DHH:

> Worth noting is that the dirty tracking that facilitates this relies
> on no changes in place. So post.title << "more" won't trigger a dirty
> mark and thus won't include post.title in an update. Which is another
> reason to make sure that existing apps don't get the partial updates
> before they're ready for it.

That's pretty drastic (in a good way ;-) ). Ruby would be a nicer
place (sometimes) with the option of cheap, immutable Strings of
course. I'm in the same boat... there's several options, and none of
them nice. Original-value comparison (memory requirements go up, slows
down having to #dup everything), immutable conventions (not "safe"),
writing your own immutable string (what a rabbit-hole!)?

If you can manage to persuade the community on ignoring the mutable
methods by convention, I'll be happy to follow your lead... :-D

DHH

unread,
Mar 27, 2008, 11:25:19 AM3/27/08
to Ruby on Rails: Core
> If you can manage to persuade the community on ignoring the mutable
> methods by convention, I'll be happy to follow your lead... :-D

That's going to be the way for Active Record attributes. If you want
dirty tracking (and partial updates), you don't get to do inplace
modifications.

Murray Steele

unread,
Mar 31, 2008, 9:14:42 AM3/31/08
to rubyonra...@googlegroups.com

I think this might be a seriously awful suggestion because I've not thought it through beyond a 30sec irb trial, but could you .freeze what model.attribute supplies so that inplace mods won't work?  In my head it seems bad as I'm sure there are obvious holes in such an idea, but I can't quite work out what they are.

For my own peace of mind, can someone convince me why it's bad?

Muz



Jeremy Kemper

unread,
Mar 31, 2008, 1:35:01 PM3/31/08
to rubyonra...@googlegroups.com
On Mon, Mar 31, 2008 at 6:14 AM, Murray Steele <murray...@gmail.com> wrote:
> I think this might be a seriously awful suggestion because I've not thought
> it through beyond a 30sec irb trial, but could you .freeze what
> model.attribute supplies so that inplace mods won't work? In my head it
> seems bad as I'm sure there are obvious holes in such an idea, but I can't
> quite work out what they are.
>
> For my own peace of mind, can someone convince me why it's bad?

This is what my dirty plugin did. It works great in most cases, but
falls down in others (Date#inspect modifies the instance, for
example).

Just freezing String attributes works well, to prevent << and .gsub!

Anyone have compelling reasons not to treat attribute values as immutable?

jeremy

Nilez Parker

unread,
Apr 12, 2008, 7:43:42 PM4/12/08
to rubyonra...@googlegroups.com
Huh. Funny thing, I submitted this functionality to the rails core in
May 2007 (http://dev.rubyonrails.org/ticket/8520), but Jeremy turned it
down as "This is cool but uncommonly needed" and mentioned "2x memory
consumption"... I posted my implementation at
http://blog.behindlogic.com/2007/08/rails-plugin-changedattributes.html
in August, and it turns out what went into trunk two weeks ago
(http://dev.rubyonrails.org/browser/trunk/activerecord/lib/active_record/dirty.rb?rev=9127)
uses the exact same logic and memory footprint.

Oh well. (no hard feelings!) I guess I'm glad it's going in after all.

Right now I'm working on "Lazy-loading facilitating smart
eager-loading." (If you wait to load an object until you actually need
it, you might find out you also need an associated object, or a whole
group of them, and you can reduce it to one query.) Fun stuff. :)

Michael Koziarski

unread,
Apr 12, 2008, 8:06:24 PM4/12/08
to Ruby on Rails: Core
> Oh well. (no hard feelings!) I guess I'm glad it's going in after all.

I personally prefer people who can change their mind on an issue when
faced with convincing new evidence. One wonders how the world would
look if this was a more common trait ;)

Thanks for your effort back then, and if you have any additional
features in mind, or old rejected-patches that you think we should
reconsider, let us know :)

--
Cheers

Koz


Chad Woolley

unread,
Apr 12, 2008, 10:54:30 PM4/12/08
to rubyonra...@googlegroups.com
On Sat, Apr 12, 2008 at 5:06 PM, Michael Koziarski <kozi...@gmail.com> wrote:
> I personally prefer people who can change their mind on an issue when
> faced with convincing new evidence. One wonders how the world would
> look if this was a more common trait ;)

Umm... Better, I'm guessing?

-- Chad

Reply all
Reply to author
Forward
0 new messages