The tests aren't particularly contrived in this case, I've written
similar stuff in the past. However the large performance boost you're
seeing is probably worth investigating whether we can apply those
joins a little more selectively, or let users opt out of them.
My understanding is that the conditions / joins *have* to be applied
in order for limit / offset to work with an :order or :conditions when
using eager loading. If you apply the limit / offset to the final
query, it's not limiting correctly, and if you leave the order or
conditions off that query your paginated result set will behave
erratically, with varying numbers per page, or repeated entities.
For an example:
@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>["tickets.open = ? and customers.priority = ?", true,
true], :include=>[:customers])
Now if you start paginating that list, your stuff will cause bugs?
--
Cheers
Koz
The tests aren't particularly contrived in this case, I've written
similar stuff in the past. However the large performance boost you're
seeing is probably worth investigating whether we can apply those
joins a little more selectively, or let users opt out of them.
My understanding is that the conditions / joins *have* to be applied
in order for limit / offset to work with an :order or :conditions when
using eager loading. If you apply the limit / offset to the final
query, it's not limiting correctly, and if you leave the order or
conditions off that query your paginated result set will behave
erratically, with varying numbers per page, or repeated entities.
For an example:
@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>[" tickets.open = ? and customers.priority = ?", true,
true], :include=>[:customers])
Now if you start paginating that list, your stuff will cause bugs?
(Of course I'm not after particulary blaming somebody personally,
but ...) As far as I can see, this is an example where even a member
of the Rails core team mistakenly assumed that in:
Article.find(:all, :include => :tags, :conditions => [' tags.name IN
(?)', tag_names])
... this would return articles that have tags with certain names
associated (i.e. condition applied), but all of them pre-populated
with *all* tags (i.e. condition not applied).
Which makes sense to me. Because the find was called on Article model.
--
Cheers!
- Pratik
http://m.onkey.org
I think I see where you're coming from for the :conditions on
:include. In fact I've worked around this exact issue with something
criminal like:
has_many :permissions
has_many :hax_permissions, :class_name=>"Permission"
Whatever.find(:all, :include=>:hax_permissions,
:conditions=>"permissions.group_id IN (?" ...
Otherwise :permissions isn't a full collection.
If you have time / ability to jump into #rails-contrib, I think this
will go a bunch quicker :)
--
Cheers
Koz
> > ... this would return articles that have tags with
> certain names
> > associated (i.e. condition applied), but all of them
> pre-populated
> > with *all* tags (i.e. condition not applied).
Which makes sense to me. Because the find was called on Article model.
Just went through your patches and relevant code in AR. And I'd
personally go with your 1st patch. I think the current behavior could
be very misleading to people who are not aware of AR internals and the
results might not be what they expected.
Also, I guess allowing conditions on eager loaded association can
easily make people write bad code. It's very likely that those
conditions deserves a separate association, at least in many cases.
For other cases, explicit join should be justified.
E.g. in Koz' example :
@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>["tickets.open = ? and customers.priority = ?", true,
true], :include=>[:customers])
I guess it'd make more sense if Ticket model looks has :
has_many :customers
has_many :priority_customers, :class_name => "Customer", :conditions
=> "customers.priority = 1"
And then the query should like like :
@tickets_from_priority_customers = Ticket.find(:all,
:conditions=>["tickets.open = ?", true],
:include=>[:priority_customers])
That'd be cleaner and more efficient.
Having said that, in any case, the patch 2 is a must if most of the
people don't agree with patch 1.
So my votes are +2/+1 for your 2 patches, in order you uploaded them.
http://darwinweb.net/article/Optimizing_And_Simplifying_Limited_Eager_Loading_In_Activerecord
Check the prequery for :
Album.find(:all, :limit => 10, :include => [:songs, :artists],
:conditions => 'songs.active = 1')
So the issue is that once it decides that *any* of the :include
tables are referenced in :conditions it joins *all* of them - even
though it's possible that some of the includes are not referenced.
Personally I've not been bitten by performance issues with the
prequery but if your non-destructive-patch improves that situation
I'd say it was a good idea.
Is your patch intelligent enough to cope with this?:
Album.find(:all, :limit => 10, :include => [{:songs
=> :uploads}, :artists],
:conditions => 'uploads.errored = 0')
where the prequery joins songs and uploads but not artists?
I bring it up because that strikes me as a reason why it just does
all of the joins - bit of a pain to tease apart exactly which joins
are absolutely required (not impossible - just a pain).
Regards,
Trevor
Thanks for the reply, I believe I understand now.
So the issue is that once it decides that *any* of the :include
tables are referenced in :conditions it joins *all* of them - even
though it's possible that some of the includes are not referenced.
Personally I've not been bitten by performance issues with the
prequery but if your non-destructive-patch improves that situation
I'd say it was a good idea.
Is your patch intelligent enough to cope with this?:
Album.find(:all, :limit => 10, :include => [{:songs
=> :uploads}, :artists],
:conditions => 'uploads.errored = 0')
where the prequery joins songs and uploads but not artists?
I bring it up because that strikes me as a reason why it just does
all of the joins - bit of a pain to tease apart exactly which joins
are absolutely required (not impossible - just a pain).
Gabe, why don't you submit a separate ticket with your second patch (
as the current ticket has a lot of negative noise and could be
misleading ) and try get it in first. And then continue your main
battle for the first patch (the original one).
Just a suggestion.
Gabe, why don't you submit a separate ticket with your second patch (
as the current ticket has a lot of negative noise and could be
misleading ) and try get it in first. And then continue your main
battle for the first patch (the original one).
I guess the issue I have with your destructive patch is that it hobbles existing behavior which (for me) returns a predictable dataset.
I've always known that :conditions which relate to :included associations can cause only part of the association's data to be loaded (I think that's been referred to as the mephisto bug). When I've used it in the past it's been because that's exactly what I want.
Setting aside the performance issues (which you address in your non-destructive patch), the problem as I see it is that there is no way to tell find() that you want certain conditions applied in the prequery and a *different* set of conditions applied in the data-loading query (which I'll call 'realquery' from now on).
For the times when I have wanted *all* of the association's data yet wanted to limit the top-level selection based on association conditions I've done my own prequery (with a limit) to grab the id values I want to present. I wrote a plugin that makes these prequeries a little easier to generate - http://svn.protocool.com/public/plugins/values_of
That allows me to have 2 sets of :conditions - one for the prequery and one for the realquery - and guarantees that my constraints are always true for my returned data.
So... what I'm thinking is this: what about adding a new option to find() that allows you to specify 'prequery-only' conditions? For example, something like a :limit_conditions option that utilized your non-destructive patch?
* The :limit_conditions option would only be valid in conjunction with :limit.* Both :limit_conditions *and* :conditions options would be applicable to the prequery.* Your non-destructive patch would determine the minimum joins required in the prequery based on :limit_conditions and :conditions* The :conditions would only be applied to the realquery
Ultimately I have to admit that when performance is a concern or the relations are hairy I will probably continue to reach for values_of() and another (soon to be released) plugin that hydrates associations with subsequent queries rather than with big-honking-joins.