Broken has_many

89 views
Skip to first unread message

Philip Edelbrock

unread,
Aug 17, 2014, 9:27:02 PM8/17/14
to rubyonra...@googlegroups.com

This has been killing me this weekend. Doing a has_many fetch is accurate the *first* time only.

Here's a simple example:

tp = TestParent.new(:name => "testing")
tp.save!

tc = TestChild.new(:name => "test1", :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => "test2", :test_parent_id => tp.id)
tc.save!
tc = TestChild.new(:name => "test3", :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, and test3" *CORRECT*

tc = TestChild.new(:name => "test4", :test_parent_id => tp.id)
tc.save!
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, and test3" *WRONG!*

tp = TestParent.find(tp.id)
tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, test3, and test4" *CORRECT*


In my case, we were preparing an order of line items (the children of the order) and then fetching the results to get a subtotal to add one last line item (think tax or such) but then the last line item is lost on the following receipt page, mailers, etc.

This is Rails 4.0.8, Pg 0.17.1, ruby 2.1.1p30 (2014-02-10 revision 44904) [x86_64-linux]. Models and migration are very, very basic in the test above:

---- migration
class HasManyTest < ActiveRecord::Migration

def self.up
execute %{

CREATE TABLE test_parents (
id serial primary key,
created_at timestamp ,
updated_at timestamp,
name varchar(255)
);

CREATE TABLE test_children (
id serial primary key,
created_at timestamp ,
updated_at timestamp,
name varchar(255),
test_parent_id int4 references test_parents(id)
);

}
end

def self.down
raise "No you can't!"
end

end
----

----
class TestParent < ActiveRecord::Base

has_many :test_children

end
----

----
class TestChild < ActiveRecord::Base

belongs_to :test_parent

end
----

Is this a known problem? It's hard to believe others haven't experienced this? :') Or is this an obscure result of some sort of caching or such I should be turning off?

Thanks!


Phil

Phil

unread,
Aug 17, 2014, 9:57:05 PM8/17/14
to rubyonra...@googlegroups.com

I hate to reply to myself, but I narrowed it down to Rails caching by DEFAULT of model queries.  It can be worked around by passing 'true', like this:


tc.name = "something different"

tc.save!

tp.test_children.map {|x| x.name }.to_sentence # => "test1, test2, test3, and test4" *WRONG*

tp.test_children(true).map {|x| x.name }.to_sentence # => "test1, test2, test3, and something different" *CORRECT*


Another workaround is to just stop using has_many and such in favor of manual functions, a la:

 def test_children

  return TestChild.where(["test_parent_id = ?", self.id])

 end


Is there a way to turn this sort of caching off globally?  (Other caching is fine, I don't want to turn all caching off.)


BTW- It is a bit mind blowing that this is turned on by default.  Possible data corruption shouldn't ever be preferred by default over (possible) speed gains.  I'd still categorize this as a serious bug, at least as a configuration default.


Thanks!


Phil

tamouse pontiki

unread,
Aug 18, 2014, 12:17:10 AM8/18/14
to rubyonra...@googlegroups.com
Try tp.reload after you make a change in the collection the way you have been. If you had been building your children entries as:

tp.children.create(name: "test1")

tp would be fully cognizant of all it's children.

Phil

unread,
Aug 18, 2014, 12:29:06 AM8/18/14
to rubyonra...@googlegroups.com

Thanks for the reply.  It still seems to be a systemic integrity problem.  But the additional info on another work around is welcome.


Phil

Frederick Cheung

unread,
Aug 18, 2014, 4:57:08 AM8/18/14
to rubyonra...@googlegroups.com

On Monday, August 18, 2014 2:57:05 AM UTC+1, Phil wrote:

Is there a way to turn this sort of caching off globally?  (Other caching is fine, I don't want to turn all caching off.)


BTW- It is a bit mind blowing that this is turned on by default.  Possible data corruption shouldn't ever be preferred by default over (possible) speed gains.  I'd still categorize this as a serious bug, at least as a configuration default.


Not that I know of. As of rails 4 (or is it 3.2?) the generated association accessors are defined in a module (rather than directly on the class so it's easy enough to override them so you could do

class TestParent
   has_many :test_children

   def test_childen(force_reload=true)
     super(force_reload)
   end
end 

To change it for all associations without having to do this on a per association basis would probably require some monkey patching inside the activerecord code - sounds brittle.

As tamouse says if you use build/create on the association then you won't see this behaviour, although there can still be differences between the array thus constructed in memory and the array you'd get if you had selected from the database (for example if you have an order clause on the association, custom select  etc.)

Rails has been like this at least since the 1.0 days - I can't say I've ever run into particular issues from it.

Fred

Jim

unread,
Aug 18, 2014, 8:25:58 AM8/18/14
to rubyonra...@googlegroups.com
It's not a systemic integrity problem, it is the way Rails has always worked.  Using tp.test_children.create() is not "another work-around", it is the recommended way of adding children to a parent model that you have already instantiated and has been available for as long as I can remember (at least since rails 2.0.x).  

Scenarios where you would actually need to re-query the database *every time you access a relation* are rare.  If you really need to, you have that option, but in no way should that be a default.

Jim

Matt Jones

unread,
Aug 18, 2014, 11:07:37 AM8/18/14
to rubyonra...@googlegroups.com
The cleanest way to deal with this is to *use* the association. Instead of:

tc = TestChild.new(:name => "test4", :test_parent_id => tp.id)

do:

tp.test_children.new(:name => "test4")

This will correctly reflect the new record when subsequently calling `tp.test_children`.

--Matt Jones

Phil

unread,
Aug 18, 2014, 8:15:37 PM8/18/14
to rubyonra...@googlegroups.com

Ah, yes, I see it now in activerecord-4.0.8/lib/active_record/associations/collection_association.rb

OK, makes sense.  I've love to have that be a config option I could throw in the environments config files to turn that off, if needed.


Rails has been like this at least since the 1.0 days - I can't say I've ever run into particular issues from it.

In my case it was rather nasty.  It went something like this:

@order = Order.new ...
... record line items..
@order.line_items... calculate subtotal to calculate tax and record that as the final line item
run charge on credit card, render receipt page, send mailers, etc.

The result was the customer got receipt page/emails and charged without tax!  Fulfillment and Accounting pulling up the order would confusingly get the right total and tax.

Thanks, I have a much better understanding of what's going on now.


Phil

Phil

unread,
Aug 18, 2014, 8:46:26 PM8/18/14
to rubyonra...@googlegroups.com
(I apologize in advance for top posting, but trying to keep this thread consistent.)

I would say it's a definite problem for us.  This project is an old legacy project that several people have been working on for 6+ years.  When I'm rewriting, say, the receipt page and mailer, how could I possibly know in advance that @order.line_items would give stale results without digging through the details of the controller?  If I have to do @order.line_items(true) (and with every other has_many, etc.) everywhere just to be sure, I'd rather just simply like to have the feature turned off globally.

While it can be 'rare' that somebody might run into this, I'd still argue that significant performance gains of caching associations like this is also rare/minimal (would be interesting to test).  I'd rather lean towards data accuracy than speed, at least in my case.

I'd simply like a simple config option to turn on/off this feature.  I'd also suggest that it could even be dangerous to have on by default if things have to be done in particular ways to get around the potential pit falls, but I'd just be happy to have a config option to turn it off.

Thanks for the reply!


Phil

Steve Robinson

unread,
Aug 19, 2014, 4:49:14 AM8/19/14
to rubyonra...@googlegroups.com


On Tuesday, August 19, 2014 6:16:26 AM UTC+5:30, Phil wrote:

While it can be 'rare' that somebody might run into this, I'd still argue that significant performance gains of caching associations like this is also rare/minimal (would be interesting to test).  I'd rather lean towards data accuracy than speed, at least in my case.

I'd simply like a simple config option to turn on/off this feature.  I'd also suggest that it could even be dangerous to have on by default if things have to be done in particular ways to get around the potential pit falls, but I'd just be happy to have a config option to turn it off.

Thanks for the reply!


Phil

On Monday, August 18, 2014 5:25:58 AM UTC-7, Jim wrote:
It's not a systemic integrity problem, it is the way Rails has always worked.  Using tp.test_children.create() is not "another work-around", it is the recommended way of adding children to a parent model that you have already instantiated and has been available for as long as I can remember (at least since rails 2.0.x).  

Scenarios where you would actually need to re-query the database *every time you access a relation* are rare.  If you really need to, you have that option, but in no way should that be a default.

Jim



Hi,

I disagree with you on your statement - "significant performance gains of caching associations like this is also rare/minimal".
Almost all the apps that I've built had pages where I've used the same association so many times for rendering the view and disabling caching in those cases would be an absolute performance disaster! Think of all the other problems it can cause... eager loading of associations? 

Well like Jim said its very rare that someone has to do what you're doing and even if they have to do it, Rails has a way of doing it properly :)

Thanks!

Steve
--
@stev4nity

Jason Fleetwood-Boldt

unread,
Aug 19, 2014, 10:41:03 AM8/19/14
to rubyonra...@googlegroups.com

I would recommend you start by getting all that domain logic out of the Controllers and into context or composite objects. Check out http://en.wikipedia.org/wiki/Data,_context_and_interaction or James Coplien's excellent book "Lean Architecture" in which he discusses DCI at length.

This doesn't actually solve the specific Rails-doesn't-reload-associations problem you're having, and in fact I've worked on an e-commerce codebase with the exact same problem you describe. We had several @order.reload calls because of the way Rails is implemented. 

Moving to a DCI pattern doesn't eliminate the Rails problem here, but it encapsulates the business logic into a single context object whose responsibility is to both do the operations and then tell the object to reload correctly, producing the expected results. You can test the context object independently of the controller, which is ideal.

I actually think there is significant performance gain from not reloading associations in general, and in e-commerce specific apps you simply need to know this about Rails and suck it up and do @order.reload when you make certain operations. Once you realize this is just how Rails works, and you move that domain logic into a place more appropriate than the controller, it starts to bother you less that you have to do it in the first place.

Just my two ¢

-Jason


--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rubyonrails-ta...@googlegroups.com.
To post to this group, send email to rubyonra...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/rubyonrails-talk/319cce88-0041-4a95-9062-1e0cdd80373d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Phil

unread,
Aug 19, 2014, 9:01:36 PM8/19/14
to rubyonra...@googlegroups.com


Thanks for the reply Steve!

I guess it's hard to say what the advantages are without testing... Assuming you are already indexing all of your id and *_id database columns and SQL query caching is on, I'm not sure what advantages association caching is actually providing.  A test might be to edit collection_association.rb from the gem manually to set force_refresh = true and do some benchmarking (which I'd like to do when I get some time).  Ironically, 37 Signals has some very good articles on how to do caching properly without worrying about getting stale data, this is a basic example:

https://signalvnoise.com/posts/3113-how-key-based-cache-expiration-works

I'm not sure what you mean by the 'rails way' in this case.  Usually the rails way means not having to worry about cache management by default.  In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong.  This just feels like an ugly hack to try to make rails faster during the days when ruby was really, really, slow (thankfully the latest since 2.0.0 is MUCH faster).


Phil

Jim

unread,
Aug 20, 2014, 2:21:34 AM8/20/14
to rubyonra...@googlegroups.com

On Tuesday, August 19, 2014 9:01:36 PM UTC-4, Phil wrote:

I guess it's hard to say what the advantages are without testing... Assuming you are already indexing all of your id and *_id database columns and SQL query caching is on, I'm not sure what advantages association caching is actually providing.  A test might be to edit collection_association.rb from the gem manually to set force_refresh = true and do some benchmarking (which I'd like to do when I get some time). 

If your dataset is small and properly indexed, and you have very few related items, chances are you will see very little change in performance from reloading every time you access the order line items.  If you have several hundred line items, things may be a little different.  If your database is on a different machine than the web server, things may also be a little different.  
 
Ironically, 37 Signals has some very good articles on how to do caching properly without worrying about getting stale data, this is a basic example:

Ironically, this has nothing to do with what you are having problems with.  You will still have the "stale" object.  (Unless you are suggesting that Rails should check the updated_at for every ActiveRecord object every single time you access it. In that case, good luck. I doubt you will ever find any ORM so poorly written.)

I'm not sure what you mean by the 'rails way' in this case.  Usually the rails way means not having to worry about cache management by default.

I don't worry about cache management by default.  In addition, I don't have to worry about how to start using the database caching, and when I might need it.
 
  In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong.  This just feels like an ugly hack to try to make rails faster during the days when ruby was really, really, slow (thankfully the latest since 2.0.0 is MUCH faster).

If your order line items are wrong, that is because somewhere in the code, somebody made some incorrect assumptions and wrote some incorrect code, examples of which have already been pointed out.  Thankfully, the goal of the framework is *not* to make sure that can never happen by default, with the side effect that you may need to make drastic changes to the code if you ever want to start caching.  The framework does provide multiple ways to create related many records in a way that updates the relation in the parent object.

In your case, you can either find that problem and fix it, or simply force reload the items in the mailer you are working on.  To complain about bad code as a "Rails Problem" is not useful.  Bad code can be written in any language, using any framework.

To be honest, I have written code that requires a reload where certain functionality is invoked.  I have a callback on deleting a related many that modifies a value in the parent.  However, that code really should be refactored such that the children are not destroyed directly, but instead a method on the parent should destroy the child. 

Jim

Phil

unread,
Aug 20, 2014, 11:37:03 PM8/20/14
to rubyonra...@googlegroups.com


On Tuesday, August 19, 2014 11:21:34 PM UTC-7, Jim wrote:

On Tuesday, August 19, 2014 9:01:36 PM UTC-4, Phil wrote:

I guess it's hard to say what the advantages are without testing... Assuming you are already indexing all of your id and *_id database columns and SQL query caching is on, I'm not sure what advantages association caching is actually providing.  A test might be to edit collection_association.rb from the gem manually to set force_refresh = true and do some benchmarking (which I'd like to do when I get some time). 

If your dataset is small and properly indexed, and you have very few related items, chances are you will see very little change in performance from reloading every time you access the order line items.  If you have several hundred line items, things may be a little different.  If your database is on a different machine than the web server, things may also be a little different.  

Yes, you are probably right(?).
 
 
Ironically, 37 Signals has some very good articles on how to do caching properly without worrying about getting stale data, this is a basic example:

Ironically, this has nothing to do with what you are having problems with.  You will still have the "stale" object.  (Unless you are suggesting that Rails should check the updated_at for every ActiveRecord object every single time you access it. In that case, good luck. I doubt you will ever find any ORM so poorly written.)

No no, I mean there's already a layer of SQL caching that takes care of repeated identical selects between updates and inserts.  Is there not?  (I think there is?)  Association caching seems unneeded(?) or at least a lot less intelligent than the existing SQL query caching.
 

I'm not sure what you mean by the 'rails way' in this case.  Usually the rails way means not having to worry about cache management by default.

I don't worry about cache management by default.  In addition, I don't have to worry about how to start using the database caching, and when I might need it.
 
  In my case I have a mailer that simply looks like this:

def customer_receipt(order)

In the view it does @order.line_items and they are sometimes wrong.  This just feels like an ugly hack to try to make rails faster during the days when ruby was really, really, slow (thankfully the latest since 2.0.0 is MUCH faster).

If your order line items are wrong, that is because somewhere in the code, somebody made some incorrect assumptions and wrote some incorrect code, examples of which have already been pointed out.  Thankfully, the goal of the framework is *not* to make sure that can never happen by default, with the side effect that you may need to make drastic changes to the code if you ever want to start caching.  The framework does provide multiple ways to create related many records in a way that updates the relation in the parent object.

Before blaming 'incorrect code' or bad programmers, look back again at my example.  It's quite trivial and reproducible.  Saves order line items, in what I would consider a completely sane and logical way, call back line items to compute tax, add the last line item (the tax), and then throws the @order to payment processing, mailers, receipt page... where totals are wrong and tax is missing.  I understand that doing things differently avoid the problem and I have used them to fix this issue on my end, but I still am quite skeptical as to how a programmer is supposed to know about these pitfalls.


In your case, you can either find that problem and fix it, or simply force reload the items in the mailer you are working on.  To complain about bad code as a "Rails Problem" is not useful.  Bad code can be written in any language, using any framework.

Oh, absolutely.  I'm not just venting to vent, I have a few (what I think are constructive) questions:

- How can one turn off 'association caching' (for lack of a better name) without having to edit the .rb file in the gem (the answer appears to be you can't?)

- If not, can we simply add a simple config switch (e.g. config.active_record... that goes in config/environments/*) to Rails to turn this feature off/on easier in the future? (Not just for my sake, I mean as a general improvement.)

- If this association caching was added as a hack to improve performance back in the day, is it really still needed?  Perhaps it can be removed or turned off by default?  Perhaps it was added before SQL caching and it is not needed(???)
 

To be honest, I have written code that requires a reload where certain functionality is invoked.  I have a callback on deleting a related many that modifies a value in the parent.  However, that code really should be refactored such that the children are not destroyed directly, but instead a method on the parent should destroy the child. 


It would be interesting if has_many and such could throw a callback on the child model(s) to know to update automatically.  Again, if this is already handled by the SQL caching layer, perhaps it's all moot to even have caching at the association layer.  It makes sense that Rails can't know that changes are being make to the database outside of the scope of Rails (e.g. somebody on a SQL console), but it seems like it could be cleaner about knowing what it is doing within its self.

My apologies if I sounded a bit frustrated before.  Now that I understand what's happening and how to avoid the problem, I am back in my zen state. ;')  Thanks again for your previous thoughts and insight.


Phil

Jim

unread,
Aug 21, 2014, 12:50:06 AM8/21/14
to rubyonra...@googlegroups.com
On Wednesday, August 20, 2014 11:37:03 PM UTC-4, Phil wrote:
 
Before blaming 'incorrect code' or bad programmers, look back again at my example.  It's quite trivial and reproducible.  Saves order line items, in what I would consider a completely sane and logical way, call back line items to compute tax, add the last line item (the tax), and then throws the @order to payment processing, mailers, receipt page... where totals are wrong and tax is missing.  I understand that doing things differently avoid the problem and I have used them to fix this issue on my end, but I still am quite skeptical as to how a programmer is supposed to know about these pitfalls.

It sounds like you are thinking in terms of the database, while ActiveRecord is an ORM.  You need to think of it in terms of objects which happen to be persisted in a database.   When you create a line item record in the database, your @order object does not know about it, so you have to tell it to reload.  The only other ORM I've used is Apple's CoreData, and the same thing applies.  If you modify the database without using the object in question, the object will not be updated until you reload it.  

Jim

Reply all
Reply to author
Forward
0 new messages