Excessive redundant object allocation in AR

Showing 1-18 of 18 messages
Excessive redundant object allocation in AR Masterleep 9/11/12 2:22 PM
I've been looking into memory behavior in Rails workers.  One thing I've noticed is that it's easy to instantiate multiple tens of thousands of objects on the Ruby heap even with find_each operating in batches of 1000.  Most of these objects appear to be highly redundant.

Consider loading 1000 instances of an AR object MyClass which has 20 database fields.  There will be at least 20 x 1000 strings allocated, as measured by GC.start; ObjectSpace.count_objects[:T_STRING].  Digging deeper, it looks like each instance has an internal attributes hash in an instance variable.  The first key is typically the string "id".  Each "id" string is an individual object, as determined by object_id, even though all of these strings are frozen for use as hash keys.

Would it be possible to take advantage of the very large amount of duplication in the keys of this hash to save thousands of unnecessary objects from being allocated every time a bulk query is run?  Maybe something like a StringPool, or getting the column name directly from a lower layer, or using symbols would work.

There are also a bunch of empty hashes which could probably be shared in a copy on write style.  Right now it looks like six Hashes per AR instance with 4 of them empty initially in a typical find query.

Thanks for any thoughts!
Re: [Rails-core] Excessive redundant object allocation in AR Steve Klabnik 9/11/12 6:01 PM
Basically every Rails request allocates a zillion objects. I'm sure
there's tons of work that could be done here.
Re: Excessive redundant object allocation in AR Gary Weaver 9/12/12 7:33 AM
On Tuesday, September 11, 2012 5:22:10 PM UTC-4, Masterleep wrote:
Maybe something like a StringPool

That's a big one, and it would be something that needs to be addressed in Ruby, not in Rails. But the problem is that you would have unintuitive behavior for those used to doing things like:

s = 'Error'
s.chomp!('or')

In today's Ruby and jruby-1.7.0.preview2:

$ irb
jruby-1.7.0.preview2 :001 > "Error".object_id
 => 2042
jruby-1.7.0.preview2 :002 > "Error".object_id
 => 2044
jruby-1.7.0.preview2 :003 > "Error".chomp!('or').object_id
 => 2046
jruby-1.7.0.preview2 :004 > s = "Error"
 => "Error"
jruby-1.7.0.preview2 :005 > s.object_id
 => 2048
jruby-1.7.0.preview2 :006 > s.chomp!('or')
 => "Err"
jruby-1.7.0.preview2 :007 > s.object_id
 => 2048

See, when you are just working with strings willy nilly, it creates new instances and you don't have to worry about things like the "bang" methods altering the same object.

In a StringPool'd ruby, the bang methods would need to return a string that was the same object_id so that past implementations that depend on object equivalence would still work, but it could not alter the "Error" string in the StringPool or things would go terribly wrong.

Feel free to take this up on the ruby list, and post back the link. I'm sure that those guys could figure out a way to make it work if they've not already discussed it, but my guess is it would be a breaking major change, even if it is necessary to reduce # of objects and make things faster.
Re: Excessive redundant object allocation in AR Gary Weaver 9/12/12 8:00 AM
Something that would work instead of a StringPool that is Ruby-ish is use of symbols. Symbols are Ruby's answer to the StringPool. If things are stored as symbols, you can work with them similarly as to what you would expect and reduce # objects, e.g.

jruby-1.7.0.preview2 :008 > :error.object_id
 => 2050
jruby-1.7.0.preview2 :009 > :error.object_id
 => 2050
jruby-1.7.0.preview2 :010 > :error.to_s.chomp!('or').to_sym
 => :err
jruby-1.7.0.preview2 :011 > :error.to_s.chomp!('or').to_sym.object_id
 => 2052
jruby-1.7.0.preview2 :012 > :error.to_s.chomp!('or').to_sym.object_id
 => 2052

So basically if everywhere in Rails documentation that referred to strings instead specified constants, and if the method didn't support constants that would be a good goal:
http://guides.rubyonrails.org

But still, whenever you output a string to a log, it becomes a string. So, you might be able to make some inroads by changes to Rails and related documentation, but if Ruby "fixed it" instead via something like StringPool (again- a major and breaking change), then you wouldn't have to worry about wasting all that time on the Rails side.

In addition, many text editors and IDEs have different colors for Strings, so that keys and values stand out better in examples like:

class Employee < ActiveRecord::Base
  has_many :subordinates, :class_name => "Employee",
    :foreign_key => "manager_id"
  belongs_to :manager, :class_name => "Employee"
end

So, if you switch to all symbols, it is a little more monotone, colorwise. However, if you switch to Ruby 1.9 key/value then you could color the key in a: :b differently by the fact that it ends in a colon vs. starting with one. Unfortunately, the existing default color schemes don't usually do that.
Re: [Rails-core] Re: Excessive redundant object allocation in AR richard schneeman 9/12/12 8:06 AM
Symbols are never garbage collected in Ruby.


-- 
Richard Schneeman
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/W-QsFXyc4cwJ.
To post to this group, send email to rubyonra...@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-co...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.

Re: Excessive redundant object allocation in AR Gary Weaver 9/12/12 8:09 AM
On Wednesday, September 12, 2012 11:00:08 AM UTC-4, Gary Weaver wrote:
Symbols are Ruby's answer to the StringPool.

btw- I shouldn't have said it like that. That makes it sound like symbols were invented as a reaction to Java's StringPool. I just said this because the way you can continuously refer to a symbol and get the same object_id is similar to referring to a string that has been stored/retrieved from StringPool in Java.
Re: [Rails-core] Re: Excessive redundant object allocation in AR Gary Weaver 9/12/12 8:24 AM
On Wednesday, September 12, 2012 11:06:43 AM UTC-4, richard schneeman wrote:
Symbols are never garbage collected in Ruby.

Good point. However, for Rails, I'd think you'd still use less memory if symbols were just used for class, controller, model, field names in views, etc.

Even if Rails had to do handfuls (or 100s) of symbol -> string -> some change to string -> to_sym'ing during startup, the memory consumption would very likely be less than not doing it.

You wouldn't want to store every value retrieved from a database as a symbol obviously, nor store all values in incoming request params as symbols, and if things in Rails are doing regexp's on something, it wouldn't make sense to constantly be to_s'ing (in one way or another) to operate on them.

There is a balance between needing to garbage collect and needing to keep too many objects from being instantiated, even if they are GC'd. But you are right- the Java StringPool would GC something that was no long referenced, I believe, and if symbols are used for large varying strings, that's a memory leak, but that's not what I'm talking about.
Re: [Rails-core] Re: Excessive redundant object allocation in AR Gary Weaver 9/12/12 8:27 AM
On Wednesday, September 12, 2012 11:24:26 AM UTC-4, Gary Weaver wrote:
 if symbols are used for large varying strings, that's a memory leak, but that's not what I'm talking about.

Sorry, I meant if a large number of varying strings were symbols, that could be like a memory leak because of the lack of GC. (Just reword what I say to make sense. I need some caffeine.)
Re: Excessive redundant object allocation in AR Masterleep 9/12/12 9:29 AM


On Wednesday, September 12, 2012 7:33:50 AM UTC-7, Gary Weaver wrote:
On Tuesday, September 11, 2012 5:22:10 PM UTC-4, Masterleep wrote:
Maybe something like a StringPool

That's a big one, and it would be something that needs to be addressed in Ruby, not in Rails. But the problem is that you would have unintuitive behavior for those used to doing things like:


It could be implemented in Rails by using a container class to hold the database field names that are used as the keys inside the AR @attributes hash, and reusing the same string object across instances. Those strings are frozen anyway so the concern about modification doesn't apply.  Based on the ObjectSpace data, that one change would have a large impact on the number of allocated subobjects for each AR model instance.
Re: [Rails-core] Re: Excessive redundant object allocation in AR Jon Leighton 9/12/12 9:36 AM
On 12/09/12 17:29, Masterleep wrote:
> It could be implemented in Rails by using a container class to hold the
> database field names that are used as the keys inside the AR @attributes
> hash, and reusing the same string object across instances. Those strings
> are frozen anyway so the concern about modification doesn't apply.
>  Based on the ObjectSpace data, that one change would have a large
> impact on the number of allocated subobjects for each AR model instance.

To be honest I think we should just change @attributes to be keyed by
symbols. I don't see that there is a DoS vector in doing this since the
keys aren't going to come from user input (however, I do need to think
about that a bit more before I say so confidently).

I changed @attributes_cache to be keyed by symbols recently which lead
to a nice speed up in attribute access (before then we were creating a
new string every time you call an attribute method).

It should be noted that these things could theoretically be optimised at
the implementation level. I did some benchmarking a while back and there
was no difference between using symbols and strings in @attributes on
JRuby. However on a practical level, I think we should change it.

I'm interested to hear what Mr T. Love thinks.

--
http://jonathanleighton.com/
Re: [Rails-core] Re: Excessive redundant object allocation in AR Masterleep 9/12/12 9:45 AM


On Wednesday, September 12, 2012 9:36:29 AM UTC-7, Jon Leighton wrote:
On 12/09/12 17:29, Masterleep wrote:
> It could be implemented in Rails by using a container class to hold the
> database field names that are used as the keys inside the AR @attributes
> hash, and reusing the same string object across instances.

To be honest I think we should just change @attributes to be keyed by
symbols. I don't see that there is a DoS vector in doing this since the
keys aren't going to come from user input (however, I do need to think
about that a bit more before I say so confidently).


That would be even better, if it's not too hard to change to symbols. 
Re: Excessive redundant object allocation in AR Masterleep 9/13/12 12:25 PM
I added https://github.com/rails/rails/issues/7629 on this subject.
Re: [Rails-core] Excessive redundant object allocation in AR Steve Klabnik 9/13/12 12:39 PM
Adding issues does not help, and only creates noise on the tracker.

Specifics about WHAT is causing over-allocation or HOW to fix it may be valid. But an open issue for 'tons 'o objects' helps nobody and is not productive. It is far too general.
Re: [Rails-core] Excessive redundant object allocation in AR Masterleep 9/13/12 12:43 PM
The bug report specifies what is causing the over-allocation and how to fix it.  It's pretty specific.
Re: [Rails-core] Excessive redundant object allocation in AR Jeremy Evans 9/13/12 2:16 PM
On Thu, Sep 13, 2012 at 12:43 PM, Masterleep <bill...@lipa.name> wrote:
> The bug report specifies what is causing the over-allocation and how to fix
> it.  It's pretty specific.

It's pretty specific in terms of what the problem is, but it's not at
all descriptive of what the actual problem is an how to fix it.

In this case, the cause is that ActiveRecord is using unfrozen strings
as keys.  When you use an unfrozen string as a hash key, ruby dups it,
freezes the dup, and uses the frozen dup as the hash key. The simple
fix to reduce the number of allocated strings from columns * (rows +
1) to just columns is to freeze the columns before using them as hash
keys.

Pull request filed: https://github.com/rails/rails/pull/7631

Jeremy
Re: [Rails-core] Excessive redundant object allocation in AR Masterleep 9/13/12 2:38 PM


On Thursday, September 13, 2012 2:16:51 PM UTC-7, Jeremy Evans wrote:

Pull request filed: https://github.com/rails/rails/pull/7631




Excellent!  I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names).

Re: [Rails-core] Excessive redundant object allocation in AR Gary Weaver 9/14/12 6:03 AM
On Thursday, September 13, 2012 5:38:57 PM UTC-4, Masterleep wrote:
Excellent!  I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names).

Attribute values would be a good case for the StringPool I guess, even though I still think that would be something that should be introduced in Ruby, not Rails, and because of string's bang methods altering the object itself so a lot of existing user code would assume object_id equivalence of a string and the object produced by one of that string's bang methods, so it would be a major change. I know you wanted to focus on AR, but if you did only focus on AR attribute values and just had a StringPool for them, then AR attribute values would be object equivalent and have the same string bang method wierdness, but other strings wouldn't act that way, and this would be much more evil than doing it in Ruby.
Re: [Rails-core] Excessive redundant object allocation in AR Gary Weaver 9/25/12 2:53 PM

Take a look at these:

Bartosz Dziewoński wrote in post #1077524:
> http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters
> http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values

That probably doesn't help because the Ruby optimization happens for strings that are 23 chars are more, and I guess that most attribute names are shorter (and many attribute values may be also).