Re: [Rails-core] Excessive redundant object allocation in AR

161 views
Skip to first unread message

Steve Klabnik

unread,
Sep 11, 2012, 9:01:39 PM9/11/12
to rubyonra...@googlegroups.com
Basically every Rails request allocates a zillion objects. I'm sure
there's tons of work that could be done here.

Gary Weaver

unread,
Sep 12, 2012, 10:33:50 AM9/12/12
to rubyonra...@googlegroups.com
On Tuesday, September 11, 2012 5:22:10 PM UTC-4, Masterleep wrote:
Maybe something like a StringPool

That's a big one, and it would be something that needs to be addressed in Ruby, not in Rails. But the problem is that you would have unintuitive behavior for those used to doing things like:

s = 'Error'
s.chomp!('or')

In today's Ruby and jruby-1.7.0.preview2:

$ irb
jruby-1.7.0.preview2 :001 > "Error".object_id
 => 2042
jruby-1.7.0.preview2 :002 > "Error".object_id
 => 2044
jruby-1.7.0.preview2 :003 > "Error".chomp!('or').object_id
 => 2046
jruby-1.7.0.preview2 :004 > s = "Error"
 => "Error"
jruby-1.7.0.preview2 :005 > s.object_id
 => 2048
jruby-1.7.0.preview2 :006 > s.chomp!('or')
 => "Err"
jruby-1.7.0.preview2 :007 > s.object_id
 => 2048

See, when you are just working with strings willy nilly, it creates new instances and you don't have to worry about things like the "bang" methods altering the same object.

In a StringPool'd ruby, the bang methods would need to return a string that was the same object_id so that past implementations that depend on object equivalence would still work, but it could not alter the "Error" string in the StringPool or things would go terribly wrong.

Feel free to take this up on the ruby list, and post back the link. I'm sure that those guys could figure out a way to make it work if they've not already discussed it, but my guess is it would be a breaking major change, even if it is necessary to reduce # of objects and make things faster.

Gary Weaver

unread,
Sep 12, 2012, 11:00:08 AM9/12/12
to rubyonra...@googlegroups.com
Something that would work instead of a StringPool that is Ruby-ish is use of symbols. Symbols are Ruby's answer to the StringPool. If things are stored as symbols, you can work with them similarly as to what you would expect and reduce # objects, e.g.

jruby-1.7.0.preview2 :008 > :error.object_id
 => 2050
jruby-1.7.0.preview2 :009 > :error.object_id
 => 2050
jruby-1.7.0.preview2 :010 > :error.to_s.chomp!('or').to_sym
 => :err
jruby-1.7.0.preview2 :011 > :error.to_s.chomp!('or').to_sym.object_id
 => 2052
jruby-1.7.0.preview2 :012 > :error.to_s.chomp!('or').to_sym.object_id
 => 2052

So basically if everywhere in Rails documentation that referred to strings instead specified constants, and if the method didn't support constants that would be a good goal:
http://guides.rubyonrails.org

But still, whenever you output a string to a log, it becomes a string. So, you might be able to make some inroads by changes to Rails and related documentation, but if Ruby "fixed it" instead via something like StringPool (again- a major and breaking change), then you wouldn't have to worry about wasting all that time on the Rails side.

In addition, many text editors and IDEs have different colors for Strings, so that keys and values stand out better in examples like:

class Employee < ActiveRecord::Base
  has_many :subordinates, :class_name => "Employee",
    :foreign_key => "manager_id"
  belongs_to :manager, :class_name => "Employee"
end

So, if you switch to all symbols, it is a little more monotone, colorwise. However, if you switch to Ruby 1.9 key/value then you could color the key in a: :b differently by the fact that it ends in a colon vs. starting with one. Unfortunately, the existing default color schemes don't usually do that.

Richard Schneeman

unread,
Sep 12, 2012, 11:06:34 AM9/12/12
to rubyonra...@googlegroups.com
Symbols are never garbage collected in Ruby.


-- 
Richard Schneeman
--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Core" group.
To view this discussion on the web visit https://groups.google.com/d/msg/rubyonrails-core/-/W-QsFXyc4cwJ.
To post to this group, send email to rubyonra...@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-co...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-core?hl=en.

Gary Weaver

unread,
Sep 12, 2012, 11:09:42 AM9/12/12
to rubyonra...@googlegroups.com
On Wednesday, September 12, 2012 11:00:08 AM UTC-4, Gary Weaver wrote:
Symbols are Ruby's answer to the StringPool.

btw- I shouldn't have said it like that. That makes it sound like symbols were invented as a reaction to Java's StringPool. I just said this because the way you can continuously refer to a symbol and get the same object_id is similar to referring to a string that has been stored/retrieved from StringPool in Java.

Gary Weaver

unread,
Sep 12, 2012, 11:24:26 AM9/12/12
to rubyonra...@googlegroups.com
On Wednesday, September 12, 2012 11:06:43 AM UTC-4, richard schneeman wrote:
Symbols are never garbage collected in Ruby.

Good point. However, for Rails, I'd think you'd still use less memory if symbols were just used for class, controller, model, field names in views, etc.

Even if Rails had to do handfuls (or 100s) of symbol -> string -> some change to string -> to_sym'ing during startup, the memory consumption would very likely be less than not doing it.

You wouldn't want to store every value retrieved from a database as a symbol obviously, nor store all values in incoming request params as symbols, and if things in Rails are doing regexp's on something, it wouldn't make sense to constantly be to_s'ing (in one way or another) to operate on them.

There is a balance between needing to garbage collect and needing to keep too many objects from being instantiated, even if they are GC'd. But you are right- the Java StringPool would GC something that was no long referenced, I believe, and if symbols are used for large varying strings, that's a memory leak, but that's not what I'm talking about.

Gary Weaver

unread,
Sep 12, 2012, 11:27:33 AM9/12/12
to rubyonra...@googlegroups.com
On Wednesday, September 12, 2012 11:24:26 AM UTC-4, Gary Weaver wrote:
 if symbols are used for large varying strings, that's a memory leak, but that's not what I'm talking about.

Sorry, I meant if a large number of varying strings were symbols, that could be like a memory leak because of the lack of GC. (Just reword what I say to make sense. I need some caffeine.)

Masterleep

unread,
Sep 12, 2012, 12:29:09 PM9/12/12
to rubyonra...@googlegroups.com


On Wednesday, September 12, 2012 7:33:50 AM UTC-7, Gary Weaver wrote:
On Tuesday, September 11, 2012 5:22:10 PM UTC-4, Masterleep wrote:
Maybe something like a StringPool

That's a big one, and it would be something that needs to be addressed in Ruby, not in Rails. But the problem is that you would have unintuitive behavior for those used to doing things like:


It could be implemented in Rails by using a container class to hold the database field names that are used as the keys inside the AR @attributes hash, and reusing the same string object across instances. Those strings are frozen anyway so the concern about modification doesn't apply.  Based on the ObjectSpace data, that one change would have a large impact on the number of allocated subobjects for each AR model instance.

Jon Leighton

unread,
Sep 12, 2012, 12:36:16 PM9/12/12
to rubyonra...@googlegroups.com
On 12/09/12 17:29, Masterleep wrote:
> It could be implemented in Rails by using a container class to hold the
> database field names that are used as the keys inside the AR @attributes
> hash, and reusing the same string object across instances. Those strings
> are frozen anyway so the concern about modification doesn't apply.
> Based on the ObjectSpace data, that one change would have a large
> impact on the number of allocated subobjects for each AR model instance.

To be honest I think we should just change @attributes to be keyed by
symbols. I don't see that there is a DoS vector in doing this since the
keys aren't going to come from user input (however, I do need to think
about that a bit more before I say so confidently).

I changed @attributes_cache to be keyed by symbols recently which lead
to a nice speed up in attribute access (before then we were creating a
new string every time you call an attribute method).

It should be noted that these things could theoretically be optimised at
the implementation level. I did some benchmarking a while back and there
was no difference between using symbols and strings in @attributes on
JRuby. However on a practical level, I think we should change it.

I'm interested to hear what Mr T. Love thinks.

--
http://jonathanleighton.com/

Masterleep

unread,
Sep 12, 2012, 12:45:46 PM9/12/12
to rubyonra...@googlegroups.com


On Wednesday, September 12, 2012 9:36:29 AM UTC-7, Jon Leighton wrote:
On 12/09/12 17:29, Masterleep wrote:
> It could be implemented in Rails by using a container class to hold the
> database field names that are used as the keys inside the AR @attributes
> hash, and reusing the same string object across instances.

To be honest I think we should just change @attributes to be keyed by
symbols. I don't see that there is a DoS vector in doing this since the
keys aren't going to come from user input (however, I do need to think
about that a bit more before I say so confidently).


That would be even better, if it's not too hard to change to symbols. 

Masterleep

unread,
Sep 13, 2012, 3:25:28 PM9/13/12
to rubyonra...@googlegroups.com

Steve Klabnik

unread,
Sep 13, 2012, 3:39:13 PM9/13/12
to rubyonra...@googlegroups.com
Adding issues does not help, and only creates noise on the tracker.

Specifics about WHAT is causing over-allocation or HOW to fix it may be valid. But an open issue for 'tons 'o objects' helps nobody and is not productive. It is far too general.

Masterleep

unread,
Sep 13, 2012, 3:43:21 PM9/13/12
to rubyonra...@googlegroups.com
The bug report specifies what is causing the over-allocation and how to fix it.  It's pretty specific.

Jeremy Evans

unread,
Sep 13, 2012, 5:16:45 PM9/13/12
to rubyonra...@googlegroups.com
On Thu, Sep 13, 2012 at 12:43 PM, Masterleep <bill...@lipa.name> wrote:
> The bug report specifies what is causing the over-allocation and how to fix
> it. It's pretty specific.

It's pretty specific in terms of what the problem is, but it's not at
all descriptive of what the actual problem is an how to fix it.

In this case, the cause is that ActiveRecord is using unfrozen strings
as keys. When you use an unfrozen string as a hash key, ruby dups it,
freezes the dup, and uses the frozen dup as the hash key. The simple
fix to reduce the number of allocated strings from columns * (rows +
1) to just columns is to freeze the columns before using them as hash
keys.

Pull request filed: https://github.com/rails/rails/pull/7631

Jeremy

Masterleep

unread,
Sep 13, 2012, 5:38:57 PM9/13/12
to rubyonra...@googlegroups.com


On Thursday, September 13, 2012 2:16:51 PM UTC-7, Jeremy Evans wrote:

Pull request filed: https://github.com/rails/rails/pull/7631




Excellent!  I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names).

Gary Weaver

unread,
Sep 14, 2012, 9:03:32 AM9/14/12
to rubyonra...@googlegroups.com
On Thursday, September 13, 2012 5:38:57 PM UTC-4, Masterleep wrote:
Excellent!  I verified that your fix did eliminate the redundancy on these field name strings in the case I was studying (from 15 extra strings per instance down to 2, where the 2 were attribute values, not field names).

Attribute values would be a good case for the StringPool I guess, even though I still think that would be something that should be introduced in Ruby, not Rails, and because of string's bang methods altering the object itself so a lot of existing user code would assume object_id equivalence of a string and the object produced by one of that string's bang methods, so it would be a major change. I know you wanted to focus on AR, but if you did only focus on AR attribute values and just had a StringPool for them, then AR attribute values would be object equivalent and have the same string bang method wierdness, but other strings wouldn't act that way, and this would be much more evil than doing it in Ruby.

Gary Weaver

unread,
Sep 25, 2012, 5:53:07 PM9/25/12
to rubyonra...@googlegroups.com

Take a look at these:

Bartosz Dziewoński wrote in post #1077524:
> http://patshaughnessy.net/2012/1/4/never-create-ruby-strings-longer-than-23-characters
> http://patshaughnessy.net/2012/1/18/seeing-double-how-ruby-shares-string-values

That probably doesn't help because the Ruby optimization happens for strings that are 23 chars are more, and I guess that most attribute names are shorter (and many attribute values may be also).
Reply all
Reply to author
Forward
0 new messages