Optimization thought...

Roger Pack

unread,

Dec 22, 2009, 7:30:49 PM12/22/09

to ruby-opt...@googlegroups.com, rubini...@googlegroups.com

After experimenting a bit with the ruby2cext/crystalizer [1] I'm
wondering if what it does wouldn't be a good match for Rubinius...

its biggest niche is if you can rely on the end user to call this at some point:

VM.done_setting_up_classes_forever!

This allows you to cache all the method call lookups at each callsite, i.e.

def go a
a.yo 3
end

=> translate to C

VALUE cached_previous_yo_class;
VALUE cached_previous_yo_method;

VALUE yo(self, a) {
if a.class == cached_previous_yo_class
*((func *)cached_previous_yo_method)(a, 3); // doing this call in
C is *so* much faster...though it loses backtrace info.
else
//lookup the method, cache it for next time.
end
}

This type of translation can speedup Ruby 1.8 quite a bit [like 5x the
speed of 1.9, on the tak benchmark], and should work well with
Rubinius, since so much of Rubinius is Ruby. It would probably not
save much if you used tons of duck types, but in practice cache misses
are rare, or could be alleviated.

Anyway just thinking out loud.

I might port this to rubinius after first porting it to 1.9. In the
meantime if somebody's interested feel free to use the idea/code.
Cheers!
-r

[1] http://github.com/rdp/crystalizer
ML: http://groups.google.com/group/ruby-optimization

Evan Phoenix

unread,

Dec 22, 2009, 7:41:05 PM12/22/09

to rubini...@googlegroups.com

Hi Roger,

We actually already do this in Rubinius, but we do it in the JIT directly. It's far less problematic than generating C code that you have to figure out how to link back in, plus we can inline methods, all that without loosing any backtrace information.

We already don't need the user to specify that they're done setting things up, as the JIT can run when it wants and the system can gracefully degrade should any JIT assumptions be invalidated by a user adding a new method or class.

Lastly, your technique looks pretty much like a normal inline cache, which is also something Rubinius has. We don't currently push them down into the execution stream like you have here, because we use them the interpreter as well.

- Evan

> --
> --- !ruby/object:MailingList
> name: rubinius-dev
> view: http://groups.google.com/group/rubinius-dev?hl=en
> post: rubini...@googlegroups.com
> unsubscribe: rubinius-dev...@googlegroups.com
>

rogerdpack

unread,

Dec 26, 2009, 6:28:49 PM12/26/09

to rubinius-dev

> We actually already do this in Rubinius, but we do it in the JIT directly. It's far less problematic than generating C code that you have to figure out how to link back in, plus we can inline methods, all that without loosing any backtrace information.

Interesting. So does that mean that only the llvm enabled build will
use this optimization?

> We already don't need the user to specify that they're done setting things up, as the JIT can run when it wants and the system can gracefully degrade should any JIT assumptions be invalidated by a user adding a new method or class.
>
> Lastly, your technique looks pretty much like a normal inline cache, which is also something Rubinius has. We don't currently push them down into the execution stream like you have here, because we use them the interpreter as well.

Same question here.
Thanks.
-r

Eero Saynatkari

unread,

Dec 27, 2009, 8:27:26 AM12/27/09

to rubinius-dev

Excerpts from rogerpack2005's message of Sun Dec 27 01:28:49 +0200 2009:

> Interesting. So does that mean that only the llvm enabled build will
> use this optimization?

The inline cache and method inlining are implemented in the
interpreter as well. Actually JITting will use both of these
features much more efficiently (along with exploiting other
type information it is given or deducts itself.)

--
Magic is insufficiently advanced technology.

Evan Phoenix

unread,

Dec 27, 2009, 12:11:39 PM12/27/09

to rubini...@googlegroups.com

On Dec 27, 2009, at 6:27 AM, Eero Saynatkari wrote:

> Excerpts from rogerpack2005's message of Sun Dec 27 01:28:49 +0200 2009:
>> Interesting. So does that mean that only the llvm enabled build will
>> use this optimization?
>
> The inline cache and method inlining are implemented in the
> interpreter as well. Actually JITting will use both of these
> features much more efficiently (along with exploiting other
> type information it is given or deducts itself.)

Only the inline caches are used in the interpreter. No method inlining is done in the interpreter. So to get the full effects of what Roger is talking about, the JIT, which uses LLVM, is required.

>
>
> --
> Magic is insufficiently advanced technology.
>

Eero Saynatkari

unread,

Dec 27, 2009, 1:43:04 PM12/27/09

to rubini...@googlegroups.com

Excerpts from evanphx's message of Sun Dec 27 19:11:39 +0200 2009:

> On Dec 27, 2009, at 6:27 AM, Eero Saynatkari wrote:
> > The inline cache and method inlining are implemented in the
> > interpreter as well. Actually JITting will use both of these
> > features much more efficiently (along with exploiting other
> > type information it is given or deducts itself.)
>
> Only the inline caches are used in the interpreter. No method inlining is done
> in the interpreter. So to get the full effects of what Roger is talking about,
> the JIT, which uses LLVM, is required.

I stand corrected - I thought the early inlining was at the
bytecode level. On the other hand, it would be possible to
implement inlining in the interpreter, though there seems
to be little reason to go through the effort.

rogerdpack

unread,

Feb 1, 2010, 4:41:14 PM2/1/10

to rubinius-dev

> We actually already do this in Rubinius, but we do it in the JIT directly. It's far less problematic than generating C code that you have to figure out how to link back in, plus we can inline methods, all that without loosing any backtrace information.

True that it is indeed less problematic.

However there is still something to be gained, should you ever be
interested...

ex:

while i<100_000_000 # benchmark loop 1
i+=1
end

On my VM (at least)

1.9.1p376: 11.1s

rbx 1.0rc2 (with JIT): 10.8s

1.8.7 + crystalizer: 3.8s

Might be worth considering.
Many thanks.
-r

Evan Phoenix

unread,

Feb 1, 2010, 7:05:05 PM2/1/10

to rubini...@googlegroups.com

On Feb 1, 2010, at 1:41 PM, rogerdpack wrote:

>> We actually already do this in Rubinius, but we do it in the JIT directly. It's far less problematic than generating C code that you have to figure out how to link back in, plus we can inline methods, all that without loosing any backtrace information.
>
> True that it is indeed less problematic.
>
> However there is still something to be gained, should you ever be
> interested...
>
> ex:
>
> while i<100_000_000 # benchmark loop 1
> i+=1
> end

Did you put this in a method and call it multiple times? Rubinius has no on stack replacement, so you'll need to run it more than once to see the JIT kick in.

>
> On my VM (at least)
>
> 1.9.1p376: 11.1s
>
> rbx 1.0rc2 (with JIT): 10.8s
>
> 1.8.7 + crystalizer: 3.8s
>
> Might be worth considering.

While that's cute, this kind of micro benchmark doesn't translate into performance on any real world code.

As I said above, I'll bet that if you run this benchmark and allow the JIT to run, you'll see it's as fast or faster than crystalize.

> Many thanks.
> -r

rogerdpack

unread,

Feb 2, 2010, 10:49:33 AM2/2/10

to rubinius-dev

> While that's cute, this kind of micro benchmark doesn't translate into performance on any real world code.
>
> As I said above, I'll bet that if you run this benchmark and allow the JIT to run, you'll see it's as fast or faster than crystalize.

Ahh thanks. I was wondering why my numbers didn't seem to add up to
some I'd seen previously. Making it a method results in

1.9.1: 12.4s

rbx 2.6s

crystalizer 5.3s

crystalizer optimized: 3.5s

Pretty impressive for Rubinius' side. That should give me enough to
chew on.
Does anybody out there have any sinatra/rails benchmarks for rubinius?
-r

rogerdpack

unread,

Mar 5, 2010, 4:17:16 PM3/5/10

to rubinius-dev

> >> We actually already do this in Rubinius, but we do it in the JIT directly. It's far less problematic than generating C code that you have to figure out how to link back in, plus we can inline methods, all that without loosing any backtrace information.

It's good that Rubinius does this already. My thought for further
improvements would be something like allowing the User to give the VM
hints, like
VM.I_promise_to_not_define_any_more_methods_or_change_class_hierarchies_after_this_point

Then the JIT VM could be"tighter" in its method cacheing.

Thoughts?
-rp

Anuj Dutta

unread,

Mar 5, 2010, 5:15:55 PM3/5/10

to rubini...@googlegroups.com

Very innovative. But then you will have to keep a record of the ones that have been marked for "extra optimizations" (as you suggest).

Anuj

--
--- !ruby/object:MailingList
name: rubinius-dev
view: http://groups.google.com/group/rubinius-dev?hl=en
post: rubini...@googlegroups.com
unsubscribe: rubinius-dev...@googlegroups.com

--
Anuj DUTTA

rogerdpack

unread,

Mar 5, 2010, 7:39:21 PM3/5/10

to rubinius-dev

> Very innovative. But then you will have to keep a record of the ones that
> have been marked for "extra optimizations" (as you suggest).

Perhaps by discarding all previous optimizations when the call is
made?
Dunno.
-rp

rogerdpack

unread,

Mar 15, 2010, 5:31:11 PM3/15/10

to rubinius-dev

> It's good that Rubinius does this already. My thought for further
> improvements would be something like allowing the User to give the VM
> hints, like
> VM.I_promise_to_not_define_any_more_methods_or_change_class_hierarchies_aft er_this_point

I suppose one could even send several types of hints, like
VM.i_promise_to_not_override_constants_after_this_point
or
VM.i_promise_to_not_use_any_new_instance_variables_after_this_point

Anyway thanks for rbx!
-rp

Reply all

Reply to author

Forward