Small static method marked not entrant, inlining reversed?

63 views
Skip to first unread message

Charles Oliver Nutter

unread,
Sep 7, 2010, 5:44:29 PM9/7/10
to hotspot compiler, JVM Languages
I've been working on JRuby performance lately and ran into a peculiar situation.

I have a static utility method in JRuby that checks whether a given
object's class is the same as the when the compiler optimized it. So
for a snippit of code like this:

def foo
bar
end

def bar
# whatever
end

After running for some time, the "foo" call will be compiled, the
compiler will see that the "bar" call has a cached method handle, and
it will emit both a dynamic call and a static-typed call plus guard.
The static-typed call looks like this:

ALOAD 8
LDC 446
INVOKESTATIC
org/jruby/javasupport/util/RuntimeHelpers.isGenerationEqual
(Lorg/jruby/runtime/builtin/IRubyObject;I)Z
IFNE L2
ALOAD 8
CHECKCAST org/jruby/RubyFixnum
ALOAD 1
LDC 1
INVOKEVIRTUAL org/jruby/RubyFixnum.op_plus
(Lorg/jruby/runtime/ThreadContext;J)Lorg/jruby/runtime/builtin/IRubyObject;

And the isGenerationEqual method looks like this:

public static boolean isGenerationEqual(IRubyObject object, int
generation) {
return object.getMetaClass().getCacheToken() == generation;
}

While running benchmarks, I noticed a peculiar thing happening. For
"fib", the method JITs in JRuby very quickly and is soon after JITed
by Hotspot. But later compiles cause "fib" to get deoptimized and
marked not-entrant. Around the same time, isGenerationEqual gets
marked not entrant. Unfortunately, when fib re-optimizes, it does so
without inlining the isGenerationEqual call, and I can see that where
it was inlined before, it now actually does a CALL in assembly.

Manually inlining the same bytecode everywhere isGenerationEqual would
be called does not seem to be subject to the same effect.

Any thoughts? The only theory I have is that early in optimization
Hotspot sees that the target object type (IRubyObject object in the
method def) is the same, and so it optimizes based on that. Later, as
other compiled methods start to hit this code, the tyoe of "object"
changes. But the logic behind the scenes should be identical in every
case... IRubyObject.getMetaClass() only has one final implementation
on org.jruby.RubyBasicObject, and getCacheToken() has only one final
implementation on org.jruby.RubyModule, which simply returns an int
field.

So I'm stumped why at least isGenerationEqual would not inline in all cases.

- Charlie

Matt Fowles

unread,
Sep 7, 2010, 5:57:08 PM9/7/10
to jvm-la...@googlegroups.com
Charlie~

Dropping hotspot compiler dev as I am suggesting a workaround...

Can you change isGenerationEqual() to

public static boolean isGenerationEqual(MetaClass klass, int
generation) {
return klass.getCacheToken() == generation;
}

then call getMetaClass() up one level of call stack?

Matt

> --
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
>
>

Charles Oliver Nutter

unread,
Sep 8, 2010, 2:32:59 AM9/8/10
to jvm-la...@googlegroups.com
I imagine it will have a similar effect to me hand-inlining that logic
at the call site (which I'm reluctant to do in general because of the
code bloat). I wanted this to be in a static method to avoid having
the getMetaClass call bulk up the original bytecode.

I'll try another analysis with the getMetaClass pulled out of the method.

Charles Oliver Nutter

unread,
Sep 8, 2010, 2:34:10 AM9/8/10
to Tom Rodriguez, hotspot compiler, JVM Languages
I'll give that a shot, Tom, thanks. Should have thought of it myself.

On Wed, Sep 8, 2010 at 12:04 AM, Tom Rodriguez <tom.ro...@oracle.com> wrote:
> Did you run with -XX:+PrintInlining?  That will report why we didn't inline.
>
> tom

Charles Oliver Nutter

unread,
Sep 8, 2010, 3:03:10 AM9/8/10
to Tom Rodriguez, hotspot compiler, JVM Languages
Ok, I have some additional evidence.

The __file__ method that contains the jitted code body is jitted by
hotspot initially and things I expected to inline inline correctly.

31 ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__
(442 bytes)

Later another recompilation forces __file__ to be recompiled...it's
around the 50th iteration of the benchmark, at which point all the
benchmark library's code is jitted by JRuby.

31 make_not_entrant

Immediately after this, it appears that calls to isGenerationEqual are
detected to be bimorphic, and so hotspot considers recompiling them as
well.

25 uncommon trap bimorphic maybe_recompile
@1 org/jruby/javasupport/util/RuntimeHelpers isGenerationEqual
(Lorg/jruby/runtime/builtin/IRubyObject;I)Z

A type profile inside the isGenerationEqual call site looks like this
in both the before and after case

@ 62 org.jruby.javasupport.util.RuntimeHelpers::isGenerationEqual (19 bytes)
@ 1 org.jruby.runtime.builtin.IRubyObject::getMetaClass (0 bytes)
type profile org/jruby/runtime/builtin/IRubyObject ->
org/jruby/RubyFixnum (71%)

...but it does appear to inline according to this output :( All
occurrences of isGenerationEqual in the PrintInlining output are
inlined. So why do I see CALLs in the assembly?

More information: The recursive calls to the method appear to never
inline, probably because the method body is too large based on this
logging output. It's not ideal, since in Ruby terms this is a pretty
small method...but I guess I'm stuck here:

@ 115 ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__
hot method too big

I remember tweaking various flags and getting "fib" to inline multiple
levels deep, but I generally had to bump up several flags
(MaxInlineSize, InlineSmallCode, MaxInlineLevel, and even a "max node
count" flag I dug out of the Hotspot sources).

Here's part of the PrintAssembly showing the calls to isGenerationEqual:

(type checks to see which of the two bimorphic targets it's seeing)
0x028614bd: cmp ecx, 'org/jruby/RubyFixnum'
; {oop('org/jruby/RubyFixnum')}
0x028614c3: jz 0x028614da
0x028614c5: cmp ecx, 'org/jruby/RubyObject'
; {oop('org/jruby/RubyObject')}
0x028614cb: jnz 0x02861b55

(the RubyFixnum branch with all of isGenerationEqual and its component
calls inlined)
0x028614da: mov ebp, [ebx+0xC] ;*synchronization entry
; -
org.jruby.javasupport.util.RuntimeHelpers::isGenerationEqual@-1 (line
1863)
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@11
(line 4)
0x028614dd: mov ebx, [ebp+0x18] ; implicit exception: dispatches to
0x028620ed
0x028614e0: cmp ebx, 0x000001f3
0x028614e6: jz 0x02861b11 ;*if_icmpne
; -
org.jruby.javasupport.util.RuntimeHelpers::isGenerationEqual@10 (line
1863)
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@11
(line 4)

(subsequent casting to RubyFixnum and direct invocation of op_lt, the
Java implementation of Fixnum#<)
0x028614ec: cmp ecx, 'org/jruby/RubyFixnum'
; {oop('org/jruby/RubyFixnum')}
0x028614f2: jnz 0x02862081 ;*checkcast
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@19
(line 4)
0x028614f8: mov ebx, [edx+0x28] ;*getfield runtime
; -
org.jruby.runtime.ThreadContext::getRuntime@1 (line 147)
; -
org.jruby.RubyFixnum::op_lt@1 (line 888)
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@26
(line 4)
; implicit exception:
dispatches to 0x028620fd

And the same sequence in the second compilation of __file__:

(Perhaps the problem here is actually getMetaClass()? It shows as
inlining based on the type profile information in LogCompilation, so
why wouldn't it be inlined here? There's a single, final
implementation on RubyBasicObject that both RubyObject and RubyFixnum
share.)
0x0286e4f6: mov ecx, [esp+0x44]
0x0286e4fa: mov eax, -1 ; {oop(NULL)}
0x0286e4ff: call 0x0282d2a0 ; OopMap{[64]=Oop [68]=Oop [16]=Oop
[24]=Oop off=36}
;*invokeinterface getMetaClass
; -
org.jruby.javasupport.util.RuntimeHelpers::isGenerationEqual@1 (line
1863)
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@11
(line 4)
; {virtual_call}
0x0286e504: mov ebx, [eax+0x18] ; implicit exception: dispatches to
0x0286efc9
0x0286e507: cmp ebx, 0x000001f3
0x0286e50d: jz 0x0286ea7f ;*if_icmpne
; -
org.jruby.javasupport.util.RuntimeHelpers::isGenerationEqual@10 (line
1863)
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@11
(line 4)
0x0286e513: mov eax, [esp+0x44]
0x0286e517: mov ebx, [eax+0x4] ; implicit exception: dispatches to 0x0286efd9
0x0286e51a: mov [esp+0x44], ebx
0x0286e51e: cmp ebx, 'org/jruby/RubyFixnum'
; {oop('org/jruby/RubyFixnum')}
0x0286e524: jnz 0x0286ef5d ;*checkcast
; -
ruby.jit.fib_ruby_5EC86D8D24F89CAF26F0CBEA2FBDA4ED21326951::__file__@19
(line 4)

So is it defeating the inlining of getMetaClass() that the IRubyObject
passed in is of two different types, even though getMetaClass comes
from their common superclass? In other words, if Hotspot encounters an
invokeinterface with two different types with a shared hierarchy and a
single implementation of that interface method...why does it fail to
inline that method? It seems like a case against using invokeinterface
if at all possible, even against a shared class hierarchy.

- Charlie

Charles Oliver Nutter

unread,
Sep 8, 2010, 3:08:50 AM9/8/10
to Tom Rodriguez, hotspot compiler, JVM Languages
Well, this is a frustrating discovery. Hotspot seems to have a lot of
trouble with invokeinterface versus casting to a concrete type and
using invokevirtual.

Modifying isGenerationEqual to do this instead seems to avoid the
deoptimization:

public static boolean isGenerationEqual(IRubyObject object, int
generation) {

return ((RubyBasicObject)object).getMetaClass().getCacheToken()
== generation;
}

There may be a future where an IRubyObject enters JRuby and does not
extend RubyBasicObject, but for now this cast should succeed every
time. But the same goes for the invokeinterface path, where both
RubyObject and RubyFixnum inherit the same getMetaClass implementation
from RubyBasicObject. Why is Hotspot able to cope with the
cast+invokevirtual when it can't cope with invokeinterface always
resolving to the same method?

- Charlie

On Wed, Sep 8, 2010 at 7:03 AM, Charles Oliver Nutter

John Rose

unread,
Sep 8, 2010, 3:10:33 AM9/8/10
to jvm-la...@googlegroups.com, Tom Rodriguez, hotspot compiler
On Sep 7, 2010, at 11:34 PM, Charles Oliver Nutter wrote:

> I'll give that a shot, Tom, thanks. Should have thought of it myself.

Also, to make the compiler really spill its guts, try +LogCompilation (google for the wiki page that discusses it).

Your prejudice against "fat" bytecodes corresponds somewhat to the HotSpot inlining heuristics. HotSpot strongly prefers to inline small methods, and the algorithm has a non-linear side to it. Two cold methods of 30 bytecodes each are much more likely to get inlined than one method of 60 bytecodes. Likewise for a hot call to two methods of 300 bytecodes each.

(Smoothing out these heuristics would be a great post-graduate thesis, IMO.)

For an example of experimenting with the inlining heuristics see:
http://blogs.sun.com/jrose/entry/an_experiment_with_generic_arithmetic
http://blogs.sun.com/jrose/resource/jsr292/SumWithIndy.zip

Any such use of the tuning flags must be regarded as purely experimental, but tuning experiments can lead to real improvements.

-- John

P.S One recent change (to type profiles, not inlining heuristics) was motivated by a performance tuning exercise similar to the present one:
http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/4b29a725c43c

The improvement is to collect type profiles up at the 'if' instead of down at the cast in idioms like this:
if (x instanceof C)
((C)x).somethingFast();
else
MyRuntime.somethingSlow(x);

With a successful type profile, this will be able to collapse like this:
if (x.getClass() != C42.class) trap();
inline C42.somethingFast(x);

HotSpot was already collecting type profiles at the cast and the invokevirtual, but not at the instanceof.

Charles Oliver Nutter

unread,
Sep 8, 2010, 3:20:29 AM9/8/10
to jvm-la...@googlegroups.com, Tom Rodriguez, hotspot compiler
On Wed, Sep 8, 2010 at 7:10 AM, John Rose <john....@oracle.com> wrote:
> Also, to make the compiler really spill its guts, try +LogCompilation (google for the wiki page that discusses it).

For whatever reaons, PrintInlining wouldn't show anything but
intrinsics. Perhaps it's my build. I opted to use LogCompilation
instead.

> Your prejudice against "fat" bytecodes corresponds somewhat to the HotSpot inlining heuristics.  HotSpot strongly prefers to inline small methods, and the algorithm has a non-linear side to it.  Two cold methods of 30 bytecodes each are much more likely to get inlined than one method of 60 bytecodes.   Likewise for a hot call to two methods of 300 bytecodes each.

I do a periodic survey of LogCompilation output for key parts of JRuby
(like the parser) to ensure we haven't grown any methods beyond
various inlining budgets. You get the idea pretty quickly that Hotspot
hates big method bodies...

> For an example of experimenting with the inlining heuristics see:
>  http://blogs.sun.com/jrose/entry/an_experiment_with_generic_arithmetic
>  http://blogs.sun.com/jrose/resource/jsr292/SumWithIndy.zip
>
> Any such use of the tuning flags must be regarded as purely experimental, but tuning experiments can lead to real improvements.

I'll give that another read. I feel like the logic I have in place for
inserting direct static (typed) invocations to "hot" methods at a
JRuby call site is a good step forward, but the bytecode size increase
is a harsh mistress. Add to that this seeming problem with inlining a
bimorphic invokeinterface (that's actually monomorphic from the base
implementation) and you have a very frustrated JRuby compiler writer.

> P.S One recent change (to type profiles, not inlining heuristics) was motivated by a performance tuning exercise similar to the present one:
>  http://hg.openjdk.java.net/jdk7/hotspot/hotspot/rev/4b29a725c43c
>
> The improvement is to collect type profiles up at the 'if' instead of down at the cast in idioms like this:
>  if (x instanceof C)
>    ((C)x).somethingFast();
>  else
>    MyRuntime.somethingSlow(x);
>
> With a successful type profile, this will be able to collapse like this:
>  if (x.getClass() != C42.class)  trap();
>  inline C42.somethingFast(x);
>
> HotSpot was already collecting type profiles at the cast and the invokevirtual, but not at the instanceof.

How about at an invokeinterface? It appears to collect type profiles,
but for only resolving to the immediate types, and not the actual
method-to-be-invoked or the common superclass of both that actually
provides the implementation...

At this point I'm also not above exploring C2, if it's possible to
localize this case to something I can consume. I'll have a gander at
your patches in the morning.

- Charlie

John Rose

unread,
Sep 8, 2010, 3:22:53 AM9/8/10
to jvm-la...@googlegroups.com, Tom Rodriguez, hotspot compiler
On Sep 8, 2010, at 12:08 AM, Charles Oliver Nutter wrote:

> Why is Hotspot able to cope with the
> cast+invokevirtual when it can't cope with invokeinterface always
> resolving to the same method?

Here's a possible answer, which could lead to JVM tweaks or bug fixes: Class hierarchy analysis allows devirtualization of the call, but only if the receiver is a real class. The JVM keeps some records about implementors of interfaces, but I don't think the compiler connects all the dots properly.

Note that if the call site is monomorphic then class hierarchy analysis doesn't apply at all, and calls are trivially devirtualized, since the receiver is an "exact" type (not quantified over a set of subtypes). This happens whether the receiver is a class or interface. But our current type profiles don't scale well beyond one or two exact types.

So a test case for this theory would have to make a call site which witnesses multiple receiver types but where all the types resolve to a common method (and that method has a single definition within the set of the receiver's subtypes). And the set of subtypes would have to be bounded by an interface, not a class.

-- John

John Rose

unread,
Sep 8, 2010, 3:29:45 AM9/8/10
to jvm-la...@googlegroups.com, Tom Rodriguez, hotspot compiler
On Sep 8, 2010, at 12:20 AM, Charles Oliver Nutter wrote:

> How about at an invokeinterface? It appears to collect type profiles,
> but for only resolving to the immediate types, and not the actual
> method-to-be-invoked or the common superclass of both that actually
> provides the implementation...

That's exactly right. Sounds like CHA (class hierarchy analysis) under interfaces would bail you out, at least until you created too many (>1 or >2) disjoint IRubyObject implementations.

A related problem may be either (a) the type profile doesn't collect general-enough information or (b) we don't have enough type profile points to collect specific-enough information. We can fix (a) by collecting ever fancier profile information or (b) by splitting profile points during inlining in early tiers.

(Or (c) use an explicitly controlled templating mechanism like anonymous classes. That may prematurely multiply bytecodes, though.)

-- John

Charles Oliver Nutter

unread,
Sep 8, 2010, 8:43:44 AM9/8/10
to jvm-la...@googlegroups.com, Tom Rodriguez, hotspot compiler
On Wed, Sep 8, 2010 at 2:29 AM, John Rose <john....@oracle.com> wrote:
> That's exactly right.  Sounds like CHA (class hierarchy analysis) under interfaces would bail you out, at least until you created too many (>1 or >2) disjoint IRubyObject implementations.

Yes, it sounds like exactly what I want. I'm guessing this is a "nice
to have but not currently being worked" sort of problem, but it seems
like a very generally-applicable improvement to Hotspot, since it's
probably common (this would be easy to measure) to have an
invokeinterface that receives multiple types but those types'
implementations of that interface mostly boil down to the same
superclass. As far as I understand it, it would help any case where
you've extended an existing type but pass it around via some
superclass's interface.

A simple analysis could be done quickly too: resolve the type profile
to the actual implementer of the interface, rather than to the exact
type. Since invokeinterface knows what interface it's looking for, it
could do a quick hierarchy scan to find the class that first claims to
implement it. Generally when extending a class that already implements
an interface, you don't re-add "implements", so this would work in a
high percentage of cases. Does the type profile at present support
resolving to a superclass, or does it only support resolving to an
object's exact class?

> A related problem may be either (a) the type profile doesn't collect general-enough information or (b) we don't have enough type profile points to collect specific-enough information.  We can fix (a) by collecting ever fancier profile information or (b) by splitting profile points during inlining in early tiers.

(a) could be the "find real implementer" above as a short-term
improvement, and later "find common superclass that implements the
target signature". I assume the latter already works because the
following also avoids the deopt I saw:

public static boolean isGenerationEqual(IRubyObject object, int
generation) {

return ((RubyObject)object).getMetaClass().getCacheToken() ==
generation;
}

Presumably the type profile here is able to handle arbitrarily many
subclasses of RubyObject. The above code really ought to optimize (at
least in my case) the same as:

public static boolean isGenerationEqual(IRubyObject object, int
generation) {
return object.getMetaClass().getCacheToken() == generation;
}

I would love to see that happen (and I'm willing to help, after the
necessary C2 learning period!), but for now I'll have to make some
hard decisions in the JRuby codebase and compiler :(

> (Or (c) use an explicitly controlled templating mechanism like anonymous classes.  That may prematurely multiply bytecodes, though.)

By this you mean using a class template to specialize a piece of code
to a common superclass *before* passing it to the optimizer?

For what it's worth, the following code is only slightly slower than
the fastest non-hacked implementation and not subject to the
deoptimization:

public static boolean isGenerationEqual(IRubyObject object, int
generation) {

RubyClass metaClass;
if (object instanceof RubyBasicObject) {
metaClass = ((RubyBasicObject)object).getMetaClass();
} else {
metaClass = object.getMetaClass();
}
return metaClass.getCacheToken() == generation;
}

But I may end up having the JRuby compiler do this since in order to
do a direct (non-dynamic) call it has to cast to a concrete type
anyway.

- Charlie

John Rose

unread,
Sep 8, 2010, 10:50:36 PM9/8/10
to jvm-la...@googlegroups.com, Tom Rodriguez, hotspot compiler
On Sep 8, 2010, at 5:43 AM, Charles Oliver Nutter wrote:

> On Wed, Sep 8, 2010 at 2:29 AM, John Rose <john....@oracle.com> wrote:
>> That's exactly right. Sounds like CHA (class hierarchy analysis) under interfaces would bail you out, at least until you created too many (>1 or >2) disjoint IRubyObject implementations.
>
> Yes, it sounds like exactly what I want. I'm guessing this is a "nice
> to have but not currently being worked" sort of problem, but it seems
> like a very generally-applicable improvement to Hotspot, since it's
> probably common (this would be easy to measure) to have an
> invokeinterface that receives multiple types but those types'
> implementations of that interface mostly boil down to the same
> superclass. As far as I understand it, it would help any case where
> you've extended an existing type but pass it around via some
> superclass's interface.
>
> A simple analysis could be done quickly too: resolve the type profile
> to the actual implementer of the interface, rather than to the exact
> type. Since invokeinterface knows what interface it's looking for, it
> could do a quick hierarchy scan to find the class that first claims to
> implement it. Generally when extending a class that already implements
> an interface, you don't re-add "implements", so this would work in a
> high percentage of cases. Does the type profile at present support
> resolving to a superclass, or does it only support resolving to an
> object's exact class?

No, it only includes exact classes. To extend the profile would require detecting profile overflow and switching to an "inexact mode", and then teaching the optimizer to do useful stuff with those extra states. The code generation framework is already pretty good at building the required code shapes; the missing bit is optimization policy. This would be a good training project for someone that wanted to learn the system thoroughly. (He said, with a hopeful look.)

>> A related problem may be either (a) the type profile doesn't collect general-enough information or (b) we don't have enough type profile points to collect specific-enough information. We can fix (a) by collecting ever fancier profile information or (b) by splitting profile points during inlining in early tiers.
>
> (a) could be the "find real implementer" above as a short-term
> improvement, and later "find common superclass that implements the
> target signature". I assume the latter already works because the
> following also avoids the deopt I saw:
>
> public static boolean isGenerationEqual(IRubyObject object, int
> generation) {
> return ((RubyObject)object).getMetaClass().getCacheToken() ==
> generation;
> }

That works (I think) because getMetaClass is monomorphic in RubyObject, and CHA is designed to detect such conditions.

If IRubyObject has just one implementor (RubyObject), we could do CHA on that implementor, and issue a dependency against new implementors of IRubyObject getting loaded. That would be a good starter project.

> Presumably the type profile here is able to handle arbitrarily many
> subclasses of RubyObject. The above code really ought to optimize (at
> least in my case) the same as:

No, the type profile is limited to -XX:TypeProfileWidth=2 distinct classes (with associated frequency counts). But CHA is not limited, since it doesn't have to record any history.

> public static boolean isGenerationEqual(IRubyObject object, int
> generation) {
> return object.getMetaClass().getCacheToken() == generation;
> }
>
> I would love to see that happen (and I'm willing to help, after the
> necessary C2 learning period!), but for now I'll have to make some
> hard decisions in the JRuby codebase and compiler :(
>
>> (Or (c) use an explicitly controlled templating mechanism like anonymous classes. That may prematurely multiply bytecodes, though.)
>
> By this you mean using a class template to specialize a piece of code
> to a common superclass *before* passing it to the optimizer?

Yes, and/or inlining at the bytecode level so as to "attract" more profiling information. Manual inlining is a desperation move, though.

> For what it's worth, the following code is only slightly slower than
> the fastest non-hacked implementation and not subject to the
> deoptimization:
>
> public static boolean isGenerationEqual(IRubyObject object, int
> generation) {
> RubyClass metaClass;
> if (object instanceof RubyBasicObject) {
> metaClass = ((RubyBasicObject)object).getMetaClass();
> } else {
> metaClass = object.getMetaClass();
> }
> return metaClass.getCacheToken() == generation;
> }

Yes, that's what the JIT might produce if it had inexact profiles.

(As I described before, in the next build or so 6912064 will make that 'instanceof' collect a new type profile. But it won't help you here, since your 'object' value is strongly polymorphic, even though the type varies under a nice bound of RubyBasicObject.)

> But I may end up having the JRuby compiler do this since in order to
> do a direct (non-dynamic) call it has to cast to a concrete type
> anyway.

I will follow this message up with a suggestion that goes in a different direction for managing metaclasses.

-- John

John Rose

unread,
Sep 8, 2010, 10:53:59 PM9/8/10
to Charles Nutter, jvm-la...@googlegroups.com
Here's something to think about as an alternative to the instanceof/cast/invoke dance for getting metaclasses.

At the Summit we were talking about managing metaobjects via a class-specific analogue of ThreadLocal. Here's a sketch of what the JSR 292 EG is thinking about in that vein:

http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm-0908/java/dyn/ClassValue.html

The implementation of such a thing should be tuned (as ThreadLocal is tuned) to execute in a small number of memory references, without locks or volatiles. Ideally the code would use a one-element cache, like this:

RubyClass metaClass = RUBY_META_OBJECT.get(object.getClass());
=>
mov (object+4), classtemp
mov (classtemp+56), cachetemp
cmp (cachetemp,8), RUBY_META_OBJECT
jne slowpath
mov (cachetemp,4), metaClass

This provides an alternative path to interface injection for storing per-runtime data on *all* classes (RubyArray and java.lang.String alike). How useful would this be?

Your feedback might help us decide on where ClassValue sits in a Plan A vs. Plan B feature cut, if such a cut comes about. See Mark Reinhold's blog for the A/B conversation.

-- John

Charles Oliver Nutter

unread,
Sep 8, 2010, 11:59:43 PM9/8/10
to John Rose, jvm-la...@googlegroups.com
On Thu, Sep 9, 2010 at 2:53 AM, John Rose <john....@oracle.com> wrote:
> Here's something to think about as an alternative to the instanceof/cast/invoke dance for getting metaclasses.
>
> At the Summit we were talking about managing metaobjects via a class-specific analogue of ThreadLocal.  Here's a sketch of what the JSR 292 EG is thinking about in that vein:
>
>  http://cr.openjdk.java.net/~jrose/pres/indy-javadoc-mlvm-0908/java/dyn/ClassValue.html
>
> The implementation of such a thing should be tuned (as ThreadLocal is tuned) to execute in a small number of memory references, without locks or volatiles.  Ideally the code would use a one-element cache, like this:

I think I told you and Jochen about my annotation trick, yes?

The trick is basically this: implement
java.lang.annotations.Annotation with your own "carrier" class:

public class MetaCarrier implements Annotation {
public final MetaClass metaClass;
public MetaCarrier(MetaClass metaClass) {
this.metaClass = metaClass;
}
}

Then, use reflection to crack open the java.lang.Class object and
inject carrier instances into it at runtime. The cost of retrieving
those instances is still ultimately a hash lookup or linear scan of an
array, as in someClass.getAnnotation(MetaCarrier.class).metaClass.
However, since the instance is *actually* attached to the
java.lang.Class object, you don't have to manually maintain a separate
weak/soft cache.

Obviously it's a hack, but I still think it's cute. Built-in support
would be much nicer. :)

>  RubyClass metaClass = RUBY_META_OBJECT.get(object.getClass());
>    =>
>  mov (object+4), classtemp
>  mov (classtemp+56), cachetemp
>  cmp (cachetemp,8), RUBY_META_OBJECT
>  jne slowpath
>  mov (cachetemp,4), metaClass
>
> This provides an alternative path to interface injection for storing per-runtime data on *all* classes (RubyArray and java.lang.String alike).  How useful would this be?
>
> Your feedback might help us decide on where ClassValue sits in a Plan A vs. Plan B feature cut, if such a cut comes about.  See Mark Reinhold's blog for the A/B conversation.

Currently, JRuby is bound to pass IRubyObject implementations
throughout the system. This was a decision made before my time, and it
has advantages and disadvantages:

* For Ruby objects (which comprises most typical Ruby apps, since we
provide a full complement of our own String, collections, IO and other
APIs), it means we have only a single hop to get an object's
metaclass, via that monomorphic implementation and a shared field on
their common superclass RubyBasicObject. This means that for the vast
majority of objects in a typical Ruby app, we pay no additional
penalties to retrieve or cache metaclasses (as in Groovy, for
example).

* For Java objects (which we'd ideally like people to use more as a
value-add in JRuby), we are force to wrap them with an
IRubyObject-implementing wrapper. These wrappers are small, but for
repeated calls across the Ruby/Java barrier (blood/brain barrier?) it
can be costly to either maintain a weak cache of wrappers or to
recreate them on each pass-through. However, once the object is in
"Ruby space", we can always get at the metaclass through the same
single hop.

For the tightest possible integration with Java, we would obviously
want to do away with the IRubyObject wrapper, but by working very hard
to hide it most users never know it's there.

The largest challenge keeping us from moving everything toward Object
and away from IRubyObject as the basic atom in the system is metaclass
management. I've heard plenty of Jochen's horror stories about
managing their weak/soft/volatile/whatever caching mechanisms for
metaclasses, and if I understood his problems from JVMLS, this cache
remains one of the biggest problems for Groovy performance and
optimization. That scares the bejeezus out of me, and it's largely why
I haven't tackled the IRubyObject to Object conversion in earnest.
Having a reliable way to manage arbitrary per-class data would go a
long way toward making the transition easier (as would, of course,
interface injection...but even in that case we need a way to look up
the class once we're in the injected interface).

So, long story short, it seems like the inability to attach clean,
safe, lifecycle-attached data to classes is a fundamental problem for
mutable-metaclass or runtime-class-decorating languages like Groovy
and JRuby.

I'll have a look at Mark's blog.

- Charlie

Jochen Theodorou

unread,
Sep 9, 2010, 4:37:09 AM9/9/10
to jvm-la...@googlegroups.com
John Rose schrieb:

Plan A and Plan B is speaking of the release plans for jdk7 I guess

I think this feature would absolutely be useful to us. When I think
about mutation, then I see some small problems in the typical usage and
how this might be done. A custom MetaClass might be created outside by
the user and then set. How could that be done with this get logic? If I
first somehow have to ensure that this from outside coming MetaClass
object is fully realized in the sense of concurrency, then this will
probably spoil all the fun about not having locks or volatiles.

where it would absolutely help is with no longer having to Weak or
SoftReference the MetaClasses and leave it to the VM to clean them up
once the Class can be collected.

Since I had some trouble with the IBM JDK in the past it would also be
good to clarify that if that MetaClass references the class it is
assigned to, that this class can still be collected. The Sun JDK has no
problem with that if the metaclass is not hard referenced, but the IBM
JDK versions we had needed special action, luckily on the command line only.

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Rémi Forax

unread,
Sep 9, 2010, 4:56:08 AM9/9/10
to jvm-la...@googlegroups.com
Le 09/09/2010 10:37, Jochen Theodorou a �crit :

We can a method set(Class<?>) but it will require the VM to go to a
safepoint.
Knowing that remove() will require the same mechanism,
I think it depends how badly you need it :)

>
> where it would absolutely help is with no longer having to Weak or
> SoftReference the MetaClasses and leave it to the VM to clean them up
> once the Class can be collected.
>
> Since I had some trouble with the IBM JDK in the past it would also be
> good to clarify that if that MetaClass references the class it is
> assigned to, that this class can still be collected. The Sun JDK has
> no problem with that if the metaclass is not hard referenced, but the
> IBM JDK versions we had needed special action, luckily on the command
> line only.
>
> bye Jochen

R�mi

Jochen Theodorou

unread,
Sep 9, 2010, 7:37:08 AM9/9/10
to jvm-la...@googlegroups.com
R�mi Forax schrieb:
[...]

> We can a method set(Class<?>) but it will require the VM to go to a
> safepoint.
> Knowing that remove() will require the same mechanism,
> I think it depends how badly you need it :)

well, it would be incomplete for me without that. I don't absolutely
need a set, but then, how does the user get it? And how to ensure that
it is initialized. The result would be that I would have to do all the
synchronization I have to do now there too and then it gives no
advantage...

actually that is a general problem in Java. I would not need a set with
some bad semantics if there were an easy way for me to ensure that my
newly created object is fully initialized, without having it rely on
final fields and immutability. If that is solved, then it will help not
only me in this case here, it would most probably help Doug Lea too. Or
is there a new way for this that will go in jdk7? Fences was not ready,
was it?

ok, back to the actual matter. A safepoint is probably quite heavy for
this, but if it is the only way, then put a big warning around it and
let's have it done.

Reply all
Reply to author
Forward
0 new messages