avoiding boxing

200 views
Skip to first unread message

Jochen Theodorou

unread,
Apr 29, 2008, 5:36:16 PM4/29/08
to jvm-la...@googlegroups.com
Hi all,

I wanted to collect a bit data to how you avoid boxing in your language
implementations. I am asking because currently Groovy makes lots of
calls via Reflection, and that means creating for each call an Object[],
containing boxed values of ints and all the other primitive types. Not
only that... for doing 2+3, we actually box the numbers, make a dynamic
method call to Integer.plus, which then unboxes the values, makes the
plus operation and boxes the result. Usually we keep the boxed value, so
there is no need to unbox again, but still... for a simple plus you need
to do quite a lot. And even if the call would be native.. I make
measurements, that such code is 20-30 times slower, even with a direct
method call.

The best thing would be of course to call the method with primitive
types.. but keeping such a type information isn't the most easy thing to
do. Reflective method calls do not return primitives and f(a)+g(b) might
or might not be something where two ints are added. On the other hand...
keeping the values on the stack isn't easy either. There are just too
many primitive types to provide a path for each and every primitive
type. Also longs and doubles take up two slots instead of one, making
stack manipulation more difficult.

I see here an advantage for interpreted languages, since thy have not to
care about such things and can do whatever they need to do. And static
languages do usually now the resulting types of method calls, so they
won't have these problems either I guess..

John Rose was talking about tuples... but I am not sure they can be used
to resolve the general problem. What do others think?

bye Jochen


--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/
http://www.g2one.com/

John Cowan

unread,
Apr 29, 2008, 6:44:14 PM4/29/08
to jvm-la...@googlegroups.com
On Tue, Apr 29, 2008 at 5:36 PM, Jochen Theodorou <blac...@gmx.org> wrote:

> I wanted to collect a bit data to how you avoid boxing in your language
> implementations.

My language provides bignums and flonums, which I simply represent as
BigIntegers and Doubles. I pay the boxing penalty, but the
combinatorial explosion isn't too bad (only 4 cases to deal with). I
worry about converting to and from Byte, Short, Int, Long, and Float
objects only when communicating with native Java methods.

My measurements show that the cost of using BigIntegers for numbers
representable by Integers is only about 2.5 times in the worst case (a
tight loop multiplying numbers), which I don't worry about -- the
mathematical tractability of BigIntegers is a huge win.

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures

Charles Oliver Nutter

unread,
Apr 29, 2008, 6:55:41 PM4/29/08
to jvm-la...@googlegroups.com
A few JRuby techniques to reduce argument boxing:

* We have specific-arity call paths for up to three arguments and with
or without a block. The compiler calls one of those when it can do so,
and calls the default [] version otherwise. This means that from the
call site down, there's 10 paths straight through to the eventual code.

* We generate method handles (about 1700 of them) for all Java-based
core class methods, wiring up the specific-arity path and default paths
to call the target Java method directly, and all others to raise arity
errors. This made managing those ten paths significantly easier. Handles
are generated based of the method signature and an annotation like this:

@JRubyMethod(name = "+") // name is optional if it matches method
public IRubyObject op_plus(IRubyObject other) { ...

More recently, the handle generation has been made "smarter", splitting
optional argument counts into separate methods. So if a method can take
one or two arguments, you can do this:

@JRubyMethod(name = "foo", required = 1, optional = 1)
public IRubyObject foo(IRubyObject[] args) { ...

but you generally would want to "arity split" the method like this:

@JRubyMethod(name = "foo")
public IRubyObject foo(IRubyObject arg) {...
@JRubyMethod(name = "foo")
public IRubyObject foo(IRubyObject arg1, IRubyObject arg2) {

A single handle is generated that dispatches one arg to the first method
and two args to the second. The handle generation is fairly intelligent,
and the generated handle does arity-checking, pre/post-method framing,
and passes through runtime structures like ThreadContext and Block if
the target method "wants" them.

* The interpreter also calls specific-arity paths when possible using a
switch on the number of actual arguments.

The reduction of argument boxing improved performance substantially, and
is largely responsible for JRuby 1.1 beating Ruby 1.9 on many benchmarks
now. We do not yet support specific-arity calling for compiled Ruby
methods or closures, which will both likely represent additional perf
boosts in the future.

On object boxing:

We currently do not do anything to avoid object boxing for primitives,
largely because of the complexity it would introduce into the entire
pipeline. We are however going to be looking at a fairly large
modification to move JRuby away from requiring all objects to implement
IRubyObject. This will require us to build a second set of call paths
throughout the system, using the IRubyObject paths for now (and in the
future when an object is known to be IRubyObject) and the Object paths
otherwise. Until we're to that point we won't be exploring primitive
boxing improvements. In general, I doubt we (I) will ever work on
keeping primitives unboxed, since the amount of effort required really
wouldn't add up to that much better an experience for Ruby users.

- Charlie

Jochen Theodorou

unread,
Apr 29, 2008, 7:21:41 PM4/29/08
to jvm-la...@googlegroups.com
John Cowan schrieb:

> On Tue, Apr 29, 2008 at 5:36 PM, Jochen Theodorou <blac...@gmx.org> wrote:
>
>> I wanted to collect a bit data to how you avoid boxing in your language
>> implementations.
>
> My language provides bignums and flonums, which I simply represent as
> BigIntegers and Doubles. I pay the boxing penalty, but the
> combinatorial explosion isn't too bad (only 4 cases to deal with). I
> worry about converting to and from Byte, Short, Int, Long, and Float
> objects only when communicating with native Java methods.
>
> My measurements show that the cost of using BigIntegers for numbers
> representable by Integers is only about 2.5 times in the worst case (a
> tight loop multiplying numbers), which I don't worry about -- the
> mathematical tractability of BigIntegers is a huge win.

well in case of adding two ints I get numbers telling me the factor is
more 20, then 2.5. That is better than using Integer, but still far from
good. So you have a micro benchmark showing this effect?I mean one
without all the other effects that will slow down computation. Because
it is well possible that other effects take so much time, that you don't
see the actual speedup... and won't see it.One the other hand that
scenario would be more realistic I guess... and it would also mean that
you have to get the other things faster ;)

Jochen Theodorou

unread,
Apr 29, 2008, 7:33:08 PM4/29/08
to jvm-la...@googlegroups.com
Jochen Theodorou schrieb:
[...]

> well in case of adding two ints I get numbers telling me the factor is
> more 20, then 2.5.

I take that back, I overlooked a decimal... so it is more like 200 than
2.5 here. At last with jdk 1.6 and standard BigInteger. That also means
it is much worse than Integer... I guess the algorithm I uses caused the
creation of really big BigIntegers. Trying to use mod to limit the sie
made it actually even worse.

bye blackdrag

Jochen Theodorou

unread,
Apr 29, 2008, 7:45:08 PM4/29/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:

> A few JRuby techniques to reduce argument boxing:
>
> * We have specific-arity call paths for up to three arguments and with
> or without a block. The compiler calls one of those when it can do so,
> and calls the default [] version otherwise. This means that from the
> call site down, there's 10 paths straight through to the eventual code.

10? that sounds like pretty much..

> * We generate method handles (about 1700 of them) for all Java-based
> core class methods, wiring up the specific-arity path and default paths
> to call the target Java method directly, and all others to raise arity
> errors. This made managing those ten paths significantly easier. Handles
> are generated based of the method signature and an annotation like this:
>
> @JRubyMethod(name = "+") // name is optional if it matches method
> public IRubyObject op_plus(IRubyObject other) { ...
>
> More recently, the handle generation has been made "smarter", splitting
> optional argument counts into separate methods. So if a method can take
> one or two arguments, you can do this:
>
> @JRubyMethod(name = "foo", required = 1, optional = 1)
> public IRubyObject foo(IRubyObject[] args) { ...
>
> but you generally would want to "arity split" the method like this:
>
> @JRubyMethod(name = "foo")
> public IRubyObject foo(IRubyObject arg) {...
> @JRubyMethod(name = "foo")
> public IRubyObject foo(IRubyObject arg1, IRubyObject arg2) {
>
> A single handle is generated that dispatches one arg to the first method
> and two args to the second. The handle generation is fairly intelligent,
> and the generated handle does arity-checking, pre/post-method framing,
> and passes through runtime structures like ThreadContext and Block if
> the target method "wants" them.

ok.. but how is that helping in avoiding boxing or get calculations faster?

> * The interpreter also calls specific-arity paths when possible using a
> switch on the number of actual arguments.
>
> The reduction of argument boxing improved performance substantially, and
> is largely responsible for JRuby 1.1 beating Ruby 1.9 on many benchmarks
> now. We do not yet support specific-arity calling for compiled Ruby
> methods or closures, which will both likely represent additional perf
> boosts in the future.

I see... maybe the JRuby problem is just very different from the Groovy
problem here

> On object boxing:
>
> We currently do not do anything to avoid object boxing for primitives,
> largely because of the complexity it would introduce into the entire
> pipeline. We are however going to be looking at a fairly large
> modification to move JRuby away from requiring all objects to implement
> IRubyObject. This will require us to build a second set of call paths
> throughout the system, using the IRubyObject paths for now (and in the
> future when an object is known to be IRubyObject) and the Object paths
> otherwise. Until we're to that point we won't be exploring primitive
> boxing improvements. In general, I doubt we (I) will ever work on
> keeping primitives unboxed, since the amount of effort required really
> wouldn't add up to that much better an experience for Ruby users.

well... lets say you represent integers as Java ints, then I doubt there
is something faster than iadd or a method call without doing any boxing
executing iadd at the end. Of course that makes no sense if your
language has no ints like Java has and if your ints have not the same
overflow logic. Using Integer instead seems at last for plus to be
around 20 times slower, using BigIntger around 200 times and using a
custom wrapper object around 47 times.. only that the last ones have
several advantages as they can be used to hold multiple different values
and keep overflow flags and such... Well such a holer would then of
course still need adaption if you want to call a Java method taking
primitive ints with it.

bye blackdrag

Rich Hickey

unread,
Apr 29, 2008, 7:54:56 PM4/29/08
to JVM Languages
I think the answer is tags, as John Rose discussed here:

http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

That, standard fast multiprecision arithmetic, and tail call
optimization are the wish list for me.

Rich

Charles Oliver Nutter

unread,
Apr 29, 2008, 8:24:04 PM4/29/08
to jvm-la...@googlegroups.com
Jochen Theodorou wrote:
> Charles Oliver Nutter schrieb:
>> A few JRuby techniques to reduce argument boxing:
>>
>> * We have specific-arity call paths for up to three arguments and with
>> or without a block. The compiler calls one of those when it can do so,
>> and calls the default [] version otherwise. This means that from the
>> call site down, there's 10 paths straight through to the eventual code.
>
> 10? that sounds like pretty much..

zero args, no block
zero args, with block
one arg, no block
one arg, with block
two args, no block
two args, with block
three args, no block
three args, with block
n args, no block
n args, with block

The block/no block split is there because blocks can result in non-local
flow control while normal methods can't. So by having two paths we can
eliminate the flow-control exception handling most of the time.

> ok.. but how is that helping in avoiding boxing or get calculations faster?

Previously, all calls boxed argument lists in [], resulting in a lot of
time wasted constructing, populating, access, and collecting those
arrays. The above work eliminated that cost for a majority of calls to
core methods.

So with current JRuby, 1 + 2 passes the '2' as a Fixnum object through a
CallSite, a method handle, and into the target method without ever
having to box it in an array.

> I see... maybe the JRuby problem is just very different from the Groovy
> problem here

Well, not really...you box all arguments in arrays too, and you're
paying a cost for that. Whether that cost is measurable in the face of
other overhead, I don't know. For us it has made a very measurable
difference.

And of course we box all numeric types, so we have the same problem (if
you consider it a problem).

> well... lets say you represent integers as Java ints, then I doubt there
> is something faster than iadd or a method call without doing any boxing
> executing iadd at the end. Of course that makes no sense if your
> language has no ints like Java has and if your ints have not the same
> overflow logic. Using Integer instead seems at last for plus to be
> around 20 times slower, using BigIntger around 200 times and using a
> custom wrapper object around 47 times.. only that the last ones have
> several advantages as they can be used to hold multiple different values
> and keep overflow flags and such... Well such a holer would then of
> course still need adaption if you want to call a Java method taking
> primitive ints with it.

It's worth mentioning that unless you want to change the semantics of
groovy quite a bit, I suspect unboxing is going to be really hard to add
after the fact. For example, in JRuby, in order to unbox, we'd need to
have extra logic for every operation we want to perform against
primitives that would check whether the given value is actually a
primitive or not. We'd need to have logic for parameters to pass them as
primitive values rather than boxing and passing. We'd have to check or
ignore overrides to that set of operations. Any call paths that need to
pass through JRuby system would need to also accept unboxed primitive
values.

And it may not even be worth it in JRuby. Ruby 1.9 uses tagged integers
for Fixnums with an overflow check to roll to Bignum which is a full-on
object. It has fast-paths for numeric operators that go straight to the
code bypassing dynamic dispatch, and those operators do a normal C-level
integer math operation on the values. And we're as fast or faster anyway
with our fully-boxed custom class wrapping a long. It's hard to justify
the work for us when we're the fastest production-worthy Ruby
implementations for most apps already.

In general it seems like the time would almost always be better-spent
making dynamic dispatch faster and reducing per-call cost before trying
to get primitive math operations to run faster.

- Charlie

John Rose

unread,
Apr 29, 2008, 8:32:23 PM4/29/08
to jvm-la...@googlegroups.com
On Apr 29, 2008, at 3:55 PM, Charles Oliver Nutter wrote:

We do not yet support specific-arity calling for compiled Ruby 

methods or closures, which will both likely represent additional perf 

boosts in the future.


JRuby has gone a fairly long way toward splitting calling sequences into a set of refinements, each of which is closer (than the generic calling sequence) to what the JVM prefers.  There's usually some semantic mismatch, such as (e.g.) a maximum number of positional arguments, or a policy of always boxing primitives.  I think each language (and its users) will favor some particular sweet spot where the calling sequences are refined to some degree, and the remaining mismatches to the JVM are not felt to be painful.  Or, at least, not as painful as the complexity of fixing them.

In a word, the JVM prefers to put argument values in registers than in objects.  Escape analysis, inlining, and compiler intrinsics (and fixnums, some day) are examples of optimizations which get more values into registers than the bytecodes would seem to allow.  But on the JVM it will always be at least a little faster to unbox your argument lists, and unbox your primitive arguments and return values.

The reason we are doing method handles the way we are is to cut down the complexity of dealing with a variety of call signatures in a runtime.  Probably, most runtimes, like JRuby, fix a handful of manually-defined call types and manually sort them out.  It is possible to make factory-based systems also which can spin out an infinity of call types (that's the way my old esh VM worked).  But the real simplification of such tasks will come when the JVM can manage hundreds of call signatures without creating hundreds of glue interfaces and classes, and when we build a low-level runtime that makes the management simple.

Then it will be possible to contemplate a design where your dynamic language call sites are customized to a good guess as to the outgoing argument types (via profiling or static analysis or both) and will be linked (as in JRuby) through adapters that usually hit straight through, to the target method, whether the target is a tightly typed Java library routine or something looser.  If you write foo[0] and foo is usually a list (and 0 is always zero), then the dynamic call site should, after warmup and linking, directly call List.get(int), without extra motion.  This can work; I've seen it work in the past.

A further, crucial simplification will come when call sites can be linked according to language-specific rules, instead of being statically tied to Java classes or interfaces.  This is what invokedynamic is for (beyond method handles).  At that point it will be practical to get rid of IMyLanguageInterface and MyLanguageInteger.

It's a good time to be a language and VM hacker...

-- John

P.S. The JSR 292 EDR is forthcoming...  See my blog in the meantime.

John Rose

unread,
Apr 29, 2008, 8:38:23 PM4/29/08
to jvm-la...@googlegroups.com
On Apr 29, 2008, at 4:21 PM, Jochen Theodorou wrote:

well in case of adding two ints I get numbers telling me the factor is 

more 20, then 2.5. 


Arithmetic is key use case for invokedynamic and primitive-heavy method handle signatures.

You want to say the bytecode equivalent of x.'+'(Object y).  But also if y is literally '1' (as it often is) you want x.'+'(int y).

Then invokedynamic can let you manage the types as they flow by dynamically; your runtime can pick more or less specific methods as the case may be.  And they do not have to be Java methods on x.getClass(); they can be anything you can express with a method handle that accept x and y.

With inlining and type profiling, you get the right machine code (addl eax, 1), guarded somewhere upstream by a cheap type test (cmpl [x+#klass], #Integer.klass).

-- John

Charles Oliver Nutter

unread,
Apr 29, 2008, 9:34:12 PM4/29/08
to jvm-la...@googlegroups.com
John Rose wrote:
> The reason we are doing method handles the way we are is to cut down the
> complexity of dealing with a variety of call signatures in a runtime.
> Probably, most runtimes, like JRuby, fix a handful of manually-defined
> call types and manually sort them out. It is possible to make
> factory-based systems also which can spin out an infinity of call types
> (that's the way my old esh VM worked). But the real simplification of
> such tasks will come when the JVM can manage hundreds of call signatures
> without creating hundreds of glue interfaces and classes, and when we
> build a low-level runtime that makes the management simple.

It's worth mentioning that I did prototype a factory system that would
generate call site adapters with varying parameter lengths, to support
the 'n' possible cases of finite argument passing (and eventually,
propagate type information), and it worked pretty well (one more call
monomorphized didn't hurt, either). But I knew it would only ever be a
prototype and the early work was largely scrapped. The reason? Too
expensive. Even with the best tricks, it would end up generating an
ungodly amount of code and classes. So the knob stayed where it is
currently and we accept that the best compromise for now is manually
adding a few call paths and propagating arity, if not type. We could do
more, but this covers a good 95% of calls with a minimal cost.

JSR292 and related work are going to make all that possible...and even
better, make a lot of it totally unnecessary. And I promise I'll have
JRuby builds, patches, and switches to take advantage of each feature as
they're available.

- Charlie

John Wilson

unread,
Apr 30, 2008, 5:35:30 AM4/30/08
to jvm-la...@googlegroups.com

Like Groovy, I use a MetaClass to implement the dynamic behaviour of a
class. MetaClasses are immutably associated with a Class but the
MetaClass itself is mutable. I support Monkey Patching by mutating the
MetaClass.

Like Groovy (kind of) I make calls to a dispatcher object (called
ThreadLocal) which, in turn, makes calls to the MetaClass to actually
perform the operation/execute the method. This object is unlike the
Groovy approach in two respects: Firstly it is Thread specific,
secondly it knows absolutely nothing about the default semantics of
the language it just orchestrates the call to the MetaClass. In Ng the
MetaClass has absolute control over the behaviour of the Class it
represents.

The Ng runtime system does not implement operators as method calls. My
view is that doing this throws away information which is useful in
improving the performance of the system. So a + b results in a call to
a method on ThreadLocal which looks like tc.add().apply(a, b) (tc is
the instance of ThreadContext for the current thread). ThreadContext
will find the correct MetaClass for a and route the call to the
MetaClass passing with it some information (about thread specific
Monkey Patching) which will help the MetaClass decide what to do.

The ThreadContext API for operators is *very* rich (there are about
250 methods which implement addition, for example). The reason for the
richness of the API the combinatorial explosion cause by the need to
support all the combinations of the primitive arithmetic types plus
BigDecimal, BigInteger and Object. This is compounded by the fact that
there are methods which return primitive results.

For example we have the following two methods on add:

Object apply(int lhs, int rhs)

and

int intApply(int lhs, int rhs) throws NotPerformed

The first flavour is a "slow" implementation which, in the standard
implementation, returns a boxed int. The second is a "fast"
implementation which returns an unboxed int.

Now the compiler can never just use the "fast" implementation. The
user can, at any time, change the semantics of addition (for example,
to return a long if the operation overflows). If this happens the
"fast" implementation will throw the NotPerformed exception if it
wants to return a result which is not an int.

So the compiler generates code which "speculatively" executes the
"fast" calls and falls back to the "slow" calls if one of the "fast'
calls fails.

e.g.

int a, b, c;
...
a = a + b * c

generates the equivalent of

try {
a = tc.add().intApply(a, tc.multiply().intApply(b, c));
} catch(NotPerformed e) {
a = tc.convert().asInt(tc.add().apply(a, rc.multiply().apply(b, c)));
}

Obviously the implementations of intApply(), etc. must not have side
effects and calls to user methods must not be made in the try/catch
block.

My initial tests show that this leads to very good performance (the
"fast" path running at less than twice the speed of Java). The fact
that the path from the call site to the code performing the operation
is very short and very simple means that the JIT seems to be able to
work wonders. The "slow" path is no slouch either (running at less
than four times slower than Java).

This has convinced me that the approach of having a very wide but
shallow API (as opposed to Groovy's narrow but deep API) really has
something to offer in improving the performance of Dynamic languages
on the JVM. (I see that Alex Tkachman's work on improving Groovy
performance involves widening the API which tends to validate this
approach).

John Wilson

Jochen Theodorou

unread,
Apr 30, 2008, 5:49:31 AM4/30/08
to jvm-la...@googlegroups.com
Rich Hickey schrieb:

>
> On Apr 29, 5:36 pm, Jochen Theodorou <blackd...@gmx.org> wrote:
[...]

> I think the answer is tags, as John Rose discussed here:
>
> http://blogs.sun.com/jrose/entry/fixnums_in_the_vm
>
> That, standard fast multiprecision arithmetic, and tail call
> optimization are the wish list for me.

I don't see how this will help me in Groovy. We use the Java types, so
there is no need to represent a 20 bit integer. Also we mostly want to
call Java methods, and those might take int, not a fixnum. If I have to
transform the fixnum into a int first, then I will loose performance
already. And I don't think that the JVM will provide new bytecodes to
multiply, divide and add fixnums. Especially not since the JVM can not
know how these operations have to be performed in detail... for example
in case of an overflow. The current approach of a class with a value
field is when containing an int and doing plus operations around 45
times slower than direct usage of primitive ints. Operations on Integer
objects are no better here.

Jochen Theodorou

unread,
Apr 30, 2008, 6:05:19 AM4/30/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:
[...]

>> I see... maybe the JRuby problem is just very different from the Groovy
>> problem here
>
> Well, not really...you box all arguments in arrays too, and you're
> paying a cost for that. Whether that cost is measurable in the face of
> other overhead, I don't know. For us it has made a very measurable
> difference.

of course creating those arrays is an overhead we have to pay too.. in
my example it seems it gives an additional 50% to the runtime... meaning
we are in the area of being 60-70 times slower than with primitive ints.
But we target calling methods directly, without reflection, and then of
course without arrays.

> And of course we box all numeric types, so we have the same problem (if
> you consider it a problem).

I do. because as I said... working with primitive ints seems to be much
faster, than using wrapper or value objects. And I am talking here about
a factor of 40 or more. With partial boxing I am able to get this down
to being 20x slower, but my goal would be to be less than 10 times
slower. Also generating bytecode that can handle the primitive types
along with the rest is not an easy task. It is one thing for ints, at
last they take only one slot, but double and long make it really
difficult, because they take two slots and any stack manipulating code
becomes really bad

>> well... lets say you represent integers as Java ints, then I doubt there
>> is something faster than iadd or a method call without doing any boxing
>> executing iadd at the end. Of course that makes no sense if your
>> language has no ints like Java has and if your ints have not the same
>> overflow logic. Using Integer instead seems at last for plus to be
>> around 20 times slower, using BigIntger around 200 times and using a
>> custom wrapper object around 47 times.. only that the last ones have
>> several advantages as they can be used to hold multiple different values
>> and keep overflow flags and such... Well such a holer would then of
>> course still need adaption if you want to call a Java method taking
>> primitive ints with it.
>
> It's worth mentioning that unless you want to change the semantics of
> groovy quite a bit, I suspect unboxing is going to be really hard to add
> after the fact. For example, in JRuby, in order to unbox, we'd need to
> have extra logic for every operation we want to perform against
> primitives that would check whether the given value is actually a
> primitive or not. We'd need to have logic for parameters to pass them as
> primitive values rather than boxing and passing. We'd have to check or
> ignore overrides to that set of operations. Any call paths that need to
> pass through JRuby system would need to also accept unboxed primitive
> values.

I plan a major semantic change to Groovy... and this change aims to not
to having to pass the values through the whole system, instead let the
callsite handle this locally and with direct access to the values.
method handles will be extremely useful here, but even without them we
can do much.

> And it may not even be worth it in JRuby. Ruby 1.9 uses tagged integers
> for Fixnums with an overflow check to roll to Bignum which is a full-on
> object. It has fast-paths for numeric operators that go straight to the
> code bypassing dynamic dispatch, and those operators do a normal C-level
> integer math operation on the values. And we're as fast or faster anyway
> with our fully-boxed custom class wrapping a long. It's hard to justify
> the work for us when we're the fastest production-worthy Ruby
> implementations for most apps already.

so you say you can theoretically do calculations in JRuby as fast as in
Java? Or how much would you say are you slower than Java?

> In general it seems like the time would almost always be better-spent
> making dynamic dispatch faster and reducing per-call cost before trying
> to get primitive math operations to run faster.

the fastest call is one you never do ;)

bye Jochen

Attila Szegedi

unread,
Apr 30, 2008, 8:39:11 AM4/30/08
to jvm-la...@googlegroups.com
On 2008.04.30., at 11:49, Jochen Theodorou wrote:

>
> Rich Hickey schrieb:
>>
>> On Apr 29, 5:36 pm, Jochen Theodorou <blackd...@gmx.org> wrote:
> [...]
>> I think the answer is tags, as John Rose discussed here:
>>
>> http://blogs.sun.com/jrose/entry/fixnums_in_the_vm
>>
>> That, standard fast multiprecision arithmetic, and tail call
>> optimization are the wish list for me.
>
> I don't see how this will help me in Groovy. We use the Java types, so
> there is no need to represent a 20 bit integer.

It doesn't help you now. It'd help you in a new VM that has this
trick :-)

John says that instead of having an arbitrary object pointer to
represent java.lang.Integer instances allocated on the heap, you could
have a specially tagged object pointer that'd be treated by the code
as a pointer to a java.lang.Integer.

Most CPUs align data on a 4 or 8 byte boundary, so it is easy to
recognize an object pointer with some of its lower 2 or 3 bits set
("tagged") as not being a valid heap object address; which you can
then use for other purposes. Like, simulate a lightweight Integer
object. So, you use the lower 2 bits as tags, and use some more bits
for type information, and you're left with, say, 20 bits of useful
payload. That way, you could have integers that fit in 20 bits
automatically behave as objects (with some quirks w/regard to
synchronization), and the VM could do "boxing" and "unboxing" (and
subsequently arithmetic) quickly by doing shifts and masks, without
touching the heap. But on high level, such a pointer would still
satisfy instanceof java.lang.Integer.

So, it'd allow very low-cost representation of all java.lang.Integer
objects whose value fits in 20 bits or so. But it'd be a VM level
optimization, not something you could observe on a higher level. You'd
keep using Integer.valueOf() and Integer.intValue() etc.

Attila.

Jochen Theodorou

unread,
Apr 30, 2008, 8:51:46 AM4/30/08
to jvm-la...@googlegroups.com
Attila Szegedi schrieb:

> On 2008.04.30., at 11:49, Jochen Theodorou wrote:
>
>> Rich Hickey schrieb:
>>> On Apr 29, 5:36 pm, Jochen Theodorou <blackd...@gmx.org> wrote:
>> [...]
>>> I think the answer is tags, as John Rose discussed here:
>>>
>>> http://blogs.sun.com/jrose/entry/fixnums_in_the_vm
>>>
>>> That, standard fast multiprecision arithmetic, and tail call
>>> optimization are the wish list for me.
>> I don't see how this will help me in Groovy. We use the Java types, so
>> there is no need to represent a 20 bit integer.
>
> It doesn't help you now. It'd help you in a new VM that has this
> trick :-)
>
> John says that instead of having an arbitrary object pointer to
> represent java.lang.Integer instances allocated on the heap, you could
> have a specially tagged object pointer that'd be treated by the code
> as a pointer to a java.lang.Integer.
[...]

>
> So, it'd allow very low-cost representation of all java.lang.Integer
> objects whose value fits in 20 bits or so. But it'd be a VM level
> optimization, not something you could observe on a higher level. You'd
> keep using Integer.valueOf() and Integer.intValue() etc.

ok, I understand that part... but how fast is this compared to primitive
ints? If it doubles the speed we have so far using Integer, then it is
nice, but not enough.

Brian Frank

unread,
Apr 30, 2008, 8:52:13 AM4/30/08
to JVM Languages
I've definitely felt the pain trying to deal with longs and doubles.
I gave up and just keep everything boxed.

But one technique I've used extensively which has really helped is
heavy use of interned/cached boxed objects. For example any integer
in the range of -256 to 1024 is interned (which is a much bigger range
than Integer.valueOf interns). I also intern all the ASCII strings
from " " to "~", empty arrays for every type, etc.

John Wilson

unread,
Apr 30, 2008, 9:37:37 AM4/30/08
to jvm-la...@googlegroups.com

I'm rather unsure about the value of making changes like this to the
JVM. The timescale from now to when they become useable is rather long
(2-3 years to get into a released JVM then another 2-3 years before I
can rely on most of my target audience having the JVM in production).

What I'm seeing, via this list and others, is that people are making
great strides in understanding how to utilise features in the current
JVM to make dynamic languages faster. The JRuby and Groovy teams seem
to be making progress which may well make this work unnecessary for
them. I'm concerned that a proposal like this will make use of some
redundant information which could be used by some other JVM feature
and by the time it's in the field and usable the problem will have
been solved another way. (I'm also slightly terrified of building
something based on address alignment and assuming that the behaviour
of the hardware will be the same in 5-10 years time).

Whist Method Handles are quite interesting I'd rather see effort being
expended in making java.lang.reflect.Method faster and more useful (we
discussed allowing downcasting to types which avoided the meed to box
the parameters, and unbox the result at one point on this list). If
only on the basis that this could be in the next version of the JVM as
an incremental improvement to refection rather than as a change to
support "dynamic languages".

John Wilson

Charles Oliver Nutter

unread,
Apr 30, 2008, 9:48:20 AM4/30/08
to jvm-la...@googlegroups.com

JRuby uses the same technique (as does Ruby) for -127 to 128. And more
expensive literals (like literal Bignums) are initialized on load. It
definitely does make a big difference.

- Charlie

John Rose

unread,
Apr 30, 2008, 2:59:41 PM4/30/08
to jvm-la...@googlegroups.com
On Apr 30, 2008, at 6:37 AM, John Wilson wrote:

On 4/30/08, Attila Szegedi <szeg...@gmail.com> wrote:

 On 2008.04.30., at 11:49, Jochen Theodorou wrote:

I don't see how this will help me in Groovy. We use the Java types, so
there is no need to represent a 20 bit integer.


It doesn't help you now. It'd help you in a new VM that has this
 trick :-)

Right, thanks Attila.  Fixnums are under the covers.  The best optimizations (usually) are.

In general, new bytecodes are not necessary for performance, since compilers are good at treating well-known static methods as macro-instructions.  When bytecode changes are justified, it's because existing workarounds have high simulation overheads in time and bytecode space.

For a small but representative example, consider ldc of a class constant.  The old JDK 1.1 code uses a static semi-anonymous variable and fast-slow CFG diamond in the bytecodes.  This is verbose, and the verbosity makes it harder for the JIT to see what is going on.  The standard code today is an ldc CONSTANT_Class, which every JIT can robustly optimize.

I'm rather unsure about the value of making changes like this to the
JVM. The timescale from now to when they become useable is rather long
(2-3 years to get into a released JVM then another 2-3 years before I
can rely on most of my target audience having the JVM in production).

That's how the JVM game has been played for 10 years now:  Major optimizations like loop transformation or compressed oops or fixnums or escape analysis take years to work through the pipeline.  Over time, JVM performance increases as new features deploy, each one after its own gestation period.  Depending on the time scales your project contemplates, it may or may not be useful to know what JVM optimizations are in the pipeline.  It is useful for language implementors to know the directions JVM implementors are taking on problems they care about, and useful for JVM implementors to talk with their users about what optimizations are on the table.  Today we're talking about fixnums.  A year or two ago we were talking about other optimizations now delivered.

been solved another way. (I'm also slightly terrified of building
something based on address alignment and assuming that the behaviour
of the hardware will be the same in 5-10 years time).

There have almost always been slack bits in machine addresses, and will certainly be slack bits in the future 64-bit world.  It's a 40-year-old tactic with plenty of life left in it.  Check my blog entry; it assumes only the presence of slack bits, not any particular position of them in the address word.  In a pinch, you could repurpose any unmapped portion of the address space as code points for fixnums.  Messy, but these optimizations are decades mature, and today a dozen CPU cycles can be faster than waiting for your memory hierarchy to cough up the data.

Whist Method Handles are quite interesting I'd rather see effort being
expended in making java.lang.reflect.Method faster and more useful (we
discussed allowing downcasting to types which avoided the meed to box
the parameters, and unbox the result at one point on this list). If
only on the basis that this could be in the next version of the JVM as
an incremental improvement to refection rather than as a change to
support "dynamic languages".

Those discussions contributed integrally to the method handle design; thank you.  MethodHandle is the downcast type that supports the direct, unboxed call.

The present unboxed alternative (many interfaces + many classes) scales poorly; see Gilad's slide #20 in:
Even if such an interface/class pair were optimized down to 100 words it would still be 10x more expensive in space than a method handle, and no faster in call sequence, than a method handle.  (Data point:  Pack200 shrinks a one-method anonymous adapter class to about 90 bytes.  That's a robust lower bound; the JVM has to expand it before it's usable.)  Space still matters these days, because a computer runs in its cache.

Method handles will make reflection faster, and also less necessary.  Implementors will take their pick between compatibility and bleeding edge optimization, or take both with a switch setting at startup.

Best wishes,
-- John

John Wilson

unread,
Apr 30, 2008, 3:48:11 PM4/30/08
to jvm-la...@googlegroups.com
On 4/30/08, John Rose <John...@sun.com> wrote:
[snip]

>
> Whist Method Handles are quite interesting I'd rather see effort being
> expended in making java.lang.reflect.Method faster and more useful (we
> discussed allowing downcasting to types which avoided the meed to box
> the parameters, and unbox the result at one point on this list). If
> only on the basis that this could be in the next version of the JVM as
> an incremental improvement to refection rather than as a change to
> support "dynamic languages".
> Those discussions contributed integrally to the method handle design; thank
> you. MethodHandle is the downcast type that supports the direct, unboxed
> call.
>
> The present unboxed alternative (many interfaces + many classes) scales
> poorly; see Gilad's slide #20 in:
>
> http://blogs.sun.com/roller/resources/gbracha/JAOO2005.pdf
>
> Even if such an interface/class pair were optimized down to 100 words it
> would still be 10x more expensive in space than a method handle, and no
> faster in call sequence, than a method handle. (Data point: Pack200
> shrinks a one-method anonymous adapter class to about 90 bytes. That's a
> robust lower bound; the JVM has to expand it before it's usable.) Space
> still matters these days, because a computer runs in its cache.
>
> Method handles will make reflection faster, and also less necessary.
> Implementors will take their pick between compatibility and bleeding edge
> optimization, or take both with a switch setting at startup
.
Perhaps I was being naive about how downcasting of Method would be done:

It's obviously possible to construct a naming scheme for interfaces
which allow you to determine the method signature of the single method
it contains from its name. For the purposes of this we only need to
have method signatures with primitive types and Object.

The system classloader can dynamically generate these interfaces on demand.

The number of different flavours of interface required in any one
program would be reasonably modest and, I would imagine, would grow
less fast than the size of the program.

So, choosing a not very practical way of encoding the name of the
interface, I could call write(char[], int, int) on a Writer by
generation the following code:

(("void foo (Object, int, int)")mymethod).invoke(myWriterInstance,
myCharArray, 0, 30)

Where "void foo (Object, int, int)" would be the name of the
dynamically created interface for all void methods which took and
Object and two ints.

This removes the need for boxing/unboxing and really should not lead
to an excessive number of interfaces being generated.

Am I overlooking something?

John Wilson

Attila Szegedi

unread,
Apr 30, 2008, 4:58:33 PM4/30/08
to jvm-la...@googlegroups.com
On 2008.04.30., at 20:59, John Rose wrote:

> On Apr 30, 2008, at 6:37 AM, John Wilson wrote:
>
>> I'm rather unsure about the value of making changes like this to the
>
>> JVM. The timescale from now to when they become useable is rather
>> long
>> (2-3 years to get into a released JVM then another 2-3 years before I
>> can rely on most of my target audience having the JVM in production).
>
> That's how the JVM game has been played for 10 years now: Major
> optimizations like loop transformation or compressed oops or fixnums
> or escape analysis take years to work through the pipeline. Over
> time, JVM performance increases as new features deploy, each one
> after its own gestation period. Depending on the time scales your
> project contemplates, it may or may not be useful to know what JVM
> optimizations are in the pipeline. It is useful for language
> implementors to know the directions JVM implementors are taking on
> problems they care about, and useful for JVM implementors to talk
> with their users about what optimizations are on the table. Today
> we're talking about fixnums. A year or two ago we were talking
> about other optimizations now delivered.

That pace is pretty natural. I remember reading in the Clock of The
Long Now book how most dynamic systems consist of layers that operate
at different paces. The book illustrated this with human civilization,
fashion/art being the highest (fastest changing, least long-term
power), nature being the lowest (slowest changing, but tectonic in
long-term power and momentum), and there are few in between; the full
list was: fashion, business, infrastructure, government, culture,
nature. Those higher up get more immediate attention and focus from
community, those lower down have more momentum and power.

The ecosystems we work in (and on) in the IT industry are similarily
layered even if they're admittedly smaller in scale than a full-blown
civilization :-). JVM and its optimizations are infrastructure. It
evolves slower than the layers built on top of it (frameworks/
libraries, applications), but the effect of the changes is big on the
layers above it, and a good optimization will be significant for
existing systems also even if they don't adapt to it explicitly.

Oh well, I guess I'd better stop the offtopic musings...

Attila.

John Rose

unread,
Apr 30, 2008, 5:51:06 PM4/30/08
to jvm-la...@googlegroups.com
On Apr 30, 2008, at 12:48 PM, John Wilson wrote:

This removes the need for boxing/unboxing and really should not lead

to an excessive number of interfaces being generated.


That schema of interfaces is about the same as is being proposed for java closures by Neal Gafter.

It's not too many for the JVM, though there are too many of them to know by name, hence Neal's function type syntax.  The compromise with object types (which are quantified as generic type variables) leads to inserted casts.  The casts can sometimes be a performance hit (hence the JVM's use of strongly typed reference arguments).

Underneath the N interfaces there are M (M>N) classes, one for each target method.  With method handles, for M target methods, you have M little objects, not M classes.  M classes is excessive, whether or not you control the number N of call-signature interfaces.

More thoughts on reflect.Method:

There's a compatibility problem with making reflect.Method polymorphic:  It is final.  There are ways to fudge that, such as making the constructor JVM-private.

There's a performance problem with making reflect.Method the direct receiver of an invoke, since Method is a very rich class; they are probably 10x larger than a minimal method handle.  The effort of constructing one (even if you make some stuff lazy) goes far beyond the intrinsic cost of naming a method.

But reflect.Method could contain a function type object ("closure"):

{PrintStream,int,int=>void} caller = mymethod.getCaller();
caller.invoke(System.out, 0, 30);

And, in turn, the closure could use method handles under the cover to provide direct access, without the need for M classes.  There would be N closure-to-method-handle implementations.  Each implementation would invoke an underlying method handle with the right signature.

Or (I don't know if it could be made to work, but it's worth thinking about) method handles could interoperate more tightly with closures, by having each individual method handle somehow take on the appropriate function interface type.

-- John

Charles Oliver Nutter

unread,
Apr 30, 2008, 7:11:05 PM4/30/08
to jvm-la...@googlegroups.com

That sounds pretty exciting. Can you elaborate at all? One strategy I
had though of for at least a subset of operations was compiling both
direct calls and dynamic calls into the bytecode. That approach
unfortunately has a major down side: LOTS more bytecode generated.
Instead, for at least the primitive operators I have specialized call
sites that can skip method lookup when the target type is, for example,
Fixnum, and the methods have not been overridden, as Ruby 1.9 does. It
would give us a small additional boost, but inline caching and
shortening the call path has done far more for us up to this point.

> so you say you can theoretically do calculations in JRuby as fast as in
> Java? Or how much would you say are you slower than Java?

No, we're certainly not doing calculations as fast as in Java. If you
compare us to ints, we're much slower, slower enough that any advantage
we have over Ruby 1.8 is not significant. My point is that people using
JRuby are using it for Ruby, and at the moment there's small enough
demand for higher performance that native primitives isn't worth the
effort. At some point in the future, it might be.

I have a different question for you: Since Groovy lets you easily write
some code in Groovy, some in Java...why the recent interest in making
primitive math faster? I think many Groovyists woulds say "write it in
Java", and there's probably other Groovy-specific areas that would be a
more broadly applicable use of the time. Am I misunderstanding something?

- Charlie

Charles Oliver Nutter

unread,
Apr 30, 2008, 7:13:48 PM4/30/08
to jvm-la...@googlegroups.com
John Wilson wrote:
> What I'm seeing, via this list and others, is that people are making
> great strides in understanding how to utilise features in the current
> JVM to make dynamic languages faster. The JRuby and Groovy teams seem
> to be making progress which may well make this work unnecessary for
> them. I'm concerned that a proposal like this will make use of some
> redundant information which could be used by some other JVM feature
> and by the time it's in the field and usable the problem will have
> been solved another way. (I'm also slightly terrified of building
> something based on address alignment and assuming that the behaviour
> of the hardware will be the same in 5-10 years time).

Speaking for JRuby, the proposed changes have me giddy with
anticipation. Sure, there's a lot of these things that we can (and often
have) but they're painful, less efficient than they should be, and often
have other costs like excess permgen use or heap for lots of generated
code. We need help here for these things to be really long-term viable,
because better performance, better integration, and better
implementations are all going to depend on incrementally improving the
way we do things; that's going to be hard when we hit a wall because the
JVM is no longer improving. We're pushing the leading edge...and we need
to make sure our pushing has an effect, even if it takes a while to
trickle down.

- Charlie

Jochen Theodorou

unread,
Apr 30, 2008, 7:45:14 PM4/30/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:
> Jochen Theodorou wrote:
[...]

>> I plan a major semantic change to Groovy... and this change aims to not
>> to having to pass the values through the whole system, instead let the
>> callsite handle this locally and with direct access to the values.
>> method handles will be extremely useful here, but even without them we
>> can do much.
>
> That sounds pretty exciting. Can you elaborate at all?

ok, let me try to explain what I think of... The current system in
Groovy works like this: you have a narrow API, with some core that
actually selects and executes the method call. Ng is more or less the
same, but with a wide API. What I plan for the future is not no longer
let the core execute the methods, instead they return handles to the
call site and the call site will call the method for us.

This design is very much oriented at invokedynamic, but we came up with
this before invokednymic. Of course MethodHandles, such as described by
John Rose will come in very handy here. Most of what can be done today
with monkey patching and categories fits well in his new way. I plan
also to restrict a MetaClass to be no longer replaceable, but mutating
it is allowed. The downside of this is, that if you want for example
write code that reacts to each method call, that you have to put that in
a MetaMethod. But much of what is done today will work without change I
think.

I think this approach will allow a narrow API, with the core selecting
the method, but not executing them. The actual call structure will be
shallow and caching can be done at lots of places

> One strategy I
> had though of for at least a subset of operations was compiling both
> direct calls and dynamic calls into the bytecode. That approach
> unfortunately has a major down side: LOTS more bytecode generated.
> Instead, for at least the primitive operators I have specialized call
> sites that can skip method lookup when the target type is, for example,
> Fixnum, and the methods have not been overridden, as Ruby 1.9 does. It
> would give us a small additional boost, but inline caching and
> shortening the call path has done far more for us up to this point.

We plan on doing so too.. But only for a few cases that can be expected.
In fact in Groovy the user can give type information, so if he does we
can use that to predict methods and their result types. I plan such
actions also for calls to private methods. This way the bytecode won't
be that bloated

>> so you say you can theoretically do calculations in JRuby as fast as in
>> Java? Or how much would you say are you slower than Java?
>
> No, we're certainly not doing calculations as fast as in Java. If you
> compare us to ints, we're much slower, slower enough that any advantage
> we have over Ruby 1.8 is not significant. My point is that people using
> JRuby are using it for Ruby, and at the moment there's small enough
> demand for higher performance that native primitives isn't worth the
> effort. At some point in the future, it might be.

to say the truth, Groovy is fast enough for me, even if it is sometimes
5-100 times slower than Java. It is quite easy to get the speed very
much up. But a language is not only about what the implementors want and
a community driven language like Groovy especially not. Groovy is no
academic language where you write papers when you have a good idea.
Instead a language is also much about politics, and if the public
demands more speed, then we will do our best. Also here are people
afraid of dynamic languages and we need o show them, that they don't
need to be slow, just because they are dynamic

> I have a different question for you: Since Groovy lets you easily write
> some code in Groovy, some in Java...why the recent interest in making
> primitive math faster? I think many Groovyists woulds say "write it in
> Java", and there's probably other Groovy-specific areas that would be a
> more broadly applicable use of the time. Am I misunderstanding something?

Well, in a benchmark like the Alioth Shootout you are not allowed to use
this obvious solution. That gives bad press. And since a language is so
much about politics, you have to handle bad press somehow

Charles Oliver Nutter

unread,
Apr 30, 2008, 9:18:31 PM4/30/08
to jvm-la...@googlegroups.com
Jochen Theodorou wrote:
> ok, let me try to explain what I think of... The current system in
> Groovy works like this: you have a narrow API, with some core that
> actually selects and executes the method call. Ng is more or less the
> same, but with a wide API. What I plan for the future is not no longer
> let the core execute the methods, instead they return handles to the
> call site and the call site will call the method for us.

Yes, I recall the discussions when this was implemented on trunk. And
from my own tests, it definitely had improved performance, but I haven't
done a wide range of testing (as I'm sure you have, e.g. grails and
otherwise). One concern that occurs to me is how this affects the
locality of the call site. Where in JRuby, the call site is never more
than a field access away, in Groovy it's retrieved from the same long
pipeline. So that pipeline has to be doing some amount of "getting in
the way" even if the call site encapsulates and eliminates a certain
portion of it. Or am I misunderstanding? This doesn't seem as much like
a call site optimization as simply currying a portion of the lookup
process into an object you then cache at the metaclass level for future
calls (and removing if there are changes).

Perhaps a stack trace of a typical call through one of your "call sites"
would help illustrate the effect better?

> This design is very much oriented at invokedynamic, but we came up with
> this before invokednymic. Of course MethodHandles, such as described by
> John Rose will come in very handy here. Most of what can be done today
> with monkey patching and categories fits well in his new way. I plan
> also to restrict a MetaClass to be no longer replaceable, but mutating
> it is allowed. The downside of this is, that if you want for example
> write code that reacts to each method call, that you have to put that in
> a MetaMethod. But much of what is done today will work without change I
> think.

That is a *big* change for the language, I think, but in my opinion a
very good one (and of course we've talked about this in the past). I
believe that Groovy's ability to not only replace methods (EMC) and
install categories, but to also wholesale replace metaclasses with
custom implementations, often implemented in Groovy themselves, is a
major barrier to optimization. I don't see the value in categories
myself, so I won't go there. But in my opinion EMC should be the only
MC, enabled by default everywhere, with ruby-like hooks to augment its
behavior and no option for replacement. Then you're in a far better
position to install more optimistic optimizations.

> I think this approach will allow a narrow API, with the core selecting
> the method, but not executing them. The actual call structure will be
> shallow and caching can be done at lots of places
>

> We plan on doing so too.. But only for a few cases that can be expected.
> In fact in Groovy the user can give type information, so if he does we
> can use that to predict methods and their result types. I plan such
> actions also for calls to private methods. This way the bytecode won't
> be that bloated

You'd be surprised. How big does is a typical Groovy method in bytecode
right now? I'd wager a substantial portion of that is call
overhead...can you afford to double the size of some subset of operations?

Here's a simple JRuby fib method, minus about 15 bytecodes worth of
preamble:

public org.jruby.runtime.builtin.IRubyObject
method__0$RUBY$fib_ruby(org.jruby.runtime.ThreadContext,
org.jruby.runtime.builtin.IRubyObject,
org.jruby.runtime.builtin.IRubyObject[], org.jruby.runtime.Block);
Code:
.... preamble ....
45: aload_1
46: iconst_3
47: invokestatic #40; //Method
setPosition:(Lorg/jruby/runtime/ThreadContext;I)V
50: aload_0
51: getfield #89; //Field site1:Lorg/jruby/runtime/CallSite;
54: aload_1
55: aload 11
57: aload 6
59: invokestatic #95; //Method
org/jruby/RubyFixnum.two:(Lorg/jruby/Ruby;)Lorg/jruby/RubyFixnum;
62: invokevirtual #74; //Method
org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
65: invokeinterface #101, 1; //InterfaceMethod
org/jruby/runtime/builtin/IRubyObject.isTrue:()Z
70: ifeq 83
73: aload_1
74: iconst_4
75: invokestatic #40; //Method
setPosition:(Lorg/jruby/runtime/ThreadContext;I)V
78: aload 11
80: goto 145
83: aload_1
84: bipush 6
86: invokestatic #40; //Method
setPosition:(Lorg/jruby/runtime/ThreadContext;I)V
89: aload_0
90: getfield #106; //Field site2:Lorg/jruby/runtime/CallSite;
93: aload_1
94: aload_0
95: getfield #111; //Field site3:Lorg/jruby/runtime/CallSite;
98: aload_1
99: aload_2
100: aload_0
101: getfield #116; //Field site4:Lorg/jruby/runtime/CallSite;
104: aload_1
105: aload 11
107: aload 6
109: invokestatic #95; //Method
org/jruby/RubyFixnum.two:(Lorg/jruby/Ruby;)Lorg/jruby/RubyFixnum;
112: invokevirtual #74; //Method
org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
115: invokevirtual #74; //Method
org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
118: aload_0
119: getfield #119; //Field site5:Lorg/jruby/runtime/CallSite;
122: aload_1
123: aload_2
124: aload_0
125: getfield #122; //Field site6:Lorg/jruby/runtime/CallSite;
128: aload_1
129: aload 11
131: aload 6
133: invokestatic #125; //Method
org/jruby/RubyFixnum.one:(Lorg/jruby/Ruby;)Lorg/jruby/RubyFixnum;
136: invokevirtual #74; //Method
org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
139: invokevirtual #74; //Method
org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
142: invokevirtual #74; //Method
org/jruby/runtime/CallSite.call:(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/builtin/IRubyObject;)Lorg/jruby/runtime/builtin/IRubyObject;
145: areturn

Now this bytecode is pretty tight. There are some special-case methods
for Fixnum 1 and 2, CallSite objects to encapsulate some boilerplate
call-wrapping logic, and "setPosition" calls to update the Ruby stack
trace, but otherwise we've managed to boil it down a lot. And it's still
a lot of code. I've been doing a bytecode audit recently to make sure
all bytecode generated is as clean as possible, and this is the result
at the moment (trunk code). What's a comparable fib method in Groovy
look like with the new call site stuff?

Of course I'm not saying to go for it...I'm going to try do the same
thing with profiling data gathered during interpretation, if I can find
a reasonable way to shrink the bytecode duplication to a reasonable
level. But I think tricks that depend on type annotations are really not
in the spirit of the language...and if possible I would help you explore
ways to optimize normal dynamic invocation more first, because I think
that's where the most generally applicable gains are going to come from.

> to say the truth, Groovy is fast enough for me, even if it is sometimes
> 5-100 times slower than Java. It is quite easy to get the speed very
> much up. But a language is not only about what the implementors want and
> a community driven language like Groovy especially not. Groovy is no
> academic language where you write papers when you have a good idea.
> Instead a language is also much about politics, and if the public
> demands more speed, then we will do our best. Also here are people
> afraid of dynamic languages and we need o show them, that they don't
> need to be slow, just because they are dynamic

...


> Well, in a benchmark like the Alioth Shootout you are not allowed to use
> this obvious solution. That gives bad press. And since a language is so
> much about politics, you have to handle bad press somehow

I hate having to worry about performance, but I love optimizing it. The
world is far too performance obsessed, but there are reasons for it. I
would strongly caution against optimizations designed to make specific
benchmarks fast, even if the political gains would be substantial. Ruby
1.9 added fast-path Fixnum math operators and ended up looking great on
a lot of benchmarks. Then more and more complaints started to come in
that they resulted in slowing down *everything non-Fixnum type* because
of the extra typechecking involved.

- Charlie

Charles Oliver Nutter

unread,
May 1, 2008, 3:25:05 AM5/1/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:
> Now this bytecode is pretty tight. There are some special-case methods
> for Fixnum 1 and 2, CallSite objects to encapsulate some boilerplate
> call-wrapping logic, and "setPosition" calls to update the Ruby stack
> trace, but otherwise we've managed to boil it down a lot. And it's still
> a lot of code. I've been doing a bytecode audit recently to make sure
> all bytecode generated is as clean as possible, and this is the result
> at the moment (trunk code). What's a comparable fib method in Groovy
> look like with the new call site stuff?

I had an itch, so I updated my working copy of groovy and gave fib a
compile, and I must say I'm very impressed with the progress.

The bytecode is probably about as tight as JRuby's. You've got CallSite
in all the same places, Integer caches, basically all the same stuff as
JRuby's output. Kudos for that, it's a vast improvement over the old
code. It's amazing how similar the bytecode looks to JRuby's now...we're
truly living in parallel dimensions.

There's a few comments and questions:

- JRuby currently doesn't do "multihosting" using the classloader
hierarchy; instead, we have a simple org.jruby.Ruby object that
represents a given runtime. This means we pass Ruby through most stacks
to get at things like Fixnum caches. Statics are most definitely out in
that scenario. We might be able to do the classloader version in the
future...we shall see. I would suppose the current system in Groovy
means you must make sure not to have Groovy at multiple levels in the
classloader hierarchy, yes? It seems like with all those statics there's
a strong change of having two Groovys lower in the hierarchy load
something from higher up and step on each other. I've wanted to try
isolating JRuby instances by classloader, but there hasn't been time.

- JRuby takes some level of perf hit from the pre/post-method setup and
the method preamble, which loads more into local variables than does
Groovy's. This is largely because of a set of features Ruby has that
require more than what Groovy provides; namely, nested closures have
heap-based nested scopes, public/private/protected are methods, a
binding can be pulled off at any time and passed around to access local
variables and other state, and several more. They're features that give
great flexibility to Ruby, but which are extremely difficult to optimize
for. I have more tricks I'll be putting in future versions of JRuby, but
they'll take a bit of time. Ruby's a tough language to implement.

Any thoughts on these?

I ran through some of the Alioth benchmarks, and they're definitely a
lot better. 1.6 ought to be a good release for you.

One question on CallSite, and admittedly I could probably get this from
digging in the source... When I followed the discussions about call
sites a few months ago, it seemed like they had to be constructed by
hand on a case-by-case basis for specific types. So for example, you had
to write the CallSite code to handle Integer +, -, *, etc, and if you
didn't hand-write a CallSite, it would not be available. Has that changed?

- Charlie

John Wilson

unread,
May 1, 2008, 4:07:56 AM5/1/08
to jvm-la...@googlegroups.com
On 5/1/08, Jochen Theodorou <blac...@gmx.org> wrote:
>
> Charles Oliver Nutter schrieb:
> > Jochen Theodorou wrote:
[...]
> ok, let me try to explain what I think of... The current system in
> Groovy works like this: you have a narrow API, with some core that
> actually selects and executes the method call. Ng is more or less the
> same, but with a wide API. What I plan for the future is not no longer
> let the core execute the methods, instead they return handles to the
> call site and the call site will call the method for us.

I have experimented with something like that. Basically selecting a
method with one call to the runtime system and executing it by making
a call to the object returned. My main motivation for trying this was
to minimise the number of extra stack frames use for each call (i.e.
to minimise the size of the stack trace printed when an exception is
uncought).

I have not found that a straightforward implementation helps with
performance (in fact it was slower with the initial implementation). I
think that this is because method selection is quite complicated (you
have to look at the types of the parameters and take Categories and
Monkey patching into account). So the JIT, in general, will not be
able to work out which method proxy object is returned from the
selection call. This means that it is not able to do any significant
inlining at the call site. Also you have to pass the actual parameters
twice, once to the selection method and once to the method proxy.

One approach I'm looking at is to pre select the method proxy and then
pass the proxy to a checking mechanism in the runtime system which
check to see if the proxy is still the best match and if so makes the
call via it. The reasoning behind this is that the checking process is
actually simpler than the selection process and the JIT may be able to
do lost more inlining. I'm still working on this.


>
> This design is very much oriented at invokedynamic, but we came up with
> this before invokednymic. Of course MethodHandles, such as described by
> John Rose will come in very handy here. Most of what can be done today
> with monkey patching and categories fits well in his new way. I plan
> also to restrict a MetaClass to be no longer replaceable, but mutating
> it is allowed. The downside of this is, that if you want for example
> write code that reacts to each method call, that you have to put that in
> a MetaMethod. But much of what is done today will work without change I
> think.

You know that I think that making the Class MataClass mapping
immutable is a *very* good idea. However if you really are going to
propose this you need to raise it on the Groovy lists now. It's a huge
breaking change and, whilst it will get my enthusiastic support, is
going to cause a lot of problems for existing code.

John Wilson

Jochen Theodorou

unread,
May 1, 2008, 5:58:42 AM5/1/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:

> Jochen Theodorou wrote:
>> ok, let me try to explain what I think of... The current system in
>> Groovy works like this: you have a narrow API, with some core that
>> actually selects and executes the method call. Ng is more or less the
>> same, but with a wide API. What I plan for the future is not no longer
>> let the core execute the methods, instead they return handles to the
>> call site and the call site will call the method for us.
>
> Yes, I recall the discussions when this was implemented on trunk.

it is a bit different on trunk.. you can see this as an experimental
version of parts of this. The problem is that the current protocol would
normally not allow this. So we pollute the protocol with something that
should not be there. As of design and specification that is a big
problem, so 2.0 is thought to give a clean solution. Also some parts can
no longer be done by the old model. For example if a property is
requested and it should be tested if private access is allowed, then we
currently have to allow that in general, because the protocol looses the
information needed to test if the access is ok, or not.

> And
> from my own tests, it definitely had improved performance, but I haven't
> done a wide range of testing (as I'm sure you have, e.g. grails and
> otherwise). One concern that occurs to me is how this affects the
> locality of the call site.

In 1.5.x the call site does a method call to ScriptByteCodeAdapter,
which can be inlined. From there on it gets to the MetaClass through
some quite complicated code, and I doubt there is much inlining done. In
the end a reflective method is selected and called, and I am sure this
can not be inlined. Well... to be frank, I think inlinign cold be a
problem for ScriptByteCodeAdapter, because the method there is
megamorphic already. With call site caches we might not be able to
inline the method selection parts, or the parts validating a method, but
at last we get a mostly monomorphic call site

> Where in JRuby, the call site is never more
> than a field access away, in Groovy it's retrieved from the same long
> pipeline.

no... the call site itself is a field access too.

> So that pipeline has to be doing some amount of "getting in
> the way" even if the call site encapsulates and eliminates a certain
> portion of it. Or am I misunderstanding? This doesn't seem as much like
> a call site optimization as simply currying a portion of the lookup
> process into an object you then cache at the metaclass level for future
> calls (and removing if there are changes).

Of course the purpose of the call site cache is to not to go through the
complete pipeline again. we select the method one time and unless the
MetaClass has been changed, there is no reason to go through it again.
The default MetaClass does not allow changes, so there is no problem.
EMC does allow changes, but we then ask EMC for the changes. Methods
from other MetaClasses are then simply not cached. Well there is also
ClosureMetaClass of course ;)

We had a cache at the MetaClass for a long time. It did store the
argument types, the method name, and some other things. But creating a
key for this cache and asking the cache had been quite slow. In
ClosureMetaClass I removed the cache and instead sued the special
properties of the Closure to write a specialized and simplified method
selection. And this is up to 40% faster, than the version with the
cache. And it is not that the cache doesn't bring any benefit. It makes
method calls faster, but in case of a closure I can do so simple method
selection algorithms that the cache is much slower, even without a miss.

> Perhaps a stack trace of a typical call through one of your "call sites"
> would help illustrate the effect better?

ok, this code:
def foo() {throw new Exception("call site!")}
foo()

results in a trace like this:

> Caught: java.lang.Exception: call site!
> java.lang.Exception: call site!
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at org.codehaus.groovy.reflection.CachedConstructor.invoke(CachedConstructor.java:70)
> at org.codehaus.groovy.runtime.callsite.ConstructorSite$ConstructorSiteNoUnwrapNoCoerce.invoke(ConstructorSite.java:84)
> at org.codehaus.groovy.runtime.callsite.CallSite.callConstructor(CallSite.java:142)
> at test.foo(test.groovy:48)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.codehaus.groovy.runtime.callsite.PogoMetaMethodSite$PogoCachedMethodSiteNoUnwrapNoCoerce.invoke(PogoMetaMethodSite.java:182)
> at org.codehaus.groovy.runtime.callsite.CallSite.callCurrent(CallSite.java:130)
> at test.run(test.groovy:49)

test.groovy:49 is where foo() is called, test.groovy:48 is where the
method foo() begins. As you can see, between these two there is only
reflection code. After that is code for creating the exception.. which
could be a bit improved.

>> This design is very much oriented at invokedynamic, but we came up with
>> this before invokednymic. Of course MethodHandles, such as described by
>> John Rose will come in very handy here. Most of what can be done today
>> with monkey patching and categories fits well in his new way. I plan
>> also to restrict a MetaClass to be no longer replaceable, but mutating
>> it is allowed. The downside of this is, that if you want for example
>> write code that reacts to each method call, that you have to put that in
>> a MetaMethod. But much of what is done today will work without change I
>> think.
>
> That is a *big* change for the language, I think, but in my opinion a
> very good one (and of course we've talked about this in the past). I
> believe that Groovy's ability to not only replace methods (EMC) and
> install categories, but to also wholesale replace metaclasses with
> custom implementations, often implemented in Groovy themselves, is a
> major barrier to optimization.

well true... live would be a lot more easy without these. But I think
the replacement with a custom metaclass is the only thing we might throw
out.

> I don't see the value in categories
> myself, so I won't go there.

I plan also a new kind of category, one that is lexically scoped. We may
then remove the old categories... not sure yet what we will do with them

> But in my opinion EMC should be the only
> MC, enabled by default everywhere, with ruby-like hooks to augment its
> behavior and no option for replacement. Then you're in a far better
> position to install more optimistic optimizations.

this is planed.. there are some things in EMC making live a bit
difficult. Normally in MetaClassImpl, each and every Metaclass is a
replaceable and sole construct. One MetaClass does not another to select
a method. This has the advantage of being able to remove and recreate a
MetaClass on demand, for example if memory is low. EMC is not
collectable. Also EMC might do lookups to the parent... well, we have to
rework these parts and see if it is ok to have the parents or not, and
if it is worth splitting the MetaClass in a part that can be collected
and one that can't because it contains user made modifications

>> I think this approach will allow a narrow API, with the core selecting
>> the method, but not executing them. The actual call structure will be
>> shallow and caching can be done at lots of places
>>
>> We plan on doing so too.. But only for a few cases that can be expected.
>> In fact in Groovy the user can give type information, so if he does we
>> can use that to predict methods and their result types. I plan such
>> actions also for calls to private methods. This way the bytecode won't
>> be that bloated
>
> You'd be surprised. How big does is a typical Groovy method in bytecode
> right now? I'd wager a substantial portion of that is call
> overhead...can you afford to double the size of some subset of operations?

If the tests show that it is not faster, then we won't do it.

def fib(n){
if (n<2) return 1
return fib(n-1)+fib(n-2)
}

> public fib(Ljava/lang/Object;)Ljava/lang/Object;
> TRYCATCHBLOCK L0 L1 L1 groovy/lang/GroovyRuntimeException
> L0
> INVOKESTATIC fib.$getCallSiteArray ()[Lorg/codehaus/groovy/runtime/callsite/CallSite;
> ASTORE 2
> L2
> LINENUMBER 2 L2
> ALOAD 1
> GETSTATIC fib.$const$0 : Ljava/lang/Integer;
> INVOKESTATIC org/codehaus/groovy/runtime/ScriptBytecodeAdapter.compareLessThan (Ljava/lang/Object;Ljava/lang/Object;)Z
> IFEQ L3
> L4
> LINENUMBER 2 L4
> GETSTATIC fib.$const$1 : Ljava/lang/Integer;
> ARETURN
> GOTO L3
> L3
> LINENUMBER 3 L3
> ALOAD 2
> LDC 1
> AALOAD
> L5
> LINENUMBER 3 L5
> ALOAD 2
> LDC 2
> AALOAD
> ALOAD 0
> L6
> LINENUMBER 3 L6
> ALOAD 2
> LDC 3
> AALOAD
> ALOAD 1
> GETSTATIC fib.$const$1 : Ljava/lang/Integer;
> INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callBinop (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
> INVOKESTATIC org/codehaus/groovy/runtime/ArrayUtil.createArray (Ljava/lang/Object;)[Ljava/lang/Object;
> INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callCurrent (Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> L7
> LINENUMBER 3 L7
> ALOAD 2
> LDC 4
> AALOAD
> ALOAD 0
> L8
> LINENUMBER 3 L8
> ALOAD 2
> LDC 5
> AALOAD
> ALOAD 1
> GETSTATIC fib.$const$0 : Ljava/lang/Integer;
> INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callBinop (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
> INVOKESTATIC org/codehaus/groovy/runtime/ArrayUtil.createArray (Ljava/lang/Object;)[Ljava/lang/Object;
> INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callCurrent (Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
> INVOKEVIRTUAL org/codehaus/groovy/runtime/callsite/CallSite.callBinop (Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
> ARETURN
> L9
> GOTO L10
> L1
> INVOKESTATIC org/codehaus/groovy/runtime/ScriptBytecodeAdapter.unwrap (Lgroovy/lang/GroovyRuntimeException;)Ljava/lang/Throwable;
> ATHROW
> L10
> NOP
> LOCALVARIABLE this Lfib; L0 L9 0
> LOCALVARIABLE n Ljava/lang/Object; L0 L9 1
> MAXSTACK = 7
> MAXLOCALS = 3

that's around 67 lines, If I remove the labels and line number entries
as well as the epiloge I get: 45 lines/instructions

> Of course I'm not saying to go for it...I'm going to try do the same
> thing with profiling data gathered during interpretation, if I can find
> a reasonable way to shrink the bytecode duplication to a reasonable
> level. But I think tricks that depend on type annotations are really not
> in the spirit of the language...and if possible I would help you explore
> ways to optimize normal dynamic invocation more first, because I think
> that's where the most generally applicable gains are going to come from.

I think with call site caching we do quit good already. I see more a
problem in the continous boxing actions... For example, same method, but
this time with ints:

int fib(int n){
if (n<2) return 1
return fib(n-1)+fib(n-2)
}

will contain code making the parameter into an int:

> INVOKESTATIC org/codehaus/groovy/runtime/typehandling/DefaultTypeTransformation.box (I)Ljava/lang/Object

and code to transform the value:

> INVOKESTATIC fib.$get$$class$java$lang$Integer ()Ljava/lang/Class;
> INVOKESTATIC org/codehaus/groovy/runtime/ScriptBytecodeAdapter.castToType (Ljava/lang/Object;Ljava/lang/Class;)Ljava/lang/Object;
> CHECKCAST java/lang/Integer
> INVOKESTATIC org/codehaus/groovy/runtime/typehandling/DefaultTypeTransformation.intUnbox (Ljava/lang/Object;)I
> IRETURN

of course two times, because we have two returns. Even for this "return
1" we do first create an Integer and then unbox it. Well, even if the
Integer object is cached it still is something we can optimize away.

Would be the code like this:

int fib(int n){
if (n<2) return 1
int a = fib(n-1)
int b = fib(n-2)
return a+b
}

the we would have two more casts, but no unboxing, since we do not do so
for local variables.

>> to say the truth, Groovy is fast enough for me, even if it is sometimes
>> 5-100 times slower than Java. It is quite easy to get the speed very
>> much up. But a language is not only about what the implementors want and
>> a community driven language like Groovy especially not. Groovy is no
>> academic language where you write papers when you have a good idea.
>> Instead a language is also much about politics, and if the public
>> demands more speed, then we will do our best. Also here are people
>> afraid of dynamic languages and we need o show them, that they don't
>> need to be slow, just because they are dynamic
> ...
>> Well, in a benchmark like the Alioth Shootout you are not allowed to use
>> this obvious solution. That gives bad press. And since a language is so
>> much about politics, you have to handle bad press somehow
>
> I hate having to worry about performance, but I love optimizing it. The
> world is far too performance obsessed, but there are reasons for it. I
> would strongly caution against optimizations designed to make specific
> benchmarks fast, even if the political gains would be substantial. Ruby
> 1.9 added fast-path Fixnum math operators and ended up looking great on
> a lot of benchmarks. Then more and more complaints started to come in
> that they resulted in slowing down *everything non-Fixnum type* because
> of the extra typechecking involved.

sure, but still our "dream" is to be able to support native primtive
base operations. We will keep an eye on what will happen to other code.

bye blackdrag

Jochen Theodorou

unread,
May 1, 2008, 6:18:01 AM5/1/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:
[...]
> - JRuby currently doesn't do "multihosting" using the classloader
> hierarchy; instead, we have a simple org.jruby.Ruby object that
> represents a given runtime. This means we pass Ruby through most stacks
> to get at things like Fixnum caches. Statics are most definitely out in
> that scenario. We might be able to do the classloader version in the
> future...we shall see. I would suppose the current system in Groovy
> means you must make sure not to have Groovy at multiple levels in the
> classloader hierarchy, yes? It seems like with all those statics there's
> a strong change of having two Groovys lower in the hierarchy load
> something from higher up and step on each other. I've wanted to try
> isolating JRuby instances by classloader, but there hasn't been time.

Two active Groovy versions means to have a parent knowing Groovy and a
child knowing a different Groovy. That also means the class loader is
violating the loader constraints, or at last they way a classloader
should work. But ok, you get more or less the same scenario with
siblings in the classloader tree. As long as these two do not try to
transport object to each other there is no problem. If they do, then
GroovyObject from the one is not recognized by the other. That means the
MetaClass will be disabled. But since the class is besides this more or
less a Java class, it will be handled as such. I think that is perfectly
legal. We once had problems with this kind of scenario, because we did
create a Reflector for each class. You can imagine this as a class with
a single method with a giant switch in it making direct method calls. So
to say.. poor mans MethodHandle. since we make direct method calls we
have to do casts, which will cause class loading, and class loading by
name. Now if there is something unusual in the classloader tree, this
often failed with very strange exceptions. Most of them caused by class
duplication and others. So we decided to remove the Reflector, not only
because of that problem, but also because it wasn't any faster than
Reflection anymore. Since then I am a bit careful with class generation
at runtime.

Besides that... when you start Groovy from the command line, then you
usually have two Groovy active, because RootLoader will load Groovy
again, even though it is part of Groovy. But of course nearly nothing of
the other Groovy is used.

> - JRuby takes some level of perf hit from the pre/post-method setup and
> the method preamble, which loads more into local variables than does
> Groovy's. This is largely because of a set of features Ruby has that
> require more than what Groovy provides; namely, nested closures have
> heap-based nested scopes, public/private/protected are methods, a
> binding can be pulled off at any time and passed around to access local
> variables and other state, and several more. They're features that give
> great flexibility to Ruby, but which are extremely difficult to optimize
> for. I have more tricks I'll be putting in future versions of JRuby, but
> they'll take a bit of time. Ruby's a tough language to implement.

I can imagine

[...]


> I ran through some of the Alioth benchmarks, and they're definitely a
> lot better. 1.6 ought to be a good release for you.

we hope so ;)

> One question on CallSite, and admittedly I could probably get this from
> digging in the source... When I followed the discussions about call
> sites a few months ago, it seemed like they had to be constructed by
> hand on a case-by-case basis for specific types. So for example, you had
> to write the CallSite code to handle Integer +, -, *, etc, and if you
> didn't hand-write a CallSite, it would not be available. Has that changed?

I think you got that only partially right. We have binary operations on
the call sites, these can be linked directly to methods that do for
example int+int. The advantage is that you do not have to create an
extra array to store the arguments. that already can mean quite a
difference. Anyway, int+int is directly available as such a binop, thus
it can be used without transformations and without Reflection from the
callsite. If it is not available, the we fall back to the normal
Reflection based code. Such an optimization would maybe not needed if we
had MethodHandles.

bye blackdrag

Jochen Theodorou

unread,
May 1, 2008, 6:25:06 AM5/1/08
to jvm-la...@googlegroups.com
John Wilson schrieb:

> On 5/1/08, Jochen Theodorou <blac...@gmx.org> wrote:
>> Charles Oliver Nutter schrieb:
>> > Jochen Theodorou wrote:
> [...]
>> ok, let me try to explain what I think of... The current system in
>> Groovy works like this: you have a narrow API, with some core that
>> actually selects and executes the method call. Ng is more or less the
>> same, but with a wide API. What I plan for the future is not no longer
>> let the core execute the methods, instead they return handles to the
>> call site and the call site will call the method for us.
[...]

> I have not found that a straightforward implementation helps with
> performance (in fact it was slower with the initial implementation). I
> think that this is because method selection is quite complicated (you
> have to look at the types of the parameters and take Categories and
> Monkey patching into account). So the JIT, in general, will not be
> able to work out which method proxy object is returned from the
> selection call. This means that it is not able to do any significant
> inlining at the call site. Also you have to pass the actual parameters
> twice, once to the selection method and once to the method proxy.

the purpose of this is not to have inlining of the method selection, it
is to be able to avoid method selection. Inlining at the call site is
impossible if you have to use Reflection, which is our old version. the
new version can give a more direct way, not sure if inlining happens,
but it sure is faster.

> One approach I'm looking at is to pre select the method proxy and then
> pass the proxy to a checking mechanism in the runtime system which
> check to see if the proxy is still the best match and if so makes the
> call via it. The reasoning behind this is that the checking process is
> actually simpler than the selection process and the JIT may be able to
> do lost more inlining. I'm still working on this.

that isn't s much different from what we do for call sites, only that we
don't do this in the core, but in the call site object.

>> This design is very much oriented at invokedynamic, but we came up with
>> this before invokednymic. Of course MethodHandles, such as described by
>> John Rose will come in very handy here. Most of what can be done today
>> with monkey patching and categories fits well in his new way. I plan
>> also to restrict a MetaClass to be no longer replaceable, but mutating
>> it is allowed. The downside of this is, that if you want for example
>> write code that reacts to each method call, that you have to put that in
>> a MetaMethod. But much of what is done today will work without change I
>> think.
>
> You know that I think that making the Class MataClass mapping
> immutable is a *very* good idea. However if you really are going to
> propose this you need to raise it on the Groovy lists now. It's a huge
> breaking change and, whilst it will get my enthusiastic support, is
> going to cause a lot of problems for existing code.

Before I go to the lists I need to see if how the current way can be
emulated. But yes, it is a huge breaking change, that's why we schedule
this for 2.0

bye blackdrag

John Wilson

unread,
May 1, 2008, 6:30:43 AM5/1/08
to jvm-la...@googlegroups.com
On 4/30/08, John Rose <John...@sun.com> wrote:
>


I have done some work on optimising method calls when you are able to
control the bytecode generated for the target (in this case Ng code
calling Ng classes). I have documented it here
http://docs.google.com/Doc?id=ah76zbd6xsx2_9ck33c8dp if you are
interested.

It seems to work well and is a lot faster than using reflection and it
solves the problem of making super.foo() calls via the MetaClass.

So I'm only really looking at reflection to dispatch calls to Java
methods. I have a scheme for doing a similar sort of method dispatch
for Java classes but it requires run time bytecode generation and a
new class for every Java class.

John Wilson

Charles Oliver Nutter

unread,
May 1, 2008, 2:17:12 PM5/1/08
to jvm-la...@googlegroups.com
John Rose wrote:
> Or (I don't know if it could be made to work, but it's worth thinking
> about) method handles could interoperate more tightly with closures, by
> having each individual method handle somehow take on the appropriate
> function interface type.

I think this could lead to some kind of "handle explosion" where we have
(too) many handles designed to take many differently-structured
interface types. Too parametric? Obviously JRuby and Groovy take a very
simple approach now, and require all closures be of a single type and
its subtypes. If that were the idea you're talking about, where there's
a single closure interface everyone could implement, it would probably
be valuable for language implementers to more tightly interop...but I
can't imagine it would be pretty from Java.

- Charlie

John Wilson

unread,
May 1, 2008, 2:33:50 PM5/1/08
to jvm-la...@googlegroups.com

Yes Java closures are a looming interop problem. Groovy, Ng and JRuby
would, I imagine, have no major problem handling Java closures. Going
the other way would probably require wrapping in a Java closure.

It will be nice when the Java closure definition is fixed. I have not
followed it too closely. I suppose it would be too much to ask that a
lava.lang.Closure would be an abstract base class or (better still) an
interface.

John Wilson

Rémi Forax

unread,
May 1, 2008, 6:11:01 PM5/1/08
to jvm-la...@googlegroups.com
Charles Oliver Nutter a écrit :
To avoid interface explosions, you can erase an object type as Neal
Gafter has proposed:
i.e (String,String => int) is erased to (Object,Object => int)
and add casts at call site.

I've counted only 927 different erased signatures considering
all public methods of all public classes of rt.jar.

Rémi

Rémi

Charles Oliver Nutter

unread,
May 1, 2008, 6:31:55 PM5/1/08
to jvm-la...@googlegroups.com
Rémi Forax wrote:
> To avoid interface explosions, you can erase an object type as Neal
> Gafter has proposed:
> i.e (String,String => int) is erased to (Object,Object => int)
> and add casts at call site.
>
> I've counted only 927 different erased signatures considering
> all public methods of all public classes of rt.jar.

That's pretty good...I must have missed that in the proposal. I presume
that could be drastically lowered by playing varargs tricks for argument
counts over some threshold (perhaps high).

- Charlie

Reply all
Reply to author
Forward
0 new messages