Effect of partial typing on the Fibonacci's function

8 views
Skip to first unread message

Rémi Forax

unread,
Nov 1, 2009, 9:04:18 AM11/1/09
to jvm-la...@googlegroups.com
It can interest some of you,
I've written a blog entry on a newly language which allow gradual typing:
http://weblogs.java.net/blog/forax/archive/2009/11/01/tales-four-fibonaccis

Rémi

Charles Oliver Nutter

unread,
Nov 1, 2009, 1:22:17 PM11/1/09
to jvm-la...@googlegroups.com
Very nice! I want to actually implement this in Duby+Surinx merged
into a single language, but also start implementing it for JRuby. We
are moving toward an SSA-based compiler IR for future versions of
JRuby which may make it easier to propagate type information when
possible.

Rémi Forax

unread,
Nov 1, 2009, 2:52:56 PM11/1/09
to jvm-la...@googlegroups.com
Le 01/11/2009 19:22, Charles Oliver Nutter a écrit :
> Very nice! I want to actually implement this in Duby+Surinx merged
> into a single language, but also start implementing it for JRuby. We
> are moving toward an SSA-based compiler IR for future versions of
> JRuby which may make it easier to propagate type information when
> possible.
>

Hi Charles,
Fortunately, you're exist. I have a reader :)
When I was writing this entry I was wondering how many people
will really take a look to bytecode dumps.

In pseudo, I don't do any type propagation, i.e
no more that what a classical typechecking do.
I just do only one thing to avoid some dynamic cast,
I back-propagate expected type from the node to
the leaf. i.e if a function is dynamic but is assigned
to an int, I encode that this function need an int
as return type.

I wonder if a SSA-based compiler IR is a good idea,
you will need something like a back transformation
to come back to a bytecode form (without phis).
And as far as I know, the transformation that you can do
on a SSA form are low levels i.e you will have lot
of informations but no way to encode them in the bytecode.

Moreover in case of JRuby you have a lot more than
just types, you can do some kind of type profiling because
your code need first to be interpreted.
And so you can propage these informations to your bytecode
compiler.

cheers,
Rémi

Charles Oliver Nutter

unread,
Nov 1, 2009, 3:30:39 PM11/1/09
to jvm-la...@googlegroups.com
On Sun, Nov 1, 2009 at 2:52 PM, Rémi Forax <fo...@univ-mlv.fr> wrote:
> I wonder if a SSA-based compiler IR is a good idea,
> you will need something like a back transformation
> to come back to a bytecode form (without phis).
> And as far as I know, the transformation that you can do
> on a SSA form are low levels i.e you will have lot
> of informations but no way to encode them in the bytecode.
>
> Moreover in case of JRuby you have a lot more than
> just types, you can do some kind of type profiling because
> your code need first to be interpreted.
> And so you can propage these informations to your bytecode
> compiler.

The SSA transformations in the absence of profiling are to help us
reduce literal fixnum loops into int loops, fixnum math into primitive
math, and so on. Obviously these cases only work if we can statically
detect the types of values and make assumptions (or add guards) for
numeric operators. But we also plan to add type profiling to our
interpreter (or make the current compiler a first-tier compiler) so we
can feed profile data into the compiler. With that, we can potentially
produce an optimized version of a given method body that runs its
guards ahead of time and then goes straight to optimized behavior.
There are some guards we can never eliminate, but we can get rid of a
lot of the current overhead of Fixnum objects if we can tell we're
using Fixnums and numeric operators have not been modified.

- Charlie

Jochen Theodorou

unread,
Nov 2, 2009, 8:11:55 AM11/2/09
to jvm-la...@googlegroups.com
Rémi Forax schrieb:

> Le 01/11/2009 19:22, Charles Oliver Nutter a écrit :
>> Very nice! I want to actually implement this in Duby+Surinx merged
>> into a single language, but also start implementing it for JRuby. We
>> are moving toward an SSA-based compiler IR for future versions of
>> JRuby which may make it easier to propagate type information when
>> possible.
>>
>
> Hi Charles,
> Fortunately, you're exist. I have a reader :)
> When I was writing this entry I was wondering how many people
> will really take a look to bytecode dumps.

Oh you would wonder... In fact we call it optional typing in Groovy, but
it is really very much like gradual typing. Only the compiler is not
using the type information to speed up the program. We will change that
for Groovy 1.8.

I should maybe try to explain a difference between the Java type system
and the one Groovy uses. In Groovy a method call on an object with Type
T is not limited to the methods declared for this type. Instead the
runtime type is used. Since Groovy supports overloading of methods too,
that means you cannot relay on information you find in the type, unless
it is a really a direct match. If you have an Object typed variable x
and do x.foo(), and the runtime type of x will be Foo, which has a foo()
method, then the call will succeed. Since we at compile time don't know
the type Foo, we cannot generate a direct method call for that. And now
optional typing in 1.8 will come into play. You can type the variable as
Foo and the compiler will be able to generate a direct method call on x
using Foo. Also in Groovy you don't need casts when you assign something
to a typed variable. Instead Groovy will add an implicit cast. This way
Groovy ensures, that the variable will contain an object of at least
that type all the time. It is like declaring an upper bound kind of (if
parents are above).

> In pseudo, I don't do any type propagation, i.e
> no more that what a classical typechecking do.
> I just do only one thing to avoid some dynamic cast,
> I back-propagate expected type from the node to
> the leaf. i.e if a function is dynamic but is assigned
> to an int, I encode that this function need an int
> as return type.

that is different from what Groovy does. If the method would return any
other number, it may still work, depending on range and kind of that
number. But it needs a conversion of course. Instead in 1.8 we will try
finding the called method like Java Groovy does not really support
different return types. So if we know the name, the receiver and the
argument types, we can maybe find a method that directly matches does
requirements. If we do, then we can generate a direct method call. We
will then also know the return type of course. You course is more easy,
since you don't need to search the other class.

On the summit I did show that Groovy can be extremely slow for the
Fibonacci example. But that is because we first have to get loose of all
the synchronization stuff we currently do in the runtime. That is there
for a good reason, just that it does make Groovy slower even in the
single threaded cases.


bye blackdrag


--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Jeremy

unread,
Nov 2, 2009, 11:40:16 AM11/2/09
to JVM Languages
Neat stuff Rémi!

One question for you... you say that the types don't help with
unboxing. That surprised me,
as in the prototype I'm working on, if something has type "int", then
it's unboxed. At the points
where something is coerced from type any to int, copy the underlying
integer into the unboxed
int. What's preventing you from doing the same?

Cheers,
Jeremy

On Nov 1, 7:04 am, Rémi Forax <fo...@univ-mlv.fr> wrote:
> It can interest some of you,
> I've written a blog entry on a newly language which allow gradual typing:http://weblogs.java.net/blog/forax/archive/2009/11/01/tales-four-fibo...
>
> Rémi

Charles Oliver Nutter

unread,
Nov 2, 2009, 11:40:41 AM11/2/09
to jvm-la...@googlegroups.com
On Mon, Nov 2, 2009 at 7:11 AM, Jochen Theodorou <blac...@gmx.org> wrote:
> Oh you would wonder... In fact we call it optional typing in Groovy, but
> it is really very much like gradual typing. Only the compiler is not
> using the type information to speed up the program. We will change that
> for Groovy 1.8.
>
> I should maybe try to explain a difference between the Java type system
> and the one Groovy uses. In Groovy a method call on an object with Type
> T is not limited to the methods declared for this type.  Instead the
> runtime type is used. Since Groovy supports overloading of methods too,
> that means you cannot relay on information you find in the type, unless
> it is a really a direct match. If you have an Object typed variable x
> and do x.foo(), and the runtime type of x will be Foo, which has a foo()
> method, then the call will succeed. Since we at compile time don't know
> the type Foo, we cannot generate a direct method call for that. And now
> optional typing in 1.8 will come into play. You can type the variable as
> Foo and the compiler will be able to generate a direct method call on x
> using Foo. Also in Groovy you don't need casts when you assign something
> to a typed variable. Instead Groovy will add an implicit cast. This way
> Groovy ensures, that the variable will contain an object of at least
> that type all the time. It is like declaring an upper bound kind of (if
> parents are above).

Won't this break existing behavior? Currently, even if you declare a
type, it will still dispatch based on the runtime set of methods for
that type. So in your example, if I have replaced the foo() method on
Foo with some other piece of code, will statically declaring my
variables as type Foo dispatch to the original foo() method or the one
I've replaced? This is the conundrum I face in Ruby, where we already
can gather type feedback to know a variable is always X, but still
need to check if X is being modified so our direct calls are valid.
Some of this exists already in JRuby; if you call a numeric method
directly against a Fixnum (boxed 64-bit integer), it will go straight
in and even inline the logic. Upcoming compiler work will also start
to add type and method guards to inlined/optimized versions of code.
But the mutability of classes common to both Ruby and Groovy means
unguarded direct calls will probably always be suspect.

> On the summit I did show that Groovy can be extremely slow for the
> Fibonacci example. But that is because we first have to get loose of all
> the synchronization stuff we currently do in the runtime. That is there
> for a good reason, just that it does make Groovy slower even in the
> single threaded cases.

What synchronization stuff do you deal with in the fib example?
Perhaps I missed some of that during your talk. JRuby currently does
no synchronization at all during a simple algorithm like fib. Is it
because you do not have your own set of core types, like Fixnum is our
64-bit numeric box? We are also exploring the near-term possibility of
making JRuby allow arbitrary Object to pass through the system without
a wrapper, and for that we'll start to find solutions to metaclass
caching, etc. A mix of wrapped and unwrapped may be best: currently we
pay a metaclass lookup cost only once, when the object enters Ruby,
and from then on it's just a field access on the wrapper.

- Charlie

Rémi Forax

unread,
Nov 2, 2009, 12:55:32 PM11/2/09
to jvm-la...@googlegroups.com
Le 02/11/2009 17:40, Jeremy a écrit :
> Neat stuff Rémi!
>
> One question for you... you say that the types don't help with
> unboxing. That surprised me,
>

What I wanted to say is that a mixed model,
some variables are 'any' some others are int,
has a cost which is more close to the full dynamic model
than the full typed model.

This is because boxings are required at all edges between
the typed world and the untyped world.

You don't have a nice linear function that says if I type a new varaible,
I can gain x% of performance.

> as in the prototype I'm working on, if something has type "int", then
> it's unboxed. At the points
> where something is coerced from type any to int, copy the underlying
> integer into the unboxed
> int. What's preventing you from doing the same?
>

That what I do or I haven't understand what you mean.

> Cheers,
> Jeremy
>

Cheers,
Rémi

John Wilson

unread,
Nov 2, 2009, 12:40:51 PM11/2/09
to jvm-la...@googlegroups.com
2009/11/2 Charles Oliver Nutter <hea...@headius.com>:


It's a good question!

It's virtually impossible to do effective type inferencing with Groovy
semantics. E.G. with

int a = 1
int b = 2
def x = a + b

you have no idea of the type of x. Indeed if this code is executed at
the same time in two different threads x can end up being two
different types (because of Categories).

It's, at least theoretically, possible to do speculative execution of
some classes of Groovy expression and, if you can do a cheap test to
ensure that the operators have not been overloaded, you can get
execution speeds within an order of magnitude of straight Java.

I have never seen any way of doing unguarded direct method calls in
Groovy. It is possible to generate synthetic helper methods on a
compiled Groovy class which can be safely called directly but these
helper methods need to do checks before they can pass the call on to
the target method; so you are, in effect, doing guarded calls.

Because, in the general case, Groovy needs to examine the type of each
parameter at runtime before dispatching the method (because Groovy
supports a method dispatch mechanism very similar to Java's) this
requires another set of guards: E.G.

class C {
int foo(int p) {
return p + 2
}
}

int x = 1
int y = 2


new C().foo(x + y)

there is no way you can do a direct dispatch to foo on the instance of
C. Firstly you don't know the type of the result of x + y and secondly
you don't know what methods have been been monkey patched onto
instances of C.

The method dispatch problem is really difficult in Groovy because of
the need to implement Java style method selection based on the type of
the actual parameters (In Groovy it's based on the type of the actual
parameter not the declared type of the variable holding the actual
parameter - but that makes it even more difficult).

Still, Jochen has been thinking about this for longer than I have - so
maybe he's seen something I've missed.

John Wilson

Rémi Forax

unread,
Nov 2, 2009, 1:17:00 PM11/2/09
to jvm-la...@googlegroups.com

In september, just at the end of the summit, we (or I and)
the JSR292 EG group talk a lot about Jochen presentation
(JRuby exhibits exactly the same problem).

Yes currently you have to check some metaclass invariant
and that check cost a lot because it need to be a volatile check.

But I think John proposes a very good solution that need to
be implemented to be sure.
The magic word here is invalidation.

The idea is to let the code without guard but to be able
to trigger a VM safepoint. At this safepoint, all or some
callsites will be invalided, i.e the callsite target will be flushed
and the callsite will be in state that will require
to call the bootstrap method to re-initialise its target.

The thread that has triggered the invalidation will hold a lock
to change the hierarchy, i.e do all language specific modifications.
If during this step another thread want to call a callsite target,
because the callsite was invalided it will require to call the
bootstrap method again. And to acheive proper synchronisation,
the BSM should try to take the lock beore installing a new target.
If the lock is held by the thread that has required invalidation,
the thread will just wait on the lock..

If invalidation works, you can get ride all of guards against
metaclass modification. And if a metaclass modification
occurs, all callsite targets will be flushed and new target
will need to be reinstalled.

So please guys, push to have a real invalidation implementation
and don't add guards. Let the VM do the magics :)

> Because, in the general case, Groovy needs to examine the type of each
> parameter at runtime before dispatching the method (because Groovy
> supports a method dispatch mechanism very similar to Java's) this
> requires another set of guards: E.G.
>
> class C {
> int foo(int p) {
> return p + 2
> }
> }
>
> int x = 1
> int y = 2
>
>
> new C().foo(x + y)
>
> there is no way you can do a direct dispatch to foo on the instance of
> C. Firstly you don't know the type of the result of x + y and secondly
> you don't know what methods have been been monkey patched onto
> instances of C.
>
> The method dispatch problem is really difficult in Groovy because of
> the need to implement Java style method selection based on the type of
> the actual parameters (In Groovy it's based on the type of the actual
> parameter not the declared type of the variable holding the actual
> parameter - but that makes it even more difficult).
>
> Still, Jochen has been thinking about this for longer than I have - so
> maybe he's seen something I've missed.
>
> John Wilson
>

Rémi

Charles Oliver Nutter

unread,
Nov 2, 2009, 1:01:22 PM11/2/09
to jvm-la...@googlegroups.com
On Mon, Nov 2, 2009 at 11:40 AM, John Wilson <tugw...@gmail.com> wrote:
>
> 2009/11/2 Charles Oliver Nutter <hea...@headius.com>:
>> Won't this break existing behavior? Currently, even if you declare a
>> type, it will still dispatch based on the runtime set of methods for
>> that type. So in your example, if I have replaced the foo() method on
>> Foo with some other piece of code, will statically declaring my
>> variables as type Foo dispatch to the original foo() method or the one
>> I've replaced? This is the conundrum I face in Ruby, where we already
...

> It's a good question!
>
> It's virtually impossible to do effective type inferencing with Groovy
> semantics. E.G. with
>
> int a = 1
> int b = 2
> def x = a + b
>
> you have no idea of the type of x. Indeed if this code is executed at
> the same time in two different threads x can end up being two
> different types (because of Categories).

Categories should be abolished, in my opinion, because they seem to be
the final sticking point for many potential improvements in Groovy.
But that's just me :)

In JRuby we don't have categories to contend with, but we do need to
be mindful of core class modifications. It is *possible* to modify the
numeric operator methods on Fixnum, but it is done so rarely in
practice that we usually don't check for it. JRuby also has an
advantage over Groovy in that we do not compile everything ahead of
time, so we can (and soon will) use profile-driven optimizations to
improve these cases, but we'll still need to have guards somewhere and
we'll still have to be fairly conservative since we can't fix-up a
currently-executing method if necessary.

> It's, at least theoretically, possible to do speculative execution of
> some classes of Groovy expression and, if you can do a cheap test to
> ensure that the operators have not been overloaded, you can get
> execution speeds within an order of magnitude of straight Java.

That's essentially what we do in JRuby now, and when running in our
fastest mode we are easily within an order of magnitude of Java for
fully-boxed math. We are nowhere close to comparing to primitive math,
which will require our newer compiler and likely also require
profile-driven optz.

> I have never seen any way of doing unguarded direct method calls in
> Groovy. It is possible to generate synthetic helper methods on a
> compiled Groovy class which can be safely called directly but these
> helper methods need to do checks before they can pass the call on to
> the target method; so you are, in effect, doing guarded calls.

JRuby essentially does this right now for math operators, checking
whether the type of the receive is Fixnum and directly dispatching.
But it's not a good general-purpose solution.

> The method dispatch problem is really difficult in Groovy because of
> the need to implement Java style method selection based on the type of
> the actual parameters (In Groovy it's based on the type of the actual
> parameter not the declared type of the variable holding the actual
> parameter - but that makes it even more difficult).

Ahh yes, this is a detail I had forgotten. Because Groovy supports
method overloading, dispatch is a much harder problem than in Ruby,
and may always be slower (at least in the presence of overloads).

For ruby-to-ruby calls in JRuby, I need only to look up a method of a
given name to find the method object I dispatch to. There's no
inspection of the parameters during dispatch at all. Generally this
means that user code needs to either check the types of incoming
arguments (using kind_of? or === or similar), or check whether they
support certain operations (using respond_to? or just calling
directly), but it does make dispatch, caching, and inlining much
simpler. As a result, ruby-to-ruby calls are generally faster than
ruby-to-java, even when the java invocations don't require coercing
any objects. Of course, all ruby-to-ruby calls avoid argument boxing,
which is impossible if you use reflection (as I believe Groovy does
for a substantial number of calls right now).

For ruby-to-java calls in JRuby, we maintain a table of incoming
arguments types and the correct target method to dispatch for those
types. The table is keyed off an aggregate hash value of the actual
argument types. This currently happens within the method object we
cache at the call site, but in the future we will actually maintain a
PIC at the call site that can quickly choose one of the most
recently-dispatched method objects.

My hope with gradual typing in JRuby is that we could start explicitly
compiling optimized and non-optimized versions of method bodies with
only trivial guards at the top. The assumption would be that if you're
explicitly typing a method, you expect the behavior at the start of
the method to match the behavior at the end of the method, and so you
don't check whether things like Fixnum#+ are changing during the
execution of the method body. That would allow us to compile numeric
logic down to primitive logic in many cases. But I expect the bulk of
these optimizations are only going to help numeric algorithms, since
our object-to-object dispatch in Ruby is only a bit slower than
non-inlined Java object-to-object dispatch right now.

- Charlie

Charles Oliver Nutter

unread,
Nov 2, 2009, 1:14:00 PM11/2/09
to jvm-la...@googlegroups.com
On Mon, Nov 2, 2009 at 12:17 PM, Rémi Forax <fo...@univ-mlv.fr> wrote:
> The idea is to let the code without guard but to be able
> to trigger a VM safepoint. At this safepoint, all or some
> callsites will be invalided, i.e the callsite target will be flushed
> and the callsite will be in state that will require
> to call the bootstrap method to re-initialise its target.
>
> The thread that has triggered the invalidation will hold a lock
> to change the hierarchy, i.e do all language specific modifications.
> If during this step another thread want to call a callsite target,
> because the callsite was invalided it will require to call the
> bootstrap method again. And to acheive proper synchronisation,
> the BSM should try to take the lock beore installing a new target.
> If the lock is held by the thread that has required invalidation,
> the thread will just wait on the lock..
>
> If invalidation works, you can get ride all of guards against
> metaclass modification. And if a metaclass modification
> occurs, all callsite targets will be flushed and new target
> will need to be reinstalled.
>
> So please guys, push to have a real invalidation implementation
> and don't add guards. Let the VM do the magics :)

This is how JRuby's call site logic used to work, and we'd love to
return to that logic. The problem we ran into (and I believe the
problem that the DLR guys ran into when they tried the same thing) was
that unless you access the inline cache under lock, you can't
guarantee that threads won't step on each other. The two can't run
concurrently without synchronization since the calling thread might
cache a method at the exact moment another thread invalidates,
resulting in a stale cache that won't ever be invalidated. It was
faster, but we had actual reported bugs from that thread race.

If it's possible to have the VM do this for us (correctly), it would
certainly be a big help. That is also predicated on invokedynamic
being able to do automatically all these optimizations we want to do
by hand. For example, invokedynamic doing "fib" against our boxed
Fixnum objects needs to be able to reduce it to integer math to be
able to compete with the profile-driven "manual" optimizations we
would do at a bytecode level. It would be ideal if we could just wire
up all invokedynamic call paths and know that those paths will receive
all the optimizations we expect from Java dispatches, but I don't know
if (or when) that will be the case.

(It's not that I don't have high confidence in the VM guys, it's just
that I also need a solution for non-indy JVMs :) I'm very much looking
forward to indy-based optimizations that would be difficult or
impossible for us to do at a bytecode level)

- Charlie

Jochen Theodorou

unread,
Nov 2, 2009, 2:00:14 PM11/2/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:
[...]

> Currently, even if you declare a
> type, it will still dispatch based on the runtime set of methods for
> that type. So in your example, if I have replaced the foo() method on
> Foo with some other piece of code, will statically declaring my
> variables as type Foo dispatch to the original foo() method or the one
> I've replaced?

to the replaced. I intend to let the compiler create a path where
different types are assumed or ensured. If the compiler sees these types
are modified or wrongly guessed, then it has to use the alternative
path. replacing/adding a method modifies the type on which the method is
replaced/added. Actually I know this will possibly make the methods
bigger and I won't know if that effect will bypass the gain for a direct
dispatch. I have to see how much makes sense once I have it working. but
surely hotspot will have some work here.

> This is the conundrum I face in Ruby, where we already
> can gather type feedback to know a variable is always X, but still
> need to check if X is being modified so our direct calls are valid.

well for groovy that check itself is currently a problem, since it
involves checking synchronized data. If you do that for each and every
operation, than that means poor hotspot can do almost nothing and is
almost as slow as in the interpreted mode. For example in 1.7 I noticed
that if an int is boxed and then unboxed, then hotspot can eliminate the
boxing. But if before the unboxing a volatile field is accessed for
example, then hotspot won't do that. It is also a barrier for inlining.

> Some of this exists already in JRuby; if you call a numeric method
> directly against a Fixnum (boxed 64-bit integer), it will go straight
> in and even inline the logic. Upcoming compiler work will also start
> to add type and method guards to inlined/optimized versions of code.
> But the mutability of classes common to both Ruby and Groovy means
> unguarded direct calls will probably always be suspect.

Well in Groovy it is reasonable to assume for a large set of classes
that the class is not modified. Even working on a subset of methods
might be possible. That works so well in Groovy because here you still
tend to declare classes and not construct them at runtime. This means we
can do that not only for numbers and other types known by the runtime,
but also for classes the user declared. And of course some methods can
get a fast path... I think of each, find, toString and such.

Jochen Theodorou

unread,
Nov 2, 2009, 2:05:58 PM11/2/09
to jvm-la...@googlegroups.com
John Wilson schrieb:
[...]

> It's virtually impossible to do effective type inferencing with Groovy
> semantics. E.G. with
>
> int a = 1
> int b = 2
> def x = a + b

I can assume x is int. But I am not free in assuming that the plus
method is really hat I assume and that x will get the right type. So I
need alternate paths.

[...]


> I have never seen any way of doing unguarded direct method calls in
> Groovy. It is possible to generate synthetic helper methods on a
> compiled Groovy class which can be safely called directly but these
> helper methods need to do checks before they can pass the call on to
> the target method; so you are, in effect, doing guarded calls.

yes. I don't intend to change the semantics of Groovy in that area, so
every call is guarded in some way. But how guarding is done is important
too.

[...]


> Still, Jochen has been thinking about this for longer than I have - so
> maybe he's seen something I've missed.

I found out the guarding itself is not so bad. hotspot can remove many
of those checks. The important part here is that the guards are in the
call site or else it is very fast anything but monomorphic.

Charles Oliver Nutter

unread,
Nov 2, 2009, 2:14:20 PM11/2/09
to jvm-la...@googlegroups.com
On Mon, Nov 2, 2009 at 1:00 PM, Jochen Theodorou <blac...@gmx.org> wrote:
> to the replaced. I intend to let the compiler create a path where
> different types are assumed or ensured. If the compiler sees these types
> are modified or wrongly guessed, then it has to use the alternative
> path. replacing/adding a method modifies the type on which the method is
> replaced/added. Actually I know this will possibly make the methods
> bigger and I won't know if that effect will bypass the gain for a direct
> dispatch. I have to see how much makes sense once I have it working. but
> surely hotspot will have some work here.

We have already begun to do this in some cases in JRuby, and will be
expanding on it more in the 1.5 work over the next few months. We will
do essentially what you describe, but based on runtime-profiled types.
If we see that an incoming argument is always a Fixnum, we'll try to
emit an optimized Fixnum path. This will be helped substantially be a
new compiler that can perform similar optimizations to Hotspot at a
Ruby level if we can feed it enough type information. Ideally we'll be
able to get runtime JRuby code to run as fast as static-typed code for
many cases.

I'm hoping to avoid the problems of large method bodies by breaking
those bodies up into several synthetic methods. The fast path would be
in the main body, with slow paths in alternative bodies. The whole
thing would be stitched together with guards and static dispatches, so
it should all inline appropriately. Early results look very good;
we're able to get simple benchmarks within only a few times Java
performance.

> well for groovy that check itself is currently a problem, since it
> involves checking synchronized data. If you do that for each and every
> operation, than that means poor hotspot can do almost nothing and is
> almost as slow as in the interpreted mode. For example in 1.7 I noticed
> that if an int is boxed and then unboxed, then hotspot can eliminate the
> boxing. But if before the unboxing a volatile field is accessed for
> example, then hotspot won't do that. It is also a barrier for inlining.

By "checking synchronized data" you mean checking a volatile field,
yes? We do the same, but it has been reduced to a single volatile
field per invocation. Eliminating the check improves performance, but
not as much as eliminating other dispatch overhead like argument
boxing or artificial stack-trace maintenance.

> Well in Groovy it is reasonable to assume for a large set of classes
> that the class is not modified. Even working on a subset of methods
> might be possible. That works so well in Groovy because here you still
> tend to declare classes and not construct them at runtime. This means we
>  can do that not only for numbers and other types known by the runtime,
> but also for classes the user declared. And of course some methods can
> get a fast path... I think of each, find, toString and such.

We will make some of those assumptions, but generally only for numeric
algorithms. "each" and friends are all commonly reimplemented by
users, so they'll either need to be hand-inlined (via input from our
new compiler) or we'll have to hope that JVMs can catch up with
closure-based languages and inline closure calls across intermediate
megamorphic methods.

- Charlie

Jochen Theodorou

unread,
Nov 2, 2009, 2:28:13 PM11/2/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:

my numbers show, that on integer based benchmarks Groovy could be around
30% slower than Java. The GSOC project which realizes a bytecode level
optimizing compiler for Groovy shows this is very possible. The only
reasons we don't go with that solution is that it requires Groovy to
start two runtimes and as a library we cannot do that. So it would be
only an option if Groovy is run from the command line. And for for
example Grails this does not really look like an option.

[...]


> By "checking synchronized data" you mean checking a volatile field,
> yes? We do the same, but it has been reduced to a single volatile
> field per invocation. Eliminating the check improves performance, but
> not as much as eliminating other dispatch overhead like argument
> boxing or artificial stack-trace maintenance.

one volatile is enough to cause problems to hotspot. I found that in
Java 1.7 for example boxing is much less of a problem. If hotspot finds
the path primitive, boxed, unboxed, then it removes those two surplus
operations. So boxing by itself would not be a problem. But is you
access the volatile before unboxing, then this path no longer works and
hotspot won't remove the surplus boxing. Artificial stack-trace
maintenance is something we don't do, so there is no problem for us.

>> Well in Groovy it is reasonable to assume for a large set of classes
>> that the class is not modified. Even working on a subset of methods
>> might be possible. That works so well in Groovy because here you still
>> tend to declare classes and not construct them at runtime. This means we
>> can do that not only for numbers and other types known by the runtime,
>> but also for classes the user declared. And of course some methods can
>> get a fast path... I think of each, find, toString and such.
>
> We will make some of those assumptions, but generally only for numeric
> algorithms. "each" and friends are all commonly reimplemented by
> users, so they'll either need to be hand-inlined (via input from our
> new compiler) or we'll have to hope that JVMs can catch up with
> closure-based languages and inline closure calls across intermediate
> megamorphic methods.

why is it megamorphic for you guys?

John Wilson

unread,
Nov 2, 2009, 2:28:10 PM11/2/09
to jvm-la...@googlegroups.com
2009/11/2 Jochen Theodorou <blac...@gmx.org>:

>
> John Wilson schrieb:
> [...]
>> It's virtually impossible to do effective type inferencing with Groovy
>> semantics. E.G. with
>>
>> int a = 1
>> int b = 2
>> def x = a + b
>
> I can assume x is int. But I am not free in assuming that the plus
> method is really hat I assume and that x will get the right type. So I
> need alternate paths.

No, you can't assume x is int. For example, it's perfectly reasonable
to override the arithmetic operations to return long if the operation
would otherwise overflow. There is nothing in the Groovy language that
says that the result of adding two ints is an int (or at least there
wasn't a couple of yeas ago - has that changed?).

>
> [...]
>> I have never seen any way of doing unguarded direct method calls in
>> Groovy. It is possible to generate synthetic helper methods on a
>> compiled Groovy class which can be safely called directly but these
>> helper methods need to do checks before they can pass the call on to
>> the target method; so you are, in effect, doing guarded calls.
>
> yes. I don't intend to change the semantics of Groovy in that area, so
> every call is guarded in some way. But how guarding is done is important
> too.
>
> [...]
>> Still, Jochen has been thinking about this for longer than I have - so
>> maybe he's seen something I've missed.
>
> I found out the guarding itself is not so bad. hotspot can remove many
> of those checks. The important part here is that the guards are in the
> call site or else it is very fast anything but monomorphic.


Yes, Hotspot can be pretty good at optimising out guards. My
experience is that they don't actually have to be at the call site -
Hotspot inlines code quite readily so if you put it in a helper method
it gets inlined quite quickly and then treated in the same way as if
it was generated at the call site.

John Wilson

Jochen Theodorou

unread,
Nov 2, 2009, 2:32:53 PM11/2/09
to jvm-la...@googlegroups.com
John Wilson schrieb:

> 2009/11/2 Jochen Theodorou <blac...@gmx.org>:
>> John Wilson schrieb:
>> [...]
>>> It's virtually impossible to do effective type inferencing with Groovy
>>> semantics. E.G. with
>>>
>>> int a = 1
>>> int b = 2
>>> def x = a + b
>> I can assume x is int. But I am not free in assuming that the plus
>> method is really hat I assume and that x will get the right type. So I
>> need alternate paths.
>
> No, you can't assume x is int. For example, it's perfectly reasonable
> to override the arithmetic operations to return long if the operation
> would otherwise overflow. There is nothing in the Groovy language that
> says that the result of adding two ints is an int (or at least there
> wasn't a couple of yeas ago - has that changed?).

You got me a bit wrong. I can assume it to generate a fast path at
compile time. I cannot relay on that assumption, which means I have to
check that before I go down that path.

[...]


>> I found out the guarding itself is not so bad. hotspot can remove many
>> of those checks. The important part here is that the guards are in the
>> call site or else it is very fast anything but monomorphic.
>
> Yes, Hotspot can be pretty good at optimising out guards. My
> experience is that they don't actually have to be at the call site -
> Hotspot inlines code quite readily so if you put it in a helper method
> it gets inlined quite quickly and then treated in the same way as if
> it was generated at the call site.

well yes, I should correct myself here. The guard should just not do the
method call later ;)

Jeremy

unread,
Nov 2, 2009, 2:54:15 PM11/2/09
to JVM Languages


On Nov 2, 10:55 am, Rémi Forax <fo...@univ-mlv.fr> wrote:
> Le 02/11/2009 17:40, Jeremy a écrit :
>
> > Neat stuff Rémi!
>
> > One question for you... you say that the types don't help with
> > unboxing. That surprised me,
>
> What I wanted to say is that a mixed model,
> some variables are 'any' some others are int,
> has a cost which is more close to the full dynamic model
> than the full typed model.
>
> This is because boxings are required at all edges between
> the typed world and the untyped world.

Yes.

> You don't have a nice linear function that says if I type a new varaible,
> I can gain x% of performance.

Right, it's much more complicated than that.

> > as in the prototype I'm working on, if something has type "int", then
> > it's unboxed. At the points
> > where something is coerced from type any to int, copy the underlying
> > integer into the unboxed
> > int. What's preventing you from doing the same?
>
> That what I do or I haven't understand what you mean.
>

Great! Thanks for the clarification.

Cheers,
Jeremy


> > Cheers,
> > Jeremy
>
> Cheers,
> Rémi

Charles Oliver Nutter

unread,
Nov 2, 2009, 2:57:08 PM11/2/09
to jvm-la...@googlegroups.com
On Monday, November 2, 2009, Jochen Theodorou <blac...@gmx.org> wrote:
> my numbers show, that on integer based benchmarks Groovy could be around
> 30% slower than Java. The GSOC project which realizes a bytecode level
> optimizing compiler for Groovy shows this is very possible. The only
> reasons we don't go with that solution is that it requires Groovy to
> start two runtimes and as a library we cannot do that. So it would be
> only an option if Groovy is run from the command line. And for for
> example Grails this does not really look like an option.

That is unfortunate. Because JRuby has an interpreted mode, adding the
profile driven optimizations will not be at all difficult for us.
Shouldn't you be able to modify your compiler to gather profile data?
I am also planning to do that for JRuby since many users want to be
able to ship only .class files.

Or perhaps someone could wirte a Groovy interpreter? JRuby's
interpreter obviously does not perform as well as the compiled code,
but it still manages to be faster than the C versions of Ruby and
nearly as fast as Groovy, plus we can gather profile data to make the
eventual compiled result much faster.

> one volatile is enough to cause problems to hotspot. I found that in
> Java 1.7 for example boxing is much less of a problem. If hotspot finds
> the path primitive, boxed, unboxed, then it removes those two surplus
> operations. So boxing by itself would not be a problem. But is you
> access the volatile before unboxing, then this path no longer works and
> hotspot won't remove the surplus boxing.

Yes, I do realize this, and discussions with John Rose often led us to
the conclusion that JRuby's next perf boost may come from eliminating
those volatile checks. But it is very hard to eliminate them when we
can't safely "push" updates out to call sites (see my earlier email on
"active" call site invalidation). Perhaps I will take another look and
see if I can find a way.

Artificial stack-trace
> maintenance is something we don't do, so there is no problem for us.

Yes, thankfully we have managed to get stack maintenance pretty fast,
fast enough that we are as fast or faster than groovy on most
microbenchmarks. But we are hoping to allow users to turn it off very
soon, which can double or triple the performance of method
invocations. A simple fib(30) goes from 0.17s to 0.9s inthat mode.
Hopefully it will be stable soon...and maybe even default.

> why is it megamorphic for you guys?

It is megamorphic for everyone. Each and times and their Ilk are all
ping-ponging through a method that receives a closure. Since there are
many callers of these methods, the closure passed varies widely and
the eventual invocation of that closure is polymorphic. On Hotspot, at
least, this prevents the closure from being unlined all the way back
to the caller of 'each' and so you don't get the full set of
optimizations.

I've talked with a few JVM engineers about this, and most agree it is
simply a case that current JVMs have not been tuned to optimize, and
that with a bit of work this case could online too. For now, however,
we are looking at doing inlining optimizations for closure-receiving
methods at a bytecode level, to allow JVMs to inline through them.

Have you done much exploration into how well Groovy code is inlining
in the JVM? My explorations with JRuby have shown me that with certain
JRuby flags turned on, I can get everything to inline, even across
dynamic calls, even into user-created code. My next goal is to
eliminate anything that interferes with inlined optimizations. Fun
times ahead reading assembly output :)

Rémi Forax

unread,
Nov 2, 2009, 3:35:40 PM11/2/09
to jvm-la...@googlegroups.com

That why invalidation of call site need to be done at a safe point
by the VM i.e in a place where you know that no Java code can run.
You can only have race between invalidated call site in order
to call the bootstrap method. That why the BSM code need
to be protected with a lock.

> If it's possible to have the VM do this for us (correctly), it would
> certainly be a big help. That is also predicated on invokedynamic
> being able to do automatically all these optimizations we want to do
> by hand. For example, invokedynamic doing "fib" against our boxed
> Fixnum objects needs to be able to reduce it to integer math to be
> able to compete with the profile-driven "manual" optimizations we
> would do at a bytecode level. It would be ideal if we could just wire
> up all invokedynamic call paths and know that those paths will receive
> all the optimizations we expect from Java dispatches, but I don't know
> if (or when) that will be the case.
>

Current Java dispatch don't do multi-dispatch.
This is basically what you want when you do Fixnum math.
What indy propose is to let write you own kind of inline cache
at call site in order to be able to emulate multi-dispatch.

If you want more, bug John to have a new adapter named
multidispatch that takes a set of method handle and do
multidispatch on it. The problem here is that you have to anwser
to that question: what is the algorithm you want to use
to do this kind of dispatch.

VM knows how to do virtual call inlining, inline cache,
polymorphic (at least bimorphic) inline cache etc.
But these optimisations are not good for solving the
multi-dispatch problem.

> (It's not that I don't have high confidence in the VM guys, it's just
> that I also need a solution for non-indy JVMs :) I'm very much looking
> forward to indy-based optimizations that would be difficult or
> impossible for us to do at a bytecode level)
>

Non-indy VMs, don't burn resource on that,
please use the backport :)

> - Charlie
>

Rémi

John Wilson

unread,
Nov 2, 2009, 3:22:45 PM11/2/09
to jvm-la...@googlegroups.com
2009/11/2 Jochen Theodorou <blac...@gmx.org>:
>
> John Wilson schrieb:
>> 2009/11/2 Jochen Theodorou <blac...@gmx.org>:
>>> John Wilson schrieb:
>>> [...]
>>>> It's virtually impossible to do effective type inferencing with Groovy
>>>> semantics. E.G. with
>>>>
>>>> int a = 1
>>>> int b = 2
>>>> def x = a + b
>>> I can assume x is int. But I am not free in assuming that the plus
>>> method is really hat I assume and that x will get the right type. So I
>>> need alternate paths.
>>
>> No, you can't assume x is int. For example, it's perfectly reasonable
>> to override the arithmetic operations to return long if the operation
>> would otherwise overflow. There is nothing in the Groovy language that
>> says that the result of adding two ints is an int (or at least there
>> wasn't a couple of yeas ago - has that changed?).
>
> You got me a bit wrong. I can assume it to generate a fast path at
> compile time. I cannot relay on that assumption, which means I have to
> check that before I go down that path.


Yes, that's fine. If you can cheaply check that no numeric semantics
have been changed and there are no function calls in the expression
you can basically generate the same code as javac. This works very
well for several common benchmarks :)

John Wilson

Jochen Theodorou

unread,
Nov 2, 2009, 4:12:45 PM11/2/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:

> On Monday, November 2, 2009, Jochen Theodorou <blac...@gmx.org> wrote:
>> my numbers show, that on integer based benchmarks Groovy could be around
>> 30% slower than Java. The GSOC project which realizes a bytecode level
>> optimizing compiler for Groovy shows this is very possible. The only
>> reasons we don't go with that solution is that it requires Groovy to
>> start two runtimes and as a library we cannot do that. So it would be
>> only an option if Groovy is run from the command line. And for for
>> example Grails this does not really look like an option.
>
> That is unfortunate. Because JRuby has an interpreted mode, adding the
> profile driven optimizations will not be at all difficult for us.
> Shouldn't you be able to modify your compiler to gather profile data?
> I am also planning to do that for JRuby since many users want to be
> able to ship only .class files.
>
> Or perhaps someone could wirte a Groovy interpreter? JRuby's
> interpreter obviously does not perform as well as the compiled code,
> but it still manages to be faster than the C versions of Ruby and
> nearly as fast as Groovy, plus we can gather profile data to make the
> eventual compiled result much faster.

Such an interpreter would still need a static class construct, so the
interpreter would interpret only the contents of the method. That means
the method would first create some kind of tree and evaluate that, with
optionally storing the tree. This imposes of course a big startup cost.
Without having such a tree it is not easy to tell if an interpreter
could have any advantage at runtime. But frankly I thought about this
too. I thought that maybe it would be good to replace the method
contents and such an interpreter could be of help here. This is kind of
like the bytecode based version, only that we don't need an agent. The
additional data structure looks quite surplus to me but if the VM does
not offer more... Of course replacing the contents of the method means
invalidation of the method to some extend and here the question is how
to handle multiple threads. And in the multiple threads are the point
that worry me. Because in the current way I think I found a solution
that works without using any explicit volatiles. If the method is
replaced, then I don't see how that should work without a volatile. And
then it would be a volatile per method call again. Something I identify
as root of evil in Groovy atm. In Groovy we don't have a runtime per
thread, so I don't think that works for us.

With invokedynamic there would be the solution, that we make a dynamic
method call to the contents of the method, with the interpreter being
the default. I we replace the interpreter, then we can invalidate the
callsite. So with invokedynamic this may make sense, without it does not
really.

>> one volatile is enough to cause problems to hotspot. I found that in
>> Java 1.7 for example boxing is much less of a problem. If hotspot finds
>> the path primitive, boxed, unboxed, then it removes those two surplus
>> operations. So boxing by itself would not be a problem. But is you
>> access the volatile before unboxing, then this path no longer works and
>> hotspot won't remove the surplus boxing.
>
> Yes, I do realize this, and discussions with John Rose often led us to
> the conclusion that JRuby's next perf boost may come from eliminating
> those volatile checks. But it is very hard to eliminate them when we
> can't safely "push" updates out to call sites (see my earlier email on
> "active" call site invalidation). Perhaps I will take another look and
> see if I can find a way.

wellI think I found a way for Groovy that eliminates all volatile
usages, but that depends on how multi threading is done in Groovy/Java
and on the java memory model.

>> Artificial stack-trace
>> maintenance is something we don't do, so there is no problem for us.
>
> Yes, thankfully we have managed to get stack maintenance pretty fast,
> fast enough that we are as fast or faster than groovy on most
> microbenchmarks. But we are hoping to allow users to turn it off very
> soon, which can double or triple the performance of method
> invocations. A simple fib(30) goes from 0.17s to 0.9s inthat mode.
> Hopefully it will be stable soon...and maybe even default.

may I ask what you even need it for?

>> why is it megamorphic for you guys?
>
> It is megamorphic for everyone. Each and times and their Ilk are all
> ping-ponging through a method that receives a closure. Since there are
> many callers of these methods, the closure passed varies widely and
> the eventual invocation of that closure is polymorphic. On Hotspot, at
> least, this prevents the closure from being unlined all the way back
> to the caller of 'each' and so you don't get the full set of
> optimizations.

You mean for example the "each" method I guess, not the method calling
each. Well you could imagine that the method calling each first inlines
the each method and then the closure. Hotspot would make many lifes more
easy if that would be done.

[...]


> Have you done much exploration into how well Groovy code is inlining
> in the JVM? My explorations with JRuby have shown me that with certain
> JRuby flags turned on, I can get everything to inline, even across
> dynamic calls, even into user-created code. My next goal is to
> eliminate anything that interferes with inlined optimizations. Fun
> times ahead reading assembly output :)

your hotspot does inline across volatiles?

Robert Fischer

unread,
Nov 2, 2009, 5:28:52 PM11/2/09
to jvm-la...@googlegroups.com
If I'm understanding, Jochen is working with a definition of "assume" that's different from everyone
else's. I'd use the term "guess" for where he uses "assume" — that is, I can guess it is an int and
then default to other behavior if it turns out otherwise.

~~ Robert Fischer, Smokejumper IT Consulting.
Enfranchised Mind Blog http://EnfranchisedMind.com/blog

Grails Expert Retainer Services
http://smokejumperit.com/grails-retainer/

Jochen Theodorou

unread,
Nov 2, 2009, 6:24:28 PM11/2/09
to jvm-la...@googlegroups.com
Robert Fischer schrieb:

> If I'm understanding, Jochen is working with a definition of "assume" that's different from everyone
> else's. I'd use the term "guess" for where he uses "assume" — that is, I can guess it is an int and
> then default to other behavior if it turns out otherwise.

well, I am no native speaker so be my guest and replace assume with guess ;)

bye Jochen

hlovatt

unread,
Nov 3, 2009, 7:18:30 AM11/3/09
to JVM Languages
If you treat the problem as a multiple-dispatch problem, can't you?

1. Issue all code without using the type information, as though
everything was Object in Java lingo. Doesn't stop the compiler from
issuing type errors, but not worth optimizing compiler output.

2. Use whatever method resolution algorithm you need to find the
desired method and put the found method in a method handle.

3. If a new method is loaded, invalidate all the handles. The new
methods are loaded by the class loader and therefore this is a thread-
safe operation and you don't need any additional locking.

4. Surround the call to the method with a try catch block that looks
for a class cast exception; if the exception occurs, then under a lock
invalidate the method handles and retry. To reduce code bloat, isolate
pure code and group this into one try block. EG if p1 and p2 are pure
functions then:

try { p1( ... ); p2( ... ); }
catch (ClassCatchException notUsed) { lock(); invalidateMethodHandles
(); p1( ... ); p2( ... ); release(); }

(Better still, the sequences p1( ... ); p2( ... ); could be replaced
by a function call.)

I am writing this post as a question because it is what I am thinking
about doing, so I am interested if anyone shoots the idea down.

-- Howard.

On Nov 1, 3:04 pm, Rémi Forax <fo...@univ-mlv.fr> wrote:
> It can interest some of you,
> I've written a blog entry on a newly language which allow gradual typing:http://weblogs.java.net/blog/forax/archive/2009/11/01/tales-four-fibo...
>
> Rémi

Jochen Theodorou

unread,
Nov 3, 2009, 8:41:49 AM11/3/09
to jvm-la...@googlegroups.com
hlovatt schrieb:

> If you treat the problem as a multiple-dispatch problem, can't you?
>
> 1. Issue all code without using the type information, as though
> everything was Object in Java lingo. Doesn't stop the compiler from
> issuing type errors, but not worth optimizing compiler output.

that depends on your language. In Groovy we have cases were you would
need static type information

> 2. Use whatever method resolution algorithm you need to find the
> desired method and put the found method in a method handle.
>
> 3. If a new method is loaded, invalidate all the handles. The new
> methods are loaded by the class loader and therefore this is a thread-
> safe operation and you don't need any additional locking.

this sounds like the wrong order... method lookup in the sense of
invokedynamic+bootstrap is done if the callsite has not yet a method or
the method was invalidated. Usually you want to do just a simple check
if that method is still valid. If you now do a complete method
selection, then this is certainly no simple check and will most probably
cost you big times, unless your method selection process is really
really fast.

> 4. Surround the call to the method with a try catch block that looks
> for a class cast exception; if the exception occurs, then under a lock
> invalidate the method handles and retry. To reduce code bloat, isolate
> pure code and group this into one try block. EG if p1 and p2 are pure
> functions then:
>
> try { p1( ... ); p2( ... ); }
> catch (ClassCatchException notUsed) { lock(); invalidateMethodHandles
> (); p1( ... ); p2( ... ); release(); }
>
> (Better still, the sequences p1( ... ); p2( ... ); could be replaced
> by a function call.)

This assumes p1 and p2 cannot normally cause such an exception. If that
assumption is right for your language I don't know. In case of Groovy it
certainly is not.

bye blackdrag

John Rose

unread,
Nov 22, 2009, 4:19:03 PM11/22/09
to jvm-la...@googlegroups.com
On Nov 2, 2009, at 11:14 AM, Charles Oliver Nutter wrote:

> we'll have to hope that JVMs can catch up with
> closure-based languages and inline closure calls across intermediate
> megamorphic methods.

Yes, this needs work. The pattern is of an algorithm method (like ArrayList.contains) with a crucial megamorphic site that is usually monomorphic with respect to a caller of the algorithm.

(If the algorithm is a working with function pointers, we can call it a combinator also.)

There ought to be a way to recognize this pattern, customize the algorithm code, and propagate the needed type information from the caller (or its caller or ...) to the megamorphic site.

One way to handle this would be to have a mark on such methods @InlineWithProfile directing the JVM to split the inlined method *and* its profile, for each call site. This might be worth an experiment. But the other missing bit would be to have the JVM automatically mark such methods. I have some further ideas about this, and hope some day to post some sketches to the mlvm repo.

Another way would be what you may do with JRuby, taking control of bytecode generation and forcing the inlining of the algorithms.

- Joh
Reply all
Reply to author
Forward
0 new messages