Google Groups Home
Help | Sign in
Lessons learned from JRuby compiler work
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  Messages 1 - 25 of 54 - Collapse all   Newer >
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
Charles Oliver Nutter  
View profile
(1 user)  More options Oct 1 2007, 5:52 pm
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Mon, 01 Oct 2007 16:52:43 -0500
Local: Mon, Oct 1 2007 5:52 pm
Subject: Lessons learned from JRuby compiler work
Ok...I'm firing a few shots across the bow of the Java machine here, and
I'd love to hear all input. But I wanted to do a retrospective of a
number of things I've learned during the implementation of the JRuby
compiler.

#1. We need lightweight method objects

In JRuby, where methods can be defined and discarded on a whim, we have
to be able to generate lightweight objects on demand that can be garbage
collected as easily as any other object. In our case, this means that
the JRuby JIT must generate single-method classes at runtime in their
own GCable classloaders. So if we JIT a thousand methods, that's a
thousand tiny classes in a thousand (not-so-tiny) classloaders. It works
great in practice, except for two details:

- classloaders are not cheap to spin up (I'd even say "expensive")
- classloaders take up too much memory

The situation is aggravated by our desire to do more and more code
generation:

- generating direct-invoking stubs to bind methods into named slots on
the MOP
- generating call adapters that make use of inline caching information
to optimize multiple fast paths
- generating switch-based "fast dispatchers" on demand that can omit
dynamic method lookup entirely
- generating type and arity-specific call interfaces throughout the call
chain to allow more specificity in dispatch

The more we choose to generate, the more the expensive cost of a
classloader-per-generated-class becomes painful. What we really must
have sooner rather than later is a way to define lightweight, GCable
autonomous methods that we can loosely bind together and not have to
bend over backwards to manage and dereference. We need GCable classes,
and better yet we need super-lightweight classes to represent autonomous
methods. NEED.

#2. We need VM implementers to better-publicize what works and what
doesn't when writing bare-metal JVM bytecode, as far as appropriate
design patterns, performance considerations, and maybe most important of
all, what NOT to do.

For JRuby, we've managed to gather bits and piece of information that
have largely seemed to help. I've blogged about them, we've discussed
them on these lists, and in general it seems the information is getting
out. But there needs to be a place we can point to official information
about how and why HotSpot and other JVMs optimize code, since as
language implementers we have a much larger interest in crafting our
bytecode, method lookup, and call chain in more optimal ways.

The JRuby story may be illustrative here:

JRuby has traditionally been an interpreted-only implementation.
Interpreters work well and are easy to implement, but there's a key
characteristic that makes them difficult to optimize on the JVM: by
definition, an interpreter loop is extremely polymorphic at the call
sites (I've heard the term "megamorphic" being batted around lately).
Because all calls to other methods must pass through one or two spots in
the interpreter, it's much more difficult for the JVM to optimize the
call path.

To remedy this in the compiler, we've reduced the distance between calls
to only a few methods. A call site first calls into a generic piece of
code that all call sites use. That call site, then immediately calls
back into a piece of generated code. So the distance between unique code
segments is only a couple calls, and HotSpot is able to recognize that.
We hope to improve the situation by generating even the call site
adapters, allowing the call path to contain almost entirely monomorphic
calls (if we can get around the problems in #1 above).

However learning that this is a good way to do things has only come from
bits and pieces of discussions. There needs to be a document somewhere.

#3. Bytecode generation is a lot easier than I expected, even with
verification errors and the like. However, there should be a bytecode
generation library as part of Java proper, along the lines of ASM.

I don't like that we have to ship ASM. This seems like something that
should just "be there" already. CLR has a library for generating IL as
part of the core library, and so should the JVM.

I'd also like to see some attention paid to the usability of raw
bytecode generation libraries. For example, the following code:

mv.visitVarInsn(ALOAD, 0);
mv.visitFieldInsn(GETFIELD, mnamePath, "$scriptObject",
                   cg.ci(Object.class));
mv.visitTypeInsn(CHECKCAST, cg.p(type));
mv.visitVarInsn(ALOAD, 1);
mv.visitVarInsn(ALOAD, 2);
mv.visitVarInsn(ALOAD, 3);
mv.visitMethodInsn(INVOKEVIRTUAL, typePath, method, cg.sig(
        RubyKernel.IRUBY_OBJECT, cg.params(ThreadContext.class,
        RubyKernel.IRUBY_OBJECT, IRubyObject[].class)));
mv.visitInsn(ARETURN);

The "cg" object basically takes a type and returns various structures
for ASM; I know there's a Type class that does something similar already
in ASM. However, by using a wrapper, the above code becomes the following:

mv.aload(0);
mv.getfield(mnamePath, "$scriptObject", cg.ci(Object.class));
mv.checkcast(cg.p(type));
mv.aload(1);
mv.aload(2);
mv.aload(3);
mv.invokevirtual(typepath, method, ...)
mv.areturn();

And yes, there's a few utility classes in ASM that provide a similar
interface, but unfortunately they also start to introduce more Javaisms
and hide more of the actual generation process.

By wrapping the above interface in Ruby, we can even end up with
something akin to an assembly source file (and I plan to create such a
Ruby library based on ASM very soon):

aload 0
getfield SomeClass, "$scriptObject", SomeOtherType
checkcast SomeCastType
aload 1
aload 2
aload 3
invokevirtual AnotherType, "method", [Param1, Param2] => ReturnType
areturn

And yes, I know there are JVM bytecode assemblers out there that could
do this as well, but they're not as imperative/programmatic as an
internal DSL could be.

I guess the bottom line here is that I'd love to see JVM bytecode
officially exposed as a language in its own right, with a standard
format, a standard built-in library for generating/assembling it, and
all the tools folks doing native processor assembly have come to expect.

....

I'll cut it off here and make another list as thoughts form, but I'd
love to hear discussions about all these points.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Cowan  
View profile
 More options Oct 1 2007, 6:37 pm
From: "John Cowan" <johnwco...@gmail.com>
Date: Mon, 1 Oct 2007 18:37:43 -0400
Local: Mon, Oct 1 2007 6:37 pm
Subject: Re: Lessons learned from JRuby compiler work
On 10/1/07, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

> In JRuby, where methods can be defined and discarded on a whim, we have
> to be able to generate lightweight objects on demand that can be garbage
> collected as easily as any other object. In our case, this means that
> the JRuby JIT must generate single-method classes at runtime in their
> own GCable classloaders.

It took me quite a while to figure out what "G Cable" meant.

> an interpreter loop is extremely polymorphic at the call
> sites (I've heard the term "megamorphic" being batted around lately).

I think the Self people invented that term, but it's obviously wrong: a
monomorphic call has one form, a polymorphic call has many forms,
and a megamorphic call has a big form?  I think not.

I think "polymorphic" should be used as the opposite of "monomorphic"
in general.   If there are only a few possible types, then "oligomorphic"
is the obvious word.  That leaves us needing a word for "many types".
A guy I knew who was a recycled teacher of classics wrote to me thus:

# One prefix that suggests itself is 'perisso-' from Greek 'perissos' =
# beyond the regular number, prodigious, excessive.  It is found in the
# English words 'perissology' (verbiage, pleonasm) and 'perissosyllabic'
# (having an additional syllable).
#
# On the downside, the Greek word was used as a technical arithmetic term to
# mean "odd" (as opposed to "even") and occurs with this meaning in the
# English adjective 'perissodactyl' (having an odd number of toes).  To
# someone who knew this term, 'perissomorphic' might, I guess, suggest
# 'having an odd number of methods'.
#
# I rather like 'perissomorphic'; but if the possible ambiguity in meaning is
# not welcome, why not resurrect, so to speak, the ancient prefix 'pletho-'
# (<-- ple:thos [neut] = multitude, large number), found in the ancient Greek
# compounds 'ple:thokhoros' = a gathering of a large number of dancers  :)
#
# The word 'plethora' is not uncommon and makes an obvious connexion with the
# prefix.  So if you don't like 'perissomorphic', maybe 'plethomorphic' would
# do?

I favor "plethomorphic".  So on this principle we have monomorphic,
oligomorphic, and plethomorphic call sites.

> To remedy this in the compiler, we've reduced the distance between calls
> to only a few methods. A call site first calls into a generic piece of
> code that all call sites use. That call site, then immediately calls
> back into a piece of generated code.

This is very like indirect threaded code in Forth.

> I guess the bottom line here is that I'd love to see JVM bytecode
> officially exposed as a language in its own right, with a standard
> format, a standard built-in library for generating/assembling it, and
> all the tools folks doing native processor assembly have come to expect.

+1

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Daniel Green  
View profile
 More options Oct 1 2007, 7:00 pm
From: "Daniel Green" <october...@gmail.com>
Date: Mon, 1 Oct 2007 19:00:12 -0400
Local: Mon, Oct 1 2007 7:00 pm
Subject: Re: Lessons learned from JRuby compiler work
I love this group, it gives the JVM language community a forum to come
together, share ideas, and solve problems that face all of us. Now
that the JVM has been freed, isn't it only logical that the next step
is to contribute code? We have some heavy hitters on this list and
some genuine genius, we could get a lot accomplished. Why not start an
initiative that focuses on solving the sort of problems Charlie
presented? We would all benefit from something like that.

On 10/1/07, John Cowan <johnwco...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Randall R Schulz  
View profile
 More options Oct 1 2007, 7:45 pm
From: Randall R Schulz <rsch...@sonic.net>
Date: Mon, 1 Oct 2007 16:45:02 -0700
Local: Mon, Oct 1 2007 7:45 pm
Subject: Re: Lessons learned from JRuby compiler work
On Monday 01 October 2007 14:52, Charles Oliver Nutter wrote:

> Ok...I'm firing a few shots across the bow of the Java machine here,
> and I'd love to hear all input. But I wanted to do a retrospective of
> a number of things I've learned during the implementation of the
> JRuby compiler.

In all likelihood you're already aware of this, but Martin Odersky, the
chief principal of the Scala language development effort has some
distinct ideas about how to improve the JVM for the next generation of
programming languages.

If you have not done so, you should compare notes with him!

> ...

Randall Schulz

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Chanwit Kaewkasi  
View profile
 More options Oct 1 2007, 8:08 pm
From: "Chanwit Kaewkasi" <chan...@gmail.com>
Date: Tue, 2 Oct 2007 07:08:58 +0700
Local: Mon, Oct 1 2007 8:08 pm
Subject: Re: Lessons learned from JRuby compiler work
Hi,

> own GCable classloaders. So if we JIT a thousand methods, that's a
> thousand tiny classes in a thousand (not-so-tiny) classloaders. It works
> great in practice, except for two details:

> - classloaders are not cheap to spin up (I'd even say "expensive")
> - classloaders take up too much memory

Don't know if this idea would work:

Let's say I'd like to have N generated classes per a class loader
(rather than 1 by 1).
I've got  CL1 = [C1, C2, C3, C4, C5, C6].
After that I need to keep 2 classes C5, C6.
I redefine C5, C6 again in a new class loader CL2, then I later do GC CL1.

Do you think this can work in practice ?

Cheers,

Chanwit

--
Chanwit Kaewkasi
PhD Student,
Centre for Novel Computing
School of Computer Science
The University of Manchester
Oxford Road
Manchester
M13 9PL, UK


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Oct 1 2007, 10:10 pm
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Mon, 01 Oct 2007 21:10:44 -0500
Local: Mon, Oct 1 2007 10:10 pm
Subject: Re: Lessons learned from JRuby compiler work

Daniel Green wrote:
> I love this group, it gives the JVM language community a forum to come
> together, share ideas, and solve problems that face all of us. Now
> that the JVM has been freed, isn't it only logical that the next step
> is to contribute code? We have some heavy hitters on this list and
> some genuine genius, we could get a lot accomplished. Why not start an
> initiative that focuses on solving the sort of problems Charlie
> presented? We would all benefit from something like that.

What an excellent idea! :)

http://code.google.com/p/jvm-language-runtime/

I'm hoping to have more time to start pulling out bits of JRuby into
this common location, and encouraging others to do the same. There's
already a bit there from Jython, a package cache for things like runtime
import xxx.*

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Charles Oliver Nutter  
View profile
 More options Oct 1 2007, 10:12 pm
From: Charles Oliver Nutter <charles.nut...@sun.com>
Date: Mon, 01 Oct 2007 21:12:44 -0500
Local: Mon, Oct 1 2007 10:12 pm
Subject: Re: Lessons learned from JRuby compiler work

I think it could certainly work, but it could be quite a bit of churn,
not to mention having to either regenerate C5 and C6 or hold onto the
raw bytecode that represents them. It's a bit of a hack...but I think it
could reduce the number of classloaders necessary.

I guess the other tricky bit is knowing which classes aren't being
referenced anymore. I shouldn't have to try to track that myself.

- Charlie


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
John Cowan  
View profile
 More options Oct 1 2007, 11:38 pm
From: "John Cowan" <johnwco...@gmail.com>
Date: Mon, 1 Oct 2007 23:38:55 -0400
Local: Mon, Oct 1 2007 11:38 pm
Subject: Re: Lessons learned from JRuby compiler work
On 10/1/07, Charles Oliver Nutter <charles.nut...@sun.com> wrote:

> I guess the other tricky bit is knowing which classes aren't being
> referenced anymore. I shouldn't have to try to track that myself.

Finalizers would help, except they help only once.

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Wright  
View profile
 More options Oct 2 2007, 1:55 am
From: "Patrick Wright" <pdoubl...@gmail.com>
Date: Tue, 2 Oct 2007 07:55:45 +0200
Local: Tues, Oct 2 2007 1:55 am
Subject: Re: Lessons learned from JRuby compiler work
Just a few random comments:

1) Bytecode generation: since there are already a few libraries that
target this area (ASM, BCEL, Jasmin) it seems ripe for
standardization. The trick would be getting enough time from the
contributors to run an expert group. But sounds like a great idea.

2) Lightweight method instances: are you proposing a Hotspot
optimization or a change to the JVM spec? I assume the former, since
I'm not sure what that would look like in the spec, e.g. what exactly
you would mandate.

3) It seems one generally tricky issue is how to talk about
optimization in this context without making assumptions about which
JVM you're running on, which I thought was something we were aiming
for, e.g. it seems a little iffy to depend on optimizations which are
only present in Hotspot or IBMs JVM or some other. How will your code
run in another type of JIT, like CACAO? How do we reason about
optimizations across JVMs? Understanding, of course, that you
(Charles) work for Sun.

4) Re: bytecode generation, Jasmin looks cool, http://jasmin.sourceforge.net/
"Jasmin is an assembler for the Java Virtual Machine. It takes ASCII
descriptions of Java classes, written in a simple assembler-like
syntax using the Java Virtual Machine instruction set. It converts
them into binary Java class files, suitable for loading by a Java
runtime system."

Regards
Patrick


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Per Bothner  
View profile
(1 user)  More options Oct 2 2007, 2:23 am
From: Per Bothner <p...@bothner.com>
Date: Mon, 01 Oct 2007 23:23:40 -0700
Local: Tues, Oct 2 2007 2:23 am
Subject: Re: Lessons learned from JRuby compiler work

Patrick Wright wrote:
> 1) Bytecode generation: since there are already a few libraries that
> target this area (ASM, BCEL, Jasmin) it seems ripe for
> standardization. The trick would be getting enough time from the
> contributors to run an expert group. But sounds like a great idea.

Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):
http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
It uses arrays extensively, which makes it very efficient,
It also has many useful features for compiler writers.

Jasmin is (primarily?) an assembler, which I don't think you want to
use.

BCEL seems really inefficient for code generation.  It creates an Object
for each instruction, which is pretty heavy-weight.  (OTOH this can be
useful for analysis and peephole optimization.)

ASM from I've seen of it seems quite elegant - I rather like its
use of the visitor pattern.  However, I haven't used it myself.

I'd suggest starting with ASM, but maybe adding some of the
features and conveniences of gnu.bytecode and other toolkits.
--
        --Per Bothner
p...@bothner.com   http://per.bothner.com/


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Patrick Wright  
View profile
 More options Oct 2 2007, 2:34 am
From: "Patrick Wright" <pdoubl...@gmail.com>
Date: Tue, 2 Oct 2007 08:34:35 +0200
Local: Tues, Oct 2 2007 2:34 am
Subject: Re: Lessons learned from JRuby compiler work

> Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):
> http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
> It uses arrays extensively, which makes it very efficient,
> It also has many useful features for compiler writers.

Thanks! I didn't realize that was available.

I added a page for this Google group listing the projects we've
mentioned so far.