Lessons learned from JRuby compiler work

236 views
Skip to first unread message

Charles Oliver Nutter

unread,
Oct 1, 2007, 5:52:43 PM10/1/07
to jvm-la...@googlegroups.com
Ok...I'm firing a few shots across the bow of the Java machine here, and
I'd love to hear all input. But I wanted to do a retrospective of a
number of things I've learned during the implementation of the JRuby
compiler.

#1. We need lightweight method objects

In JRuby, where methods can be defined and discarded on a whim, we have
to be able to generate lightweight objects on demand that can be garbage
collected as easily as any other object. In our case, this means that
the JRuby JIT must generate single-method classes at runtime in their
own GCable classloaders. So if we JIT a thousand methods, that's a
thousand tiny classes in a thousand (not-so-tiny) classloaders. It works
great in practice, except for two details:

- classloaders are not cheap to spin up (I'd even say "expensive")
- classloaders take up too much memory

The situation is aggravated by our desire to do more and more code
generation:

- generating direct-invoking stubs to bind methods into named slots on
the MOP
- generating call adapters that make use of inline caching information
to optimize multiple fast paths
- generating switch-based "fast dispatchers" on demand that can omit
dynamic method lookup entirely
- generating type and arity-specific call interfaces throughout the call
chain to allow more specificity in dispatch


The more we choose to generate, the more the expensive cost of a
classloader-per-generated-class becomes painful. What we really must
have sooner rather than later is a way to define lightweight, GCable
autonomous methods that we can loosely bind together and not have to
bend over backwards to manage and dereference. We need GCable classes,
and better yet we need super-lightweight classes to represent autonomous
methods. NEED.

#2. We need VM implementers to better-publicize what works and what
doesn't when writing bare-metal JVM bytecode, as far as appropriate
design patterns, performance considerations, and maybe most important of
all, what NOT to do.

For JRuby, we've managed to gather bits and piece of information that
have largely seemed to help. I've blogged about them, we've discussed
them on these lists, and in general it seems the information is getting
out. But there needs to be a place we can point to official information
about how and why HotSpot and other JVMs optimize code, since as
language implementers we have a much larger interest in crafting our
bytecode, method lookup, and call chain in more optimal ways.

The JRuby story may be illustrative here:

JRuby has traditionally been an interpreted-only implementation.
Interpreters work well and are easy to implement, but there's a key
characteristic that makes them difficult to optimize on the JVM: by
definition, an interpreter loop is extremely polymorphic at the call
sites (I've heard the term "megamorphic" being batted around lately).
Because all calls to other methods must pass through one or two spots in
the interpreter, it's much more difficult for the JVM to optimize the
call path.

To remedy this in the compiler, we've reduced the distance between calls
to only a few methods. A call site first calls into a generic piece of
code that all call sites use. That call site, then immediately calls
back into a piece of generated code. So the distance between unique code
segments is only a couple calls, and HotSpot is able to recognize that.
We hope to improve the situation by generating even the call site
adapters, allowing the call path to contain almost entirely monomorphic
calls (if we can get around the problems in #1 above).

However learning that this is a good way to do things has only come from
bits and pieces of discussions. There needs to be a document somewhere.

#3. Bytecode generation is a lot easier than I expected, even with
verification errors and the like. However, there should be a bytecode
generation library as part of Java proper, along the lines of ASM.

I don't like that we have to ship ASM. This seems like something that
should just "be there" already. CLR has a library for generating IL as
part of the core library, and so should the JVM.

I'd also like to see some attention paid to the usability of raw
bytecode generation libraries. For example, the following code:

mv.visitVarInsn(ALOAD, 0);
mv.visitFieldInsn(GETFIELD, mnamePath, "$scriptObject",
cg.ci(Object.class));
mv.visitTypeInsn(CHECKCAST, cg.p(type));
mv.visitVarInsn(ALOAD, 1);
mv.visitVarInsn(ALOAD, 2);
mv.visitVarInsn(ALOAD, 3);
mv.visitMethodInsn(INVOKEVIRTUAL, typePath, method, cg.sig(
RubyKernel.IRUBY_OBJECT, cg.params(ThreadContext.class,
RubyKernel.IRUBY_OBJECT, IRubyObject[].class)));
mv.visitInsn(ARETURN);

The "cg" object basically takes a type and returns various structures
for ASM; I know there's a Type class that does something similar already
in ASM. However, by using a wrapper, the above code becomes the following:

mv.aload(0);
mv.getfield(mnamePath, "$scriptObject", cg.ci(Object.class));
mv.checkcast(cg.p(type));
mv.aload(1);
mv.aload(2);
mv.aload(3);
mv.invokevirtual(typepath, method, ...)
mv.areturn();

And yes, there's a few utility classes in ASM that provide a similar
interface, but unfortunately they also start to introduce more Javaisms
and hide more of the actual generation process.

By wrapping the above interface in Ruby, we can even end up with
something akin to an assembly source file (and I plan to create such a
Ruby library based on ASM very soon):

aload 0
getfield SomeClass, "$scriptObject", SomeOtherType
checkcast SomeCastType
aload 1
aload 2
aload 3
invokevirtual AnotherType, "method", [Param1, Param2] => ReturnType
areturn

And yes, I know there are JVM bytecode assemblers out there that could
do this as well, but they're not as imperative/programmatic as an
internal DSL could be.

I guess the bottom line here is that I'd love to see JVM bytecode
officially exposed as a language in its own right, with a standard
format, a standard built-in library for generating/assembling it, and
all the tools folks doing native processor assembly have come to expect.

....

I'll cut it off here and make another list as thoughts form, but I'd
love to hear discussions about all these points.

- Charlie

John Cowan

unread,
Oct 1, 2007, 6:37:43 PM10/1/07
to jvm-la...@googlegroups.com
On 10/1/07, Charles Oliver Nutter <charles...@sun.com> wrote:

> In JRuby, where methods can be defined and discarded on a whim, we have
> to be able to generate lightweight objects on demand that can be garbage
> collected as easily as any other object. In our case, this means that
> the JRuby JIT must generate single-method classes at runtime in their
> own GCable classloaders.

It took me quite a while to figure out what "G Cable" meant.

> an interpreter loop is extremely polymorphic at the call
> sites (I've heard the term "megamorphic" being batted around lately).

I think the Self people invented that term, but it's obviously wrong: a
monomorphic call has one form, a polymorphic call has many forms,
and a megamorphic call has a big form? I think not.

I think "polymorphic" should be used as the opposite of "monomorphic"
in general. If there are only a few possible types, then "oligomorphic"
is the obvious word. That leaves us needing a word for "many types".
A guy I knew who was a recycled teacher of classics wrote to me thus:

# One prefix that suggests itself is 'perisso-' from Greek 'perissos' =
# beyond the regular number, prodigious, excessive. It is found in the
# English words 'perissology' (verbiage, pleonasm) and 'perissosyllabic'
# (having an additional syllable).
#
# On the downside, the Greek word was used as a technical arithmetic term to
# mean "odd" (as opposed to "even") and occurs with this meaning in the
# English adjective 'perissodactyl' (having an odd number of toes). To
# someone who knew this term, 'perissomorphic' might, I guess, suggest
# 'having an odd number of methods'.
#
# I rather like 'perissomorphic'; but if the possible ambiguity in meaning is
# not welcome, why not resurrect, so to speak, the ancient prefix 'pletho-'
# (<-- ple:thos [neut] = multitude, large number), found in the ancient Greek
# compounds 'ple:thokhoros' = a gathering of a large number of dancers :)
#
# The word 'plethora' is not uncommon and makes an obvious connexion with the
# prefix. So if you don't like 'perissomorphic', maybe 'plethomorphic' would
# do?

I favor "plethomorphic". So on this principle we have monomorphic,
oligomorphic, and plethomorphic call sites.

> To remedy this in the compiler, we've reduced the distance between calls
> to only a few methods. A call site first calls into a generic piece of
> code that all call sites use. That call site, then immediately calls
> back into a piece of generated code.

This is very like indirect threaded code in Forth.

> I guess the bottom line here is that I'd love to see JVM bytecode
> officially exposed as a language in its own right, with a standard
> format, a standard built-in library for generating/assembling it, and
> all the tools folks doing native processor assembly have come to expect.

+1

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures

Daniel Green

unread,
Oct 1, 2007, 7:00:12 PM10/1/07
to jvm-la...@googlegroups.com
I love this group, it gives the JVM language community a forum to come
together, share ideas, and solve problems that face all of us. Now
that the JVM has been freed, isn't it only logical that the next step
is to contribute code? We have some heavy hitters on this list and
some genuine genius, we could get a lot accomplished. Why not start an
initiative that focuses on solving the sort of problems Charlie
presented? We would all benefit from something like that.

Randall R Schulz

unread,
Oct 1, 2007, 7:45:02 PM10/1/07
to jvm-la...@googlegroups.com
On Monday 01 October 2007 14:52, Charles Oliver Nutter wrote:
> Ok...I'm firing a few shots across the bow of the Java machine here,
> and I'd love to hear all input. But I wanted to do a retrospective of
> a number of things I've learned during the implementation of the
> JRuby compiler.

In all likelihood you're already aware of this, but Martin Odersky, the
chief principal of the Scala language development effort has some
distinct ideas about how to improve the JVM for the next generation of
programming languages.

If you have not done so, you should compare notes with him!


> ...


Randall Schulz

Chanwit Kaewkasi

unread,
Oct 1, 2007, 8:08:58 PM10/1/07
to jvm-la...@googlegroups.com
Hi,

> own GCable classloaders. So if we JIT a thousand methods, that's a
> thousand tiny classes in a thousand (not-so-tiny) classloaders. It works
> great in practice, except for two details:
>
> - classloaders are not cheap to spin up (I'd even say "expensive")
> - classloaders take up too much memory

Don't know if this idea would work:

Let's say I'd like to have N generated classes per a class loader
(rather than 1 by 1).
I've got CL1 = [C1, C2, C3, C4, C5, C6].
After that I need to keep 2 classes C5, C6.
I redefine C5, C6 again in a new class loader CL2, then I later do GC CL1.

Do you think this can work in practice ?

Cheers,

Chanwit

--
Chanwit Kaewkasi
PhD Student,
Centre for Novel Computing
School of Computer Science
The University of Manchester
Oxford Road
Manchester
M13 9PL, UK

Charles Oliver Nutter

unread,
Oct 1, 2007, 10:10:44 PM10/1/07
to jvm-la...@googlegroups.com
Daniel Green wrote:
> I love this group, it gives the JVM language community a forum to come
> together, share ideas, and solve problems that face all of us. Now
> that the JVM has been freed, isn't it only logical that the next step
> is to contribute code? We have some heavy hitters on this list and
> some genuine genius, we could get a lot accomplished. Why not start an
> initiative that focuses on solving the sort of problems Charlie
> presented? We would all benefit from something like that.

What an excellent idea! :)

http://code.google.com/p/jvm-language-runtime/

I'm hoping to have more time to start pulling out bits of JRuby into
this common location, and encouraging others to do the same. There's
already a bit there from Jython, a package cache for things like runtime
import xxx.*

- Charlie

Charles Oliver Nutter

unread,
Oct 1, 2007, 10:12:44 PM10/1/07
to jvm-la...@googlegroups.com
Chanwit Kaewkasi wrote:
> Hi,
>
>> own GCable classloaders. So if we JIT a thousand methods, that's a
>> thousand tiny classes in a thousand (not-so-tiny) classloaders. It works
>> great in practice, except for two details:
>>
>> - classloaders are not cheap to spin up (I'd even say "expensive")
>> - classloaders take up too much memory
>
> Don't know if this idea would work:
>
> Let's say I'd like to have N generated classes per a class loader
> (rather than 1 by 1).
> I've got CL1 = [C1, C2, C3, C4, C5, C6].
> After that I need to keep 2 classes C5, C6.
> I redefine C5, C6 again in a new class loader CL2, then I later do GC CL1.
>
> Do you think this can work in practice ?

I think it could certainly work, but it could be quite a bit of churn,
not to mention having to either regenerate C5 and C6 or hold onto the
raw bytecode that represents them. It's a bit of a hack...but I think it
could reduce the number of classloaders necessary.

I guess the other tricky bit is knowing which classes aren't being
referenced anymore. I shouldn't have to try to track that myself.

- Charlie

John Cowan

unread,
Oct 1, 2007, 11:38:55 PM10/1/07
to jvm-la...@googlegroups.com
On 10/1/07, Charles Oliver Nutter <charles...@sun.com> wrote:

> I guess the other tricky bit is knowing which classes aren't being
> referenced anymore. I shouldn't have to try to track that myself.

Finalizers would help, except they help only once.

Patrick Wright

unread,
Oct 2, 2007, 1:55:45 AM10/2/07
to jvm-la...@googlegroups.com
Just a few random comments:

1) Bytecode generation: since there are already a few libraries that
target this area (ASM, BCEL, Jasmin) it seems ripe for
standardization. The trick would be getting enough time from the
contributors to run an expert group. But sounds like a great idea.

2) Lightweight method instances: are you proposing a Hotspot
optimization or a change to the JVM spec? I assume the former, since
I'm not sure what that would look like in the spec, e.g. what exactly
you would mandate.

3) It seems one generally tricky issue is how to talk about
optimization in this context without making assumptions about which
JVM you're running on, which I thought was something we were aiming
for, e.g. it seems a little iffy to depend on optimizations which are
only present in Hotspot or IBMs JVM or some other. How will your code
run in another type of JIT, like CACAO? How do we reason about
optimizations across JVMs? Understanding, of course, that you
(Charles) work for Sun.

4) Re: bytecode generation, Jasmin looks cool, http://jasmin.sourceforge.net/
"Jasmin is an assembler for the Java Virtual Machine. It takes ASCII
descriptions of Java classes, written in a simple assembler-like
syntax using the Java Virtual Machine instruction set. It converts
them into binary Java class files, suitable for loading by a Java
runtime system."


Regards
Patrick

Per Bothner

unread,
Oct 2, 2007, 2:23:40 AM10/2/07
to jvm-la...@googlegroups.com
Patrick Wright wrote:
> 1) Bytecode generation: since there are already a few libraries that
> target this area (ASM, BCEL, Jasmin) it seems ripe for
> standardization. The trick would be getting enough time from the
> contributors to run an expert group. But sounds like a great idea.

Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):
http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
It uses arrays extensively, which makes it very efficient,
It also has many useful features for compiler writers.

Jasmin is (primarily?) an assembler, which I don't think you want to
use.

BCEL seems really inefficient for code generation. It creates an Object
for each instruction, which is pretty heavy-weight. (OTOH this can be
useful for analysis and peephole optimization.)

ASM from I've seen of it seems quite elegant - I rather like its
use of the visitor pattern. However, I haven't used it myself.

I'd suggest starting with ASM, but maybe adding some of the
features and conveniences of gnu.bytecode and other toolkits.
--
--Per Bothner
p...@bothner.com http://per.bothner.com/

Patrick Wright

unread,
Oct 2, 2007, 2:34:35 AM10/2/07
to jvm-la...@googlegroups.com
> Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):
> http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
> It uses arrays extensively, which makes it very efficient,
> It also has many useful features for compiler writers.

Thanks! I didn't realize that was available.

I added a page for this Google group listing the projects we've
mentioned so far.

>
> Jasmin is (primarily?) an assembler, which I don't think you want to
> use.

I thought, if one already knew the bytecode one wanted to generate,
that spitting out a simplified assembler would be more direct than
going through a Java API. You would know (much) better than I, though.
Jamaica also seems like an option in this case.


Regards
Patrick

Per Bothner

unread,
Oct 2, 2007, 3:01:32 AM10/2/07
to jvm-la...@googlegroups.com
Patrick Wright wrote:
>> Jasmin is (primarily?) an assembler, which I don't think you want to
>> use.
>
> I thought, if one already knew the bytecode one wanted to generate,
> that spitting out a simplified assembler would be more direct than
> going through a Java API. You would know (much) better than I, though.
> Jamaica also seems like an option in this case.

Quote the opposite. If you have a compiler written in Java, and need
to generate bytecode, it's wasteful to write out assembler and then
parse it. It's about as easy and much more efficient to use
the Java API.

John Wilson

unread,
Oct 2, 2007, 4:09:57 AM10/2/07
to jvm-la...@googlegroups.com
On 10/1/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>
> Ok...I'm firing a few shots across the bow of the Java machine here, and
> I'd love to hear all input. But I wanted to do a retrospective of a
> number of things I've learned during the implementation of the JRuby
> compiler.

Lots of good stuff in the message, Charlie!

I have some thoughts on the other parts which I'll post separately but
the bytecode generation standardisation idea is at the top of my list.

I wonder how feasible it would be to define a standard AST as well?
I'm thinking in terms of a set of interfaces/abstract classes which
represent a class. Each language implementation would
implement/subclass these to produce a concrete AST (a rather confusing
term I'm afraid).

In my ideal world there would be an introspection API which allowed me
to get the AST for an arbitrary class and manipulate it and then ask
the manipulated AST to give me a new Class object which implemented
the AST. (Paranoid Classes could refuse to supply their ASTs, of
course).

This would allow non bytecode mavens to do AOP and Lisp macro style
manipulation of arbitrary classes.

This suggestion does not, of course, replace yours as we would need a
standard way of generating bytecodes to allow the ASTs to turn
themselves into classes.

John Wilson

Patrick Wright

unread,
Oct 2, 2007, 4:43:26 AM10/2/07
to jvm-la...@googlegroups.com
> I wonder how feasible it would be to define a standard AST as well?

One note along these lines is that in order to produce the Jackpot
refactoring tool, there was some coordination between the Jackpot team
and the javac team on the structure of the AST. There is, however, no
JSR for the AST (the compiler API doesn't define one, AFAIK). That
would be a Java AST, in any case. The one Jackpot uses is the one from
Sun's javac (and so is particular to that implementation).


Patrick

Geert Bevin

unread,
Oct 2, 2007, 5:34:09 AM10/2/07
to jvm-la...@googlegroups.com
>> 1) Bytecode generation: since there are already a few libraries that
>> target this area (ASM, BCEL, Jasmin) it seems ripe for
>> standardization. The trick would be getting enough time from the
>> contributors to run an expert group. But sounds like a great idea.

I'd definitely support that and would be interested in helping out.
Bytecode generation and instrumentation is become increasingly
important and for many projects out there it's almost as important as
the Java language itself. One can really see it's growing since
projects now include ASM in their own package structure to ensure
that the right version is available and that no clashes with other
versions occur.

> Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):
> http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
> It uses arrays extensively, which makes it very efficient,
> It also has many useful features for compiler writers.
>
> Jasmin is (primarily?) an assembler, which I don't think you want to
> use.

The problem with Jasmin as I see it is that it only addresses a one-
time generation of bytecode. Typically you generate bytecode
dynamically in bits and pieces, driven by other logic. Also, I don't
think it has any solution for instrumentation of existing bytecode.

> ASM from I've seen of it seems quite elegant - I rather like its
> use of the visitor pattern. However, I haven't used it myself.
>
> I'd suggest starting with ASM, but maybe adding some of the
> features and conveniences of gnu.bytecode and other toolkits.

I agree with this. I've been having very good results with ASM on the
projects that I've used it on. The visitor pattern is neat and allows
to keep the footprint low. A nice approach also is that you can chain
visitors and nicely isolate instrumentations in their own visitor
classes.

Geert

--
Geert Bevin
Terracotta - http://www.terracotta.org
Uwyn "Use what you need" - http://uwyn.com
RIFE Java application framework - http://rifers.org
Music and words - http://gbevin.com

Matt Fowles

unread,
Oct 2, 2007, 10:16:37 AM10/2/07
to jvm-la...@googlegroups.com
All~

The company that I work for would really like to be able to target a
standard AST and then let the existing javac do its work...

Matt

Charles Oliver Nutter

unread,
Oct 2, 2007, 12:18:06 PM10/2/07
to jvm-la...@googlegroups.com
Patrick Wright wrote:
> Just a few random comments:
>
> 1) Bytecode generation: since there are already a few libraries that
> target this area (ASM, BCEL, Jasmin) it seems ripe for
> standardization. The trick would be getting enough time from the
> contributors to run an expert group. But sounds like a great idea.

I'd hate to have to run anything through an expert group; more
productive in early days might be to try to et the different groups to
have a few conversations about how to combine efforts. Then once we have
something done or well on its way, JCP-type hassles would be a lot
easier to cope with.

> 2) Lightweight method instances: are you proposing a Hotspot
> optimization or a change to the JVM spec? I assume the former, since
> I'm not sure what that would look like in the spec, e.g. what exactly
> you would mandate.

This could be done entirely in the VM, if there were a way to spin up
small classes and say "please please PLEASE GC this class when it's no
longer referenced; I don't care about the characteristics of other
classloaders that keep all classes hard-referenced". Of course a VM spec
that makes it a little less painful to generate such lightweight classes
in Java code would be fine, but I'm largely greating this raw in
bytecode anyway.

> 3) It seems one generally tricky issue is how to talk about
> optimization in this context without making assumptions about which
> JVM you're running on, which I thought was something we were aiming
> for, e.g. it seems a little iffy to depend on optimizations which are
> only present in Hotspot or IBMs JVM or some other. How will your code
> run in another type of JIT, like CACAO? How do we reason about
> optimizations across JVMs? Understanding, of course, that you
> (Charles) work for Sun.

I want to know how they all work, and I would be extremely surprised if
there's not a ton of common patterns across all major JVM
implementations. I certainly don't want to target a specific JVM or JVM
revision's optimizations, but without knowing *any* good patterns we're
fumbling in the dark.

> 4) Re: bytecode generation, Jasmin looks cool, http://jasmin.sourceforge.net/
> "Jasmin is an assembler for the Java Virtual Machine. It takes ASCII
> descriptions of Java classes, written in a simple assembler-like
> syntax using the Java Virtual Machine instruction set. It converts
> them into binary Java class files, suitable for loading by a Java
> runtime system."

Jasmin is a great assembler, and there's others out there; but I want
something that I can also have imperative code-generation logic, like
looping over an AST data structure and generating code for each. Raw
assembly is close, but it doesn't allow embedding. An internal DSL, even
one that's just a friendly Java API, would be more practical.

- Charlie

ekul...@gmail.com

unread,
Oct 2, 2007, 12:32:01 PM10/2/07
to JVM Languages
Charles Oliver Nutter wrote:
> #3. Bytecode generation is a lot easier than I expected, even with
> verification errors and the like. However, there should be a bytecode
> generation library as part of Java proper, along the lines of ASM.
>
> I don't like that we have to ship ASM. This seems like something that
> should just "be there" already. CLR has a library for generating IL as
> part of the core library, and so should the JVM.

It sounds like we need a new JSR for that :-)

> I'd also like to see some attention paid to the usability of raw
> bytecode generation libraries. For example, the following code:
>
> mv.visitVarInsn(ALOAD, 0);
> mv.visitFieldInsn(GETFIELD, mnamePath, "$scriptObject",
> cg.ci(Object.class));
> mv.visitTypeInsn(CHECKCAST, cg.p(type));
> mv.visitVarInsn(ALOAD, 1);
> mv.visitVarInsn(ALOAD, 2);
> mv.visitVarInsn(ALOAD, 3);
> mv.visitMethodInsn(INVOKEVIRTUAL, typePath, method, cg.sig(
> RubyKernel.IRUBY_OBJECT, cg.params(ThreadContext.class,
> RubyKernel.IRUBY_OBJECT, IRubyObject[].class)));
> mv.visitInsn(ARETURN);
>
> The "cg" object basically takes a type and returns various structures
> for ASM; I know there's a Type class that does something similar already
> in ASM.

Any particular reason for not using Type class from ASM directly?

> However, by using a wrapper, the above code becomes the following:
>
> mv.aload(0);
> mv.getfield(mnamePath, "$scriptObject", cg.ci(Object.class));
> mv.checkcast(cg.p(type));
> mv.aload(1);
> mv.aload(2);
> mv.aload(3);
> mv.invokevirtual(typepath, method, ...)
> mv.areturn();
>
> And yes, there's a few utility classes in ASM that provide a similar
> interface, but unfortunately they also start to introduce more Javaisms
> and hide more of the actual generation process.

I guess we owe an explanation for those verbose mv.visit*Insn()
methods. :-)
This is basically a class size optimization merged with caller-side
dispatch. The latter provides grouping for opcode instructions, based
on their parameter types (in other words how those instructions are
represented in the bytecode) and it helps to improve performance
because ASM don't need to do that dispatching. Such tricks allows to
keep ASM core jar under 42k.

As Charlie mentioned, there is a GeneratorAdapter that provides very
similar API as in the second code snippet above, but I'd like to hear
more about those Javaisms. Charlie, can you please elaborate bit more?

regards,
Eugene

ekul...@gmail.com

unread,
Oct 2, 2007, 12:35:56 PM10/2/07
to JVM Languages
On Oct 2, 2:23 am, Per Bothner <p...@bothner.com> wrote:

> Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
> It uses arrays extensively, which makes it very efficient,
> It also has many useful features for compiler writers.

Per, I think I already bugged you about this while ago, but can you
please remind what features from gnu.bytecode you are referring to?

BTW, ASM is also extensively using arrays for internal structures as
well as minimize String conversion overhead.

regards,
Eugene


Attila Szegedi

unread,
Oct 3, 2007, 9:29:08 AM10/3/07
to jvm-la...@googlegroups.com
Not referenced as in, unreachable? You use java.lang.ref.PhantomReference
for that.

Attila.

On Tue, 02 Oct 2007 05:38:55 +0200, John Cowan <johnw...@gmail.com>
wrote:

Attila Szegedi

unread,
Oct 3, 2007, 10:03:46 AM10/3/07
to jvm-la...@googlegroups.com
Yeah, Rhino suffers from this too - every function gets compiled to a
separate Java class implementing org.mozilla.javascript.Function, and if
you want its instances to be separately garbage collectible (which you
want, since they're just ordinary objects in our languages), you need to
load each one of them through its own class loader, since class lifecycle
is tied to its class loader.

The whole business of classes not being collectibe until their loader gets
collected has to do with guaranteeing single execution of the class'
static initializers. If we could get Sun to change the JVM spec so it'd
allow "loose class loaders" that allow their loaded classes to get GCed,
but can only load classes that don't have <clinit>() I think that'd help.

(I'm still thinking in terms of classes, as I don't think we could get Sun
to accept anything smaller than a class as the smallest unit of executable
content that can be loaded into JVM.)

Alternatively, you can go for all sorts of (admittedly artificial)
lumping. I.e. you could use a single class loader per compilation unit
when you know that you're compiling multiple functions from a same source
(i.e. single .rb file). You get less class loaders, but as a tradeoff all
function classes generated from the compilation unit become collectible
only as a group. Precompilation is another alternative -- Rhino has a
command-line compiler (jsc) that produces .class files. If you bundle them
in a JAR, they'll obviously all get loaded through whichever class loader
reads the JAR file.

Attila.

Charles Oliver Nutter

unread,
Oct 3, 2007, 10:08:22 AM10/3/07
to jvm-la...@googlegroups.com
Attila Szegedi wrote:
> Not referenced as in, unreachable? You use java.lang.ref.PhantomReference
> for that.

As in using Weak or Soft references to hold the classes; If some type of
classloader had some kind of weak reference to the classes it contains,
it would allow spinning up just the class rather than an entire
garbage-collectable classloader to contain it (as is the case right now
in JRuby).

- Charlie

Matt Fowles

unread,
Oct 3, 2007, 10:15:04 AM10/3/07
to jvm-la...@googlegroups.com
All~

Could you write your own class loader that uses PhatomReferences to
allow Classes to be GCed?

Matt

Charles Oliver Nutter

unread,
Oct 3, 2007, 11:09:29 AM10/3/07
to jvm-la...@googlegroups.com
Attila Szegedi wrote:
> The whole business of classes not being collectibe until their loader gets
> collected has to do with guaranteeing single execution of the class'
> static initializers. If we could get Sun to change the JVM spec so it'd
> allow "loose class loaders" that allow their loaded classes to get GCed,
> but can only load classes that don't have <clinit>() I think that'd help.

I think the counterpoint to that requirement would be "I don't care".
I'm using <clinit> in my compiled classes to initialize things like Ruby
string objects, Ruby stack trace positioning information and so on. I
really would be happy as a clam if <clinit> ran again...and of course
the assertion that <clinit> is only going to run once is impossible to
guarantee anyway, since people in our case end up loading the same
classes in separate classloaders anyway. Perhaps people are doing things
with side effects in <clinit>...that's their problem, not mine. It seems
quite unfortunate to have an artificial limitation on class GC to
support a few bad designs. It's time to Free the Class!

> (I'm still thinking in terms of classes, as I don't think we could get Sun
> to accept anything smaller than a class as the smallest unit of executable
> content that can be loaded into JVM.)

A class would be fine, with two caveats:

- GC-able without 1:1 classloaders
- easy to generate really light classes that, for example, will only
ever contain one method and (maybe) no <clinit>.

If classes can be GC-able and super lightweight...I have no concerns.
And both seem to be implementation details we ought to be able to solve.

> Alternatively, you can go for all sorts of (admittedly artificial)
> lumping. I.e. you could use a single class loader per compilation unit
> when you know that you're compiling multiple functions from a same source
> (i.e. single .rb file). You get less class loaders, but as a tradeoff all
> function classes generated from the compilation unit become collectible
> only as a group. Precompilation is another alternative -- Rhino has a
> command-line compiler (jsc) that produces .class files. If you bundle them
> in a JAR, they'll obviously all get loaded through whichever class loader
> reads the JAR file.

Yes, I'm considering various chunking mechanisms. The tricky bit for me
is that when a Ruby script starts executing, none of its Ruby classes
exist, none of the methods defined therein are bound. And it's binding
classes at runtime that end up chewing up class+classloader, one per
method in the system, so I can have a "Callable" object that goes
straight into the precompiled method without reflection. And of course
there's methods defined in 'eval', which will not be compiled
immediately but may JIT later.

So there may be a way to get chunking working, but as you say it's very
artificial and potentially very complicated to do in the extremely
late-bound Ruby.

- Charlie

Charles Oliver Nutter

unread,
Oct 3, 2007, 11:11:33 AM10/3/07
to jvm-la...@googlegroups.com
Matt Fowles wrote:
> All~
>
> Could you write your own class loader that uses PhatomReferences to
> allow Classes to be GCed?

Some classloader, somewhere in the classloader hierarchy, eventually has
to define what Java considers a "class". There's no way to write a
classloader that, for example, does the definition of class data in
memory all by itself. And this is where the class leak
happens..."somewhere in there" there's a hast list of classes in what
the JVM sees as a classloader, and never shall the twain be parted.

- Charlie

John Wilson

unread,
Oct 3, 2007, 12:20:41 PM10/3/07
to jvm-la...@googlegroups.com

I'm looking at a way of minimising the classes generated for closures
and I'm thinking of compiling the closure body as a synthetic static
method in the enclosing class. The Closure object would then be an
instance of a generic closure class which dispatches to the static
method via reflection.

It occurs to me that this could be generalised to allow the generation
of lightweight method objects. If we had a way of dynamically adding
static methods to some utility class returning an instance of
java.reflect.method then these cold be used as lightweigth method
objects.

Imagine a class java.util.DynaHome with a single method Method
makeMethod(byte[]). Calling that method with some bytecode would add
the static method to java.util.DynaHome. It would have an arbitrary
unique name and an instance of method would be returned which allows
the method to be called. When the instance of metod is GCd the method
is removed from java.util.DynaHome.

I have absolutly no idea how feasible this is but I think it, or
soemthing like it would be pretty useful.

John Wilson

Randall R Schulz

unread,
Oct 3, 2007, 12:37:03 PM10/3/07
to jvm-la...@googlegroups.com
On Wednesday 03 October 2007 09:20, John Wilson wrote:
> ...

>
> I'm looking at a way of minimising the classes generated for closures
> and I'm thinking of compiling the closure body as a synthetic static
> method in the enclosing class. The Closure object would then be an
> instance of a generic closure class which dispatches to the static
> method via reflection.

Again, (if you have not already) I recommend you open a discussion or
bring into this one Martin Odersky and / or other principals from the
Scala project. From my rather limited and naive understanding, they are
dealing with some of the very same issues. I'm pretty sure Scala has
the issue of large numbers of classes representing closures.

In addition to cross-fertilizing implementation ideas, it seems that by
combining your voices you might have better luck prevailing upon those
that determine the JVM specifications to accommodate your needs.


> ...
>
> John Wilson


Randall Schulz

Per Bothner

unread,
Oct 3, 2007, 12:39:05 PM10/3/07
to jvm-la...@googlegroups.com
Matt Fowles wrote:
> The company that I work for would really like to be able to target a
> standard AST and then let the existing javac do its work...

The JavaFX Script compiler targets an extension of the javac AST.
The extended AST then gets translated to a pure javac AST (except for
an extension for "BlockExpression", which I hope we can get
merged into javac proper).

The javac AST is very focused on Java. For example it strictly
separates expressions and statements, which is not ideal for an
expression language, such as JavaFX Script (or Ruby).

Maybe after the JavaFX Script compiler stabilizes, and after
both have been moved to Mercurial, we can meet with the javac
compiler groups and discuss ideas for changes. It's hard to
generalize from one language, but with two languages we can
make a start. This is complicated by the legal requirement
that javac compiles Java and nothing but Java, so any modified
or extended features have to remain inaccessible when the
compiler is invoked as javac.

David Pollak

unread,
Oct 3, 2007, 12:49:25 PM10/3/07
to jvm-la...@googlegroups.com
Randall,

Some members of the Scala team are on this list.  I forward the key discussions to Martin, Lex, etc.  I know they are aware of the discussions and have from time to time chimed in.

Thanks,

David
--
lift, the secure, simple, powerful web framework
http://liftweb.net

Per Bothner

unread,
Oct 3, 2007, 9:33:05 PM10/3/07
to jvm-la...@googlegroups.com
ekul...@gmail.com wrote:
> On Oct 2, 2:23 am, Per Bothner <p...@bothner.com> wrote:
>
>> Don't forget gnu.bytecode, which I wrote for Kawa (starting in 1996):http://www.gnu.org/software/kawa/api/gnu/bytecode/package-summary.html
>> It uses arrays extensively, which makes it very efficient,
>> It also has many useful features for compiler writers.
>
> Per, I think I already bugged you about this while ago, but can you
> please remind what features from gnu.bytecode you are referring to?

Some of these are no-brainers, and ASM probably has them too.
But here are some convenience features that I find useful:

* Help in selecting the right load instruction: For example emitPushInt
takes an arbitrary int, and selects between iconst_m1 .. iconst_5,
bipush, sipush, or ldc/ldc_w with automatic constant pool entry
creation.

* Similarly, emitPushString(String) handles Strings longer than 2^16
chars by automatically splitting it into smaller segments and
concatenating them.

* Help in constant pool management - does not create duplicate
constant pool entries.

* Automatically generates code to initialize a primitive array from
a compile-time constant array.

* Emits line numbers and LocalVariableTable attributes.

* Emits JSR-45 "SourceDebugExtension" attribute.

* Semi-automatic local variable management.

* Handles try-catch-finally *expressions*, which can push a result
(in both try and catch clauses) onto JVM stack. Also saves and
restores any values already on the JVM stack.

* Support for simple self-tail-calls (jump to "beginning" of
function, which could be an inlined function).

* Help for emitting/managing switch statements, automatically
selecting the "best" instruction.

* Fixups and fragments: You can emit code into a bytecode fragment",
and then re-arrange the fragments. This is useful for avoiding needless
jumps. (This probably doesn't matter for Hotspot, but it reduces code
size and might help for simpler VMs.) Re-ordering fragments is
non-trivial, because the size of (say) a switch instruction depends on
alignment. Kawa also automatically generates "long jumps" - when the
jump offset becomes more than will fit into a signed 16-bit offset,
Kawa will generate long 32-bit jumps, with trampolines as needed.
(This is not really tested, and it has limited usefulness given that
method-bodies are in practice limited to 2^16-1 bytes, but at least
Kawa will allow jumps that are larger than 2^15, which effectively
doubles the maximum method size.)

Randall R Schulz

unread,
Oct 3, 2007, 2:53:01 PM10/3/07
to jvm-la...@googlegroups.com
On Wednesday 03 October 2007 09:49, David Pollak wrote:
> Randall,
>
> Some members of the Scala team are on this list. I forward the key
> discussions to Martin, Lex, etc. I know they are aware of the
> discussions and have from time to time chimed in.

Great. I guess that comes under "if you have not already."


> Thanks,
>
> David


RRS

Charles Oliver Nutter

unread,
Oct 4, 2007, 10:08:36 AM10/4/07
to jvm-la...@googlegroups.com
Per Bothner wrote:
> Maybe after the JavaFX Script compiler stabilizes, and after
> both have been moved to Mercurial, we can meet with the javac
> compiler groups and discuss ideas for changes. It's hard to
> generalize from one language, but with two languages we can
> make a start. This is complicated by the legal requirement
> that javac compiles Java and nothing but Java, so any modified
> or extended features have to remain inaccessible when the
> compiler is invoked as javac.

That would be most welcome. Having another dynamic language (say, Ruby?)
represented in that discussion would be very helpful as well, since
there are some unique complexities specific to dynamic languages
compiling on the JVM.

- Charlie

Charles Oliver Nutter

unread,
Oct 4, 2007, 10:43:21 AM10/4/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> I'm looking at a way of minimising the classes generated for closures
> and I'm thinking of compiling the closure body as a synthetic static
> method in the enclosing class. The Closure object would then be an
> instance of a generic closure class which dispatches to the static
> method via reflection.

This is how JRuby compiled closures. I will do a review of the JRuby
compiler design below.

> It occurs to me that this could be generalised to allow the generation
> of lightweight method objects. If we had a way of dynamically adding
> static methods to some utility class returning an instance of
> java.reflect.method then these cold be used as lightweigth method
> objects.
>
> Imagine a class java.util.DynaHome with a single method Method
> makeMethod(byte[]). Calling that method with some bytecode would add
> the static method to java.util.DynaHome. It would have an arbitrary
> unique name and an instance of method would be returned which allows
> the method to be called. When the instance of metod is GCd the method
> is removed from java.util.DynaHome.
>
> I have absolutly no idea how feasible this is but I think it, or
> soemthing like it would be pretty useful.

I suppose it would be just fine if it were possible to add methods to
anything at all. Lacking that...

So, JRuby compiler design 101.

JRuby compiles Ruby code to Java bytecode. Once complete, there's no
interpretation done, except for eval calls. evaluated code never gets
compiled; however, if the eval defines a method that's called enough, it
will also eventually get JIT compiled to bytecode. JRuby is a mixed-mode
engine.

Given a single input .rb file, JRuby produces a single output .class
file. This was a key design goal I wanted for the compiler; other
languages (including Groovy) and other Ruby implementations (including
XRuby) produce numerous classes from an input file; in some cases,
dozens and dozens of classes if the input file is very large and
complex. JRuby produces one .class file.

JRuby compiles from the same AST it interprets from. There is a first
pass over the AST before compilation to determine certain runtime
characteristics:

- does a method have closures in it?
- does a method have calls to eval or other scope and frame-aware methods?
- does a method have class definitions in it?
- does a method define other methods?
- .... and so on

Based on this pass, we determine scoping characteristics of all code in
the method, selectively choosing pure heap-based variables or pure
stack-based variables. Only methods and leaf closures without eval,
closures, etc can use normal stack-based local variables. Performance is
significantly faster with stack variables.

The resulting class file from JRuby contains at a minimum methods to start:

- a normal main() method for running from the command line (grabs a
default JRuby runtime and launches itself)
- a load() instance method that represents a normal top-level loading of
the script into a runtime. This performs pre/post script setup and teardown.
- a run() instance method that represents a bare execution of the
script's contents. This is used by the JIT, where setup/teardown is
handled outside the JITed code on a method-by-method basis
- a __file__() method that represents the body of the script. This is
where script execution eventually starts.

Then, depending on the contents of the file, additional methods are added:

- normal method definition bodies become Java methods
- class/module bodies become Java methods
- closure bodies become Java methods
- rescue/ensure bodies become synthetic methods
- if the normal top-level script method is too long, it's split every
500 top-level syntactic elements and chained (we did run into one large
flat file that broke the method size limit). We do not yet perform
chaining on normal method bodies, because we have not encountered any
that are too large.

Of these, only class bodies, rescue/ensure bodies, and chained top-level
script methods get directly invoked during script execution. The others
are bound into the MOP at runtime.

Binding occurs in one of two ways:

- by generating a small stub class that implements DynamicMethod and
invokes the target method on the target script directly
- by doing the same with reflection (broken now due to lack of use; will
be fixed for 1.1)

In our testing, generating stub "invoker" classes has always been faster
than reflection, especially on older JVMs. For the time being, that's
the preferred way to bind methods, but I'm going to get reflection-based
binding working again for limited/restricted environments like applets.
With reflection-based binding and pre-compiled Ruby code with no evals,
JIT compilation could be completely turned off and no classes would ever
be generated in memory by JRuby.

So then here's a walkthrough of a simple script:

# we enter into the script body in the __file__ method
# require would first look for .rb files, then try to load .class
require 'foo'

# normal code in the method body
puts 'here we go'

# upon encountering a method def, a new method is started in the class
def bar
# this is a simple method body, and would use stack-based vars
puts 'hello'
end
# once the method has been compiled, binding code is added to __file__

# class definitions become methods as well, building the class
class MyClass
# this is code in the body of the class
puts 'here'

# a method in the class is compiled like any other method body
def something(a, b = 2, *c, &block)
# this method has all four param types:
# normal, optional, "rest" or varargs, and block argument
# the compiler generates code to assign these from an incoming
# IRubyObject[]

# this method has a closure, so it would use heap-based vars
# ... but the closure would use stack vars, since it's a simple leaf
1.times { puts 'in closure' }
end
# method is completed, bound into the class we're building
end
# end of class definition; __file__ code invokes the class body directly

# any begin block or method body with a rescue/ensure attached will
# be compiled as a synthetic method. This also necessarily means that
# method bodies containing rescue/ensure must be heap-based.
begin
puts 'rescue me'
rescue
puts 'rescued!'
ensure
puts 'ensured!'
end

A sample run of the JRuby compiler:

~/NetBeansProjects/jruby $ jruby sample_script.rb
here we go
here
rescue me
ensured!

~/NetBeansProjects/jruby $ jrubyc sample_script.rb
Compiling file "sample_script.rb" as class "sample_script"

~/NetBeansProjects/jruby $ ls -l sample_script.*
-rw-r--r-- 1 headius headius 8396 Oct 4 09:38 sample_script.class
-rw-r--r-- 1 headius headius 1449 Oct 4 09:38 sample_script.rb

~/NetBeansProjects/jruby $ export
CLASSPATH=lib/jruby.jar:lib/asm-3.0.jar:lib/jna.jar:.

~/NetBeansProjects/jruby $ java sample_script
here we go
here
rescue me
ensured!

The resulting .class file is attached for your enjoyment!

Shall I continue? I can discuss the inline cache, the call adapters we
generate for dynamic dispatch, the fast switch-based dispatcher, how the
JIT and interpreter work together, or any other details anyone would like.

- Charlie

sample_script.class

Dudley Flanders

unread,
Oct 4, 2007, 12:59:21 PM10/4/07
to jvm-la...@googlegroups.com

On Oct 4, 2007, at 9:43 AM, Charles Oliver Nutter wrote:
> Shall I continue? I can discuss the inline cache, the call adapters we
> generate for dynamic dispatch, the fast switch-based dispatcher,
> how the
> JIT and interpreter work together, or any other details anyone
> would like.

Please do continue. I'd like to know more about all of the above, so
anything you feel like explaining would be awesome.

:dudley

Daniel Green

unread,
Oct 4, 2007, 1:08:45 PM10/4/07
to jvm-la...@googlegroups.com
> > Shall I continue? I can discuss the inline cache, the call adapters we
> > generate for dynamic dispatch, the fast switch-based dispatcher,
> > how the
> > JIT and interpreter work together, or any other details anyone
> > would like.
>
> Please do continue. I'd like to know more about all of the above, so
> anything you feel like explaining would be awesome.
>

Hehe, yes, awesome indeed! This would make a very interesting article,
actually, although I bet this is probably already written up in a spec
somewhere.

Charles Oliver Nutter

unread,
Oct 4, 2007, 1:30:46 PM10/4/07
to jvm-la...@googlegroups.com
Daniel Green wrote:
>>> Shall I continue? I can discuss the inline cache, the call adapters we
>>> generate for dynamic dispatch, the fast switch-based dispatcher,
>>> how the
>>> JIT and interpreter work together, or any other details anyone
>>> would like.
>> Please do continue. I'd like to know more about all of the above, so
>> anything you feel like explaining would be awesome.
>>
>
> Hehe, yes, awesome indeed! This would make a very interesting article,
> actually, although I bet this is probably already written up in a spec
> somewhere.

You'd think that, wouldn't you. This is all just from my brain
though...I'm making this whole "JRuby compiler" thing up as I go.

A spec would be nice.

- Charlie

hlovatt

unread,
Oct 4, 2007, 4:54:42 PM10/4/07
to JVM Languages
I haven't tried this, so I cannot speak from experiance - this is just
a thought.

To enable class redefinitions why not run the JVM that is running
JRuby in debugging mode and hot swap the class. See:

http://java.sun.com/j2se/1.4.2/docs/guide/jpda/jvmdi-spec.html#RedefineClasses

This way new methods/closures can be added/removed at runtime.

Brian Goetz

unread,
Oct 4, 2007, 5:28:11 PM10/4/07
to jvm-la...@googlegroups.com
Finalizers have serious performance costs that would probably be
counterproductive here.

John Cowan wrote:
> On 10/1/07, Charles Oliver Nutter <charles...@sun.com> wrote:
>

Patrick Wright

unread,
Oct 5, 2007, 1:02:34 AM10/5/07
to jvm-la...@googlegroups.com
On 10/4/07, hlovatt <howard...@gmail.com> wrote:
>
> I haven't tried this, so I cannot speak from experiance - this is just
> a thought.
>
> To enable class redefinitions why not run the JVM that is running
> JRuby in debugging mode and hot swap the class. See:

AFAIK, it's VM-dependent how far it's supported. In the Sun VM, you
can swap a class if the method body is changed, but if method
signatures are changed, or fields are added or removed, the change is
rejected. I believe some version of the IBM VM allows for more
flexibility; I don't know where the others stack up. It's one feature
people have asked for more flexibility on, but apparently it's a tough
one to implement correctly. There was at least one research project at
Sun, or sponsored by Sun, a few years back on the topic. As I
understood it, the general problem is what you do with other class
instances that have references to the class you're swapping, in case
the swap puts them in some new, possibly inconsistent state as far as
those references are concerned.

A more complete "class hotswapping" feature for "dynamic" languages
has been suggested in the dynamic-language-support JSR, but I haven't
heard yet whether they will commit to it.


Patrick

Attila Szegedi

unread,
Oct 5, 2007, 4:18:33 AM10/5/07
to jvm-la...@googlegroups.com
No can do. java.lang.ClassLoader has a field named "Vector classes"
defined as:

// The classes loaded by this class loader. The only purpose of this
table
// is to keep the classes from being GC'ed until the loader is GC'ed.
private Vector classes = new Vector();

It also has a method:

void addClass(Class c) {
classes.addElement(c);
}

that is invoked by *native* JVM code to add a class to the class loader
when it is loaded. It being package-private method, it can't be overridden
outside the package, and you also can't define a new class within
java.lang.* package since it is sealed. Oh, and it is also pretty much
specific to Sun JRE too :-)

Mind you, it *might* be possible to do horrible hackery, like retrieving
the "classes" field through reflection, with Field.setAccessible(false),
and then clearing it from time to time. Needless to say, it would *not*
work in any sane enterprise scenario when JVM runs under a security
manager.

The other way to get around this would be to run java with a tampered
rt.jar in which ClassLoader.addClass() is modified to be a no-op.

Again, all these solutions either require you to run without a security
manager, or assume a Sun JRE, or both.

However, also notice that the "classes" field is only used for the sole
purpose of preventing GC of the class. I.e. it is not used by
findLoadedClass, which actually delegates to native findLoadedClass0()
method, hinting that there's further association of loaded class to its
class loader somewhere in JVMs native guts. Meaning it might not be a good
idea to try to pull the rug out of it, as it could well be assuming that
the classes can't be GCed until the loader can be.

You can say I gave some thought to this idea already :-)

Attila.

On Wed, 03 Oct 2007 16:15:04 +0200, Matt Fowles <matt....@gmail.com>
wrote:

--
home: http://www.szegedi.org
weblog: http://constc.blogspot.com
Visit Szegedi Butterfly fractals at:
http://www.szegedi.org/fractals/butterfly/index.html

Attila Szegedi

unread,
Oct 5, 2007, 4:35:20 AM10/5/07
to jvm-la...@googlegroups.com
On Wed, 03 Oct 2007 17:09:29 +0200, Charles Oliver Nutter
<charles...@sun.com> wrote:

>
> Attila Szegedi wrote:
>
> I think the counterpoint to that requirement would be "I don't care".
> I'm using <clinit> in my compiled classes to initialize things like Ruby
> string objects, Ruby stack trace positioning information and so on. I
> really would be happy as a clam if <clinit> ran again...and of course
> the assertion that <clinit> is only going to run once is impossible to
> guarantee anyway, since people in our case end up loading the same
> classes in separate classloaders anyway.

Those end up being two separate classes then -- you'd get a
ClassCastException if you tried to cast an instance of one into the other.
Class identity is always class loader scoped, so you still get one
guaranteed <clinit> per class, 'cause as far as JVM is concerned, loading
the same bytecode in two class loaders results in two different classes.

> Perhaps people are doing things
> with side effects in <clinit>...that's their problem, not mine. It seems
> quite unfortunate to have an artificial limitation on class GC to
> support a few bad designs. It's time to Free the Class!
>

This actually sounds like it'd make sense to add a new class annotation in
next Java release, i.e. @Disposable. JVM would not create strong
references from the class with this annotation to the class loader. (It
might actually want to create a soft reference, though, to minimize the
number of reloads.)

I'm not sure whether this'd could actually fly -- I myself don't see why
not, but maybe some JVM folks with more insight on this list could tell us
if there'd be some dire consequence to having this feature?

Attila.

Charles Oliver Nutter

unread,
Oct 5, 2007, 5:26:11 AM10/5/07
to jvm-la...@googlegroups.com
Attila Szegedi wrote:
> Those end up being two separate classes then -- you'd get a
> ClassCastException if you tried to cast an instance of one into the other.
> Class identity is always class loader scoped, so you still get one
> guaranteed <clinit> per class, 'cause as far as JVM is concerned, loading
> the same bytecode in two class loaders results in two different classes.

Fine by me :) I don't want them to be the same anyway. They'll all
implement the same interface from a higher classloader, so problem solved.

> This actually sounds like it'd make sense to add a new class annotation in
> next Java release, i.e. @Disposable. JVM would not create strong
> references from the class with this annotation to the class loader. (It
> might actually want to create a soft reference, though, to minimize the
> number of reloads.)
>
> I'm not sure whether this'd could actually fly -- I myself don't see why
> not, but maybe some JVM folks with more insight on this list could tell us
> if there'd be some dire consequence to having this feature?

I remember either John Rose or Kenneth Russell claiming it ought to be
possible, but I don't know if it went any further than that...

- Charlie

Rémi Forax

unread,
Oct 5, 2007, 10:10:27 AM10/5/07
to jvm-la...@googlegroups.com
Attila Szegedi a écrit :

> No can do. java.lang.ClassLoader has a field named "Vector classes"
> defined as:
>
> // The classes loaded by this class loader. The only purpose of this
> table
> // is to keep the classes from being GC'ed until the loader is GC'ed.
> private Vector classes = new Vector();
>
> It also has a method:
>
> void addClass(Class c) {
> classes.addElement(c);
> }
>
> that is invoked by *native* JVM code to add a class to the class loader
> when it is loaded. It being package-private method, it can't be overridden
> outside the package, and you also can't define a new class within
> java.lang.* package since it is sealed. Oh, and it is also pretty much
> specific to Sun JRE too :-)
>
> Mind you, it *might* be possible to do horrible hackery, like retrieving
> the "classes" field through reflection, with Field.setAccessible(false),
> and then clearing it from time to time. Needless to say, it would *not*
> work in any sane enterprise scenario when JVM runs under a security
> manager.
>
The last time, i've tried a similar trick, the VM crash :)

> The other way to get around this would be to run java with a tampered
> rt.jar in which ClassLoader.addClass() is modified to be a no-op.
>
> Again, all these solutions either require you to run without a security
> manager, or assume a Sun JRE, or both.
>
> However, also notice that the "classes" field is only used for the sole
> purpose of preventing GC of the class. I.e. it is not used by
> findLoadedClass, which actually delegates to native findLoadedClass0()
> method, hinting that there's further association of loaded class to its
> class loader somewhere in JVMs native guts. Meaning it might not be a good
> idea to try to pull the rug out of it, as it could well be assuming that
> the classes can't be GCed until the loader can be.
>
> You can say I gave some thought to this idea already :-)
>
> Attila.
>
Rémi

Matt Fowles

unread,
Oct 5, 2007, 10:26:29 AM10/5/07
to jvm-la...@googlegroups.com
Attila~

Thanks for the detailed response, that definitely clears the picture
up for me a lot.

Matt

hlovatt

unread,
Oct 7, 2007, 4:46:51 PM10/7/07
to JVM Languages
> AFAIK, it's VM-dependent how far it's supported. In the Sun VM, you
> can swap a class if the method body is changed, but if method
> signatures are changed, or fields are added or removed, the change is
> rejected.

This is probably sufficient, imagine each JRuby class including
JRuby's Object class inheriting from:

abstract class AbstractJRObject {
List< JRObject > addedFields = null; // Used to store references to
fields added at runtime

JRObject addedMethods( int methodID, JRObject... args ) {
// call missing_method and add new method if that is what
missing_method does
// return the value from the new method
}

// other methods from JRuby's Object class that are known at compile
time
}

Then in a JRuby's Object class you initially have:

class JRObject extends AbstractJRObject {}

But when at runtime a method with id 123, say, is added JRObject is
replaced with:

class JRObject extends AbstractJRObject {
JRObject addedMethods( int methodID, JRObject... args ) {
switch ( methodID ) {
case 123:
// body of method
break;
}
return super.addedMethod( methodID, args );
}
}

Methods known at compile time are called via object.name( ... ), but
methods unknown at compile time are called via
object.addedMethod( methodID, ... )

Matt Fowles

unread,
Oct 8, 2007, 10:59:01 PM10/8/07
to jvm-la...@googlegroups.com
Howard~

How does this approach handle redefinition of a compile time method?
What about removal of one?

Matt

hlovatt

unread,
Oct 9, 2007, 5:13:46 PM10/9/07
to JVM Languages
@Matt,

You can redefine compile time methods, e.g. if a method is removed
then provided that the method is in the super class the method body is
replaced with:

return super.name( ... )

A similar technique to the one above is used if the removed method is
in a mix-in. If the removed method isn't in the super class or a mix-
in then:

return addedMethod( methodID, ... )

Howard.

Matt Fowles

unread,
Oct 9, 2007, 5:55:25 PM10/9/07
to jvm-la...@googlegroups.com
Howard~

I am not certain I follow your explanation, so I will try to rephrase
my question.

If a method is known at compile time (and thus called by
object.name(...)). How can that method be replaced (or removed) at
run time, such that previously compiled code still executes the new
version.

After thinking about it a bit, I suppose that the compiler can insert
a check at the beginning of all the statically compiled methods to
check if it has been removed or replaced and delegate to the dynamic
dispatch mechanism in that event.

Matt

hlovatt

unread,
Oct 10, 2007, 1:49:44 AM10/10/07
to JVM Languages
@Matt,

Sorry my explanation wasn't that clear, suppose you have:

class A extends JRObject {
JRObject m() { ... }
}

class B extends A {
@Override JRObject m() { ... }
}

Then you remove B.m, in which case you replace B with:

class B extends A {
@Override JRObject m() { return super.m(); } // m is declared at
compile time in super class, A
}

And if you then remove A.m you replace A with:

class A extends JRObject {
JRObject m() { return addedMethod( <methodID for m()>, null ); } //
m isn't in A's superclass
}

Hope that is a better explanation,

Howard.

On Oct 10, 7:55 am, "Matt Fowles" <matt.fow...@gmail.com> wrote:
> Howard~
>
> I am not certain I follow your explanation, so I will try to rephrase
> my question.
>
> If a method is known at compile time (and thus called by
> object.name(...)). How can that method be replaced (or removed) at
> run time, such that previously compiled code still executes the new
> version.
>
> After thinking about it a bit, I suppose that the compiler can insert
> a check at the beginning of all the statically compiled methods to
> check if it has been removed or replaced and delegate to the dynamic
> dispatch mechanism in that event.
>
> Matt
>

Matt Fowles

unread,
Oct 10, 2007, 10:37:46 AM10/10/07
to jvm-la...@googlegroups.com
Howard~

That makes sense assuming you know how to 'replace B'. That is the
step I am unclear on. I thought that the JVM did not generally allow
you to replace a class with a new definition of it (although I could
be mistaken).

One could insert some amount of magic checking at the beginning of
methods to handle this case:

class A extends JRObject {
JRObject m() {

if(<m was removed or replaced>) {


return addedMethod( <methodID for m()>, null );
}

...
}
}

class B extends A {
@Override JRObject m() {

if(<m was removed or replaced>) {


return addedMethod( <methodID for m()>, null );
}

...
}
}

Matt

hlovatt

unread,
Oct 11, 2007, 1:31:14 AM10/11/07
to JVM Languages
Matt,

> That makes sense assuming you know how to 'replace B'. That is the
> step I am unclear on. I thought that the JVM did not generally allow
> you to replace a class with a new definition of it (although I could
> be mistaken).

In debug mode some JVMs allow classes and/or methods to be replaced.
Sometimes the class can only be replaced with a class that has the
same signature, e.g. Sun's JVM.

> One could insert some amount of magic checking at the beginning of
> methods to handle this case:

I am far from a Ruby expert, but my reading of remove_method is that
it removes the method from the class but not from super classes.
Therefore if the method that is removed is in a super class at compile
time then you should call the super class version, i.e. replace method
body with:

return super.name( ... );

You have to treat Mix-ins like super classes. If the method isn't in
the super class or in a Mix-in at compile time then you call
added_method, in case the method has been added since compilation and/
or the method is to be added back into the class.

Therefore I don't think you need to keep track of which methods have
been added or not, you are either calling a super method or
added_method.

Cheers,

Howard.

hlovatt

unread,
Oct 11, 2007, 6:02:25 AM10/11/07
to JVM Languages
Matt,

> That makes sense assuming you know how to 'replace B'. That is the
> step I am unclear on. I thought that the JVM did not generally allow
> you to replace a class with a new definition of it (although I could
> be mistaken).

In debug mode some JVMs allow classes and/or methods to be replaced.


Sometimes the class can only be replaced with a class that has the
same signature, e.g. Sun's JVM.

> One could insert some amount of magic checking at the beginning of


> methods to handle this case:

I am far from a Ruby expert, but my reading of remove_method is that

Charles Oliver Nutter

unread,
Oct 2, 2007, 12:22:59 PM10/2/07
to jvm-la...@googlegroups.com
John Wilson wrote:
> I wonder how feasible it would be to define a standard AST as well?
> I'm thinking in terms of a set of interfaces/abstract classes which
> represent a class. Each language implementation would
> implement/subclass these to produce a concrete AST (a rather confusing
> term I'm afraid).

Unfortunately, this ignores one key point: my Ruby-compiled code, for
example, would never be representable in anything resembling a Java AST.
In fact, it's practically impossible to decompile it even now, and I
suspect we're going to see that's the case in many other compiled
non-Java languages on the JVM. Microsoft's DLR has made attempts to have
a common AST structure (or as they call it, an abstract semantic tree),
but from what I understand only IronPython is using it, with IronRuby
forced to use the AST coming out of a YACC/Bison-based parser. A common
AST would be impressive, but I think it's probably almost impossible to
find something we call can use to represent our languages.

Of course I'm also a little biased at this point; I've sadly gotten to
the point where I can emit sequences of 25-50 raw bytecodes and have
them work the first time, so I'm starting to lose touch with a
non-bytecode way of life :)

- Charlie

Matt Fowles

unread,
Nov 2, 2007, 1:51:28 PM11/2/07
to jvm-la...@googlegroups.com
Charles~

I think such an AST would be both useful and applicable. For example,
the compiler that my company writes can emit java code which we can
then send to javac. The serialization followed by parse of that code
represents a non-trivial amount of our start up time. If we could go
straight from our AST to a Java AST (which I imagine are not too
different), we could save a lot of IO.

Also, an AST with the rought semantics of Java Bytecode (but without
arbitrary limits), would be a reasonable candidate as well.

Matt

Reply all
Reply to author
Forward
0 new messages