Ok...I'm firing a few shots across the bow of the Java machine here, and I'd love to hear all input. But I wanted to do a retrospective of a number of things I've learned during the implementation of the JRuby compiler.
#1. We need lightweight method objects
In JRuby, where methods can be defined and discarded on a whim, we have to be able to generate lightweight objects on demand that can be garbage collected as easily as any other object. In our case, this means that the JRuby JIT must generate single-method classes at runtime in their own GCable classloaders. So if we JIT a thousand methods, that's a thousand tiny classes in a thousand (not-so-tiny) classloaders. It works great in practice, except for two details:
- classloaders are not cheap to spin up (I'd even say "expensive") - classloaders take up too much memory
The situation is aggravated by our desire to do more and more code generation:
- generating direct-invoking stubs to bind methods into named slots on the MOP - generating call adapters that make use of inline caching information to optimize multiple fast paths - generating switch-based "fast dispatchers" on demand that can omit dynamic method lookup entirely - generating type and arity-specific call interfaces throughout the call chain to allow more specificity in dispatch
The more we choose to generate, the more the expensive cost of a classloader-per-generated-class becomes painful. What we really must have sooner rather than later is a way to define lightweight, GCable autonomous methods that we can loosely bind together and not have to bend over backwards to manage and dereference. We need GCable classes, and better yet we need super-lightweight classes to represent autonomous methods. NEED.
#2. We need VM implementers to better-publicize what works and what doesn't when writing bare-metal JVM bytecode, as far as appropriate design patterns, performance considerations, and maybe most important of all, what NOT to do.
For JRuby, we've managed to gather bits and piece of information that have largely seemed to help. I've blogged about them, we've discussed them on these lists, and in general it seems the information is getting out. But there needs to be a place we can point to official information about how and why HotSpot and other JVMs optimize code, since as language implementers we have a much larger interest in crafting our bytecode, method lookup, and call chain in more optimal ways.
The JRuby story may be illustrative here:
JRuby has traditionally been an interpreted-only implementation. Interpreters work well and are easy to implement, but there's a key characteristic that makes them difficult to optimize on the JVM: by definition, an interpreter loop is extremely polymorphic at the call sites (I've heard the term "megamorphic" being batted around lately). Because all calls to other methods must pass through one or two spots in the interpreter, it's much more difficult for the JVM to optimize the call path.
To remedy this in the compiler, we've reduced the distance between calls to only a few methods. A call site first calls into a generic piece of code that all call sites use. That call site, then immediately calls back into a piece of generated code. So the distance between unique code segments is only a couple calls, and HotSpot is able to recognize that. We hope to improve the situation by generating even the call site adapters, allowing the call path to contain almost entirely monomorphic calls (if we can get around the problems in #1 above).
However learning that this is a good way to do things has only come from bits and pieces of discussions. There needs to be a document somewhere.
#3. Bytecode generation is a lot easier than I expected, even with verification errors and the like. However, there should be a bytecode generation library as part of Java proper, along the lines of ASM.
I don't like that we have to ship ASM. This seems like something that should just "be there" already. CLR has a library for generating IL as part of the core library, and so should the JVM.
I'd also like to see some attention paid to the usability of raw bytecode generation libraries. For example, the following code:
The "cg" object basically takes a type and returns various structures for ASM; I know there's a Type class that does something similar already in ASM. However, by using a wrapper, the above code becomes the following:
And yes, there's a few utility classes in ASM that provide a similar interface, but unfortunately they also start to introduce more Javaisms and hide more of the actual generation process.
By wrapping the above interface in Ruby, we can even end up with something akin to an assembly source file (and I plan to create such a Ruby library based on ASM very soon):
And yes, I know there are JVM bytecode assemblers out there that could do this as well, but they're not as imperative/programmatic as an internal DSL could be.
I guess the bottom line here is that I'd love to see JVM bytecode officially exposed as a language in its own right, with a standard format, a standard built-in library for generating/assembling it, and all the tools folks doing native processor assembly have come to expect.
....
I'll cut it off here and make another list as thoughts form, but I'd love to hear discussions about all these points.
On 10/1/07, Charles Oliver Nutter <charles.nut...@sun.com> wrote:
> In JRuby, where methods can be defined and discarded on a whim, we have > to be able to generate lightweight objects on demand that can be garbage > collected as easily as any other object. In our case, this means that > the JRuby JIT must generate single-method classes at runtime in their > own GCable classloaders.
It took me quite a while to figure out what "G Cable" meant.
> an interpreter loop is extremely polymorphic at the call > sites (I've heard the term "megamorphic" being batted around lately).
I think the Self people invented that term, but it's obviously wrong: a monomorphic call has one form, a polymorphic call has many forms, and a megamorphic call has a big form? I think not.
I think "polymorphic" should be used as the opposite of "monomorphic" in general. If there are only a few possible types, then "oligomorphic" is the obvious word. That leaves us needing a word for "many types". A guy I knew who was a recycled teacher of classics wrote to me thus:
# One prefix that suggests itself is 'perisso-' from Greek 'perissos' = # beyond the regular number, prodigious, excessive. It is found in the # English words 'perissology' (verbiage, pleonasm) and 'perissosyllabic' # (having an additional syllable). # # On the downside, the Greek word was used as a technical arithmetic term to # mean "odd" (as opposed to "even") and occurs with this meaning in the # English adjective 'perissodactyl' (having an odd number of toes). To # someone who knew this term, 'perissomorphic' might, I guess, suggest # 'having an odd number of methods'. # # I rather like 'perissomorphic'; but if the possible ambiguity in meaning is # not welcome, why not resurrect, so to speak, the ancient prefix 'pletho-' # (<-- ple:thos [neut] = multitude, large number), found in the ancient Greek # compounds 'ple:thokhoros' = a gathering of a large number of dancers :) # # The word 'plethora' is not uncommon and makes an obvious connexion with the # prefix. So if you don't like 'perissomorphic', maybe 'plethomorphic' would # do?
I favor "plethomorphic". So on this principle we have monomorphic, oligomorphic, and plethomorphic call sites.
> To remedy this in the compiler, we've reduced the distance between calls > to only a few methods. A call site first calls into a generic piece of > code that all call sites use. That call site, then immediately calls > back into a piece of generated code.
This is very like indirect threaded code in Forth.
> I guess the bottom line here is that I'd love to see JVM bytecode > officially exposed as a language in its own right, with a standard > format, a standard built-in library for generating/assembling it, and > all the tools folks doing native processor assembly have come to expect.
I love this group, it gives the JVM language community a forum to come together, share ideas, and solve problems that face all of us. Now that the JVM has been freed, isn't it only logical that the next step is to contribute code? We have some heavy hitters on this list and some genuine genius, we could get a lot accomplished. Why not start an initiative that focuses on solving the sort of problems Charlie presented? We would all benefit from something like that.
On 10/1/07, John Cowan <johnwco...@gmail.com> wrote:
> On 10/1/07, Charles Oliver Nutter <charles.nut...@sun.com> wrote:
> > In JRuby, where methods can be defined and discarded on a whim, we have > > to be able to generate lightweight objects on demand that can be garbage > > collected as easily as any other object. In our case, this means that > > the JRuby JIT must generate single-method classes at runtime in their > > own GCable classloaders.
> It took me quite a while to figure out what "G Cable" meant.
> > an interpreter loop is extremely polymorphic at the call > > sites (I've heard the term "megamorphic" being batted around lately).
> I think the Self people invented that term, but it's obviously wrong: a > monomorphic call has one form, a polymorphic call has many forms, > and a megamorphic call has a big form? I think not.
> I think "polymorphic" should be used as the opposite of "monomorphic" > in general. If there are only a few possible types, then "oligomorphic" > is the obvious word. That leaves us needing a word for "many types". > A guy I knew who was a recycled teacher of classics wrote to me thus:
> # One prefix that suggests itself is 'perisso-' from Greek 'perissos' = > # beyond the regular number, prodigious, excessive. It is found in the > # English words 'perissology' (verbiage, pleonasm) and 'perissosyllabic' > # (having an additional syllable). > # > # On the downside, the Greek word was used as a technical arithmetic term to > # mean "odd" (as opposed to "even") and occurs with this meaning in the > # English adjective 'perissodactyl' (having an odd number of toes). To > # someone who knew this term, 'perissomorphic' might, I guess, suggest > # 'having an odd number of methods'. > # > # I rather like 'perissomorphic'; but if the possible ambiguity in meaning is > # not welcome, why not resurrect, so to speak, the ancient prefix 'pletho-' > # (<-- ple:thos [neut] = multitude, large number), found in the ancient Greek > # compounds 'ple:thokhoros' = a gathering of a large number of dancers :) > # > # The word 'plethora' is not uncommon and makes an obvious connexion with the > # prefix. So if you don't like 'perissomorphic', maybe 'plethomorphic' would > # do?
> I favor "plethomorphic". So on this principle we have monomorphic, > oligomorphic, and plethomorphic call sites.
> > To remedy this in the compiler, we've reduced the distance between calls > > to only a few methods. A call site first calls into a generic piece of > > code that all call sites use. That call site, then immediately calls > > back into a piece of generated code.
> This is very like indirect threaded code in Forth.
> > I guess the bottom line here is that I'd love to see JVM bytecode > > officially exposed as a language in its own right, with a standard > > format, a standard built-in library for generating/assembling it, and > > all the tools folks doing native processor assembly have come to expect.
On Monday 01 October 2007 14:52, Charles Oliver Nutter wrote:
> Ok...I'm firing a few shots across the bow of the Java machine here, > and I'd love to hear all input. But I wanted to do a retrospective of > a number of things I've learned during the implementation of the > JRuby compiler.
In all likelihood you're already aware of this, but Martin Odersky, the chief principal of the Scala language development effort has some distinct ideas about how to improve the JVM for the next generation of programming languages.
If you have not done so, you should compare notes with him!
> own GCable classloaders. So if we JIT a thousand methods, that's a > thousand tiny classes in a thousand (not-so-tiny) classloaders. It works > great in practice, except for two details:
> - classloaders are not cheap to spin up (I'd even say "expensive") > - classloaders take up too much memory
Don't know if this idea would work:
Let's say I'd like to have N generated classes per a class loader (rather than 1 by 1). I've got CL1 = [C1, C2, C3, C4, C5, C6]. After that I need to keep 2 classes C5, C6. I redefine C5, C6 again in a new class loader CL2, then I later do GC CL1.
Do you think this can work in practice ?
Cheers,
Chanwit
-- Chanwit Kaewkasi PhD Student, Centre for Novel Computing School of Computer Science The University of Manchester Oxford Road Manchester M13 9PL, UK
Daniel Green wrote: > I love this group, it gives the JVM language community a forum to come > together, share ideas, and solve problems that face all of us. Now > that the JVM has been freed, isn't it only logical that the next step > is to contribute code? We have some heavy hitters on this list and > some genuine genius, we could get a lot accomplished. Why not start an > initiative that focuses on solving the sort of problems Charlie > presented? We would all benefit from something like that.
I'm hoping to have more time to start pulling out bits of JRuby into this common location, and encouraging others to do the same. There's already a bit there from Jython, a package cache for things like runtime import xxx.*
>> own GCable classloaders. So if we JIT a thousand methods, that's a >> thousand tiny classes in a thousand (not-so-tiny) classloaders. It works >> great in practice, except for two details:
>> - classloaders are not cheap to spin up (I'd even say "expensive") >> - classloaders take up too much memory
> Don't know if this idea would work:
> Let's say I'd like to have N generated classes per a class loader > (rather than 1 by 1). > I've got CL1 = [C1, C2, C3, C4, C5, C6]. > After that I need to keep 2 classes C5, C6. > I redefine C5, C6 again in a new class loader CL2, then I later do GC CL1.
> Do you think this can work in practice ?
I think it could certainly work, but it could be quite a bit of churn, not to mention having to either regenerate C5 and C6 or hold onto the raw bytecode that represents them. It's a bit of a hack...but I think it could reduce the number of classloaders necessary.
I guess the other tricky bit is knowing which classes aren't being referenced anymore. I shouldn't have to try to track that myself.
1) Bytecode generation: since there are already a few libraries that target this area (ASM, BCEL, Jasmin) it seems ripe for standardization. The trick would be getting enough time from the contributors to run an expert group. But sounds like a great idea.
2) Lightweight method instances: are you proposing a Hotspot optimization or a change to the JVM spec? I assume the former, since I'm not sure what that would look like in the spec, e.g. what exactly you would mandate.
3) It seems one generally tricky issue is how to talk about optimization in this context without making assumptions about which JVM you're running on, which I thought was something we were aiming for, e.g. it seems a little iffy to depend on optimizations which are only present in Hotspot or IBMs JVM or some other. How will your code run in another type of JIT, like CACAO? How do we reason about optimizations across JVMs? Understanding, of course, that you (Charles) work for Sun.
4) Re: bytecode generation, Jasmin looks cool, http://jasmin.sourceforge.net/ "Jasmin is an assembler for the Java Virtual Machine. It takes ASCII descriptions of Java classes, written in a simple assembler-like syntax using the Java Virtual Machine instruction set. It converts them into binary Java class files, suitable for loading by a Java runtime system."
Patrick Wright wrote: > 1) Bytecode generation: since there are already a few libraries that > target this area (ASM, BCEL, Jasmin) it seems ripe for > standardization. The trick would be getting enough time from the > contributors to run an expert group. But sounds like a great idea.
Jasmin is (primarily?) an assembler, which I don't think you want to use.
BCEL seems really inefficient for code generation. It creates an Object for each instruction, which is pretty heavy-weight. (OTOH this can be useful for analysis and peephole optimization.)
ASM from I've seen of it seems quite elegant - I rather like its use of the visitor pattern. However, I haven't used it myself.
I'd suggest starting with ASM, but maybe adding some of the features and conveniences of gnu.bytecode and other toolkits. -- --Per Bothner p...@bothner.com http://per.bothner.com/