Fwd: Project proposal: Multi-Language VM

55 views
Skip to first unread message

Patrick Wright

unread,
Oct 10, 2007, 2:57:42 AM10/10/07
to jvm-la...@googlegroups.com
Forwarded from the OpenJDK announce mailing list. I didn't want to
steal John's thunder (since he'll probably post here as well) but this
is so cool I thought others on the list would want to hear about it
ASAP.


---------- Forwarded message ----------
From: John Rose <John...@sun.com>
Date: Oct 9, 2007 11:27 PM
Subject: Project proposal: Multi-Language VM
To: anno...@openjdk.java.net


Hello world. I propose a new OpenJDK project[1],
the Multi-Language VM, to be abbreviated "mlvm",
and to be sponsored by the HotSpot group[2].

This project will be open for prototyping
JVM features aimed at efficiently supporting
languages other than Java.

The emphasis will be on completing the existing
bytecode and execution architecture with general
purpose extensions, as opposed to a new feature
for just one language, or adjoining an unrelated
new execution model.

The emphasis will also be on work which removes
"pain points" already observed by implementors
of successful or influential languages, as opposed
to more speculative work on unproven features or
niche languages.

Virtual machines produced by this project will
be standards-conforming, in that they will not change
the meaning or behavior of existing Java classes
and classfile formats. They may define variations
or extensions of the class format, or new kinds of
objects, whose meaning and behavior are beyond
the scope of current Java and JVM specifications.

However, these extended codes and data structures
will interoperate as much as possible with Java objects.

In addition, as a way of delimiting separate prototyping
efforts, each new feature will come with a switch which
turns it off, and that switch will be "off" by default.
This is the approach used in the Kitchen Sink Language
project.[3]

This proposal refines and completes a partial proposal
I sent earlier this year to the HotSpot project,
a proposal for a "Kitchen Sink VM"[4]. The present
proposal is more specifically directed at supporting
new languages (i.e., those languages which are
new to the JVM).

Here are some examples of features that could be
prototyped in this project, if developers were found
who are willing and able:

- tail calls and tail recursion [5]
- continuations and coroutines [6]
- tuples and value-oriented types [7]
- lightweight method objects [8]
- runtime support for closures [9]
- invokedynamic [10]

Prototyping for JSR 292[11] is likely to occur as
a part of this project. Note that none of the above
suggested features is specific to any single language.

As the current OpenJDK Project guidelines request,
please send followups to the discussion list.[12]

Thanks very much for your attention to this matter,

-- John Rose
http://blogs.sun.com/jrose/

[1] http://openjdk.java.net/projects/
[2] http://openjdk.java.net/groups/hotspot/
[3] http://ksl.dev.java.net/ (Kitchen Sink Language)
[4] http://mail.openjdk.java.net/pipermail/hotspot-dev/2007-July/
000091.html (Kitchen Sink VM)
[5] http://blogs.sun.com/jrose/entry/tail_calls_in_the_vm
[6] http://lambda-the-ultimate.org/node/1002 (Continuations for Java)
[7] http://blogs.sun.com/jrose/entry/tuples_in_the_vm
[8] http://groups.google.com/group/jvm-languages/t/dbc3a4a382868904
(Lightweight Methods)
[9] http://www.javac.info/ (Java Closures)
[10] http://groups.google.com/group/jvm-languages/web/implementation-
of-multimethods-in-jvm-languages
[11] http://jcp.org/en/jsr/detail?id=292#2 (Original JSR 292 request)
[12] http://mail.openjdk.java.net/mailman/listinfo/discuss

John Rose

unread,
Oct 10, 2007, 3:24:25 PM10/10/07
to jvm-la...@googlegroups.com, Patrick Wright
Thanks, Patrick.

Many of the ideas floated on this list deserve to be
prototyped on the OpenJDK mlvm.

If I had a decade's leisure, I'd try them all personally.
Right now I have time mainly for dynamic invoke work,
treating neighboring ideas (like other invocation features)
as targets of opportunity. Stay tuned...

If you want to get into the sources, get used to Mercurial.
The mlvm will be a Mercurial workspace.

Best,
-- John

P.S. The shift of OpenJDK to Mercurial is happening as we speak:
http://weblogs.java.net/blog/kellyohair/archive/2007/10/
openjdk_mercuri_4.html

This is an amazing transition for us at Sun. We've been using
SCCS/Teamware since the beginning of time, and now we're
switching to an open-source, dynamic-language-based SCM
system. I've been a Teamware fan for a long time, and I'm glad
there's finally a modern replacement for it. (The key property
of both systems is fully symmetric distribution; you can do all
your code management disconnected from the local and/or
global nets. There is no special server that claims to know
everything. When you do a pull you get all the bits.)

Neal Gafter

unread,
Oct 10, 2007, 7:18:23 PM10/10/07
to jvm-la...@googlegroups.com
My pet peeve is arithmetic. In the JVM, you either get efficiency and
arithmetic modulo some power of 2 (which is incorrect for most
applications) or you get inefficiency. A number of the Puzzles in the
Java Puzzlers book are based on this unfortunate feature of the Java
language that is baked into the JVM. In the pre-Java days, dynamic
languages had efficient arbitrary precision arithmetic (typically
using tagging or exceptions on overflow). I feel we've moved
backwards in this respect. It would be a very good thing if our
favorite VM allowed us to have the best of both worlds as in days of
old.

Patrick Wright

unread,
Oct 11, 2007, 12:27:18 AM10/11/07
to jvm-la...@googlegroups.com
On 10/11/07, Neal Gafter <neal....@gmail.com> wrote:
>
> My pet peeve is arithmetic. In the JVM, you either get efficiency and

Neal

Would the work from the Java Grande group apply, or is there other
relevant research?

Thanks
Patrick

Neal Gafter

unread,
Oct 11, 2007, 2:53:41 AM10/11/07
to jvm-la...@googlegroups.com

I'm not aware of any work on improving the performance/correctness
tradeoff of integral arithmetic on the Java platform. If you're
generating byte-code for the Java VM, you have to choose between good
performance but limited precision with silent overflow or poor
performance but flexible precision. The general problem was "solved"
long ago, including in the StrongTalk system that was a predecessor to
HotSpot. Unfortunately those results have not been applied to the Java
platform.

Attila Szegedi

unread,
Oct 12, 2007, 5:49:59 AM10/12/07
to jvm-la...@googlegroups.com
Wow.

Continuations!
Lightweight methods!

You have me hooked. I guess lightweight methods will get JRuby folks
hooked as well really fast :-)

As for multilinguality in a managed runtime -- at JAOO this year, Charlie
Nutter, Erik Meijer, and myself were interviewed by Kresten Thorup about
dynamic languages on managed runtimes, and one of the ideas that came up
was that (analogously to how today you have class loader mechanisms to
load executable content dynamically into the runtime) it would be great to
have a runtime where you can go one level lower and actually load
metaobject protocols into the runtime dynamically. I don't know yet
whether this makes sense (it feels to me it does), but is definitely
something to think about.

Of course, there's always the danger of losing focus if the goal is too
broad; see history of Parrot for terrifying examples. Although I'm
optimistic -- building on a foundation of the HotSpot JVM architecture
gets you immediately past lots of project infancy problems.

Attila.

--
home: http://www.szegedi.org
weblog: http://constc.blogspot.com

On Wed, 10 Oct 2007 08:57:42 +0200, Patrick Wright <pdou...@gmail.com>
wrote:

James Abley

unread,
Oct 12, 2007, 11:26:43 AM10/12/07
to jvm-la...@googlegroups.com

Any other cools features from StrongTalk that haven't made it into the
JVM yet? StrongTalk, from what I've been able to find about it (and of
that, the stuff that doesn't go completely over my head), sounds
awesome.

James

John Rose

unread,
Oct 13, 2007, 1:34:18 AM10/13/07
to jvm-la...@googlegroups.com
On Oct 12, 2007, at 2:49 AM, Attila Szegedi wrote:

> it would be great to
> have a runtime where you can go one level lower and actually load
> metaobject protocols into the runtime dynamically. I don't know yet
> whether this makes sense (it feels to me it does), but is definitely
> something to think about.

There are projects to soften up the JVM's object architecture,
to give it a MOP, but that sort of thing probably requires
a redesign from the ground up.

With HotSpot, I hope to make the bytecode set more flexible,
so that it can be used as a type-safe, GC-able assembly language,
and then build MOPs on top.

But (thinking here...) there might be MOP-like capabilities we could to
do push downward into the current hardwired runtime. For example,
it might be reasonable, given method handles, to have a VM-level
operation for creating a class and populating its vtable with closures.
(I once wrote a Scheme system which did this to C++ base classes.)

But that's probably secondary, compared with a primary feature of
method handles. (With purely structural (signature-based) calling
sequence and provision for making closures, of course.)

The fast out-of-line dynamic calling sequence in HotSpot is called
a monomorphic inline cache. It involves having the caller pass a
token (expected callee class) which the callee checks quickly on
method entry. (This is called the "verified entry point", or VEP,
as opposed to the "unverified entry point", or UEP.)
Calling a method handle probably requires a similar check,
of a receiver signature rather than a receiver class.
(After all, closures are classified by their signatures.)
Introducing such a calling sequence into the JVM would
be very profitable.

I'll post more on method handles when I get some blog time.

> Of course, there's always the danger of losing focus if the goal is
> too
> broad; see history of Parrot for terrifying examples.

Yes. That initial list of ideas, if fully explored, would cost
decades of work.
(And that's just my own pick of favorites.) So there is (as always)
a need to
choose the most profitable projects first. I'm working on dynamic
invoke,
and I agree that some sort of method handles are also a big need.

You mentioned continuations. I have a low-level design
and a rough implementation sketch for the suspension,
pickling, and resumption of coroutines, but (short on time
and expertise) I left out the all-important part of resumption,
which HotSpot would call on-stack replacement.

HotSpot does perform several kinds of stack frame editing.
That code is not easy to extend. (Suprised?) It would be
lovely to have (someday) a flexible system ("MsfP") for
stack frame management, so we could do things like
reoptimization, evacuation to the heap on overflow,
or work stealing directly from the execution stack.

> Although I'm
> optimistic -- building on a foundation of the HotSpot JVM architecture
> gets you immediately past lots of project infancy problems.

Yes, that's the really appealing thing to me about this.
We've spent 10 years working on this platform,
investing in Java, and now it's time to make that
investment pay off for the newer crop of languages.

One big investment, BTW, are the JITs.
The server compiler is not Java-specific at all.
It has a (now-classic) SSA sea of low-level nodes.
As a good citizen of the JVM, it knows how to leverage
type profiles and invariants provided by the object architecture.
But it's really a pretty general purpose compiler.
(Recently we added some vectorization optimizations.)

-- John

Miles Sabin

unread,
Oct 13, 2007, 3:59:02 AM10/13/07
to jvm-la...@googlegroups.com
John Rose wrote,

> The fast out-of-line dynamic calling sequence in HotSpot is called
> a monomorphic inline cache.

Google, CiteSeer and I know about *poly*morphic inline caching, but not
about *mono*morphic inline caching: any chance of a quick sketch of the
contrast and a pointer to the literature?

Cheers,


Miles

Miles Sabin

unread,
Oct 13, 2007, 6:10:24 AM10/13/07
to jvm-la...@googlegroups.com
I wrote,

> Google, CiteSeer and I know about *poly*morphic inline caching, but
> not about *mono*morphic inline caching: any chance of a quick sketch
> of the contrast and a pointer to the literature?

Actually, google does seem to know about it, and it's just the obvious
restriction to an inline cache for exactly one <class, method> pair.

Apologies for the noise.

Cheers,


Miles

John Cowan

unread,
Oct 13, 2007, 1:03:28 PM10/13/07
to jvm-la...@googlegroups.com
On 10/13/07, Miles Sabin <mi...@milessabin.com> wrote:

> Actually, google does seem to know about it, and it's just the obvious
> restriction to an inline cache for exactly one <class, method> pair.

Quite so, and the idea is that on highly pipelined modern CPUs,
you don't want to pay the price for more than a single conditional
branch.

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures

Rémi Forax

unread,
Oct 13, 2007, 1:46:56 PM10/13/07
to John...@sun.com, jvm-la...@googlegroups.com
John Rose a écrit :
and reified generics (generics at runtime) [11]

[11]
http://www.weiqigao.com/blog/2007/01/20/java_generics_let_the_other_shoe_drop.html


>
> Prototyping for JSR 292[11] is likely to occur as
> a part of this project. Note that none of the above
> suggested features is specific to any single language.
>
> As the current OpenJDK Project guidelines request,
> please send followups to the discussion list.[12]
>
> Thanks very much for your attention to this matter,
>
> -- John Rose
> http://blogs.sun.com/jrose/
>
> [1] http://openjdk.java.net/projects/
> [2] http://openjdk.java.net/groups/hotspot/
> [3] http://ksl.dev.java.net/ (Kitchen Sink Language)
> [4]
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2007-July/000091.html
> (Kitchen Sink VM)
> [5] http://blogs.sun.com/jrose/entry/tail_calls_in_the_vm
> [6] http://lambda-the-ultimate.org/node/1002 (Continuations for Java)
> [7] http://blogs.sun.com/jrose/entry/tuples_in_the_vm
> [8] http://groups.google.com/group/jvm-languages/t/dbc3a4a382868904
> (Lightweight Methods)
> [9] http://www.javac.info/ (Java Closures)
> [10]
> http://groups.google.com/group/jvm-languages/web/implementation-of-multimethods-in-jvm-languages
>
> [11] http://jcp.org/en/jsr/detail?id=292#2 (Original JSR 292 request)
> [12] http://mail.openjdk.java.net/mailman/listinfo/discuss

cheers,
Rémi

John Rose

unread,
Oct 13, 2007, 3:59:22 PM10/13/07
to Rémi Forax, jvm-la...@googlegroups.com
On Oct 13, 2007, at 10:46 AM, Rémi Forax wrote:

and reified generics (generics at runtime) [11]


Absolutely.  That's about where I gave up lengthening the list,
because I didn't have a reference handy, and I'm not sure what
the cost will be, and the list seemed long enough for starters.

Thanks for the reference!

Here's another I left off, lacking a reference:
 - interface injection (for languages with traits)

The idea there is to allow an interface J to be marked "injectable".
There is a slow path for asking whether class X implements interface J.
Have some sort of up-call for the first time anybody asks the question
for a particular pair (X, J). The up-call does trait matching or whatever,
and returns a permanent answer.  If the answer is "yes", then the up-call
also returns any necessary adapter methods (assuming X does not
already happen to implement the exact methods required by J).
I believe this would help the implementation of Scala, Fortress, etc.

As a related more obvious one, schema-preserving method injection
would help.  The idea there is to allow new methods to be adjoined,
but without changing existing override relationships.  This would
make it much easier to build adapters on pre-existing classes,
and would cleanly cover nasty cases like the ones mentioned
by John Wilson on this list.  (Q: How do you get a reflective
super.m()?  A: You don't, if the class has already been loaded.
A2: You adjoin an adapter method with an invokespecial.)

Best,
-- John

Brian Goetz

unread,
Oct 14, 2007, 6:58:15 PM10/14/07
to jvm-la...@googlegroups.com
> I'm not aware of any work on improving the performance/correctness
> tradeoff of integral arithmetic on the Java platform. If you're
> generating byte-code for the Java VM, you have to choose between good
> performance but limited precision with silent overflow or poor
> performance but flexible precision. The general problem was "solved"
> long ago, including in the StrongTalk system that was a predecessor to
> HotSpot. Unfortunately those results have not been applied to the Java
> platform.

Neal, could you sketch out the correct general solution, for those of us
not familiar with Strongtalk?

Neal Gafter

unread,
Oct 14, 2007, 8:20:21 PM10/14/07
to jvm-la...@googlegroups.com

There are many possible solutions, depending on your hardware architecture.  One solution is to compile to arithmetic instructions that trap on overflow; on overflow JIT-rewrite the code to use extended precision.

Rémi Forax

unread,
Oct 15, 2007, 2:57:02 PM10/15/07
to John Rose, jvm-la...@googlegroups.com
John Rose a écrit :

> On Oct 13, 2007, at 10:46 AM, Rémi Forax wrote:
>
>> and reified generics (generics at runtime) [11]
>>
>> [11]
>> http://www.weiqigao.com/blog/2007/01/20/java_generics_let_the_other_shoe_drop.html
>
> Absolutely. That's about where I gave up lengthening the list,
> because I didn't have a reference handy, and I'm not sure what
> the cost will be, and the list seemed long enough for starters.
about the cost and how to implement that, this paper give some insights:
http://portal.acm.org/citation.cfm?id=1244286
>
> Thanks for the reference!
>
...
>
> Best,
> -- John
Rémi

John Rose

unread,
Oct 15, 2007, 4:55:40 PM10/15/07
to jvm-la...@googlegroups.com
Another great reference.

To my VM-tuned ears, it sounds like a job for "class splitting".

I.e., the JVM currently has a one-to-one relation between
bytecoded classes and the klass IDs which are stored
in object headers. This need not be the case.

I know of several possible reasons to split classes:
- saving parameters from erasure
- distinguishing between instances with different creation paths
(constructors, etc.)
- distinguishing between optimized and general-case instances
(short & long number formats, etc.)
- distinguishing between immediate and ordinary objects
- adding instance-specific methods (Ruby, etc.)
- other behavioral customizations on well-known instances like enums

The cost tradeoffs are the usual balance between copying hot
information and indirecting to shared information.
Sharing vtables requires some sort of extra check on method call.

In the extreme case of immediate (non-oop) pseudo-pointers,
the processing of object references is somewhat complicated
by detecting non-oop tag bits. (E.g., if a subrange of Integer
were encoded into an immediate pointer, a corresponding
klass split from Integer would make sure never to indirect the
'this' pointer and instead decode the 'value' field from a bitfield
therein.)

These design questions need to be explored in a VM-centric way.
By that I mean what low-level general purpose API will allow
applications (or optimization packages) to organize split classes,
in such a way that use cases like the above are well-served.

-- John

P.S. "oop" comes from Smalltalk, means "ordinary object pointer",
as opposed to a primitive value bit-encoded into a pseudo-pointer.
In HotSpot, all oops have 2 or 3 low zero bits, so it's a relatively
simple matter to set one of those bits to distinguish pseudo-pointers.
On 64-bit systems the possibilities are impressive.

Patrick Wright

unread,
Oct 16, 2007, 9:23:38 AM10/16/07
to jvm-la...@googlegroups.com
> about the cost and how to implement that, this paper give some insights:
> http://portal.acm.org/citation.cfm?id=1244286

That link is for the ACM, which (altho a great resource) requires a
subscription to view. I contacted Mirko Viroli, one of the authors,
and he sent me PDFs of that paper and an earlier one he worked on.
They're posted in our Files section
http://groups.google.com/group/jvm-languages/files


Regards
Patrick

Randall R Schulz

unread,
Oct 16, 2007, 9:30:41 AM10/16/07
to jvm-la...@googlegroups.com
On Tuesday 16 October 2007 06:23, Patrick Wright wrote:
> > about the cost and how to implement that, this paper give some
> > insights: http://portal.acm.org/citation.cfm?id=1244286
>
> That link is for the ACM, which (altho a great resource) requires a
> subscription to view.

At $100 per year, it's a bargain for anybody with an ongoing need to
access research publications.

But my experience with other restricted publishers, say Springer or
Elsevier, is that with a little Web searching, you can often find other
copies (sometimes only pre-publication drafts) on the wild Web.


> I contacted Mirko Viroli, one of the authors,
> and he sent me PDFs of that paper and an earlier one he worked on.

That often works, too. I once got a Japanese researcher to send me print
copy of a publication for which he had no digital counterpart!


> They're posted in our Files section
> http://groups.google.com/group/jvm-languages/files
>
>
> Regards
> Patrick


Randall schulz

Patrick Wright

unread,
Oct 16, 2007, 9:46:04 AM10/16/07
to jvm-la...@googlegroups.com
Everybody:

My bad. Immediately after posting the two files, Mirko wrote me to
tell me they were actually under ACM copyright. I checked, and they
are (is in the footnote). I have removed the links. This was a
misunderstanding on my part in my email with Mirko.

If you downloaded these since I posted them, please observe the
copyright and do not post them on servers or distribute in similar
fashion (see copyright notice on first page, footnote).

Sorry for the trouble--I actually did a Google sweep yesterday but
could find no other copies (outside the ACM).


Patrick

Rémi Forax

unread,
Oct 16, 2007, 9:45:42 AM10/16/07
to jvm-la...@googlegroups.com
Patrick Wright a écrit :
oups sorry, my web browser 'magically' give me access to ACM papers
when i browse from my office.
>
> Regards
> Patrick
>
Rémi

Randall R Schulz

unread,
Oct 16, 2007, 9:53:33 AM10/16/07
to jvm-la...@googlegroups.com
On Tuesday 16 October 2007 06:45, Rémi Forax wrote:
> ...

>
> oups sorry, my web browser 'magically' give me access to ACM papers
> when i browse from my office.

It's not your browser, it's an agreement between your university and the
ACM. Stanford has this arrangement with many publishers, including
Springer and Elsevier. From their network you can retrieve digital
versions of all their publications (all those distributed in digital
form, i.e.).

> ...
>
> Rémi


Randall Schulz

Charles Oliver Nutter

unread,
Oct 18, 2007, 1:39:03 PM10/18/07
to jvm-la...@googlegroups.com
John Cowan wrote:
> On 10/13/07, Miles Sabin <mi...@milessabin.com> wrote:
>
>> Actually, google does seem to know about it, and it's just the obvious
>> restriction to an inline cache for exactly one <class, method> pair.
>
> Quite so, and the idea is that on highly pipelined modern CPUs,
> you don't want to pay the price for more than a single conditional
> branch.

JRuby's inline cache is currently monomorphic, and after removing a few
roadblocks it appears that HotSpot has really picked that up and ran
with it. However I have a PIC patch hanging around that improved
polymorphic dispatch by almost 50%, and didn't appear to impact
monomorphic inline caching perf at all. We will probably put it in place
some time soon, along with a "hotness" measure to allow re-sorting the
cache occasionally.

- Charlie

Reply all
Reply to author
Forward
0 new messages