Common compiler infrastructure?

67 views
Skip to first unread message

Charles Oliver Nutter

unread,
Dec 17, 2009, 11:50:20 AM12/17/09
to JVM Languages
Ok, I think it's time we formally got some of our compiler mavens
together to create a common AOT compiler infrastructure we can all
plug into. This was discussed briefly at JVM-L, but I'm not sure
anything has come of it yet.

Right now I believe there are at least two separate efforts to build
joint compilers: the Groovy efforts to joint compile Groovy+Java,
Groovy+Scala, and Groovy+Java+Scala; and the Scala efforts to joint
compile Scala+Java.

Obviously we other JVM languages would like to join in the fun.

The common infrastructure would be responsible for resolving methods
(perhaps pluggable), getting lists of provided types and methods from
languages (definitely pluggable), and providing needed types and
methods to languages. Structurally, it seems like it wouldn't be
difficult to do.

I am now interested in this since this is essentially *all* JRuby
would need to have a full ahead-of-time compiler that can produce
normal Java classes, and it seems silly to either depend on
Groovy/Scala joint compilers (our distribution is already too big) or
to write this all completely from scratch.

So what's the deal, folks? Anyone else interested in this? Can we
cannibalize an existing compiler and make the type
provision/requisition logic per-language-pluggable?

- Charlie

Matt Fowles

unread,
Dec 17, 2009, 11:56:07 AM12/17/09
to jvm-la...@googlegroups.com
Charlie~

Also worth considering is the runtime/dev time aspects of such a system.  It would be nice to have some amount of infrastructure common so not every language needs to implement a full set of Eclipse/InteliJ plugins from scratch.

The two should almost certainly be separate projects, but it would be nice if the joint compiler had enough hooks that IDE's could use them.

Matt


--

You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To post to this group, send email to jvm-la...@googlegroups.com.
To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.



Chuck Esterbrook

unread,
Dec 17, 2009, 12:01:48 PM12/17/09
to jvm-la...@googlegroups.com
On Thu, Dec 17, 2009 at 8:50 AM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> So what's the deal, folks? Anyone else interested in this?

Definitely. And not just for languages already running on the JVM, but
for languages that haven't gotten there yet. Like mine. :-)

I'm curious how close ASM is to fulfilling this role?


-Chuck
--
http://chuckesterbrook.com/

Charles Oliver Nutter

unread,
Dec 17, 2009, 12:02:11 PM12/17/09
to jvm-languages
On Thu, Dec 17, 2009 at 10:56 AM, Matt Fowles <matt....@gmail.com> wrote:
> Charlie~
> Also worth considering is the runtime/dev time aspects of such a system.  It
> would be nice to have some amount of infrastructure common so not every
> language needs to implement a full set of Eclipse/InteliJ plugins from
> scratch.
> The two should almost certainly be separate projects, but it would be nice
> if the joint compiler had enough hooks that IDE's could use them.

Yes, absolutely!

I may be naive, but what we really want here is simply a set of common
protocols for doing the following:

* Requesting from a language what types and methods are provided by a
set of source files
* Providing to a language services to look up types from other languages
* Resolution and error handling when no languages provide a type or
multiple languages incompatibly provide the same type
* Eventually generating the dumb bytecode once all participating
languages have satisfied their type dependencies

Given this, IDE support would be a natural extension.

And it's important to point out that "common compiler infrastructure"
has nothing to do with code generation...it's just a type-wrangler
that says "yes, I have that type" or "no, I don't have that type" and
then triggers all the language-specific bits to eventually cough up
their bytecode. That can't really be hard, can it?

- Charlie

Charles Oliver Nutter

unread,
Dec 17, 2009, 12:04:22 PM12/17/09
to jvm-languages
On Thu, Dec 17, 2009 at 11:01 AM, Chuck Esterbrook
<chuck.es...@gmail.com> wrote:
> Definitely. And not just for languages already running on the JVM, but
> for languages that haven't gotten there yet. Like mine.  :-)
>
> I'm curious how close ASM is to fulfilling this role?

ASM is entirely orthogonal, actually. There's no need to force a
specific bytecode-generation backend on any of the participating
languages; they simply need to support the protocols I describe in my
previous email. The "compiler" itself really just needs to act as a
type-managing intermediate between languages and a starter gun for
producing bytecode once all participating languages are happy. Whether
you eventually use ASM after that is up to you.

- Charlie

Patrick Wright

unread,
Dec 17, 2009, 12:08:14 PM12/17/09
to jvm-la...@googlegroups.com
Charlie

It might be worthwhile to open a channel to the JetBrains devs, since
they contributed some code IIRC for Groovy/Java compilation. Since a
bunch of their stuff is FOSS now, there might be a starting point
there (plus insight in what they learned.


Patrick

Ryan Slobojan

unread,
Dec 17, 2009, 12:15:38 PM12/17/09
to jvm-la...@googlegroups.com
Hi,

Along similar lines, what's being done on the .Net side of the fence around this? I'm just curious if there is prior art or experience that can be drawn upon to speed things up/avoid traps, or whether the .Net ecosystem is in the same state as the JVM ecosystem is for these sorts of capabilities.

Thanks,

Ryan Slobojan

Werner Schuster (murphee)

unread,
Dec 17, 2009, 12:22:07 PM12/17/09
to jvm-la...@googlegroups.com
Ryan Slobojan wrote:
> Hi,
>
> Along similar lines, what's being done on the .Net side of the fence around this?

DLR - the basis for all Iron* languages;

Patrick Wright

unread,
Dec 17, 2009, 12:28:10 PM12/17/09
to jvm-la...@googlegroups.com

Sure, but how does a non-Microsoft CLR language project integrate with
e.g. the C# compiler--for example, the Boo language?

Charles Oliver Nutter

unread,
Dec 17, 2009, 12:30:14 PM12/17/09
to jvm-languages

As far as I know, DLR does nothing for type resolution across
languages because it's...you know...dynamic :)

- Charlie

Werner Schuster (murphee)

unread,
Dec 17, 2009, 12:44:46 PM12/17/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:
>
> As far as I know, DLR does nothing for type resolution across
> languages because it's...you know...dynamic :)
>
Haven't looked at DLR in a while, but it allowed language impls to build
IL by building expression trees,
including handling all kinds of naughty business with debugger metadata
etc.
So at least that kind of boilerplate seems to be shared - which is
already miles ahead of what's shared in the JVM language world (ie.
nuthin').

Note: take with a grain of salt, as I said: been ages since I looked at
the DLR.

Jochen Theodorou

unread,
Dec 17, 2009, 4:19:16 PM12/17/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:
[...]

> So what's the deal, folks? Anyone else interested in this? Can we
> cannibalize an existing compiler and make the type
> provision/requisition logic per-language-pluggable?

We can try. I think a common approach would serve a high purpose in the
matter that type inferencing could be done on a higher level and there
are different ways to this as well. If you want to, we can try using the
Groovy compiler for the cannibalization. It would me for Groovy 2.0, in
which I want to do a big rewrite of many parts anyway. But the AST we
use is quite complex and the design is not best in all parts. So maybe
there is a better AST out there. At last with the Groovy AST I know how
to do things ;)

bye Jochen

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Charles Oliver Nutter

unread,
Dec 17, 2009, 4:31:06 PM12/17/09
to jvm-languages
On Thu, Dec 17, 2009 at 3:19 PM, Jochen Theodorou <blac...@gmx.org> wrote:
> We can try. I think a common approach would serve a high purpose in the
> matter that type inferencing could be done on a higher level and there
> are different ways to this as well. If you want to, we can try using the
> Groovy compiler for the cannibalization. It would me for Groovy 2.0, in
> which I want to do a big rewrite of many parts anyway. But the AST we
> use is quite complex and the design is not best in all parts. So maybe
> there is a better AST out there. At last with the Groovy AST I know how
> to do things ;)

I'm not even sure a common AST is required. Maybe I'm not making
myself clear enough?

The lingua franca for languages compiling together is a common set of
types they can use to collaborate through. In almost every case on the
JVM, that means we have to choose to present normal Java types to each
other (though in the future, we could present our own type systems,
but that's out of scope for this discussion). What I see as a common
compiler infrastructure is a set of mirror types (perhaps the same
ones used by the Annotation Processing Tool (apt) or by javac itself)
that all languages can produce and consume with their compiler
plugins. So for the following Ruby code:

class MyClass < my.package.Foo
def foo(a: java.lang.String, b: my.package.Bar)
...
end
end

Compiling this would produce a mirror class called MyClass (provided
by this source) that extends a mirror class called my.package.Foo
(requested from the high-level compiler stuff). MyClass contains one
method foo (provided) that takes two arguments java.lang.String and
my.package.Bar (requested).

Once all this has been resolved, the mirror types are used to generate
the bytecode for this code. There would be no exchanging of ASTs
necessary.

What I see as a high-level joint compiler is really a type-wrangler,
that can field requests for types from plugins and provide types to
plugins so that each language participating in compilation gets what
it needs (or else doesn't, and errors out). Then once everyone is
satisfied that existing libraries or other jointly-compiled code
provides all the types they need, they proceed to emit bytecode or
.class files. It's far simpler than having a shared AST or shared
nothing of per-language types...it's just a shared notion of Java
types and which Java types each language's source provides or
consumes.

- Charlie

Jim White

unread,
Dec 17, 2009, 5:24:35 PM12/17/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:

> Ok, I think it's time we formally got some of our compiler mavens
> together to create a common AOT compiler infrastructure we can all
> plug into. This was discussed briefly at JVM-L, but I'm not sure
> anything has come of it yet.

> ...

You don't mention javax.tools, but it seems to me that it almost does
what we want here. And since it is properly named (javax.tools rather
than java.tools) it can be supported on JDK 1.5 (just like JSR-223).

For example, it provides the key interface for supplying source files
and class files given class names (JavaFileManager.getJavaFileForInput).

The trouble of course is that it is only concerned with a single source
file type. But that may be OK becase each compiler can have a
JavaFileManager in which JavaFileObject.Kind.SOURCE is it's respective
source.

Naturally the Kind.CLASS files would be the collective set, mapping the
stubs out as it goes along if using sequential compilation. Or perhaps
you'd stub everything and then compile each language with all the stubs
except the ones for it? That would be simplest and easiest to analyze.

But if you've got enough memory (including distributed compilation) then
multiple compiler JavaCompiler.CompilationTasks might be run in
parallel. Whether that would wind up doing the right thing
dependency-wise I'm not sure without working through a design and
examples. Certainly stubs simplify such things and avoid recursion.

I'm actually more interested in more general dependency schemes based on
RDF, but having something for JVM languages would be dandy and would be
useful in IFCX Wings.

Jim
---
http://www.ifcx.org

Jochen Theodorou

unread,
Dec 17, 2009, 5:36:49 PM12/17/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter schrieb:
[...]
> I'm not even sure a common AST is required. Maybe I'm not making
> myself clear enough?

but then what of the compiler you want to cannibalize?

> The lingua franca for languages compiling together is a common set of
> types they can use to collaborate through. In almost every case on the
> JVM, that means we have to choose to present normal Java types to each
> other (though in the future, we could present our own type systems,
> but that's out of scope for this discussion). What I see as a common
> compiler infrastructure is a set of mirror types (perhaps the same
> ones used by the Annotation Processing Tool (apt) or by javac itself)
> that all languages can produce and consume with their compiler
> plugins. So for the following Ruby code:
>
> class MyClass < my.package.Foo
> def foo(a: java.lang.String, b: my.package.Bar)
> ...
> end
> end
>
> Compiling this would produce a mirror class called MyClass (provided
> by this source) that extends a mirror class called my.package.Foo
> (requested from the high-level compiler stuff). MyClass contains one
> method foo (provided) that takes two arguments java.lang.String and
> my.package.Bar (requested).

for me that is part of the AST. oh well, ok, maybe our AST does
represent a bit more than the bare program. Basically I understnad it so
that you want to have a data structure that can represent the classes
MyClass and Foo and in which MyClass will have a foo method with typed
parameters. In Groovy this is done by ClassNode, MethodNode and Parameter.

> Once all this has been resolved, the mirror types are used to generate
> the bytecode for this code. There would be no exchanging of ASTs
> necessary.

but common data structures or else communication will be a problem. And
that data structure could very well be part of the AST... so far my
thinking. If it is not part of the AST, then you have kind of generate a
second AST... or name it a communication data structure used to
communicate between different language. For me the AST is already a
communication data structure inside the compiler.

> What I see as a high-level joint compiler is really a type-wrangler,
> that can field requests for types from plugins and provide types to
> plugins so that each language participating in compilation gets what
> it needs (or else doesn't, and errors out). Then once everyone is
> satisfied that existing libraries or other jointly-compiled code
> provides all the types they need, they proceed to emit bytecode or
> .class files. It's far simpler than having a shared AST or shared
> nothing of per-language types...it's just a shared notion of Java
> types and which Java types each language's source provides or
> consumes.

well ok, here comes into play that Groovy has in a static aspect almost
the same type system as Java.

Still... all that is needed is to write this communication protocol
then, or not?

bye blackdrag

Miles Sabin

unread,
Dec 18, 2009, 4:10:29 AM12/18/09
to jvm-la...@googlegroups.com
On Thu, Dec 17, 2009 at 9:31 PM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> On Thu, Dec 17, 2009 at 3:19 PM, Jochen Theodorou <blac...@gmx.org> wrote:
>> use is quite complex and the design is not best in all parts. So maybe
>> there is a better AST out there. At last with the Groovy AST I know how
>> to do things ;)
>
> I'm not even sure a common AST is required. Maybe I'm not making
> myself clear enough?

I think that we should ensure that a common AST is *not* required. As you say,

> The lingua franca for languages compiling together is a common set of
> types they can use to collaborate through.

and that's all we need.

Cheers,


Miles

--
Miles Sabin
tel: +44 (0)7813 944 528
skype: milessabin
http://www.chuusai.com/
http://twitter.com/milessabin

Miles Sabin

unread,
Dec 18, 2009, 4:12:35 AM12/18/09
to jvm-la...@googlegroups.com
On Thu, Dec 17, 2009 at 4:50 PM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> So what's the deal, folks? Anyone else interested in this?

Yes, definitely.

> Can we cannibalize an existing compiler and make the type
> provision/requisition logic per-language-pluggable?

This should be doable for Scala as a compiler plugin.

Miles Sabin

unread,
Dec 18, 2009, 4:25:13 AM12/18/09
to jvm-la...@googlegroups.com
On Thu, Dec 17, 2009 at 4:56 PM, Matt Fowles <matt....@gmail.com> wrote:
> Also worth considering is the runtime/dev time aspects of such a system.  It
> would be nice to have some amount of infrastructure common so not every
> language needs to implement a full set of Eclipse/InteliJ plugins from
> scratch.
>
> The two should almost certainly be separate projects, but it would be nice
> if the joint compiler had enough hooks that IDE's could use them.
> Matt

I'm not so sure about this.

There is definitely a benefit to sharing infrastructure between IDEs
*for a given language* (eg. there is a fair amount of sharing between
the Scala tooling for Eclipse, Netbeans and IDEA), but I don't think
this carries over to multiple languages in a way that Charlie's kind
of proposal can address. That's because the bulk of that sharing would
overlap with the IDE's own frameworks at one end and collide with
language differences at the other (ie. the most interesting tooling is
very language specific).

Here's an example. To enable cross language search and refactoring in
Eclipse all participating languages need to hook into the JDT's
indexer. Doing this is inescapably Eclipse-specific, and beyond the
basic job of mapping non-Java language symbols into Java there's
really nothing much to factor out which isn't either IDE-specific or
language-specific.

Miles Sabin

unread,
Dec 18, 2009, 4:29:26 AM12/18/09
to jvm-la...@googlegroups.com
On Thu, Dec 17, 2009 at 10:36 PM, Jochen Theodorou <blac...@gmx.org> wrote:
> Charles Oliver Nutter schrieb:
> [...]
>> I'm not even sure a common AST is required. Maybe I'm not making
>> myself clear enough?
>
> but then what of the compiler you want to cannibalize?

Each language that wants to play has to contribute it's own
type-wrangling component. We need one for Java, one for Scala, one for
JRuby, one for Groovy, etc. ...

John Cowan

unread,
Dec 18, 2009, 11:16:06 AM12/18/09
to jvm-la...@googlegroups.com
On Fri, Dec 18, 2009 at 4:29 AM, Miles Sabin <mi...@milessabin.com> wrote:
> On Thu, Dec 17, 2009 at 10:36 PM, Jochen Theodorou <blac...@gmx.org> wrote:
>> Charles Oliver Nutter schrieb:
>> [...]
>>> I'm not even sure a common AST is required. Maybe I'm not making
>>> myself clear enough?
>>
>> but then what of the compiler you want to cannibalize?
>
> Each language that wants to play has to contribute it's own
> type-wrangling component. We need one for Java, one for Scala, one for
> JRuby, one for Groovy, etc. ...

I'm still not grasping this. Could you sketch one of these? My mind
works better with concrete/operational examples these days.

>
> Cheers,
>
>
> Miles
>
> --
> Miles Sabin
> tel: +44 (0)7813 944 528
> skype:  milessabin
> http://www.chuusai.com/
> http://twitter.com/milessabin
>

> --
>
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
>
>
>

--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures

Charles Oliver Nutter

unread,
Dec 18, 2009, 1:41:04 PM12/18/09
to jvm-languages
Good to hear from you, Miles!

On Fri, Dec 18, 2009 at 3:25 AM, Miles Sabin <mi...@milessabin.com> wrote:
> There is definitely a benefit to sharing infrastructure between IDEs
> *for a given language* (eg. there is a fair amount of sharing between
> the Scala tooling for Eclipse, Netbeans and IDEA), but I don't think
> this carries over to multiple languages in a way that Charlie's kind
> of proposal can address. That's because the bulk of that sharing would
> overlap with the IDE's own frameworks at one end and collide with
> language differences at the other (ie. the most interesting tooling is
> very language specific).

Yes, I agree here. It may be useful as the dirt-simplest dumb tooling
for an IDE, so that you can have mixed-language projects all compiling
together, but that's not the key benefit of an IDE. You'd still need
per-language tooling for the language's own specific details for it to
be useful.

It's also worth pointing out that this joint compiler probably
wouldn't be the *only* compilation phase for any of these languages,
since we still will produce our own "abnormal" bytecode and .class
files that have nothing to do with our Java-facing types. So it's
mostly an orchestration agent for the types we *do* expose and
consume. And again, this is what all the current joint compilers
actually give you, so let's do that once and be done with it.

> Here's an example. To enable cross language search and refactoring in
> Eclipse all participating languages need to hook into the JDT's
> indexer. Doing this is inescapably Eclipse-specific, and beyond the
> basic job of mapping non-Java language symbols into Java there's
> really nothing much to factor out which isn't either IDE-specific or
> language-specific.

Perhaps someone from IntelliJ will chime in about how they do their
cross-language refactoring, but it's a much, much more complicated
process than just getting languages to compile together when they may
have cross-language Java type exposure. It's definitely out of scope
for what I want...I just want a base compiler structure that I can
provide a few Ruby-specific plugins for and know that cross-compiling
with any other plugged language will just work. That's "easy".

- Charlie

Ted Neward

unread,
Dec 20, 2009, 4:24:37 AM12/20/09
to jvm-la...@googlegroups.com
DLR isn't quite the same creature.

From a "compiler" perspective, there's a more in-depth project called CCI
(Common Compiler Infrastructure) up on Codeplex. But I think it's more
in-depth than Charlie's looking for.

Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com

> --
>
> You received this message because you are subscribed to the Google
> Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-

> languages+...@googlegroups.com.

Ted Neward

unread,
Dec 20, 2009, 4:26:56 AM12/20/09
to jvm-la...@googlegroups.com
Yes, but the expression trees have to follow a certain set of rules, AFAIK,
which tend to be the sorts of things that dynamic languages look for. The
DLR, perhaps not surprisingly, is really geared entirely towards the
execution of those expression trees, not for coughing up compiled types.

Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com

> -----Original Message-----
> From: jvm-la...@googlegroups.com [mailto:jvm-
> lang...@googlegroups.com] On Behalf Of Werner Schuster (murphee)
> Sent: Thursday, December 17, 2009 9:45 AM
> To: jvm-la...@googlegroups.com
> Subject: Re: [jvm-l] Common compiler infrastructure?
>

> --
>
> You received this message because you are subscribed to the Google
> Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-

> languages+...@googlegroups.com.

Per Bothner

unread,
Dec 20, 2009, 8:16:47 PM12/20/09
to jvm-la...@googlegroups.com, Charles Oliver Nutter
On 12/17/2009 09:02 AM, Charles Oliver Nutter wrote:
> I may be naive, but what we really want here is simply a set of common
> protocols for doing the following:
>
> * Requesting from a language what types and methods are provided by a
> set of source files
> * Providing to a language services to look up types from other languages

If there are no dependency cycles, and the dependencies are known, then
it is easy: If files in language A depend on files in language B, but
not vice versa, just compile the files in language B before those in
language A, and then have language A's compiler read the B .class files.
We all do this, at least in the case that B==Java.

To handle dependency cycles or unknown dependies one can use a "compile
for export" option. This just generates a stub .class file, that
contains no code, or private fields, or anything else that is not
part of the "public contract" of the module. The key is that it
should be possible to generate this without reading any other
class files, and so cycles or other dependencies aren't a problem.

Generating the stub for Java is in theory easy: You can generate
stub method bodies by replacing all method bodies by "return null"
or whatever is appropriate for the method's return type, and
otherwise not checking for type compatibility and so. The import
statements allow replacing class names by the full-qualified names.

There are some complications: one is that import-on-depend does need
to search for other classes. So you need one extra step, as below.

So the multi-language compiler-driver does:
(1) Figure out the set of source files to (possibly) compile.
(2) For each source file, figure out the compiler to use, presumably
using extensions, though it could also use mime-types or looking at
the source file itself or some property datebase.
(3) For each source file, ask its compiler the set of class names
it might generate, using some to-be-specified simple protocol.
(4) For each source file, compile it "for export". I.e. generate
the stub .classes mentioned above, leaving them in some temporary
location (ideally a "MemoryFileSystem").
(5) For each source file, compile it for real, setting up the class
path to search the above temporary location first. When it needs to
import definitions from another file or class, it just reads the
sub class in "the normal way". The compiler should have access to
the map generated in (3), and should have a mechanism to invoke
the corresponding compiler "out of order". For example, a Scheme
library may want to execute a macro at compile time, and that
macro may depend on methods in some other class; thus it would
be nice to have a "compile this file first" mechanism.

This approach still means an API invoking compilers with various
options, but we don't need to invent a protocol for representing
types and modules - we just use the .class files.
--
--Per Bothner
p...@bothner.com http://per.bothner.com/

Martin C. Martin

unread,
Dec 20, 2009, 8:37:47 PM12/20/09
to jvm-la...@googlegroups.com, Charles Oliver Nutter

Per Bothner wrote:
> On 12/17/2009 09:02 AM, Charles Oliver Nutter wrote:
>> I may be naive, but what we really want here is simply a set of common
>> protocols for doing the following:
>>
>> * Requesting from a language what types and methods are provided by a
>> set of source files
>> * Providing to a language services to look up types from other languages
>

> To handle dependency cycles or unknown dependies one can use a "compile
> for export" option. This just generates a stub .class file, that
> contains no code, or private fields, or anything else that is not
> part of the "public contract" of the module. The key is that it
> should be possible to generate this without reading any other
> class files, and so cycles or other dependencies aren't a problem.

Charlie's point is that generating a .class file is overkill. If we
just need to know a list of types & methods, then we just need a
function that takes source files and returns such a list. We don't need
to go all the way to generating valid .class files.


>
> Generating the stub for Java is in theory easy: You can generate
> stub method bodies by replacing all method bodies by "return null"
> or whatever is appropriate for the method's return type, and
> otherwise not checking for type compatibility and so.

Why bother? Why not just stop once you've parsed enough to know the
classes & methods, and return some representation of those?

> There are some complications: one is that import-on-depend does need
> to search for other classes. So you need one extra step, as below.
>
> So the multi-language compiler-driver does:
> (1) Figure out the set of source files to (possibly) compile.
> (2) For each source file, figure out the compiler to use, presumably
> using extensions, though it could also use mime-types or looking at
> the source file itself or some property datebase.
> (3) For each source file, ask its compiler the set of class names
> it might generate, using some to-be-specified simple protocol.

Agreed.

> (4) For each source file, compile it "for export". I.e. generate
> the stub .classes mentioned above, leaving them in some temporary
> location (ideally a "MemoryFileSystem").
> (5) For each source file, compile it for real, setting up the class
> path to search the above temporary location first. When it needs to
> import definitions from another file or class, it just reads the
> sub class in "the normal way". The compiler should have access to
> the map generated in (3), and should have a mechanism to invoke
> the corresponding compiler "out of order". For example, a Scheme
> library may want to execute a macro at compile time, and that
> macro may depend on methods in some other class; thus it would
> be nice to have a "compile this file first" mechanism.
>
> This approach still means an API invoking compilers with various
> options, but we don't need to invent a protocol for representing
> types and modules - we just use the .class files.

True, this approach saves compiler writers from figuring out another
representation for types and modules. However, compiler writers already
need such a representation, e.g. if a single Groovy file contains
classes A and B, and A refers to B and vice versa. Groovy has a
ClassNode, and distinguishes between primary ClassNodes (that the
compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.

Also, this approach makes extra work for compiler writers in generating
a .class stub. Charlie is proposing a common lingua franca for these
representations of classes, so we don't have to generate the stub.

Best,
Martin

Per Bothner

unread,
Dec 20, 2009, 9:28:00 PM12/20/09
to jvm-la...@googlegroups.com, Martin C. Martin
On 12/20/2009 05:37 PM, Martin C. Martin wrote:

>
> Per Bothner wrote:
>> To handle dependency cycles or unknown dependies one can use a "compile
>> for export" option. This just generates a stub .class file, ...

>
> Charlie's point is that generating a .class file is overkill. If we
> just need to know a list of types& methods, then we just need a

> function that takes source files and returns such a list. We don't need
> to go all the way to generating valid .class files.
>
> Why bother? Why not just stop once you've parsed enough to know the
> classes& methods, and return some representation of those?

The problem is picking a representation. A .class file is one valid
well-understood representation. An API might be better, but designing
one for multiple languages is probably not feasible. (Remember CORBA.)

I think the best choice for an API may be javax.lang.model.*.
It very Java-centric, but that's probably the best we can do.
It has the big advantage that javac already implements it.

> True, this approach saves compiler writers from figuring out another
> representation for types and modules. However, compiler writers already
> need such a representation, e.g. if a single Groovy file contains
> classes A and B, and A refers to B and vice versa. Groovy has a
> ClassNode, and distinguishes between primary ClassNodes (that the
> compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.

Indeed. Kawa uses gnu.bytecode.ClassType for the same.

The problem is we all have *different* representations, so that does
not help with inter-language interoperability. And unless you get javac
to implement the same representation, you have a problem: A JVM lingua
franca not supported by Java has limited use ...

> Also, this approach makes extra work for compiler writers in generating
> a .class stub. Charlie is proposing a common lingua franca for these
> representations of classes, so we don't have to generate the stub.

I won't help my breath, but look forward to it.

I think an API that builds on (extends) javax.lang.model may make sense,
since that means interoperating with javac.

Charles Oliver Nutter

unread,
Dec 21, 2009, 4:21:44 AM12/21/09
to jvm-languages, Martin C. Martin
On Sun, Dec 20, 2009 at 8:28 PM, Per Bothner <p...@bothner.com> wrote:
> The problem is picking a representation.  A .class file is one valid
> well-understood representation.  An API might be better, but designing
> one for multiple languages is probably not feasible.  (Remember CORBA.)
>
> I think the best choice for an API may be javax.lang.model.*.
> It very Java-centric, but that's probably the best we can do.
> It has the big advantage that javac already implements it.

I was actually thinking along the lines of the mirror API that the
Annotation Processing Tool uses:

http://java.sun.com/j2se/1.5.0/docs/guide/apt/mirror/overview-summary.html

Again, I may be naïve but it seems to me we could easily form a common
mirror type system that we'd feed to the orchestration logic. Starting
with the mirror API would be pretty easy, since it's basically *all*
interfaces.

Going back to the example I gave Jochen:

class MyClass < my.package.Foo
def foo(a: java.lang.String, b: my.package.Bar)
...
end
end

This script would produce a ClassDeclaration for MyClass, with a
ClassType referencing (symbolically) my.package.Foo. We'd be providing
MyClass, and the compiler would see that we're consuming
my.package.Foo and look for a declaration elsewhere. Our MyClass
ClassDeclaration would provide one method "foo" with two
ParameterDeclarations point at ClassTypes for java.lang.String and
my.package.Bar. The compiler would see these and add them to the list
of produced and consumed types.

I'll see if I can mock up a prototype over the holidays, or at least a
compiler plugin for JRuby that can produce these interfaces.

- Charlie

>> True, this approach saves compiler writers from figuring out another
>> representation for types and modules.  However, compiler writers already
>> need such a representation, e.g. if a single Groovy file contains
>> classes A and B, and A refers to B and vice versa.  Groovy has a
>> ClassNode, and distinguishes between primary ClassNodes (that the
>> compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.
>
> Indeed.  Kawa uses gnu.bytecode.ClassType for the same.
>
> The problem is we all have *different* representations, so that does
> not help with inter-language interoperability.  And unless you get javac
> to implement the same representation, you have a problem: A JVM lingua
> franca not supported by Java has limited use ...
>
>> Also, this approach makes extra work for compiler writers in generating
>> a .class stub.  Charlie is proposing a common lingua franca for these
>> representations of classes, so we don't have to generate the stub.
>
> I won't help my breath, but look forward to it.
>
> I think an API that builds on (extends) javax.lang.model may make sense,
> since that means interoperating with javac.
> --
>        --Per Bothner
> p...@bothner.com   http://per.bothner.com/
>

Miles Sabin

unread,
Dec 21, 2009, 4:56:53 AM12/21/09
to jvm-la...@googlegroups.com, Martin C. Martin
On Mon, Dec 21, 2009 at 9:21 AM, Charles Oliver Nutter
<hea...@headius.com> wrote:
> On Sun, Dec 20, 2009 at 8:28 PM, Per Bothner <p...@bothner.com> wrote:
>> The problem is picking a representation.  A .class file is one valid
>> well-understood representation.  An API might be better, but designing
>> one for multiple languages is probably not feasible.  (Remember CORBA.)
>>
>> I think the best choice for an API may be javax.lang.model.*.
>> It very Java-centric, but that's probably the best we can do.
>> It has the big advantage that javac already implements it.
>
> I was actually thinking along the lines of the mirror API that the
> Annotation Processing Tool uses:
>
> http://java.sun.com/j2se/1.5.0/docs/guide/apt/mirror/overview-summary.html
>
> Again, I may be naïve but it seems to me we could easily form a common
> mirror type system that we'd feed to the orchestration logic. Starting
> with the mirror API would be pretty easy, since it's basically *all*
> interfaces.

I like the sound of this ... good choice :-)

Jim White

unread,
Dec 21, 2009, 7:34:01 AM12/21/09
to jvm-la...@googlegroups.com
Charles Oliver Nutter wrote:

> On Sun, Dec 20, 2009 at 8:28 PM, Per Bothner <p...@bothner.com> wrote:
>
>>The problem is picking a representation. A .class file is one valid
>>well-understood representation. An API might be better, but designing
>>one for multiple languages is probably not feasible. (Remember CORBA.)
>>
>>I think the best choice for an API may be javax.lang.model.*.
>>It very Java-centric, but that's probably the best we can do.
>>It has the big advantage that javac already implements it.
>
>
> I was actually thinking along the lines of the mirror API that the
> Annotation Processing Tool uses:
>
> http://java.sun.com/j2se/1.5.0/docs/guide/apt/mirror/overview-summary.html

That private API has been replaced by the standard Java extension API
Per referred to (javax.annotation.processing.Processor processes
javax.lang.model.element.TypeElements).

http://java.sun.com/javase/6/docs/technotes/guides/apt/index.html

And as I said, if you want to do something like Groovy's unified
compiler, it is done by generating .class files (as Per also suggests)
and could be implemented in a standard manner integrated with javac
using javax.tools. That doesn't mean that .class files have to be
generated, because JavaFileManagers can provide the bytes any way they
please.

And if you want JDK 1.5 compatibility, the javax classes can be jarred
up as needed, like is done for JSR-223 with LiveTribe's implementation
(which includes public Maven 2 artifacts). It is used by both Groovy
and Jython for compiling on JDK 1.5.

Jim

Rich Hickey

unread,
Dec 21, 2009, 7:43:21 AM12/21/09
to JVM Languages

On Dec 20, 8:37 pm, "Martin C. Martin" <mar...@martincmartin.com>
wrote:


> Per Bothner wrote:
> > On 12/17/2009 09:02 AM, Charles Oliver Nutter wrote:
> >> I may be naive, but what we really want here is simply a set of common
> >> protocols for doing the following:
>
> >> * Requesting from a language what types and methods are provided by a
> >> set of source files
> >> * Providing to a language services to look up types from other languages
>
> > To handle dependency cycles or unknown dependies one can use a "compile
> > for export" option. This just generates a stub .class file, that
> > contains no code, or private fields, or anything else that is not
> > part of the "public contract" of the module. The key is that it
> > should be possible to generate this without reading any other
> > class files, and so cycles or other dependencies aren't a problem.
>
> Charlie's point is that generating a .class file is overkill. If we
> just need to know a list of types & methods, then we just need a
> function that takes source files and returns such a list. We don't need
> to go all the way to generating valid .class files.
>
>
>
> > Generating the stub for Java is in theory easy: You can generate
> > stub method bodies by replacing all method bodies by "return null"
> > or whatever is appropriate for the method's return type, and
> > otherwise not checking for type compatibility and so.
>
> Why bother? Why not just stop once you've parsed enough to know the
> classes & methods, and return some representation of those?
>

Generating .class files isn't necessarily 'overkill'. I agree with
Per, this is a standard representation. Given such .class files in the
classpath, all of our existing compilers and tools will 'just work',
right now. Reflection will work, etc.

>
> > (4) For each source file, compile it "for export". I.e. generate
> > the stub .classes mentioned above, leaving them in some temporary
> > location (ideally a "MemoryFileSystem").
> > (5) For each source file, compile it for real, setting up the class
> > path to search the above temporary location first. When it needs to
> > import definitions from another file or class, it just reads the
> > sub class in "the normal way". The compiler should have access to
> > the map generated in (3), and should have a mechanism to invoke
> > the corresponding compiler "out of order". For example, a Scheme
> > library may want to execute a macro at compile time, and that
> > macro may depend on methods in some other class; thus it would
> > be nice to have a "compile this file first" mechanism.
>
> > This approach still means an API invoking compilers with various
> > options, but we don't need to invent a protocol for representing
> > types and modules - we just use the .class files.
>
> True, this approach saves compiler writers from figuring out another
> representation for types and modules. However, compiler writers already
> need such a representation, e.g. if a single Groovy file contains
> classes A and B, and A refers to B and vice versa. Groovy has a
> ClassNode, and distinguishes between primary ClassNodes (that the
> compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.
>

In Clojure, when in such a situation, I generate stub classes in
memory, just as Per describes, and for the same reason, everything
downstream just works.

> Also, this approach makes extra work for compiler writers in generating
> a .class stub. Charlie is proposing a common lingua franca for these
> representations of classes, so we don't have to generate the stub.
>

There is extra work in any case, as no one is yet generating what
Charlie is proposing. Also, 'compiling with options' seems more
amenable to implementation via ant et al, vs programmatic invocation
of and interaction with compilers via an API.

Rich

David MacIver

unread,
Dec 21, 2009, 10:20:16 AM12/21/09
to jvm-la...@googlegroups.com
2009/12/21 Rich Hickey <richh...@gmail.com>:

>> Why bother?  Why not just stop once you've parsed enough to know the
>> classes & methods, and return some representation of those?
>>
>
> Generating .class files isn't necessarily 'overkill'. I agree with
> Per, this is a standard representation. Given such .class files in the
> classpath, all of our existing compilers and tools will 'just work',
> right now. Reflection will work, etc.

One issue to bear in mind is that this potentially generates a lot of
stubs where it doesn't need to. With Scala at least the generation of
a large number of small class files is a big performance hit.
Potentially doubling the number generated (even if the version
generated in the first pass is much smaller) is going to slow down the
compilation process quite a lot, potentially for no reason at all: If
another language never requests that class then there was no need to
generate the stub. This could easily be addressed by an API (and if
that API wants to generate class stubs, that's not a problem) which
asks for classes rather than generating them all up front.

Another potential issue to worry about with Scala is that type
inference across language boundaries introduces you to a world of
pain. As long as only one language involved is type inferred (e.g.
Java + Scala) this isn't a big deal, but when you've got cyclic type
inference dependencies between languages you're going to suffer.

James Iry

unread,
Dec 21, 2009, 10:21:07 AM12/21/09
to jvm-la...@googlegroups.com, Charles Oliver Nutter
This seems like a clean approach in general, but it needs some refinement to deal with languages with type inference.   As a simple example, a bit of Scala code can say

class SomeScalaClass {
   def foo(x : SomeNonScalaClass) = x.something()
}

The return type of foo depends on the return type of SomeNonScalaClass#something.  If that's the only dependency then compiling in order solves the problem.  But if SomeNonScalaClass is written in a type inferring language and it has a another method that depends on the return type of SomeScalaClass#foo then neither side can figure things out in one pass over the whole .class.  The dependency chain is only breakable by being able to ask for typing of individual methods.

Which brings me to a pathological case.  It's entirely possible to have

class SomeScalaClass {
   def something(x : SomeNonScalaClass, n : Int) = if (n == 0) 48 else x.something(this, n - 1)
}

// this is a class written in a hypothetical non-Scala, but type-inferring language
class SomeNonScalaClasss
   define something[x : SomeScalaClass]
      if (n ==0)
         return 900
     else
        return x.something(this, n - 1)

Obviously a silly example, but does illustrate that mutual recursion needs to be dealt with, perhaps by punting and insisting that one side or the other annotate a type.


Charles Oliver Nutter

unread,
Dec 21, 2009, 1:39:46 PM12/21/09
to jvm-languages
On Mon, Dec 21, 2009 at 6:34 AM, Jim White <j...@pagesmiths.com> wrote:
> Charles Oliver Nutter wrote:
>> I was actually thinking along the lines of the mirror API that the
>> Annotation Processing Tool uses:
>>
>> http://java.sun.com/j2se/1.5.0/docs/guide/apt/mirror/overview-summary.html
>
> That private API has been replaced by the standard Java extension API
> Per referred to (javax.annotation.processing.Processor processes
> javax.lang.model.element.TypeElements).
>
> http://java.sun.com/javase/6/docs/technotes/guides/apt/index.html

Ok, now I see where the right bits go in the javax.lang stuff...I was
thrown off initially because there's far fewer interfaces, but they
appear to have been condensed a bit (ExecutableElement for all
method-like things, and so on). So using javax.lang plus a jarred-up
backport version of it would be just as good as apt, and javac would
speak it already if you used the right javac. We're making progress
now :)

The "javac implements these interfaces" bit would ned to be figured
out for Java 5 users, of course.

> And as I said, if you want to do something like Groovy's unified
> compiler, it is done by generating .class files (as Per also suggests)
> and could be implemented in a standard manner integrated with javac
> using javax.tools.  That doesn't mean that .class files have to be
> generated, because JavaFileManagers can provide the bytes any way they
> please.

So your suggestion would be to use the javax.tool interfaces and
produce dummy stubs live for the compilation process? That's not bad I
guess, but it seems like it would be a lot cleaner and less hacky to
generate the appropriate interfaces and coordinate them directly. Need
to think about that a bit.

In any case, there still needs to be a top-level compiler coordinating
the generation of either stub .class files or javax.lang.element data
and then eventually triggering the "final" compilation, and that
top-level needs to be agnostic of any of the languages. So we're still
in the same boat as far as needing a coordinator.

- Charlie

Charles Oliver Nutter

unread,
Dec 21, 2009, 1:50:13 PM12/21/09
to jvm-languages
On Mon, Dec 21, 2009 at 6:43 AM, Rich Hickey <richh...@gmail.com> wrote:
> Generating .class files isn't necessarily 'overkill'. I agree with
> Per, this is a standard representation. Given such .class files in the
> classpath, all of our existing compilers and tools will 'just work',
> right now. Reflection will work, etc.

I'm not sure where reflection comes into this, unless you're referring
to one way the top-level compiler stuff would get class data (and
reflection may be a particularly poor way, since it has to *load* the
classes and requires that they be valid/complete to do so, rather than
just reading the data format). Generally we're talking about .class
data and Java-centric type structures across languages. What do you
need reflection for at compile time?

> In Clojure, when in such a situation, I generate stub classes in
> memory, just as Per describes, and for the same reason, everything
> downstream just works.

Sure, I'm sure the generating dumb stubs can work, but it essentially
means having to do almost the entire compile process twice for every
language:

* All languages generate stubs into a common location (possibly
in-memory and combined with classpath)
* All languages generate final code based on that common location

And we *still* need a coordinator since we all have different ways to
generate stubs or trigger a final compilation.

I don't really care much about the data format. Stub .class files are
essentially just a richer and less flexible version of the
mirror/javax.lang stuff, but certainly contain all the data we need
(and potentially a lot of data we don't need...or can't generate
without a full compile?) so they'd probably be fine for a simple first
pass. But are you saying that having a common compiler infrastructure
that actually speaks mirror interfaces rather than .class data
*wouldn't* be good to have?

- Charlie

>
>> Also, this approach makes extra work for compiler writers in generating
>> a .class stub.  Charlie is proposing a common lingua franca for these
>> representations of classes, so we don't have to generate the stub.
>>
>
> There is extra work in any case, as no one is yet generating what
> Charlie is proposing. Also, 'compiling with options' seems more
> amenable to implementation via ant et al, vs programmatic invocation
> of and interaction with compilers via an API.
>
> Rich
>

Rich Hickey

unread,
Dec 21, 2009, 3:24:56 PM12/21/09
to JVM Languages

On Dec 21, 1:50 pm, Charles Oliver Nutter <head...@headius.com> wrote:


> On Mon, Dec 21, 2009 at 6:43 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > Generating .class files isn't necessarily 'overkill'. I agree with
> > Per, this is a standard representation. Given such .class files in the
> > classpath, all of our existing compilers and tools will 'just work',
> > right now. Reflection will work, etc.
>
> I'm not sure where reflection comes into this, unless you're referring
> to one way the top-level compiler stuff would get class data (and
> reflection may be a particularly poor way, since it has to *load* the
> classes and requires that they be valid/complete to do so, rather than
> just reading the data format). Generally we're talking about .class
> data and Java-centric type structures across languages. What do you
> need reflection for at compile time?
>

You are missing the point. .class file *are* a data format. And
everything in the Java ecosystem understands them already. APIs *are
not* a data format, they are a means for live communication between
program entities.

> > In Clojure, when in such a situation, I generate stub classes in
> > memory, just as Per describes, and for the same reason, everything
> > downstream just works.
>
> Sure, I'm sure the generating dumb stubs can work, but it essentially
> means having to do almost the entire compile process twice for every
> language:
>

Dumb stubs vs smart stubs?

That doesn't follow. We will have to do precisely and just as much
work as is necessary to determine our types and method signatures,
then generate something (either a stub .class file, *or* an instance
of something implementing these javax interfaces, which in turn need
more instances of other interfaces, essentially forcing each of us to
duplicate the reflection API, ugh, why?)

> * All languages generate stubs into a common location (possibly
> in-memory and combined with classpath)
> * All languages generate final code based on that common location
>
> And we *still* need a coordinator since we all have different ways to
> generate stubs or trigger a final compilation.
>

Yes. But with Per's suggestion, that process is significantly more
decoupled, and allows for more reuse of existing capabilities.

> I don't really care much about the data format. Stub .class files are
> essentially just a richer and less flexible version of the
> mirror/javax.lang stuff, but certainly contain all the data we need
> (and potentially a lot of data we don't need...or can't generate
> without a full compile?)

Really, I don't know what you are talking about. Generating a class
file is a simple thing for all of us to do, and we already do it. A
'stub' classfile doesn't need anything other than fabricated returns.
There is no "potentially a lot of..."

And again, javax.lang is not a data format.

> so they'd probably be fine for a simple first
> pass. But are you saying that having a common compiler infrastructure
> that actually speaks mirror interfaces rather than .class data
> *wouldn't* be good to have?
>

I'm saying it would be premature to dismiss Per's suggestion with
hyperbole about *dumb* stubs and "a lot of xxx we don't need". There
is tremendous value in:

an actual data format
that everyone already consumes
and everyone already produces

Being able to point a tool at some .class files and get a javax.lang
model would be great, but that should be a single job (if not done
already) we all could leverage, vs each of us having to implement
javax.lang interfaces, and programmatic access to same, directly.

Rich

John Cowan

unread,
Dec 21, 2009, 3:52:31 PM12/21/09
to jvm-la...@googlegroups.com
On Mon, Dec 21, 2009 at 3:24 PM, Rich Hickey <richh...@gmail.com> wrote:

> I'm saying it would be premature to dismiss Per's suggestion with
> hyperbole about *dumb* stubs and "a lot of xxx we don't need". There
> is tremendous value in:
>
>  an actual data format
>  that everyone already consumes
>  and everyone already produces

I agree. Because my compiler is not (yet) self-hosting, I use "javap
-public" to get what I need out of .class files, so I definitely want
an offline way of getting information about classes I don't generate.

In any case, the issue of how to consume a remote class is not
trivial. I know how to consume Java classes and expose them, but
classes compiled by a non-Java compiler may have invocation and use
patterns that I don't know. This is especially true in "foo module =
Java class" compilation strategies, including my own -- a random JVM
programming language wouldn't know what to do (more importantly, what
not to do) with one of my .class files.

>
> Being able to point a tool at some .class files and get a javax.lang
> model would be great, but that should be a single job (if not done
> already) we all could leverage, vs each of us having to implement
> javax.lang interfaces, and programmatic access to same, directly.
>
> Rich
>

> --
>
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
>
>
>

--

Charles Oliver Nutter

unread,
Dec 21, 2009, 3:54:24 PM12/21/09
to jvm-languages
Perhaps we should take a deep breath before proceeding. Ready?

On Mon, Dec 21, 2009 at 2:24 PM, Rich Hickey <richh...@gmail.com> wrote:
> On Dec 21, 1:50 pm, Charles Oliver Nutter <head...@headius.com> wrote:
>> I'm not sure where reflection comes into this, unless you're referring
>> to one way the top-level compiler stuff would get class data (and
>> reflection may be a particularly poor way, since it has to *load* the
>> classes and requires that they be valid/complete to do so, rather than
>> just reading the data format). Generally we're talking about .class
>> data and Java-centric type structures across languages. What do you
>> need reflection for at compile time?
>>
>
> You are missing the point. .class file *are* a data format. And
> everything in the Java ecosystem understands them already. APIs *are
> not* a data format, they are a means for live communication between
> program entities.

Eventually a shared compiler has to work with some API against some
"data" to coordinate different languages and the types they consume or
produce. Whether that data comes from class files (a common format) or
just appears to the coordinator as a set of interface-implementing
datatypes (backed by per-language data), I don't care. But as the
inference case shows, it is not always possible to generate stubs in a
single shot.

>> Sure, I'm sure the generating dumb stubs can work, but it essentially
>> means having to do almost the entire compile process twice for every
>> language:
>>
>
> Dumb stubs vs smart stubs?
>
> That doesn't follow. We will have to do precisely and just as much
> work as is necessary to determine our types and method signatures,
> then generate something (either a stub .class file, *or* an instance
> of something implementing these javax interfaces, which in turn need
> more instances of other interfaces, essentially forcing each of us to
> duplicate the reflection API, ugh, why?)

If your claim is that we can just have all languages generate stub
classes, point all languages at the stub classes, and have them
regenerate real classes...then I think you're wrong. James Iry's case
of type inference in Scala shows that much, since the return type of a
given method (which you would need to put in the stub) depends on the
return value of some other method (which may not be available at
stub-generation time or which may have circular dependencies).

Stubs work great if everyone knows (at least symbolically) what types
all their Java-facing types will return and receive, and can determine
that based on existing on-disk or in-memory data. That's obvious, and
that's how, for example, the Groovy joint compiler works (since it and
Java do not infer return types in the way that Scala does). Is that
all we want?

>> * All languages generate stubs into a common location (possibly
>> in-memory and combined with classpath)
>> * All languages generate final code based on that common location
>>
>> And we *still* need a coordinator since we all have different ways to
>> generate stubs or trigger a final compilation.
>>
>
> Yes. But with Per's suggestion, that process is significantly more
> decoupled, and allows for more reuse of existing capabilities.

Sure, sounds great! I don't think I ever said it was wrong...just that
it may not cover necessary cases for the languages under discussion.
Specifically, Scala can't necessarily generate complete stubs without
some interaction with the other languages and their type definitions,
which could be as-yet-uncompiled.

>> I don't really care much about the data format. Stub .class files are
>> essentially just a richer and less flexible version of the
>> mirror/javax.lang stuff, but certainly contain all the data we need
>> (and potentially a lot of data we don't need...or can't generate
>> without a full compile?)
>
> Really, I don't know what you are talking about. Generating a class
> file is a simple thing for all of us to do, and we already do it. A
> 'stub' classfile doesn't need anything other than fabricated returns.
> There is no "potentially a lot of..."

You are correct if, as stated previously, it's possible to generate
stubs in isolation without a give-and-take mediation between
languages.

> And again, javax.lang is not a data format.

It's an API to data. The format is behind the scenes is irrelevant as
far as the API is concerned. Isn't that the point of such an API?

>> so they'd probably be fine for a simple first
>> pass. But are you saying that having a common compiler infrastructure
>> that actually speaks mirror interfaces rather than .class data
>> *wouldn't* be good to have?
>>
>
> I'm saying it would be premature to dismiss Per's suggestion with
> hyperbole about *dumb* stubs and "a lot of xxx we don't need". There
> is tremendous value in:
>
>  an actual data format
>  that everyone already consumes
>  and everyone already produces

I never dismissed it. Please don't put words in my mouth. I only
suggested that what I had in mind was per-language impls of the mirror
API. I know of the existing efforts that generate stubs, and I know
that approach works well for the straightforward case.

By "dumb" I didn't mean "stupid idea"...I meant "raw" or "simple" or
"basic". Generating stub classes obviously can work as long as the
participating languages are able to generate stubs without interacting
with each other. And that may be enough for most use cases. But
perhaps it's not enough to cover the cases that the current (for
example) Scala/Java joint compiler is able to cover?

> Being able to point a tool at some .class files and get a javax.lang
> model would be great, but that should be a single job (if not done
> already) we all could leverage, vs each of us having to implement
> javax.lang interfaces, and programmatic access to same, directly.

And if .class files are enough, I will happily concede that they're
the best approach for the problem. Like I said...I don't care!
Ultimately a joint compiler that knows how to work with all those
.class files (or with a mirror API to language-specific data) and
trigger a subsequent "complete" compile still needs to be there, which
is the point of this thread.

We're just brainstorming here :)

- Charlie

Per Bothner

unread,
Dec 21, 2009, 5:23:01 PM12/21/09
to jvm-la...@googlegroups.com, Charles Oliver Nutter
On 12/21/2009 12:54 PM, Charles Oliver Nutter wrote:

> If your claim is that we can just have all languages generate stub
> classes, point all languages at the stub classes, and have them
> regenerate real classes...then I think you're wrong. James Iry's case
> of type inference in Scala shows that much, since the return type of a
> given method (which you would need to put in the stub) depends on the
> return value of some other method (which may not be available at
> stub-generation time or which may have circular dependencies).

If the return type of an exported method depends on type inference
and that inference may have circular cross-module dependencies,
my reaction is: Don't do that. You can't define a stable API
that way, at least not a library API. It might be OK for a
module-private method (in the sense of JSR-294 - i.e. a group of
classes that go together), but the use-case where you might
need to infer return types across language boundaries seems
pretty low-priority.

> Stubs work great if everyone knows (at least symbolically) what types
> all their Java-facing types will return and receive, and can determine
> that based on existing on-disk or in-memory data. That's obvious, and
> that's how, for example, the Groovy joint compiler works (since it and
> Java do not infer return types in the way that Scala does). Is that
> all we want?

It may be good enough - or at least a good start.

> And if .class files are enough, I will happily concede that they're
> the best approach for the problem. Like I said...I don't care!
> Ultimately a joint compiler that knows how to work with all those
> .class files (or with a mirror API to language-specific data) and
> trigger a subsequent "complete" compile still needs to be there, which
> is the point of this thread.

And I'll happily concede using a standard API might work better.
A key issue is how easy it is to fit javac into the framework.
In theory we have the option of adding functionality to OpenJDK,
but of course we really want something that can work with Java 6
(or even Java 5, though that's lower priority).

Charles Oliver Nutter

unread,
Dec 21, 2009, 5:58:39 PM12/21/09
to Per Bothner, jvm-languages
On Mon, Dec 21, 2009 at 4:23 PM, Per Bothner <p...@bothner.com> wrote:
> If the return type of an exported method depends on type inference
> and that inference may have circular cross-module dependencies,
> my reaction is:  Don't do that.  You can't define a stable API
> that way, at least not a library API.  It  might be OK for a
> module-private method (in the sense of JSR-294 - i.e. a group of
> classes that go together), but the use-case where you might
> need to infer return types across language boundaries seems
> pretty low-priority.

Sure, I agree with that. It's an outlier, in any case.

>> And if .class files are enough, I will happily concede that they're
>> the best approach for the problem. Like I said...I don't care!
>> Ultimately a joint compiler that knows how to work with all those
>> .class files (or with a mirror API to language-specific data) and
>> trigger a subsequent "complete" compile still needs to be there, which
>> is the point of this thread.
>
> And I'll happily concede using a standard API might work better.
> A key issue is how easy it is to fit javac into the framework.
> In theory we have the option of adding functionality to OpenJDK,
> but of course we really want something that can work with Java 6
> (or even Java 5, though that's lower priority).

I think we need more than that: there's other non-Sun-javac Java
compilers out there, that would presumably want to participate. Maybe
we by presenting stubs we'll be good enough, but having a standard
interface for each language that can 1. generate stubs and 2. trigger
final compilation would at least be necessary. And having a
per-language API that could get mirrored types out would be even
better, since that feeds into IDE support for recognizing where type
and method declarations are coming from without having to regenerate
stubs all the time.

I don't think it would be very hard to implement the javax.lang
interfaces for any language that wants to expose Java types. If we
ignore the circular type-inference problem for a moment, that would be
almost as easy to do as generating stubs.

- Charlie

- Charlie

Rich Hickey

unread,
Dec 21, 2009, 6:25:03 PM12/21/09
to JVM Languages

On Dec 21, 3:54 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> Perhaps we should take a deep breath before proceeding. Ready?
>

> On Mon, Dec 21, 2009 at 2:24 PM, Rich Hickey <richhic...@gmail.com> wrote:
> > On Dec 21, 1:50 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> >> I'm not sure where reflection comes into this, unless you're referring
> >> to one way the top-level compiler stuff would get class data (and
> >> reflection may be a particularly poor way, since it has to *load* the
> >> classes and requires that they be valid/complete to do so, rather than
> >> just reading the data format). Generally we're talking about .class
> >> data and Java-centric type structures across languages. What do you
> >> need reflection for at compile time?
>
> > You are missing the point. .class file *are* a data format. And
> > everything in the Java ecosystem understands them already. APIs *are
> > not* a data format, they are a means for live communication between
> > program entities.
>
> Eventually a shared compiler has to work with some API against some
> "data" to coordinate different languages and the types they consume or
> produce. Whether that data comes from class files (a common format) or
> just appears to the coordinator as a set of interface-implementing
> datatypes (backed by per-language data), I don't care.

But it's not just the coordinator, it's also the consumers (other
compilers). And we already know how to consume type information from
class files.

> But as the
> inference case shows, it is not always possible to generate stubs in a
> single shot.
>

When, how, with what granularity, and how often to generate type
information seems to me orthogonal to how that information is
represented. Is there any evidence that the javax.lang representation
is more suitable for languages with type inference?

> If your claim is that we can just have all languages generate stub
> classes, point all languages at the stub classes, and have them
> regenerate real classes...then I think you're wrong.

That isn't my claim. I am just saying, when you are generating Java
type information, class files are a lingua franca.

> James Iry's case
> of type inference in Scala shows that much, since the return type of a
> given method (which you would need to put in the stub) depends on the
> return value of some other method (which may not be available at
> stub-generation time or which may have circular dependencies).
>

I fail to see how stubs vs javax.lang affects this. If you are going
to have difficulty producing a stub you'll have difficulty producing a
javax.lang descriptor as well.

> >> And we *still* need a coordinator since we all have different ways to
> >> generate stubs or trigger a final compilation.
>
> > Yes. But with Per's suggestion, that process is significantly more
> > decoupled, and allows for more reuse of existing capabilities.
>
> Sure, sounds great! I don't think I ever said it was wrong...just that
> it may not cover necessary cases for the languages under discussion.
> Specifically, Scala can't necessarily generate complete stubs without
> some interaction with the other languages and their type definitions,
> which could be as-yet-uncompiled.
>

So we're going to build some dynamic network of compilers cooperating
to figure things out? I don't think that's covered by javax.lang. The
multiple type-inferring languages problem seems like a research topic
to me.

> > And again, javax.lang is not a data format.
>
> It's an API to data. The format is behind the scenes is irrelevant as
> far as the API is concerned. Isn't that the point of such an API?
>

Except that everyone has to implement that API when they already know
how to deal with at least one format for the data already. APIs are
different from data - we might interact with XML via a DOM API but we
don't build systems by connecting their DOMs to each other. Going
through a static data representation allows us to decouple execution.

Let's say you use javax.lang in the coordinator, and it asks a
compiler 'do you have type X.Y.Z in this file?' and it says 'yes', and
the coordinator asks for the type info and the compiler gives it a
javax.lang thingy. Now what? How does the coordinator convey that to
any other compilers? It's not data, it's a live object in the
coordinator process.

Rich

Charles Oliver Nutter

unread,
Dec 21, 2009, 7:14:13 PM12/21/09
to jvm-languages
On Mon, Dec 21, 2009 at 5:25 PM, Rich Hickey <richh...@gmail.com> wrote:
>> Eventually a shared compiler has to work with some API against some
>> "data" to coordinate different languages and the types they consume or
>> produce. Whether that data comes from class files (a common format) or
>> just appears to the coordinator as a set of interface-implementing
>> datatypes (backed by per-language data), I don't care.
>
> But it's not just the coordinator, it's also the consumers (other
> compilers). And we already know how to consume type information from
> class files.
>
>> But as the
>> inference case shows, it is not always possible to generate stubs in a
>> single shot.
>>
>
> When, how, with what granularity, and how often to generate type
> information seems to me orthogonal to how that information is
> represented. Is there any evidence that the javax.lang representation
> is more suitable for languages with type inference?

I don't know yet :)

I don't know whether the inference case is a requirement.

I don't know whether the fine-grained online reflective querying of
type information from arbitrary languages' files is a requirement.

I'm exploring options.

>> If your claim is that we can just have all languages generate stub
>> classes, point all languages at the stub classes, and have them
>> regenerate real classes...then I think you're wrong.
>
> That isn't my claim. I am just saying, when you are generating Java
> type information, class files are a lingua franca.

A lingua franca, yes, but a cumbersome one to deal with if your goal
is to do more than just suck in source files and spit out class files.

I'm interested in that simple case, certainly, but I am also
interested in the possibility of a richer API for per-language
compilers that might make it possible for IDEs (as an example) to see
across languages. You could probably do some of that with stubs, but
it seems like it would require regenerating stubs on every save,
rather than simply re-requesting javax.lang representations for a file
or subset of a file.

Can you point us toward the code in Clojure that handles compiling
circular references in generated Java classes?

>> James Iry's case
>> of type inference in Scala shows that much, since the return type of a
>> given method (which you would need to put in the stub) depends on the
>> return value of some other method (which may not be available at
>> stub-generation time or which may have circular dependencies).
>>
>
> I fail to see how stubs vs javax.lang affects this. If you are going
> to have difficulty producing a stub you'll have difficulty producing a
> javax.lang descriptor as well.

Except that you don't have to produce the javax.lang descriptor all at
once, like you do with the class files. The method declarations could
be generated lazily:

1. Ask all languages what class declarations they provide
2. Ask all languages what types they need to consume to proceed with
determining their method declarations, and presumably have the type
information available
3. Ask the languages to proceed with producing method declarations
based on the now-available set of types

And at some point after this, each language will ideally have resolved
all types it needs for its class/method declarations and can proceed
to generate the final class format.

Again, just a napkin sketch of how this might work. I will grant that
any initial work on a joint compiler would be well-served by basing
type discovery on class files and generated stubs, since that's the
simple case and it covers common usage.

> So we're going to build some dynamic network of compilers cooperating
> to figure things out? I don't think that's covered by javax.lang. The
> multiple type-inferring languages problem seems like a research topic
> to me.

Yes, it may be. It may also be a useful discussion to have here, and
there may be aspects that apply to practical concerns (like IDE
support for type knowledge across multiple languages' sources).

Don't get too hung up on javax.lang as being the only potential API
either. This is an open discussion about the features, challenges, and
possible solutions for joint compiling multiple JVM languages at the
same time.

>> It's an API to data. The format is behind the scenes is irrelevant as
>> far as the API is concerned. Isn't that the point of such an API?
>>
>
> Except that everyone has to implement that API when they already know
> how to deal with at least one format for the data already. APIs are
> different from data - we might interact with XML via a DOM API but we
> don't build systems by connecting their DOMs to each other. Going
> through a static data representation allows us to decouple execution.
>
> Let's say you use javax.lang in the coordinator, and it asks a
> compiler 'do you have type X.Y.Z in this file?' and it says 'yes', and
> the coordinator asks for the type info and the compiler gives it a
> javax.lang thingy. Now what? How does the coordinator convey that to
> any other compilers? It's not data, it's a live object in the
> coordinator process.

Perhaps I didn't describe this well enough in earlier emails.

The coordinator would basically get two things from the language compilers:

1. What types and methods do you provide (perhaps in separate phases)
2. What types and methods do you consume (again, perhaps in separate phases)

The coordinator does as much or as little as is necessary to ensure
the consumed types match up with the provided types, which could
certainly mean triggering stubs to generate if necessary. The language
compilers could upcall into the coordinator to ask for information on
types they hope to consume, with the coordinator providing that type
information (potentially without ever generating stubs) by just using
the information it has gathered from other languages (and yes, from
real class and jar files already on disk).

And the language-specific compiler bits wouldn't necessarily even have
to be reimplemented for each language if generating stubs would be
enough for a given language. In that case, there could be a base
"StubbedClassDeclaration" that knows how to ask a language to generate
its stubs and reflectively inspect them.

- Charlie

Rémi Forax

unread,
Dec 22, 2009, 6:07:03 AM12/22/09
to jvm-la...@googlegroups.com
Le 22/12/2009 00:25, Rich Hickey a �crit :

If you have a type representing a type variable, it is.

[...]

> Rich
>

R�mi


Reply all
Reply to author
Forward
0 new messages