Right now I believe there are at least two separate efforts to build
joint compilers: the Groovy efforts to joint compile Groovy+Java,
Groovy+Scala, and Groovy+Java+Scala; and the Scala efforts to joint
compile Scala+Java.
Obviously we other JVM languages would like to join in the fun.
The common infrastructure would be responsible for resolving methods
(perhaps pluggable), getting lists of provided types and methods from
languages (definitely pluggable), and providing needed types and
methods to languages. Structurally, it seems like it wouldn't be
difficult to do.
I am now interested in this since this is essentially *all* JRuby
would need to have a full ahead-of-time compiler that can produce
normal Java classes, and it seems silly to either depend on
Groovy/Scala joint compilers (our distribution is already too big) or
to write this all completely from scratch.
So what's the deal, folks? Anyone else interested in this? Can we
cannibalize an existing compiler and make the type
provision/requisition logic per-language-pluggable?
- Charlie
--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To post to this group, send email to jvm-la...@googlegroups.com.
To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
Definitely. And not just for languages already running on the JVM, but
for languages that haven't gotten there yet. Like mine. :-)
I'm curious how close ASM is to fulfilling this role?
-Chuck
--
http://chuckesterbrook.com/
Yes, absolutely!
I may be naive, but what we really want here is simply a set of common
protocols for doing the following:
* Requesting from a language what types and methods are provided by a
set of source files
* Providing to a language services to look up types from other languages
* Resolution and error handling when no languages provide a type or
multiple languages incompatibly provide the same type
* Eventually generating the dumb bytecode once all participating
languages have satisfied their type dependencies
Given this, IDE support would be a natural extension.
And it's important to point out that "common compiler infrastructure"
has nothing to do with code generation...it's just a type-wrangler
that says "yes, I have that type" or "no, I don't have that type" and
then triggers all the language-specific bits to eventually cough up
their bytecode. That can't really be hard, can it?
- Charlie
ASM is entirely orthogonal, actually. There's no need to force a
specific bytecode-generation backend on any of the participating
languages; they simply need to support the protocols I describe in my
previous email. The "compiler" itself really just needs to act as a
type-managing intermediate between languages and a starter gun for
producing bytecode once all participating languages are happy. Whether
you eventually use ASM after that is up to you.
- Charlie
It might be worthwhile to open a channel to the JetBrains devs, since
they contributed some code IIRC for Groovy/Java compilation. Since a
bunch of their stuff is FOSS now, there might be a starting point
there (plus insight in what they learned.
Patrick
Along similar lines, what's being done on the .Net side of the fence around this? I'm just curious if there is prior art or experience that can be drawn upon to speed things up/avoid traps, or whether the .Net ecosystem is in the same state as the JVM ecosystem is for these sorts of capabilities.
Thanks,
Ryan Slobojan
DLR - the basis for all Iron* languages;
Sure, but how does a non-Microsoft CLR language project integrate with
e.g. the C# compiler--for example, the Boo language?
As far as I know, DLR does nothing for type resolution across
languages because it's...you know...dynamic :)
- Charlie
Note: take with a grain of salt, as I said: been ages since I looked at
the DLR.
We can try. I think a common approach would serve a high purpose in the
matter that type inferencing could be done on a higher level and there
are different ways to this as well. If you want to, we can try using the
Groovy compiler for the cannibalization. It would me for Groovy 2.0, in
which I want to do a big rewrite of many parts anyway. But the AST we
use is quite complex and the design is not best in all parts. So maybe
there is a better AST out there. At last with the Groovy AST I know how
to do things ;)
bye Jochen
--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/
I'm not even sure a common AST is required. Maybe I'm not making
myself clear enough?
The lingua franca for languages compiling together is a common set of
types they can use to collaborate through. In almost every case on the
JVM, that means we have to choose to present normal Java types to each
other (though in the future, we could present our own type systems,
but that's out of scope for this discussion). What I see as a common
compiler infrastructure is a set of mirror types (perhaps the same
ones used by the Annotation Processing Tool (apt) or by javac itself)
that all languages can produce and consume with their compiler
plugins. So for the following Ruby code:
class MyClass < my.package.Foo
def foo(a: java.lang.String, b: my.package.Bar)
...
end
end
Compiling this would produce a mirror class called MyClass (provided
by this source) that extends a mirror class called my.package.Foo
(requested from the high-level compiler stuff). MyClass contains one
method foo (provided) that takes two arguments java.lang.String and
my.package.Bar (requested).
Once all this has been resolved, the mirror types are used to generate
the bytecode for this code. There would be no exchanging of ASTs
necessary.
What I see as a high-level joint compiler is really a type-wrangler,
that can field requests for types from plugins and provide types to
plugins so that each language participating in compilation gets what
it needs (or else doesn't, and errors out). Then once everyone is
satisfied that existing libraries or other jointly-compiled code
provides all the types they need, they proceed to emit bytecode or
.class files. It's far simpler than having a shared AST or shared
nothing of per-language types...it's just a shared notion of Java
types and which Java types each language's source provides or
consumes.
- Charlie
> Ok, I think it's time we formally got some of our compiler mavens
> together to create a common AOT compiler infrastructure we can all
> plug into. This was discussed briefly at JVM-L, but I'm not sure
> anything has come of it yet.
> ...
You don't mention javax.tools, but it seems to me that it almost does
what we want here. And since it is properly named (javax.tools rather
than java.tools) it can be supported on JDK 1.5 (just like JSR-223).
For example, it provides the key interface for supplying source files
and class files given class names (JavaFileManager.getJavaFileForInput).
The trouble of course is that it is only concerned with a single source
file type. But that may be OK becase each compiler can have a
JavaFileManager in which JavaFileObject.Kind.SOURCE is it's respective
source.
Naturally the Kind.CLASS files would be the collective set, mapping the
stubs out as it goes along if using sequential compilation. Or perhaps
you'd stub everything and then compile each language with all the stubs
except the ones for it? That would be simplest and easiest to analyze.
But if you've got enough memory (including distributed compilation) then
multiple compiler JavaCompiler.CompilationTasks might be run in
parallel. Whether that would wind up doing the right thing
dependency-wise I'm not sure without working through a design and
examples. Certainly stubs simplify such things and avoid recursion.
I'm actually more interested in more general dependency schemes based on
RDF, but having something for JVM languages would be dandy and would be
useful in IFCX Wings.
Jim
---
http://www.ifcx.org
but then what of the compiler you want to cannibalize?
> The lingua franca for languages compiling together is a common set of
> types they can use to collaborate through. In almost every case on the
> JVM, that means we have to choose to present normal Java types to each
> other (though in the future, we could present our own type systems,
> but that's out of scope for this discussion). What I see as a common
> compiler infrastructure is a set of mirror types (perhaps the same
> ones used by the Annotation Processing Tool (apt) or by javac itself)
> that all languages can produce and consume with their compiler
> plugins. So for the following Ruby code:
>
> class MyClass < my.package.Foo
> def foo(a: java.lang.String, b: my.package.Bar)
> ...
> end
> end
>
> Compiling this would produce a mirror class called MyClass (provided
> by this source) that extends a mirror class called my.package.Foo
> (requested from the high-level compiler stuff). MyClass contains one
> method foo (provided) that takes two arguments java.lang.String and
> my.package.Bar (requested).
for me that is part of the AST. oh well, ok, maybe our AST does
represent a bit more than the bare program. Basically I understnad it so
that you want to have a data structure that can represent the classes
MyClass and Foo and in which MyClass will have a foo method with typed
parameters. In Groovy this is done by ClassNode, MethodNode and Parameter.
> Once all this has been resolved, the mirror types are used to generate
> the bytecode for this code. There would be no exchanging of ASTs
> necessary.
but common data structures or else communication will be a problem. And
that data structure could very well be part of the AST... so far my
thinking. If it is not part of the AST, then you have kind of generate a
second AST... or name it a communication data structure used to
communicate between different language. For me the AST is already a
communication data structure inside the compiler.
> What I see as a high-level joint compiler is really a type-wrangler,
> that can field requests for types from plugins and provide types to
> plugins so that each language participating in compilation gets what
> it needs (or else doesn't, and errors out). Then once everyone is
> satisfied that existing libraries or other jointly-compiled code
> provides all the types they need, they proceed to emit bytecode or
> .class files. It's far simpler than having a shared AST or shared
> nothing of per-language types...it's just a shared notion of Java
> types and which Java types each language's source provides or
> consumes.
well ok, here comes into play that Groovy has in a static aspect almost
the same type system as Java.
Still... all that is needed is to write this communication protocol
then, or not?
bye blackdrag
I think that we should ensure that a common AST is *not* required. As you say,
> The lingua franca for languages compiling together is a common set of
> types they can use to collaborate through.
and that's all we need.
Cheers,
Miles
--
Miles Sabin
tel: +44 (0)7813 944 528
skype: milessabin
http://www.chuusai.com/
http://twitter.com/milessabin
Yes, definitely.
> Can we cannibalize an existing compiler and make the type
> provision/requisition logic per-language-pluggable?
This should be doable for Scala as a compiler plugin.
I'm not so sure about this.
There is definitely a benefit to sharing infrastructure between IDEs
*for a given language* (eg. there is a fair amount of sharing between
the Scala tooling for Eclipse, Netbeans and IDEA), but I don't think
this carries over to multiple languages in a way that Charlie's kind
of proposal can address. That's because the bulk of that sharing would
overlap with the IDE's own frameworks at one end and collide with
language differences at the other (ie. the most interesting tooling is
very language specific).
Here's an example. To enable cross language search and refactoring in
Eclipse all participating languages need to hook into the JDT's
indexer. Doing this is inescapably Eclipse-specific, and beyond the
basic job of mapping non-Java language symbols into Java there's
really nothing much to factor out which isn't either IDE-specific or
language-specific.
Each language that wants to play has to contribute it's own
type-wrangling component. We need one for Java, one for Scala, one for
JRuby, one for Groovy, etc. ...
I'm still not grasping this. Could you sketch one of these? My mind
works better with concrete/operational examples these days.
>
> Cheers,
>
>
> Miles
>
> --
> Miles Sabin
> tel: +44 (0)7813 944 528
> skype: milessabin
> http://www.chuusai.com/
> http://twitter.com/milessabin
>
> --
>
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
>
>
>
--
GMail doesn't have rotating .sigs, but you can see mine at
http://www.ccil.org/~cowan/signatures
On Fri, Dec 18, 2009 at 3:25 AM, Miles Sabin <mi...@milessabin.com> wrote:
> There is definitely a benefit to sharing infrastructure between IDEs
> *for a given language* (eg. there is a fair amount of sharing between
> the Scala tooling for Eclipse, Netbeans and IDEA), but I don't think
> this carries over to multiple languages in a way that Charlie's kind
> of proposal can address. That's because the bulk of that sharing would
> overlap with the IDE's own frameworks at one end and collide with
> language differences at the other (ie. the most interesting tooling is
> very language specific).
Yes, I agree here. It may be useful as the dirt-simplest dumb tooling
for an IDE, so that you can have mixed-language projects all compiling
together, but that's not the key benefit of an IDE. You'd still need
per-language tooling for the language's own specific details for it to
be useful.
It's also worth pointing out that this joint compiler probably
wouldn't be the *only* compilation phase for any of these languages,
since we still will produce our own "abnormal" bytecode and .class
files that have nothing to do with our Java-facing types. So it's
mostly an orchestration agent for the types we *do* expose and
consume. And again, this is what all the current joint compilers
actually give you, so let's do that once and be done with it.
> Here's an example. To enable cross language search and refactoring in
> Eclipse all participating languages need to hook into the JDT's
> indexer. Doing this is inescapably Eclipse-specific, and beyond the
> basic job of mapping non-Java language symbols into Java there's
> really nothing much to factor out which isn't either IDE-specific or
> language-specific.
Perhaps someone from IntelliJ will chime in about how they do their
cross-language refactoring, but it's a much, much more complicated
process than just getting languages to compile together when they may
have cross-language Java type exposure. It's definitely out of scope
for what I want...I just want a base compiler structure that I can
provide a few Ruby-specific plugins for and know that cross-compiling
with any other plugged language will just work. That's "easy".
- Charlie
From a "compiler" perspective, there's a more in-depth project called CCI
(Common Compiler Infrastructure) up on Codeplex. But I think it's more
in-depth than Charlie's looking for.
Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com
> --
>
> You received this message because you are subscribed to the Google
> Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-
> languages+...@googlegroups.com.
Ted Neward
Java, .NET, XML Services
Consulting, Teaching, Speaking, Writing
http://www.tedneward.com
> -----Original Message-----
> From: jvm-la...@googlegroups.com [mailto:jvm-
> lang...@googlegroups.com] On Behalf Of Werner Schuster (murphee)
> Sent: Thursday, December 17, 2009 9:45 AM
> To: jvm-la...@googlegroups.com
> Subject: Re: [jvm-l] Common compiler infrastructure?
>
> --
>
> You received this message because you are subscribed to the Google
> Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-
> languages+...@googlegroups.com.
If there are no dependency cycles, and the dependencies are known, then
it is easy: If files in language A depend on files in language B, but
not vice versa, just compile the files in language B before those in
language A, and then have language A's compiler read the B .class files.
We all do this, at least in the case that B==Java.
To handle dependency cycles or unknown dependies one can use a "compile
for export" option. This just generates a stub .class file, that
contains no code, or private fields, or anything else that is not
part of the "public contract" of the module. The key is that it
should be possible to generate this without reading any other
class files, and so cycles or other dependencies aren't a problem.
Generating the stub for Java is in theory easy: You can generate
stub method bodies by replacing all method bodies by "return null"
or whatever is appropriate for the method's return type, and
otherwise not checking for type compatibility and so. The import
statements allow replacing class names by the full-qualified names.
There are some complications: one is that import-on-depend does need
to search for other classes. So you need one extra step, as below.
So the multi-language compiler-driver does:
(1) Figure out the set of source files to (possibly) compile.
(2) For each source file, figure out the compiler to use, presumably
using extensions, though it could also use mime-types or looking at
the source file itself or some property datebase.
(3) For each source file, ask its compiler the set of class names
it might generate, using some to-be-specified simple protocol.
(4) For each source file, compile it "for export". I.e. generate
the stub .classes mentioned above, leaving them in some temporary
location (ideally a "MemoryFileSystem").
(5) For each source file, compile it for real, setting up the class
path to search the above temporary location first. When it needs to
import definitions from another file or class, it just reads the
sub class in "the normal way". The compiler should have access to
the map generated in (3), and should have a mechanism to invoke
the corresponding compiler "out of order". For example, a Scheme
library may want to execute a macro at compile time, and that
macro may depend on methods in some other class; thus it would
be nice to have a "compile this file first" mechanism.
This approach still means an API invoking compilers with various
options, but we don't need to invent a protocol for representing
types and modules - we just use the .class files.
--
--Per Bothner
p...@bothner.com http://per.bothner.com/
Per Bothner wrote:
> On 12/17/2009 09:02 AM, Charles Oliver Nutter wrote:
>> I may be naive, but what we really want here is simply a set of common
>> protocols for doing the following:
>>
>> * Requesting from a language what types and methods are provided by a
>> set of source files
>> * Providing to a language services to look up types from other languages
>
> To handle dependency cycles or unknown dependies one can use a "compile
> for export" option. This just generates a stub .class file, that
> contains no code, or private fields, or anything else that is not
> part of the "public contract" of the module. The key is that it
> should be possible to generate this without reading any other
> class files, and so cycles or other dependencies aren't a problem.
Charlie's point is that generating a .class file is overkill. If we
just need to know a list of types & methods, then we just need a
function that takes source files and returns such a list. We don't need
to go all the way to generating valid .class files.
>
> Generating the stub for Java is in theory easy: You can generate
> stub method bodies by replacing all method bodies by "return null"
> or whatever is appropriate for the method's return type, and
> otherwise not checking for type compatibility and so.
Why bother? Why not just stop once you've parsed enough to know the
classes & methods, and return some representation of those?
> There are some complications: one is that import-on-depend does need
> to search for other classes. So you need one extra step, as below.
>
> So the multi-language compiler-driver does:
> (1) Figure out the set of source files to (possibly) compile.
> (2) For each source file, figure out the compiler to use, presumably
> using extensions, though it could also use mime-types or looking at
> the source file itself or some property datebase.
> (3) For each source file, ask its compiler the set of class names
> it might generate, using some to-be-specified simple protocol.
Agreed.
> (4) For each source file, compile it "for export". I.e. generate
> the stub .classes mentioned above, leaving them in some temporary
> location (ideally a "MemoryFileSystem").
> (5) For each source file, compile it for real, setting up the class
> path to search the above temporary location first. When it needs to
> import definitions from another file or class, it just reads the
> sub class in "the normal way". The compiler should have access to
> the map generated in (3), and should have a mechanism to invoke
> the corresponding compiler "out of order". For example, a Scheme
> library may want to execute a macro at compile time, and that
> macro may depend on methods in some other class; thus it would
> be nice to have a "compile this file first" mechanism.
>
> This approach still means an API invoking compilers with various
> options, but we don't need to invent a protocol for representing
> types and modules - we just use the .class files.
True, this approach saves compiler writers from figuring out another
representation for types and modules. However, compiler writers already
need such a representation, e.g. if a single Groovy file contains
classes A and B, and A refers to B and vice versa. Groovy has a
ClassNode, and distinguishes between primary ClassNodes (that the
compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.
Also, this approach makes extra work for compiler writers in generating
a .class stub. Charlie is proposing a common lingua franca for these
representations of classes, so we don't have to generate the stub.
Best,
Martin
The problem is picking a representation. A .class file is one valid
well-understood representation. An API might be better, but designing
one for multiple languages is probably not feasible. (Remember CORBA.)
I think the best choice for an API may be javax.lang.model.*.
It very Java-centric, but that's probably the best we can do.
It has the big advantage that javac already implements it.
> True, this approach saves compiler writers from figuring out another
> representation for types and modules. However, compiler writers already
> need such a representation, e.g. if a single Groovy file contains
> classes A and B, and A refers to B and vice versa. Groovy has a
> ClassNode, and distinguishes between primary ClassNodes (that the
> compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.
Indeed. Kawa uses gnu.bytecode.ClassType for the same.
The problem is we all have *different* representations, so that does
not help with inter-language interoperability. And unless you get javac
to implement the same representation, you have a problem: A JVM lingua
franca not supported by Java has limited use ...
> Also, this approach makes extra work for compiler writers in generating
> a .class stub. Charlie is proposing a common lingua franca for these
> representations of classes, so we don't have to generate the stub.
I won't help my breath, but look forward to it.
I think an API that builds on (extends) javax.lang.model may make sense,
since that means interoperating with javac.
I was actually thinking along the lines of the mirror API that the
Annotation Processing Tool uses:
http://java.sun.com/j2se/1.5.0/docs/guide/apt/mirror/overview-summary.html
Again, I may be naïve but it seems to me we could easily form a common
mirror type system that we'd feed to the orchestration logic. Starting
with the mirror API would be pretty easy, since it's basically *all*
interfaces.
Going back to the example I gave Jochen:
class MyClass < my.package.Foo
def foo(a: java.lang.String, b: my.package.Bar)
...
end
end
This script would produce a ClassDeclaration for MyClass, with a
ClassType referencing (symbolically) my.package.Foo. We'd be providing
MyClass, and the compiler would see that we're consuming
my.package.Foo and look for a declaration elsewhere. Our MyClass
ClassDeclaration would provide one method "foo" with two
ParameterDeclarations point at ClassTypes for java.lang.String and
my.package.Bar. The compiler would see these and add them to the list
of produced and consumed types.
I'll see if I can mock up a prototype over the holidays, or at least a
compiler plugin for JRuby that can produce these interfaces.
- Charlie
>> True, this approach saves compiler writers from figuring out another
>> representation for types and modules. However, compiler writers already
>> need such a representation, e.g. if a single Groovy file contains
>> classes A and B, and A refers to B and vice versa. Groovy has a
>> ClassNode, and distinguishes between primary ClassNodes (that the
>> compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.
>
> Indeed. Kawa uses gnu.bytecode.ClassType for the same.
>
> The problem is we all have *different* representations, so that does
> not help with inter-language interoperability. And unless you get javac
> to implement the same representation, you have a problem: A JVM lingua
> franca not supported by Java has limited use ...
>
>> Also, this approach makes extra work for compiler writers in generating
>> a .class stub. Charlie is proposing a common lingua franca for these
>> representations of classes, so we don't have to generate the stub.
>
> I won't help my breath, but look forward to it.
>
> I think an API that builds on (extends) javax.lang.model may make sense,
> since that means interoperating with javac.
> --
> --Per Bothner
> p...@bothner.com http://per.bothner.com/
>
I like the sound of this ... good choice :-)
> On Sun, Dec 20, 2009 at 8:28 PM, Per Bothner <p...@bothner.com> wrote:
>
>>The problem is picking a representation. A .class file is one valid
>>well-understood representation. An API might be better, but designing
>>one for multiple languages is probably not feasible. (Remember CORBA.)
>>
>>I think the best choice for an API may be javax.lang.model.*.
>>It very Java-centric, but that's probably the best we can do.
>>It has the big advantage that javac already implements it.
>
>
> I was actually thinking along the lines of the mirror API that the
> Annotation Processing Tool uses:
>
> http://java.sun.com/j2se/1.5.0/docs/guide/apt/mirror/overview-summary.html
That private API has been replaced by the standard Java extension API
Per referred to (javax.annotation.processing.Processor processes
javax.lang.model.element.TypeElements).
http://java.sun.com/javase/6/docs/technotes/guides/apt/index.html
And as I said, if you want to do something like Groovy's unified
compiler, it is done by generating .class files (as Per also suggests)
and could be implemented in a standard manner integrated with javac
using javax.tools. That doesn't mean that .class files have to be
generated, because JavaFileManagers can provide the bytes any way they
please.
And if you want JDK 1.5 compatibility, the javax classes can be jarred
up as needed, like is done for JSR-223 with LiveTribe's implementation
(which includes public Maven 2 artifacts). It is used by both Groovy
and Jython for compiling on JDK 1.5.
Jim
On Dec 20, 8:37 pm, "Martin C. Martin" <mar...@martincmartin.com>
wrote:
> Per Bothner wrote:
> > On 12/17/2009 09:02 AM, Charles Oliver Nutter wrote:
> >> I may be naive, but what we really want here is simply a set of common
> >> protocols for doing the following:
>
> >> * Requesting from a language what types and methods are provided by a
> >> set of source files
> >> * Providing to a language services to look up types from other languages
>
> > To handle dependency cycles or unknown dependies one can use a "compile
> > for export" option. This just generates a stub .class file, that
> > contains no code, or private fields, or anything else that is not
> > part of the "public contract" of the module. The key is that it
> > should be possible to generate this without reading any other
> > class files, and so cycles or other dependencies aren't a problem.
>
> Charlie's point is that generating a .class file is overkill. If we
> just need to know a list of types & methods, then we just need a
> function that takes source files and returns such a list. We don't need
> to go all the way to generating valid .class files.
>
>
>
> > Generating the stub for Java is in theory easy: You can generate
> > stub method bodies by replacing all method bodies by "return null"
> > or whatever is appropriate for the method's return type, and
> > otherwise not checking for type compatibility and so.
>
> Why bother? Why not just stop once you've parsed enough to know the
> classes & methods, and return some representation of those?
>
Generating .class files isn't necessarily 'overkill'. I agree with
Per, this is a standard representation. Given such .class files in the
classpath, all of our existing compilers and tools will 'just work',
right now. Reflection will work, etc.
>
> > (4) For each source file, compile it "for export". I.e. generate
> > the stub .classes mentioned above, leaving them in some temporary
> > location (ideally a "MemoryFileSystem").
> > (5) For each source file, compile it for real, setting up the class
> > path to search the above temporary location first. When it needs to
> > import definitions from another file or class, it just reads the
> > sub class in "the normal way". The compiler should have access to
> > the map generated in (3), and should have a mechanism to invoke
> > the corresponding compiler "out of order". For example, a Scheme
> > library may want to execute a macro at compile time, and that
> > macro may depend on methods in some other class; thus it would
> > be nice to have a "compile this file first" mechanism.
>
> > This approach still means an API invoking compilers with various
> > options, but we don't need to invent a protocol for representing
> > types and modules - we just use the .class files.
>
> True, this approach saves compiler writers from figuring out another
> representation for types and modules. However, compiler writers already
> need such a representation, e.g. if a single Groovy file contains
> classes A and B, and A refers to B and vice versa. Groovy has a
> ClassNode, and distinguishes between primary ClassNodes (that the
> compiler is compiling) and wrapper ClassNodes that wrap a java.lang.Class.
>
In Clojure, when in such a situation, I generate stub classes in
memory, just as Per describes, and for the same reason, everything
downstream just works.
> Also, this approach makes extra work for compiler writers in generating
> a .class stub. Charlie is proposing a common lingua franca for these
> representations of classes, so we don't have to generate the stub.
>
There is extra work in any case, as no one is yet generating what
Charlie is proposing. Also, 'compiling with options' seems more
amenable to implementation via ant et al, vs programmatic invocation
of and interaction with compilers via an API.
Rich
One issue to bear in mind is that this potentially generates a lot of
stubs where it doesn't need to. With Scala at least the generation of
a large number of small class files is a big performance hit.
Potentially doubling the number generated (even if the version
generated in the first pass is much smaller) is going to slow down the
compilation process quite a lot, potentially for no reason at all: If
another language never requests that class then there was no need to
generate the stub. This could easily be addressed by an API (and if
that API wants to generate class stubs, that's not a problem) which
asks for classes rather than generating them all up front.
Another potential issue to worry about with Scala is that type
inference across language boundaries introduces you to a world of
pain. As long as only one language involved is type inferred (e.g.
Java + Scala) this isn't a big deal, but when you've got cyclic type
inference dependencies between languages you're going to suffer.
Ok, now I see where the right bits go in the javax.lang stuff...I was
thrown off initially because there's far fewer interfaces, but they
appear to have been condensed a bit (ExecutableElement for all
method-like things, and so on). So using javax.lang plus a jarred-up
backport version of it would be just as good as apt, and javac would
speak it already if you used the right javac. We're making progress
now :)
The "javac implements these interfaces" bit would ned to be figured
out for Java 5 users, of course.
> And as I said, if you want to do something like Groovy's unified
> compiler, it is done by generating .class files (as Per also suggests)
> and could be implemented in a standard manner integrated with javac
> using javax.tools. That doesn't mean that .class files have to be
> generated, because JavaFileManagers can provide the bytes any way they
> please.
So your suggestion would be to use the javax.tool interfaces and
produce dummy stubs live for the compilation process? That's not bad I
guess, but it seems like it would be a lot cleaner and less hacky to
generate the appropriate interfaces and coordinate them directly. Need
to think about that a bit.
In any case, there still needs to be a top-level compiler coordinating
the generation of either stub .class files or javax.lang.element data
and then eventually triggering the "final" compilation, and that
top-level needs to be agnostic of any of the languages. So we're still
in the same boat as far as needing a coordinator.
- Charlie
I'm not sure where reflection comes into this, unless you're referring
to one way the top-level compiler stuff would get class data (and
reflection may be a particularly poor way, since it has to *load* the
classes and requires that they be valid/complete to do so, rather than
just reading the data format). Generally we're talking about .class
data and Java-centric type structures across languages. What do you
need reflection for at compile time?
> In Clojure, when in such a situation, I generate stub classes in
> memory, just as Per describes, and for the same reason, everything
> downstream just works.
Sure, I'm sure the generating dumb stubs can work, but it essentially
means having to do almost the entire compile process twice for every
language:
* All languages generate stubs into a common location (possibly
in-memory and combined with classpath)
* All languages generate final code based on that common location
And we *still* need a coordinator since we all have different ways to
generate stubs or trigger a final compilation.
I don't really care much about the data format. Stub .class files are
essentially just a richer and less flexible version of the
mirror/javax.lang stuff, but certainly contain all the data we need
(and potentially a lot of data we don't need...or can't generate
without a full compile?) so they'd probably be fine for a simple first
pass. But are you saying that having a common compiler infrastructure
that actually speaks mirror interfaces rather than .class data
*wouldn't* be good to have?
- Charlie
>
>> Also, this approach makes extra work for compiler writers in generating
>> a .class stub. Charlie is proposing a common lingua franca for these
>> representations of classes, so we don't have to generate the stub.
>>
>
> There is extra work in any case, as no one is yet generating what
> Charlie is proposing. Also, 'compiling with options' seems more
> amenable to implementation via ant et al, vs programmatic invocation
> of and interaction with compilers via an API.
>
> Rich
>
On Dec 21, 1:50 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> On Mon, Dec 21, 2009 at 6:43 AM, Rich Hickey <richhic...@gmail.com> wrote:
> > Generating .class files isn't necessarily 'overkill'. I agree with
> > Per, this is a standard representation. Given such .class files in the
> > classpath, all of our existing compilers and tools will 'just work',
> > right now. Reflection will work, etc.
>
> I'm not sure where reflection comes into this, unless you're referring
> to one way the top-level compiler stuff would get class data (and
> reflection may be a particularly poor way, since it has to *load* the
> classes and requires that they be valid/complete to do so, rather than
> just reading the data format). Generally we're talking about .class
> data and Java-centric type structures across languages. What do you
> need reflection for at compile time?
>
You are missing the point. .class file *are* a data format. And
everything in the Java ecosystem understands them already. APIs *are
not* a data format, they are a means for live communication between
program entities.
> > In Clojure, when in such a situation, I generate stub classes in
> > memory, just as Per describes, and for the same reason, everything
> > downstream just works.
>
> Sure, I'm sure the generating dumb stubs can work, but it essentially
> means having to do almost the entire compile process twice for every
> language:
>
Dumb stubs vs smart stubs?
That doesn't follow. We will have to do precisely and just as much
work as is necessary to determine our types and method signatures,
then generate something (either a stub .class file, *or* an instance
of something implementing these javax interfaces, which in turn need
more instances of other interfaces, essentially forcing each of us to
duplicate the reflection API, ugh, why?)
> * All languages generate stubs into a common location (possibly
> in-memory and combined with classpath)
> * All languages generate final code based on that common location
>
> And we *still* need a coordinator since we all have different ways to
> generate stubs or trigger a final compilation.
>
Yes. But with Per's suggestion, that process is significantly more
decoupled, and allows for more reuse of existing capabilities.
> I don't really care much about the data format. Stub .class files are
> essentially just a richer and less flexible version of the
> mirror/javax.lang stuff, but certainly contain all the data we need
> (and potentially a lot of data we don't need...or can't generate
> without a full compile?)
Really, I don't know what you are talking about. Generating a class
file is a simple thing for all of us to do, and we already do it. A
'stub' classfile doesn't need anything other than fabricated returns.
There is no "potentially a lot of..."
And again, javax.lang is not a data format.
> so they'd probably be fine for a simple first
> pass. But are you saying that having a common compiler infrastructure
> that actually speaks mirror interfaces rather than .class data
> *wouldn't* be good to have?
>
I'm saying it would be premature to dismiss Per's suggestion with
hyperbole about *dumb* stubs and "a lot of xxx we don't need". There
is tremendous value in:
an actual data format
that everyone already consumes
and everyone already produces
Being able to point a tool at some .class files and get a javax.lang
model would be great, but that should be a single job (if not done
already) we all could leverage, vs each of us having to implement
javax.lang interfaces, and programmatic access to same, directly.
Rich
> I'm saying it would be premature to dismiss Per's suggestion with
> hyperbole about *dumb* stubs and "a lot of xxx we don't need". There
> is tremendous value in:
>
> an actual data format
> that everyone already consumes
> and everyone already produces
I agree. Because my compiler is not (yet) self-hosting, I use "javap
-public" to get what I need out of .class files, so I definitely want
an offline way of getting information about classes I don't generate.
In any case, the issue of how to consume a remote class is not
trivial. I know how to consume Java classes and expose them, but
classes compiled by a non-Java compiler may have invocation and use
patterns that I don't know. This is especially true in "foo module =
Java class" compilation strategies, including my own -- a random JVM
programming language wouldn't know what to do (more importantly, what
not to do) with one of my .class files.
>
> Being able to point a tool at some .class files and get a javax.lang
> model would be great, but that should be a single job (if not done
> already) we all could leverage, vs each of us having to implement
> javax.lang interfaces, and programmatic access to same, directly.
>
> Rich
>
> --
>
> You received this message because you are subscribed to the Google Groups "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to jvm-language...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/jvm-languages?hl=en.
>
>
>
--
On Mon, Dec 21, 2009 at 2:24 PM, Rich Hickey <richh...@gmail.com> wrote:
> On Dec 21, 1:50 pm, Charles Oliver Nutter <head...@headius.com> wrote:
>> I'm not sure where reflection comes into this, unless you're referring
>> to one way the top-level compiler stuff would get class data (and
>> reflection may be a particularly poor way, since it has to *load* the
>> classes and requires that they be valid/complete to do so, rather than
>> just reading the data format). Generally we're talking about .class
>> data and Java-centric type structures across languages. What do you
>> need reflection for at compile time?
>>
>
> You are missing the point. .class file *are* a data format. And
> everything in the Java ecosystem understands them already. APIs *are
> not* a data format, they are a means for live communication between
> program entities.
Eventually a shared compiler has to work with some API against some
"data" to coordinate different languages and the types they consume or
produce. Whether that data comes from class files (a common format) or
just appears to the coordinator as a set of interface-implementing
datatypes (backed by per-language data), I don't care. But as the
inference case shows, it is not always possible to generate stubs in a
single shot.
>> Sure, I'm sure the generating dumb stubs can work, but it essentially
>> means having to do almost the entire compile process twice for every
>> language:
>>
>
> Dumb stubs vs smart stubs?
>
> That doesn't follow. We will have to do precisely and just as much
> work as is necessary to determine our types and method signatures,
> then generate something (either a stub .class file, *or* an instance
> of something implementing these javax interfaces, which in turn need
> more instances of other interfaces, essentially forcing each of us to
> duplicate the reflection API, ugh, why?)
If your claim is that we can just have all languages generate stub
classes, point all languages at the stub classes, and have them
regenerate real classes...then I think you're wrong. James Iry's case
of type inference in Scala shows that much, since the return type of a
given method (which you would need to put in the stub) depends on the
return value of some other method (which may not be available at
stub-generation time or which may have circular dependencies).
Stubs work great if everyone knows (at least symbolically) what types
all their Java-facing types will return and receive, and can determine
that based on existing on-disk or in-memory data. That's obvious, and
that's how, for example, the Groovy joint compiler works (since it and
Java do not infer return types in the way that Scala does). Is that
all we want?
>> * All languages generate stubs into a common location (possibly
>> in-memory and combined with classpath)
>> * All languages generate final code based on that common location
>>
>> And we *still* need a coordinator since we all have different ways to
>> generate stubs or trigger a final compilation.
>>
>
> Yes. But with Per's suggestion, that process is significantly more
> decoupled, and allows for more reuse of existing capabilities.
Sure, sounds great! I don't think I ever said it was wrong...just that
it may not cover necessary cases for the languages under discussion.
Specifically, Scala can't necessarily generate complete stubs without
some interaction with the other languages and their type definitions,
which could be as-yet-uncompiled.
>> I don't really care much about the data format. Stub .class files are
>> essentially just a richer and less flexible version of the
>> mirror/javax.lang stuff, but certainly contain all the data we need
>> (and potentially a lot of data we don't need...or can't generate
>> without a full compile?)
>
> Really, I don't know what you are talking about. Generating a class
> file is a simple thing for all of us to do, and we already do it. A
> 'stub' classfile doesn't need anything other than fabricated returns.
> There is no "potentially a lot of..."
You are correct if, as stated previously, it's possible to generate
stubs in isolation without a give-and-take mediation between
languages.
> And again, javax.lang is not a data format.
It's an API to data. The format is behind the scenes is irrelevant as
far as the API is concerned. Isn't that the point of such an API?
>> so they'd probably be fine for a simple first
>> pass. But are you saying that having a common compiler infrastructure
>> that actually speaks mirror interfaces rather than .class data
>> *wouldn't* be good to have?
>>
>
> I'm saying it would be premature to dismiss Per's suggestion with
> hyperbole about *dumb* stubs and "a lot of xxx we don't need". There
> is tremendous value in:
>
> an actual data format
> that everyone already consumes
> and everyone already produces
I never dismissed it. Please don't put words in my mouth. I only
suggested that what I had in mind was per-language impls of the mirror
API. I know of the existing efforts that generate stubs, and I know
that approach works well for the straightforward case.
By "dumb" I didn't mean "stupid idea"...I meant "raw" or "simple" or
"basic". Generating stub classes obviously can work as long as the
participating languages are able to generate stubs without interacting
with each other. And that may be enough for most use cases. But
perhaps it's not enough to cover the cases that the current (for
example) Scala/Java joint compiler is able to cover?
> Being able to point a tool at some .class files and get a javax.lang
> model would be great, but that should be a single job (if not done
> already) we all could leverage, vs each of us having to implement
> javax.lang interfaces, and programmatic access to same, directly.
And if .class files are enough, I will happily concede that they're
the best approach for the problem. Like I said...I don't care!
Ultimately a joint compiler that knows how to work with all those
.class files (or with a mirror API to language-specific data) and
trigger a subsequent "complete" compile still needs to be there, which
is the point of this thread.
We're just brainstorming here :)
- Charlie
> If your claim is that we can just have all languages generate stub
> classes, point all languages at the stub classes, and have them
> regenerate real classes...then I think you're wrong. James Iry's case
> of type inference in Scala shows that much, since the return type of a
> given method (which you would need to put in the stub) depends on the
> return value of some other method (which may not be available at
> stub-generation time or which may have circular dependencies).
If the return type of an exported method depends on type inference
and that inference may have circular cross-module dependencies,
my reaction is: Don't do that. You can't define a stable API
that way, at least not a library API. It might be OK for a
module-private method (in the sense of JSR-294 - i.e. a group of
classes that go together), but the use-case where you might
need to infer return types across language boundaries seems
pretty low-priority.
> Stubs work great if everyone knows (at least symbolically) what types
> all their Java-facing types will return and receive, and can determine
> that based on existing on-disk or in-memory data. That's obvious, and
> that's how, for example, the Groovy joint compiler works (since it and
> Java do not infer return types in the way that Scala does). Is that
> all we want?
It may be good enough - or at least a good start.
> And if .class files are enough, I will happily concede that they're
> the best approach for the problem. Like I said...I don't care!
> Ultimately a joint compiler that knows how to work with all those
> .class files (or with a mirror API to language-specific data) and
> trigger a subsequent "complete" compile still needs to be there, which
> is the point of this thread.
And I'll happily concede using a standard API might work better.
A key issue is how easy it is to fit javac into the framework.
In theory we have the option of adding functionality to OpenJDK,
but of course we really want something that can work with Java 6
(or even Java 5, though that's lower priority).
Sure, I agree with that. It's an outlier, in any case.
>> And if .class files are enough, I will happily concede that they're
>> the best approach for the problem. Like I said...I don't care!
>> Ultimately a joint compiler that knows how to work with all those
>> .class files (or with a mirror API to language-specific data) and
>> trigger a subsequent "complete" compile still needs to be there, which
>> is the point of this thread.
>
> And I'll happily concede using a standard API might work better.
> A key issue is how easy it is to fit javac into the framework.
> In theory we have the option of adding functionality to OpenJDK,
> but of course we really want something that can work with Java 6
> (or even Java 5, though that's lower priority).
I think we need more than that: there's other non-Sun-javac Java
compilers out there, that would presumably want to participate. Maybe
we by presenting stubs we'll be good enough, but having a standard
interface for each language that can 1. generate stubs and 2. trigger
final compilation would at least be necessary. And having a
per-language API that could get mirrored types out would be even
better, since that feeds into IDE support for recognizing where type
and method declarations are coming from without having to regenerate
stubs all the time.
I don't think it would be very hard to implement the javax.lang
interfaces for any language that wants to expose Java types. If we
ignore the circular type-inference problem for a moment, that would be
almost as easy to do as generating stubs.
- Charlie
- Charlie
On Dec 21, 3:54 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> Perhaps we should take a deep breath before proceeding. Ready?
>
> On Mon, Dec 21, 2009 at 2:24 PM, Rich Hickey <richhic...@gmail.com> wrote:
> > On Dec 21, 1:50 pm, Charles Oliver Nutter <head...@headius.com> wrote:
> >> I'm not sure where reflection comes into this, unless you're referring
> >> to one way the top-level compiler stuff would get class data (and
> >> reflection may be a particularly poor way, since it has to *load* the
> >> classes and requires that they be valid/complete to do so, rather than
> >> just reading the data format). Generally we're talking about .class
> >> data and Java-centric type structures across languages. What do you
> >> need reflection for at compile time?
>
> > You are missing the point. .class file *are* a data format. And
> > everything in the Java ecosystem understands them already. APIs *are
> > not* a data format, they are a means for live communication between
> > program entities.
>
> Eventually a shared compiler has to work with some API against some
> "data" to coordinate different languages and the types they consume or
> produce. Whether that data comes from class files (a common format) or
> just appears to the coordinator as a set of interface-implementing
> datatypes (backed by per-language data), I don't care.
But it's not just the coordinator, it's also the consumers (other
compilers). And we already know how to consume type information from
class files.
> But as the
> inference case shows, it is not always possible to generate stubs in a
> single shot.
>
When, how, with what granularity, and how often to generate type
information seems to me orthogonal to how that information is
represented. Is there any evidence that the javax.lang representation
is more suitable for languages with type inference?
> If your claim is that we can just have all languages generate stub
> classes, point all languages at the stub classes, and have them
> regenerate real classes...then I think you're wrong.
That isn't my claim. I am just saying, when you are generating Java
type information, class files are a lingua franca.
> James Iry's case
> of type inference in Scala shows that much, since the return type of a
> given method (which you would need to put in the stub) depends on the
> return value of some other method (which may not be available at
> stub-generation time or which may have circular dependencies).
>
I fail to see how stubs vs javax.lang affects this. If you are going
to have difficulty producing a stub you'll have difficulty producing a
javax.lang descriptor as well.
> >> And we *still* need a coordinator since we all have different ways to
> >> generate stubs or trigger a final compilation.
>
> > Yes. But with Per's suggestion, that process is significantly more
> > decoupled, and allows for more reuse of existing capabilities.
>
> Sure, sounds great! I don't think I ever said it was wrong...just that
> it may not cover necessary cases for the languages under discussion.
> Specifically, Scala can't necessarily generate complete stubs without
> some interaction with the other languages and their type definitions,
> which could be as-yet-uncompiled.
>
So we're going to build some dynamic network of compilers cooperating
to figure things out? I don't think that's covered by javax.lang. The
multiple type-inferring languages problem seems like a research topic
to me.
> > And again, javax.lang is not a data format.
>
> It's an API to data. The format is behind the scenes is irrelevant as
> far as the API is concerned. Isn't that the point of such an API?
>
Except that everyone has to implement that API when they already know
how to deal with at least one format for the data already. APIs are
different from data - we might interact with XML via a DOM API but we
don't build systems by connecting their DOMs to each other. Going
through a static data representation allows us to decouple execution.
Let's say you use javax.lang in the coordinator, and it asks a
compiler 'do you have type X.Y.Z in this file?' and it says 'yes', and
the coordinator asks for the type info and the compiler gives it a
javax.lang thingy. Now what? How does the coordinator convey that to
any other compilers? It's not data, it's a live object in the
coordinator process.
Rich
I don't know yet :)
I don't know whether the inference case is a requirement.
I don't know whether the fine-grained online reflective querying of
type information from arbitrary languages' files is a requirement.
I'm exploring options.
>> If your claim is that we can just have all languages generate stub
>> classes, point all languages at the stub classes, and have them
>> regenerate real classes...then I think you're wrong.
>
> That isn't my claim. I am just saying, when you are generating Java
> type information, class files are a lingua franca.
A lingua franca, yes, but a cumbersome one to deal with if your goal
is to do more than just suck in source files and spit out class files.
I'm interested in that simple case, certainly, but I am also
interested in the possibility of a richer API for per-language
compilers that might make it possible for IDEs (as an example) to see
across languages. You could probably do some of that with stubs, but
it seems like it would require regenerating stubs on every save,
rather than simply re-requesting javax.lang representations for a file
or subset of a file.
Can you point us toward the code in Clojure that handles compiling
circular references in generated Java classes?
>> James Iry's case
>> of type inference in Scala shows that much, since the return type of a
>> given method (which you would need to put in the stub) depends on the
>> return value of some other method (which may not be available at
>> stub-generation time or which may have circular dependencies).
>>
>
> I fail to see how stubs vs javax.lang affects this. If you are going
> to have difficulty producing a stub you'll have difficulty producing a
> javax.lang descriptor as well.
Except that you don't have to produce the javax.lang descriptor all at
once, like you do with the class files. The method declarations could
be generated lazily:
1. Ask all languages what class declarations they provide
2. Ask all languages what types they need to consume to proceed with
determining their method declarations, and presumably have the type
information available
3. Ask the languages to proceed with producing method declarations
based on the now-available set of types
And at some point after this, each language will ideally have resolved
all types it needs for its class/method declarations and can proceed
to generate the final class format.
Again, just a napkin sketch of how this might work. I will grant that
any initial work on a joint compiler would be well-served by basing
type discovery on class files and generated stubs, since that's the
simple case and it covers common usage.
> So we're going to build some dynamic network of compilers cooperating
> to figure things out? I don't think that's covered by javax.lang. The
> multiple type-inferring languages problem seems like a research topic
> to me.
Yes, it may be. It may also be a useful discussion to have here, and
there may be aspects that apply to practical concerns (like IDE
support for type knowledge across multiple languages' sources).
Don't get too hung up on javax.lang as being the only potential API
either. This is an open discussion about the features, challenges, and
possible solutions for joint compiling multiple JVM languages at the
same time.
>> It's an API to data. The format is behind the scenes is irrelevant as
>> far as the API is concerned. Isn't that the point of such an API?
>>
>
> Except that everyone has to implement that API when they already know
> how to deal with at least one format for the data already. APIs are
> different from data - we might interact with XML via a DOM API but we
> don't build systems by connecting their DOMs to each other. Going
> through a static data representation allows us to decouple execution.
>
> Let's say you use javax.lang in the coordinator, and it asks a
> compiler 'do you have type X.Y.Z in this file?' and it says 'yes', and
> the coordinator asks for the type info and the compiler gives it a
> javax.lang thingy. Now what? How does the coordinator convey that to
> any other compilers? It's not data, it's a live object in the
> coordinator process.
Perhaps I didn't describe this well enough in earlier emails.
The coordinator would basically get two things from the language compilers:
1. What types and methods do you provide (perhaps in separate phases)
2. What types and methods do you consume (again, perhaps in separate phases)
The coordinator does as much or as little as is necessary to ensure
the consumed types match up with the provided types, which could
certainly mean triggering stubs to generate if necessary. The language
compilers could upcall into the coordinator to ask for information on
types they hope to consume, with the coordinator providing that type
information (potentially without ever generating stubs) by just using
the information it has gathered from other languages (and yes, from
real class and jar files already on disk).
And the language-specific compiler bits wouldn't necessarily even have
to be reimplemented for each language if generating stubs would be
enough for a given language. In that case, there could be a base
"StubbedClassDeclaration" that knows how to ask a language to generate
its stubs and reflectively inspect them.
- Charlie
If you have a type representing a type variable, it is.
[...]
> Rich
>
R�mi