compiler for multiple languages

108 views
Skip to first unread message

Jochen Theodorou

unread,
Feb 1, 2016, 7:56:44 AM2/1/16
to jvm-la...@googlegroups.com
Hi,

I was wondering if anyone knows of an effort to create a compiler for
the java world, that can understand multiple programming languages. Be
it using some IR or other means. Because I am aware of no such effort
really. And outside Java I know only gcc and llvm doing things remotely
like that.

My ideal would be a compiler allowing to mix multiple languages to some
extend without having to use different independent compilation steps for
each language. I would be interested in that, even if there is no actual
implementation and only a research project (but then the paper has to be
freely available)

Having spend a lot of time with compilers I very well know that such a
thing is really not an easy task and there are numerous problems to
solve... but that is exactly why I am looking for available resources -
to not to make all those mistakes again and again

bye Jochen

Alessio Stalla

unread,
Feb 1, 2016, 8:03:27 AM2/1/16
to jvm-la...@googlegroups.com

Truffle/Graal? I haven't followed it closely, so I don't know if it matches what you're searching for.
It would be a great thing to have, sure.

--
You received this message because you are subscribed to the Google Groups "JVM Languages" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jvm-language...@googlegroups.com.
To post to this group, send email to jvm-la...@googlegroups.com.
Visit this group at https://groups.google.com/group/jvm-languages.
For more options, visit https://groups.google.com/d/optout.

Jochen Theodorou

unread,
Feb 1, 2016, 9:02:59 AM2/1/16
to jvm-la...@googlegroups.com
Truffle/Graal.... hmm... I am looking for a compiler to bytecode, not a
VM though. And since Graal is the VM and Truffle the interface to the VM
in a graph like manner, I see no real solution for the actual
compilation part. That makes it great for an interpreter, but I don't
want to interface the GraalVM here. Am I missing something?

On 01.02.2016 14:03, Alessio Stalla wrote:
> Truffle/Graal? I haven't followed it closely, so I don't know if it
> matches what you're searching for.
> It would be a great thing to have, sure.
>
> On 1 Feb 2016 13:56, "Jochen Theodorou" <blac...@gmx.org
> <mailto:blac...@gmx.org>> wrote:
>
> Hi,
>
> I was wondering if anyone knows of an effort to create a compiler
> for the java world, that can understand multiple programming
> languages. Be it using some IR or other means. Because I am aware of
> no such effort really. And outside Java I know only gcc and llvm
> doing things remotely like that.
>
> My ideal would be a compiler allowing to mix multiple languages to
> some extend without having to use different independent compilation
> steps for each language. I would be interested in that, even if
> there is no actual implementation and only a research project (but
> then the paper has to be freely available)
>
> Having spend a lot of time with compilers I very well know that such
> a thing is really not an easy task and there are numerous problems
> to solve... but that is exactly why I am looking for available
> resources - to not to make all those mistakes again and again
>
> bye Jochen
>
> --
> You received this message because you are subscribed to the Google
> Groups "JVM Languages" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to jvm-language...@googlegroups.com
> <mailto:jvm-languages%2Bunsu...@googlegroups.com>.
> To post to this group, send email to jvm-la...@googlegroups.com
> <mailto:jvm-la...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/jvm-languages.
> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google
> Groups "JVM Languages" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jvm-language...@googlegroups.com
> <mailto:jvm-language...@googlegroups.com>.
> To post to this group, send email to jvm-la...@googlegroups.com
> <mailto:jvm-la...@googlegroups.com>.

Alexander Bertram

unread,
Feb 1, 2016, 9:21:39 AM2/1/16
to JVM Languages
Have you looked at the Soot project? It's very close to this:

It provides a few intermediate languages such as Jimple and BAF.

The original version of gcc-bridge used Soot as a backend, but the API is a bit cumbersome to work with so in the most recent reworking we use ASM to generate code directly from Gimple, which is GCC's Intermediate Representation and our input language.

-Alex

Jochen Theodorou

unread,
Feb 1, 2016, 10:02:20 AM2/1/16
to jvm-la...@googlegroups.com

On 01.02.2016 15:21, Alexander Bertram wrote:
> Have you looked at the Soot project? It's very close to this:
> http://sable.github.io/soot/
>
> It provides a few intermediate languages such as Jimple and BAF.

this looks interesting, yes... though because of LGPL (see
http://www.apache.org/legal/resolved.html#category-x) I might not be
able to actually use any of that... but still a good information about
the approach

> The original version of gcc-bridge used Soot as a backend, but the API
> is a bit cumbersome to work with so in the most recent reworking we use
> ASM to generate code directly from Gimple, which is GCC's Intermediate
> Representation and our input language.

can you explain a bit why it got cumbersome? How have the build times
been with that compiler infrastructure?

bye Jochen

Alexander Bertram

unread,
Feb 1, 2016, 11:32:51 AM2/1/16
to JVM Languages

The build times with Soot were definitely an issue. Soot is still first and foremost an analysis framework so on startup it builds this enormous
data structure of all Java classes reachable from the input, which ends up including most of the JRE.

I also found it difficult to build the Jimple IR programmatically. Soot makes extensive use of singletons so it's not as easy as 

SomeASTNode node = buildFunctionNode(inputAst)
writeClassFile(node)

For this reason, I ended up generating a Jimple text file which I fed into Soot to compile to bytecode. This was both slow and meant having to 
mess around with composing text files as output.

I've found the ASM Tree library to be quite nice to work by contrast, and it's made it possible to include some nicieties like debugging info,
like stack traces that include C source references:

java.lang.ArrayIndexOutOfBoundsException: -1
at org.renjin.gcc.array.sum10(array.c:29)
at org.renjin.gcc.array.test(array.c:16)

(Was pleasantly suprised that it was then possible to step line by line through Fortran code in IntelliJ)

What I do miss after switching away from Soot are tools to "optimize" the resulting bytecode, for example, when you have something like:

ILOAD 3
ICONST_1
IADD
ISTORE 3

Which can be replaced with 

IINC 3

My informal testing seems to suggest that these kind of optimizations don't have much impact on runtime performance, but it can 
mean the difference between a 170k classfile and a 28k classfile, which I do care about. 

I've partially addressed this with a small number of "peephole" optimizations that run on ASM's MethodNode structure:

Basically the approach described here:

This might be extracted to a nice little library that could be shared among compilers.


-Alex

Jochen Theodorou

unread,
Feb 2, 2016, 4:55:35 AM2/2/16
to jvm-la...@googlegroups.com
On 01.02.2016 17:32, Alexander Bertram wrote:
>
> The build times with Soot were definitely an issue. Soot is still first
> and foremost an analysis framework so on startup it builds this enormous
> data structure of all Java classes reachable from the input, which ends
> up including most of the JRE.

that is quite bad... the JRE is big

> I also found it difficult to build the Jimple IR programmatically. Soot
> makes extensive use of singletons so it's not as easy as
>
> SomeASTNode node = buildFunctionNode(inputAst)
> writeClassFile(node)
>
> For this reason, I ended up generating a Jimple text file which I fed
> into Soot to compile to bytecode. This was both slow and meant having to
> mess around with composing text files as output.
>
> I've found the ASM Tree library to be quite nice to work by contrast,
> and it's made it possible to include some nicieties like debugging info,
> like stack traces that include C source references:
>
> java.lang.ArrayIndexOutOfBoundsException: -1
> at org.renjin.gcc.array.sum10(array.c:29)
> at org.renjin.gcc.array.test(array.c:16)
>
> (Was pleasantly suprised that it was then possible to step line by line
> through Fortran code in IntelliJ)

since you worked with gcc as well I am wondering if gcc actually allows
the joint compilation of multiple languages. assume for example there
would be a part for java and one for groovy and you have a java class A
extending a groovy class B extending a java class C, and C has a field
of type A and another one of type B.... bad structure, I know, just for
the sake of the example... then, unless you do a true joint compilation
in which you resolve the classes in the same compilation you cannot
compile this, since you cannot even split the compilation steps anymore.
Does gcc provide something here? With just IR this is not done, unless
class resolving would be done on the IR code and given that the IR is
almost assembler I fail to believe that, as too much information is
already lost in that part.

So assuming this won't work I am playing around with the following idea...

* each language will generate private AST able to interface with a AST
based on javax.lang.model
* the compiler will implement a class resolving structure, which each
language must implement
* optionally we generate asm tree structures as kind of IR
* generate bytecode directly or from IR (language plugin decides what it
wants to do)
* output to files or in memory...

naturally this would be still a very Java oriented compiler, but I
imagine with such a structure you could do a lot of things... and easily
extend it for use in Java based IDEs for example.

[...]
> I've partially addressed this with a small number of "peephole"
> optimizations that run on ASM's MethodNode structure:
> https://github.com/bedatadriven/renjin/blob/841ffcd2498f618fe50d490ecb12e054ea52fb97/tools/gcc-bridge/compiler/src/main/java/org/renjin/gcc/peephole/IntegerIncrement.java
>
> Basically the approach described here:
> http://users.sdsc.edu/~mtikir/publications/papers/technicalreport.pdf
>
> This might be extracted to a nice little library that could be shared
> among compilers.

yes, that would be a good idea.

bye Jochen
Reply all
Reply to author
Forward
0 new messages