Interpreting a language on the JVM

11 views
Skip to first unread message

John Wilson

unread,
Nov 3, 2009, 9:17:16 AM11/3/09
to jvm-la...@googlegroups.com
On Rémi's recent thread Charlie talked about the advantages of having
an interpreted mode in a language implementation to allow profiling
before deciding on code generation strategies. I'm starting a new
thread to ask implementation questions about this.

I think this is interesting but I'd like to hear some more about the
actual mechanics of doing it. There's no problem in writing an
interpreter, of course. It's pretty trivial for any language which has
a MOP. The problem I have is how to deal with a class in language X
which subclasses a non trivial (i.e. not java.lang.Object) Java
object.

if I have a Java class

class C {
public void foo() {
bar();
}

public void bar() {
// some stuff
}
}

then I have a X class

class D extends C {
public void bar() {
// some other stuff
}
}

For every instance of D I can instantiate C and delegate calls to it
in the interpreter. The MOP can use its normal refection black magic
to get at things like protected instance variables and methods.
However I can't get the call of bar in foo to call the interpreter
back.

The only thing I can think of is to generated code to subclass C with
a shim class that has helper methods which allows the interpreter to
decide how calls are to be dispatched:

class C$shim extends C {
public void foo() {
if (D has a matching method) {
// invoke the interpreter
} else {
super foo();
}
public void bar() {
if (D has a matching method) {
// invoke the interpreter
} else {
super bar();
}
}

Is this a problem in JRuby, Charlie? If so do you have a nicer way of
dealing with it?

John Wilson

Daniel Hicks

unread,
Nov 19, 2009, 9:45:11 PM11/19/09
to JVM Languages
I suppose different JVMs tackle it different ways, but on the IBM
iSeries JVM we kept a virtual function table for each class and called
through that. Interpreted functions contained a pointer to the
interpreter in the VFT, and a pointer to the method table entry in the
"this" class was passed as a hidden parameter, to identify the method
to the interpreter. A fairly straight-forward setup.

The only real complication was parameter mapping between the
interpreter stack and the standard register-based scheme for compiled
code. "Glue" code was used for that.

John Wilson

unread,
Nov 21, 2009, 4:38:05 PM11/21/09
to jvm-la...@googlegroups.com
2009/11/20 Daniel Hicks <jvm...@gmail.com>:
> I suppose different JVMs tackle it different ways, but on the IBM
> iSeries JVM we kept a virtual function table for each class and called
> through that.  Interpreted functions contained a pointer to the
> interpreter in the VFT, and a pointer to the method table entry in the
> "this" class was passed as a hidden parameter, to identify the method
> to the interpreter.  A fairly straight-forward setup.

Thanks, Daniel

But I was trying to understand how to do the interpretation above the
level of the JVM (e.g so the same code would work on Windows and on
Android). I can't see any way of doing an interpretative
implementation of a class which subclasses an arbitrary complied
class.


John Wilson

Daniel Hicks

unread,
Nov 21, 2009, 6:58:23 PM11/21/09
to JVM Languages
I'm quite confident that I don't understand what you're trying to do,
but I would think that using JVMTI would be your best bet at some sort
of portable interface that doesn't involve modifying the JVM. Or, of
course, you could always take the open source Sun JVM and target it
for both Windows and Android, to have common low-level source for both
platforms.

(But of course there are profiling tools available on the JVM that you
can use without the need to do any of this, depending on what you're
trying to profile. Or the Instrumentation interface can be used to
rewrite classes to do whatever you want, and still run without
interpretation or breakpoints.)

With any scheme (except perhaps rewriting) you do need to inhibit the
JIT inlining of methods you'll want to later interpret. While I don't
offhand know the best way to do this, I'm confident that there are
several approaches, since inhibiting inlining is key to several of the
builtin debugging/monitoring schemes.

logi...@gmail.com

unread,
Nov 22, 2009, 11:06:47 AM11/22/09
to jvm-la...@googlegroups.com
 
Some jvm languages have problems left to solve:
 
---------------------------------------------Problem1---------------------------------------------
P1) Whole program type inference to allow use of jvm primitives for their numeric types and math.
           (Even the use of Strings and other types we tend to be a HolderForATypicalJREClass)
 
Some strategies used:    (So far the compilers that get closest to optimal have done the hard ones like S5 and S1,) 
 
S1) Some people had to "force themselves" to use java primitives.  And when they could not get away with it, at least used
 some  JRE version of a java.lang.PrimitiveHolder.  Stayed frustrated for a few hours but finally conceded that this was no longer their problem.
 
S2) Some (like myself) used  my.lang.PrimitiveHolder. Consoling myself with "java did it!" or "when its time, *eventually* will fix this to use primitives" or "what knucklehead thought 16 bits is enough to represent any character?" or  "they forgot unsigned byte.. I need a special holder that helps mark this as so"
 
S3) Create a couple generic HolderForJavaObject and optimize the use of reflection 
           or make HolderForWidelyUsedJREInterface(s)
 
S4) Guess what the user will likely use based on the types from MyLang:
       MyLangJavaNetSocket, MyLangJavaReader, MyLangJavaWriter, etc
 
S5)  Or enforce strong typing and never cheat using a HolderForATypicalJREClass... force my compiler to invoke-virtual 
     
S6)  and/or make my MyLangWhatever implement JRE interfaces.. Later on work on passing JRE objects thru
 
S7)  Pick out a set of well documented interfaces from something external like CORBA
      .. and make my compiler only target that
 
* Know I've missed a few (We could try on this list to keep enumerating them)
 
 
---------------------------------------------Problem2---------------------------------------------
P2) Excluding P1, if some user would have just written their application in .java in many cases,
          The bytecode that would have resulted from their code would be a tad better than ours.
 
S8) invoke static to MyLangLibrary.tooWierdOfFeatureToCompileSmartly(MyLangASTLikeObject obj) {... written in java ..}
 
S9) Make the user just write java in a new featurefull syntax
 
S10) inline S8 into bytecode  (Not a strategy: But allot of people think it is.. profiling might tell otherwise.. ("method too large to JIT"))
 
S11) have 2nd-4th passes that optimize their intermediate proposals before bytecode representation is emitted.
 
S12)  Some may even run bytecode optimizations like SOOT 
      One JVM lang even had to write their own post process: GJIT (http://code.google.com/p/gjit/), 
 
 
* Know I've missed a few (We could try on this list to keep enumerating them)
 
 
---------------------------------------------Problem3---------------------------------------------
P3)   Maintaining and improving our compilers to solve P1 & P2 better.
 
 
S13) Improve the built in functions of our runtime
 
S14)  Use JAVAP/JAD/JODE on the emitted output.    (I think we of are constantly doing this.. but...)
 
!!!!Now the reason for my Email!!!!
 
S15)  Create a .Java emitter branch
 
  Instead of emitting bytecode .. make compilers emit java code that would have produced the same bytecode
   (jvm language permitting/omitting optimizations = Can some on list please enumerate what they cannot translate to .java?)
      
    
          - Compile an entire user program into a tree of .java files
          - Decide if that was how you would have written their program in .java
                If not, see how many simplistic changes you could make that could be done at compiler/translator level
          - Get an overview of how many changes you'd like to make (like S1 thru S9) at the compiler but would be just be too grueling         
          - Could you write an AST transform/refactor tool targeting this .java source to improve it? 
               If so, can this be incorporated into the compiler as a pass?
          - Now that their program is a .java program.. Include your Runtime and make a java project out of it.
             since you are a java hacker, you can experiment and profile what are good and bad ideas better. to find what optimizations are worth it
             With that information .. re-include it into the emitter/compiler.
 
Pretty much IMO most complier writers that try to do all the above in typical S14 and profiling and have even reached best-cases quite often ..
Most jvm languages go directly from source to bytecode: 
  We want to get to execution quickly as possible or Can advertise as a real compiler. or Makes the user feel more secure with their proprietary code
 
But regardless feel they never reached "Whole program" efficiency I think it is because the limited scope of S14
I suspect the more time one spends in the java-emitter branch of their jvm language (dealing with entire user programs),
   the better their compiler trunk will become.
 
Here is an example of a complier-as-emitter.. An entire user's program was translated from SubLisp to .java
 
 
It uses a for/next look on a boxed Fixnum. but if that was fixed.. line 118 which only consumes Fixnums has to be updated as well.
And so the cycle begins, replacing fixnums with primitive 'long's (or maybe 'int's if array index accesses will be found in the call tree).
This code could be manipulated with some heavy java AST manipulation tools. Same way bytecode could have been..
 
I just bring all this up in case people are looking for new ideas and might find sometimes easier to work with java in their compiler improvement workflows.
 
Myself, once I got the .java down that I thought best its just as easy to compile and then get the best bytecode version for a compiler.
 
 
 
 

Matt Fowles

unread,
Nov 22, 2009, 12:46:18 PM11/22/09
to jvm-la...@googlegroups.com
logicmoo~

The company I work for actually does exactly what you are describing.
We have extended Janino and fixed a bunch of bugs in it to support out
needs.

http://code.google.com/p/janino-streambase/

But in essence, we build an EventFlow specific AST which we transform
into a composite AST (we make very heavy use of value types, having
most of our primitives expand into multiple local variable or fields)
we then transform that AST in a Janino AST which we use to either
produce bytecode directly or java code to aid in debugging.

The ability to generate the java code (complete with indicating lines
in our compiler that generated the code) has been immensely useful for
us. We have also seen very strong performance through our use of
value types.

Matt
> --
>
> You received this message because you are subscribed to the Google Groups
> "JVM Languages" group.
> To post to this group, send email to jvm-la...@googlegroups.com.
> To unsubscribe from this group, send email to
> jvm-language...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/jvm-languages?hl=.
>

Jim White

unread,
Nov 22, 2009, 1:13:26 PM11/22/09
to jvm-la...@googlegroups.com
John Wilson wrote:

> On R�mi's recent thread Charlie talked about the advantages of having
> an interpreted mode in a language implementation to allow profiling
> before deciding on code generation strategies. I'm starting a new
> thread to ask implementation questions about this.
>
> I think this is interesting but I'd like to hear some more about the
> actual mechanics of doing it. There's no problem in writing an
> interpreter, of course. It's pretty trivial for any language which has
> a MOP. The problem I have is how to deal with a class in language X
> which subclasses a non trivial (i.e. not java.lang.Object) Java
> object.
>
> if I have a Java class
>
> class C {
> public void foo() {
> bar();
> }
>
> public void bar() {
> // some stuff
> }
> }
>
> then I have a X class
>
> class D extends C {
> public void bar() {
> // some other stuff
> }
> }
>
> For every instance of D I can instantiate C and delegate calls to it
> in the interpreter. The MOP can use its normal refection black magic
> to get at things like protected instance variables and methods.
> However I can't get the call of bar in foo to call the interpreter
> back.
> ...

I think using naive code generation and AOP is a better way to approach
this problem. That wouldn't be constrained by your MOP (or whatever
interpreter strategy) only being able to deal with the calls it
dispatches directly.

That is essentially what HotSpot does as well. Generate simple code
quickly, profile it, then regenerate with optimization in the critical
areas. Coupled with hot class redefinition ala JRebel, you would then
have a pretty comprehensive solution that could work with many different
JVM languages.

Jim

Robert Fischer

unread,
Nov 22, 2009, 1:15:45 PM11/22/09
to jvm-la...@googlegroups.com
What, exactly, is gained by compiling to Java and then compiling the
Java to bytecode? Are there optimizing compilers out there for Java
source code => byte code that you can leverage? If not, then is there a
particular example of a place where it's easier to generate Java code
than byte code? I just don't see what you're gaining, although I'm
intrigued.

~~ Robert.

Matt Fowles

unread,
Nov 22, 2009, 1:30:37 PM11/22/09
to jvm-la...@googlegroups.com

Robert~

Ease of debugging mostly.  Our production environment always goes straight to bytecode, but when we are debugging the compiler it is much nicer to be able to step through java code and get links from it back to compiler code the generated it.

Matt

On Nov 22, 2009 1:15 PM, "Robert Fischer" <robert....@smokejumperit.com> wrote:

What, exactly, is gained by compiling to Java and then compiling the
Java to bytecode?  Are there optimizing compilers out there for Java
source code => byte code that you can leverage?  If not, then is there a
particular example of a place where it's easier to generate Java code
than byte code?  I just don't see what you're gaining, although I'm
intrigued.

~~ Robert.

logi...@gmail.com wrote: > > Some jvm languages have problems left to solve: > > -------------...

John Wilson

unread,
Nov 22, 2009, 1:40:26 PM11/22/09
to jvm-la...@googlegroups.com
2009/11/22 Jim White <j...@pagesmiths.com>:
> John Wilson wrote:
[snip]

>
> I think using naive code generation and AOP is a better way to approach
> this problem.  That wouldn't be constrained by your MOP (or whatever
> interpreter strategy) only being able to deal with the calls it
> dispatches directly.
>
> That is essentially what HotSpot does as well.  Generate simple code
> quickly, profile it, then regenerate with optimization in the critical
> areas.  Coupled with hot class redefinition ala JRebel, you would then
> have a pretty comprehensive solution that could work with many different
> JVM languages.


Thanks for your suggestions. I probably didn't explain the issue as
well as I should have:)

Unfortunately there are situations where it is just not possible to
generate code of any sort on the fly. For example the security
restrictions in your target environment may forbid it. You may be
running on an Android phone which runs Java but does not use JVM
bytecodes (and is not, strictly, a JVM but is an attractive target
nonetheless).

My original question was triggered by Charlie talking about the use of
interpretation in JRuby and I wondered if the JRuby guys had a
desperately clever way of getting round the problems of interpreting a
class which extends an arbitrary compiled class 9if it just extends
Object there is no problem).

I can see how to get round the problems if I'm allowed to generate a
shim class which redirects call to the interpreter. It may be that the
best bet is to precompile shim classes for every public non final
class in the JDK and stick them in the runtime JAR. You still have a
problem with subclassing non JVM compiled classes but it gets you
further.

John Wilson

John Wilson

unread,
Nov 22, 2009, 1:47:14 PM11/22/09
to jvm-la...@googlegroups.com
2009/11/22 Matt Fowles <matt....@gmail.com>:
> Robert~
>
> Ease of debugging mostly.  Our production environment always goes straight
> to bytecode, but when we are debugging the compiler it is much nicer to be
> able to step through java code and get links from it back to compiler code
> the generated it.


We looked at this in the early days of Groovy. It my be a decent route
for your language but it wasn't great for Groovy. The problem is that
there are some Java features (e.g. checked exceptions) which are
enforced by the compiler but not the JVM. If your language shares all
the restrictions that the Java compiler enforces then it's a perfectly
good option. Groovy doesn't share many of the Java restrictions so it
didn't seem worth it.

Also generating bytecodes is not that hard once you get up the
learning curve a bit so I think most people end up just going straight
to bytecode because it's easier in the end.

John Wilson

Tobias Ivarsson

unread,
Nov 22, 2009, 1:48:59 PM11/22/09
to jvm-la...@googlegroups.com
Interesting that you mentions this. For debugging the compiler, maybe. I find however that when working in a language, any language, I want my debugger to point to source locations in the source I wrote, and not in some generated intermediate language that I don't care about for getting the job done. ANTLR is a great example of this. Debugging an ANTLR grammar is hard work due to the extra step I need to go through to map the locations my debugger tells me to the actual locations in the grammar when I single step through the parser.

/Tobias

--

Martin C. Martin

unread,
Nov 22, 2009, 1:52:54 PM11/22/09
to jvm-la...@googlegroups.com
Another route is to generate bytecodes, but make sure the generated
bytecodes are the same as what a Java compiler would generate. Then you
can use a java decompiler on them. Probably doesn't help in the
debugger, but does help you understand what your compiler generates.

Best,
Martin

Matt Fowles

unread,
Nov 22, 2009, 2:05:59 PM11/22/09
to jvm-la...@googlegroups.com

Tobias~

I agree with you completely about debugging the source language.  We have a source level debugger for our language that is far more useful for that.  But most of my time is spent extending and debugging the compiler itself.

Matt

On Nov 22, 2009 1:49 PM, "Tobias Ivarsson" <tho...@gmail.com> wrote:

Interesting that you mentions this. For debugging the compiler, maybe. I find however that when working in a language, any language, I want my debugger to point to source locations in the source I wrote, and not in some generated intermediate language that I don't care about for getting the job done. ANTLR is a great example of this. Debugging an ANTLR grammar is hard work due to the extra step I need to go through to map the locations my debugger tells me to the actual locations in the grammar when I single step through the parser.

/Tobias

On Sun, Nov 22, 2009 at 7:30 PM, Matt Fowles <matt....@gmail.com> wrote:

> > Robert~ > > Ease of debugging mostly.  Our production environment always goes straight to byteco...

--

> > You received this message because you are subscribed to the Google Groups "JVM Languages" group...

-- You received this message because you are subscribed to the Google Groups "JVM Languages" grou...

Matt Fowles

unread,
Nov 22, 2009, 2:09:42 PM11/22/09
to jvm-la...@googlegroups.com

John~

You can easily get around the checked exceptions by just generating all your methods with 'throws Exception'.  But in principle, I agree that it is not necessarily a good fit for all languages, and things like ASM make generating byecode fairly easy.

Matt

On Nov 22, 2009 1:47 PM, "John Wilson" <tugw...@gmail.com> wrote:

2009/11/22 Matt Fowles <matt....@gmail.com>:

> Robert~ > > Ease of debugging mostly.  Our production environment always goes straight > to byteco...

We looked at this in the early days of Groovy. It my be a decent route
for your language but it wasn't great for Groovy. The problem is that
there are some Java features (e.g. checked exceptions) which are
enforced by the compiler but not the JVM. If your language shares all
the restrictions that the Java compiler enforces then it's a perfectly
good option. Groovy doesn't share many of the Java restrictions so it
didn't seem worth it.

Also generating bytecodes is not that hard once you get up the
learning curve a bit so I think most people end up just going straight
to bytecode because it's easier in the end.

John Wilson

-- You received this message because you are subscribed to the Google Groups "JVM Languages" group...

John Wilson

unread,
Nov 22, 2009, 4:05:19 PM11/22/09
to jvm-la...@googlegroups.com
2009/11/22 Matt Fowles <matt....@gmail.com>:
> John~
>
> You can easily get around the checked exceptions by just generating all your
> methods with 'throws Exception'.

Yes, but that makes it very messy if your users want to call your
methods from Java (lots of unnecessary try-catches).

Sometimes you want to do things like generate synthetic methods which
you can't do at all in Java.

John Wilson

Rémi Forax

unread,
Nov 22, 2009, 5:06:50 PM11/22/09
to jvm-la...@googlegroups.com
For the language pseudo (the one with gradual typing)
I use a similar approach.

My compiler generates javac AST and then I call javac algorithms
to generate bytecode. It takes me only one week to be able
to compile the different versions of fibonacci.

If I want to debug something in the compiler, instead of
generating the bytecode, I use the pretty printer of javac
which generates Java source.

I choose this option because pseudo is almost a subset of Java +
some dynamic magic, so generating a Java AST is not a limitation.

For some constructs (involving closures/lambdas) of the pseudo language
that doesn't exist in Java, I generate an invokokedynamic call to
escape at runtime and provide the missing bits in the runtime of
the language.

R�mi


Le 22/11/2009 22:05, John Wilson a �crit :
> --
>

Jochen Theodorou

unread,
Nov 24, 2009, 3:16:04 AM11/24/09
to jvm-la...@googlegroups.com
logi...@gmail.com schrieb:
>
> Some jvm languages have problems left to solve:
[...]

just to add some points... I think the main part here is if you decide
to go with the Java model or not. For example if the bytecode you output
is just there to control an interpreter, then breakpoints on the
bytecode level might be useless. This language may have its own debugger
and a "foreign" stack, if any. "foreign" in the sense that the java
stack is not really part of the stack of your language. In such a
language usually you don't use the Java object model too. I think early
JRuby might be a good example for such a language, but cannot tell for
sure. Charles may know more about that of course.

Such a language usually has also the problem of wrapping each and any
object of the language into wrappers or access the objects through
language specific interfaces. If classic Java classes are generated,
then usually only to be able to interact with Java itself. Extending of
classes interfaces in a round trip manner usually is a pain here. Just
think of overwriting an overloaded method.

Now Groovy is different, because Groovy uses the same stack as Java, the
same classes and almost the same object model. So of course you can use
any Java debugger to debug Groovy as well. Since the changes to the
object model should be understood as "add on", Groovy does not need
special interfaces to work. So wrappers of any kind or normally not needed.

Wrapper are needed when using Reflection and for primitive types then of
course. There is of course still the numeric math problem, which is
bigger than we initially thought. Currently the operand stack in a
method will contain only objects, including boxed integers for example.
Groovy 1.8 might change it, if it proofs to be worth the try.

> ---------------------------------------------Problem1---------------------------------------------
> P1) Whole program type inference to allow use of jvm primitives for
> their numeric types and math.
> (Even the use of Strings and other types we tend to be
> a HolderForATypicalJREClass)
>
> Some strategies used: (So far the compilers that get closest to
> optimal have done the hard ones like S5 and S1,)
>
> S1) Some people had to "force themselves" to use java primitives. And
> when they could not get away with it, at least used
> some JRE version of a java.lang.PrimitiveHolder. Stayed frustrated
> for a few hours but finally conceded that this was no longer their problem.

in Groovy int and Integer are almost aliases. The pain is for the
language writers then ;) In fact if you declare a local variable of type
int, then Groovy will store there an object of type Integer. Tat is not
100% native.

> S2) Some (like myself) used my.lang.PrimitiveHolder. Consoling myself
> with "java did it!" or "when its time, *eventually* will fix this to use
> primitives" or "what knucklehead thought 16 bits is enough to represent
> any character?" or "they forgot unsigned byte.. I need a special holder
> that helps mark this as so"

I think you mean numeric types differing from what Java provides. his is
a decision for your language.. only what if you have to interface with
Java and your type system does not contain the types Java knows? You
will ave to convert all the time, maybe even loosing information in the
process.

> S3) Create a couple generic HolderForJavaObject and optimize the use of
> reflection
> or make HolderForWidelyUsedJREInterface(s)

in my experience this does not really pay out. Especially if such
holders will contain generic logic and cause the creation of a
megamorphic call site.

> S4) Guess what the user will likely use based on the types from MyLang:
> MyLangJavaNetSocket, MyLangJavaReader, MyLangJavaWriter, etc

Well, in Groovy this is absolutely out of question. We think this kind
of interfacing is just butt ugly. If you can hide it completely, then it
is ok, but if the user must use those, then, no.. and for Groovy we see
here not only the Groovy side, but the Java side as well. One design
goal for Groovy is to make interfacing Groovy/Java as seamless as
possible in both directions.

> S5) Or enforce strong typing and never cheat using a
> HolderForATypicalJREClass... force my compiler to invoke-virtual

you can use JVM objects and not use strong typing on them? how does this
work.. ah well, maybe my definition of strong typing is different. For
me it means you cannot change the type of an object to something else
without creating a new one. In case of OO language this is usually
softened that type changes are allowed if it is to an implemented
interfaces or parent class. On the static side it is the downcast which
softens the system, but still requires the runtime check. Anyway, S5
kind of implies you are not really using the Java object model.

[...]
> * Know I've missed a few (We could try on this list to keep enumerating
> them)

the basic rule for the JVM is I think that if you want not to have any
problems, then your language needs to be 100% compatible with Java.
Groovy is not, so at some small points we have problems. Scala is/was
not, I don't the current state. Still, if you are 100% compatible with
an language as huge as Java, then tis implies so many things, that you
have hardly anything in your language you can make different.

> ---------------------------------------------Problem2---------------------------------------------
> P2) Excluding P1, if some user would have just written
> their application in .java in many cases,
> The bytecode that would have resulted from their code would be
> a tad better than ours.

it is not really the bytecode that is better, it is the hotspot
compiler, that knows the patterns the java compiler emits better than
other patterns. And this is really important only if you are really
creating patterns hotspot cannot handle. I think all interpreted
languages are excluded here already. I think this point mostly targets
performance... but there are so many other places where you can easily
loose performance too. Optimizing a language runtime for the JVM is a
tough job.

> S8) invoke static
> to MyLangLibrary.tooWierdOfFeatureToCompileSmartly(MyLangASTLikeObject
> obj) {... written in java ..}

what would that be? I found that sometimes I want to do tings the
bytecode allows, but not Java as language. In that case it would not
help me. And in other cases I can emit the bytecode I need myself too..
Also this kind of implies having a runtime AST of some kind. In that
case you are on a good way to make an interpreter, including all the
stack information problems.

> S9) Make the user just write java in a new featurefull syntax

I think surprisingly many want actually this.

> S10) inline S8 into bytecode (Not a strategy: But allot of people think
> it is.. profiling might tell otherwise.. ("method too large to JIT"))

Which reminds me.. is there an easy way to tell a method is too large to
JIT?

> S11) have 2nd-4th passes that optimize their intermediate proposals
> before bytecode representation is emitted.

that usually requires a compiled language, probably with a long running
compiler. But since the optimizations are done mostly by hotspot I see
not much point in doing that. Often enough it happened that an old
optimization strategy turned out to be slowing down, because a new
generation of hotspot learned to optimize the original construct.

> S12) Some may even run bytecode optimizations like SOOT
> One JVM lang even had to write their own post process: GJIT
> (http://code.google.com/p/gjit/),

"even had to" is not really right. Groovy works perfectly without it.
The target of this is to make Groovy faster. If you think of the
bytecode as control code for an interpreter, then GJIT is there to emit
optimized interpreter code by removing some checks, boxing, method
selection and all that. This can easily turn out into debugging hell if
the bytecode the optimizer emits has an error. That is because this
bytecode exists only at runtime and you cannot even take a look at it.
Calling the java compiler here is just too slow.

Still we probably would have made it the standard for Groovy if it were
not for the problem that you need a second VM to start the agent and the
security model may forbid the attachment of that agent to the other VM.
Imagine for example trying this on Google App Engine.

since we want to be able to make frameworks written in Groovy, but used
from Java we have to have a Groovy that can be used as kind of library.
Starting a second VM for this is no option for that. And then Groovy
would be fast from the command line, but slow as library? We don't want
that. Not to mention that we may have to handle multiple parallel Groovy
in different versions at some point in the future.

For almost the same reason bytecode weaving a normal class loader does
not work. The class might be loader through a loader I don't control,
ten there is nothing I can do.

[...]
> ---------------------------------------------Problem3---------------------------------------------
> P3) Maintaining and improving our compilers to solve P1 & P2 better.
>
>
> S13) Improve the built in functions of our runtime
>
> S14) Use JAVAP/JAD/JODE on the emitted output. (I think we of are
> constantly doing this.. but...)

I don't use JAVAP/JAD/JODE for that, since there is no clear
transformation from Groovy generated bytecode to Java
I think you would need an language that is semantically almost exactly
as Java and allows a transformation to Java source without having to add
information and without loosing information. How to make a Groovy style
method call from Java? Just by using runtime code may ignore for example
runtime bytecode generation.

> I suspect the more time one spends in the java-emitter branch of their
> jvm language (dealing with entire user programs),
> the better their compiler trunk will become.

I of course sometimes write a construct in Java compile it and then use
equal bytecode for my compiler to emit. But the Java compiler is not
really an optimizing compiler. So there are no iteration steps to be
done to further optimize anything.

> Here is an example of a complier-as-emitter.. An entire user's program
> was translated from SubLisp to .java
>
> line 117 of
> http://larkc.svn.sourceforge.net/viewvc/larkc/trunk/platform/src/com/cyc/cycjava/cycl/constant_reader.java?revision=254&view=markup
> <http://larkc.svn.sourceforge.net/viewvc/larkc/trunk/platform/src/com/cyc/cycjava/cycl/constant_reader.java?revision=254&view=markup>
>
> It uses a for/next look on a boxed Fixnum. but if that was fixed.. line
> 118 which only consumes Fixnums has to be updated as well.
> And so the cycle begins, replacing fixnums with primitive 'long's (or
> maybe 'int's if array index accesses will be found in the call tree).
> This code could be manipulated with some heavy java AST manipulation
> tools. Same way bytecode could have been..
>
> I just bring all this up in case people are looking for new ideas and
> might find sometimes easier to work with java in their compiler
> improvement workflows.
>
> Myself, once I got the .java down that I thought best its just as easy
> to compile and then get the best bytecode version for a compiler.

well in Groovy

for ( def i in x) {
println i
}

means something like:

for (Iterator it = x.><iterator(); it.hasNext();) {
Object i = it.next()
this.><println(i)
}

I used >< to mark Groovy method calls, which are not Java method calls.
In Groovy for example iterator() and various println methods are defined
on Object already. I could have written:

for (Object i : runtime.getIterable(x)) {
this.><println(i)
}

but this will require a runtime method that was not needed before and a
wrapper that was not needed before.

bye blackdrag

--
Jochen "blackdrag" Theodorou
The Groovy Project Tech Lead (http://groovy.codehaus.org)
http://blackdragsview.blogspot.com/

Reply all
Reply to author
Forward
0 new messages