Generic typing information is present in Java 1.5 class files, and I'd like to see even a single optimization that you can perform from a source parse that you can't from a bytecode parse. In general, I have trouble seeing advantages to parsing Java source code over Java bytecode, whereas I can name several disadvantages off the top of my head. |
Perhaps I know not what I'm talking about. I thought all the generic type information was elided in the bytecode, but I didn't think about the fact that it could be available in the headers or something like that.
As far as source vs. bytecode is concerned, I've always been of the impression that you could get most everything from parsing bytecode. At times, though, I've seen decompilers generate lots of weird labels and jumps, turning for loops into while loops, etc. I don't know if this is actually a theoretical issue or a practical one. Miguel has a lot of experience trying to do this on .NET, so he probably knows more about it than I do.
joel. |
On 6/15/07, Toby Reyelts <to...@google.com
> wrote:
As I understand it, the generated bytecode loses generic type information via type erasure (yes it is still present in method signatures etc). In general, I have trouble seeing advantages to parsing Java source code over Java bytecode, whereas I can name several disadvantages off the top of my head. You are operating at a lower level, dealing with smaller operations. Given that JS is roughly the same level of abstraction as the Java source, to get efficient JS generation you would have to essentially piece together bytecode ops (which theoretically could be intermixed from different statements complicating this, although they don't seem to be in practice) to get back to the corresponding Java source. Essentially, we would be adding the complexity and fragility of a bytecode decompiler to the JS compile process. |
On 6/15/07, John Tamplin <
j...@google.com> wrote: On 6/15/07, Toby Reyelts <to...@google.com> wrote: If you have a generic type declaration in your source code, it's available in the class file. This holds for classes, methods, fields, parameters, and locals.
Java bytecode is nearly the same level of abstraction as source code, just generally more explicit. For example, instead of having to do complicated name-resolution lookup involving scopes and import statements to determine which type an identifier represents, java bytecode contains a fully-qualified reference to the type. This is simpler - not more complex. Rather than hand-waving in the abstract, why not give some concrete examples of specific optimizations that can only occur from a source code parse? Essentially, we would be adding the complexity and fragility of a bytecode decompiler to the JS compile process. Java bytecode has been more stable than the Java language. I think that speaks for itself. It also sounds like you believe that the goal of parsing bytecode would be to generate the same parse tree we'd have gotten had we parsed the original source, which is inaccurate. My primary interest in responding to this thread was to point out the misinformation, rather than getting into a full-blown debate about the two approaches (although I'm up for that at any point). I'd just like us to have our facts straight, particularly on issues that we could likely end up commenting about in public. |
If you have a generic type declaration in your source code, it's available in the class file. This holds for classes, methods, fields, parameters, and locals.
Java bytecode is nearly the same level of abstraction as source code, just generally more explicit. For example, instead of having to do complicated name-resolution lookup involving scopes and import statements to determine which type an identifier represents, java bytecode contains a fully-qualified reference to the type. This is simpler - not more complex.
Rather than hand-waving in the abstract, why not give some concrete examples of specific optimizations that can only occur from a source code parse?
Java bytecode has been more stable than the Java language. I think that speaks for itself. It also sounds like you believe that the goal of parsing bytecode would be to generate the same parse tree we'd have gotten had we parsed the original source, which is inaccurate.
My primary interest in responding to this thread was to point out the misinformation, rather than getting into a full-blown debate about the two approaches (although I'm up for that at any point). I'd just like us to have our facts straight, particularly on issues that we could likely end up commenting about in public.
From: Toby Reyelts < to...@google.com>
If you have a generic type declaration in your source code, it's available in the class file. This holds for classes, methods, fields, parameters, and locals.
Java bytecode is nearly the same level of abstraction as source code, just generally more explicit. For example, instead of having to do complicated name-resolution lookup involving scopes and import statements to determine which type an identifier represents, java bytecode contains a fully-qualified reference to the type. This is simpler - not more complex. Rather than hand-waving in the abstract, why not give some concrete examples of specific optimizations that can only occur from a source code parse?
I think we should move this conversation to GWT-Contrib. My folllowup to follow.
Forwarded Conversation
From: John Tamplin <j...@google.com>To: Toby Reyelts <to...@google.com>Date: Fri, Jun 15, 2007 at 11:09 AM
On 6/15/07, Toby Reyelts <to...@google.com > wrote: In general, I have trouble seeing advantages to parsing Java source code over Java bytecode, whereas I can name several disadvantages off the top of my head.
You are operating at a lower level, dealing with smaller operations. Given that JS is roughly the same level of abstraction as the Java source, to get efficient JS generation you would have to essentially piece together bytecode ops (which theoretically could be intermixed from different statements complicating this, although they don't seem to be in practice) to get back to the corresponding Java source. Essentially, we would be adding the complexity and fragility of a bytecode decompiler to the JS compile process.
I see several issues with compiling from bytecode:
1) IIRC correctly, generic type information is only available for certain things (methods declaration, not fields, not return types,
not method locals)
2) type erasure generates spurious bridge methods that are not needed or different than GWT
3) inner class/local class/anonymous inner class transformation is not neccessarily how GWT wants to do it
4) JSNI information won't be preserved (now what, force people to move all JSNI code to external .js files or XML?) This is the deal
breaker.
5) autoboxing!
I also think it's not going to speed anything up. Right now, GWT parses source, and runs visitors over the AST to perform its optimizations. Now if you feed it bytecode, it will essentially need to rebuild an intermediate representation in memory amenable to optimization passes, which essentially means parsing bytecode and constructing trees and graphs from it. Not only that, but all the optimizations will have to be rewritten to deal with a new, lower-level IR format.
Has it even been determined that GWT spends most of its time parsing?
On 6/15/07, Scott Blum <sco...@google.com> wrote:I think we should move this conversation to GWT-Contrib. My folllowup to follow.
Forwarded Conversation
From: John Tamplin <j...@google.com>To: Toby Reyelts <to...@google.com>Date: Fri, Jun 15, 2007 at 11:09 AM
On 6/15/07, Toby Reyelts < to...@google.com > wrote: In general, I have trouble seeing advantages to parsing Java source code over Java bytecode, whereas I can name several disadvantages off the top of my head.
You are operating at a lower level, dealing with smaller operations. Given that JS is roughly the same level of abstraction as the Java source, to get efficient JS generation you would have to essentially piece together bytecode ops (which theoretically could be intermixed from different statements complicating this, although they don't seem to be in practice) to get back to the corresponding Java source. Essentially, we would be adding the complexity and fragility of a bytecode decompiler to the JS compile process.
Also, javac introduces synthetic methods to get around some access related differences between java bytecode and java source. For example an inner class accessing a private member of the containing class will usually create a synthetic access method with a more accessible scope.
JavaScript doesn't have the enforced scoping rules the JVM has but unless you introduced decompiler inference logic I think you're more likely to produce less optimal JavaScript because of the "quirks" javac introduced.
On 6/15/07, Toby Reyelts <to...@google.com> wrote:If you have a generic type declaration in your source code, it's available in the class file. This holds for classes, methods, fields, parameters, and locals.
Good to know.Java bytecode is nearly the same level of abstraction as source code, just generally more explicit. For example, instead of having to do complicated name-resolution lookup involving scopes and import statements to determine which type an identifier represents, java bytecode contains a fully-qualified reference to the type. This is simpler - not more complex.
It's not really our problem at the moment, as JDT does all this for us. It also does additional things for us, like ensure the code is error free, that all necessary classes are available, that the set of classes are internally consistent (for example, the user didn't use JRE APIs that aren't in our JRE). We'd need alternative solutions for these.
Rather than hand-waving in the abstract, why not give some concrete examples of specific optimizations that can only occur from a source code parse?
I don't think we could answer this question without really diving into it.
Java bytecode has been more stable than the Java language. I think that speaks for itself. It also sounds like you believe that the goal of parsing bytecode would be to generate the same parse tree we'd have gotten had we parsed the original source, which is inaccurate.
I'm not sure I follow. Are you saying that the parse tree we get from source is inaccurate (I'm not sure how it could be) or that there's no need to get the same AST from byte code that we could have gotten from source?
If we require a "special" kind of compilation from source to generate class files with enough information, then we've sort of just pushed the problem around.
No, generic type information is available on all of those entities. (It might help to review sections 4.4.4 and 4.8.13 of the JVMS).
3) inner class/local class/anonymous inner class transformation is not neccessarily how GWT wants to do it
How is that relevant? We don't do a 1 to 1 transformation of the source, why should we be doing a 1 to 1 transformation of the bytecode?
4) JSNI information won't be preserved (now what, force people to move all JSNI code to external .js files or XML?) This is the deal
breaker.
No, we can continue to do a source-code parse for JSNI. JSNI would be stored in class files as an additional attribute.
Has it even been determined that GWT spends most of its time parsing?
Yes, in fact, I've performed several profiling runs against both the GWT compiler and hosted mode with important results that I'd like to use towards speeding up both hosted mode and unit tests. I encourage anybody who's interested to also post their own results. Please be sure to include your JVM version + flags, operating system, hardware configuration, profiler, etc...
On 6/15/07, Toby Reyelts < to...@google.com> wrote:No, generic type information is available on all of those entities. (It might help to review sections 4.4.4 and 4.8.13 of the JVMS).
I agree that the class file format allows for it, the question is, does javac actually do anything with these optional attributes?
I just compiled the following to test before posting my original message.
import java.util.*;
public class t {
public static void main(String arg[]) {
ArrayList<Integer> al=new ArrayList<Integer>();
System.out.println(al);
}
}
An examination of the compiled bytecode shows that the type java.lang.Integer appears nowhere.
On 6/15/07, Scott Blum <sco...@google.com> wrote: