Hi,I followed closely the debate on emulation strategy. I didn't participate in that discussion because I didn't feel I have any useful input to provide. Still, I appreciate all thoughts expressed in that thread because they led me into thinking about this problem for a while.I decided to do a little experiment:grek-imac:gwtJavaEmul grek$ ./prepare_jars.sh
Hi,I followed closely the debate on emulation strategy. I didn't participate in that discussion because I didn't feel I have any useful input to provide. Still, I appreciate all thoughts expressed in that thread because they led me into thinking about this problem for a while.I decided to do a little experiment:
Doesn't this override the boot classpath for the compiler itself? In your
example, what if the scala compiler tried to call String.format
somewhere...wouldn't that now blow up?
- Stephen
Ah, right. Thanks for pointing that out, Eric.
I'd mentioned it earlier in the scala emulation thread but forgot about
it. You're right, the scala compiler can't just be given special
client-side jar files, as the correct classpath is only known by looking
through gwt.xml files.
- Stephen
Huh, looking at the scalac shell script, I guess it doesn't. I didn't
anticipate that given the name of the flag. Hm.
- Stephen
I haven't followed this closely enough to understand the merits of the
different approaches. That said, we should think very carefully about
anything (however small) that increases the burden of migrating a GWT
project to Scala. Developers already face organizational resistance to
migrating to new technologies, so anything we can do to ease the
transition is arguably worthwhile.
~Aaron
I haven't followed this closely enough to understand the merits of the
different approaches. That said, we should think very carefully about
anything (however small) that increases the burden of migrating a GWT
project to Scala. Developers already face organizational resistance to
migrating to new technologies, so anything we can do to ease the
transition is arguably worthwhile.
That is what I thought as well, however, looking at the 2.9's scalac
shell script, I don't think it overrides the VM's bootclasspath for
either -bootclasspath or -javabootclasspath--both options are passed in
as command line args to nsc.Main.
This explains why the compiler itself doesn't blow up if String.format
isn't available in the jars passed to the -bootclasspath or
-javabootclasspath arguments.
It seems like, per what Lex said is standard compiler design, the scala
compiler can? distinguish between the compiler classpath and the program
classpath. I wonder why Scala IDE is (or was) different then. Maybe it's
due to binary incompatibility in stuff like the pickled type signatures,
that may/may not be parseable between different versions of the
scala-compiler.
> Having said all of that, -javabootclasspath allows one to override
> Java's standard library for resolving dependencies in compiled Scala
> code and that's exactly what we need.
Agreed, I see what you're doing. And I can see how it will work for
cobbling together our forked scala-library.jar.
...but what about code in GWT that is emulated? We'd have to get the
special bytecode for those files into a jar file and onto scalac's
classpath. And what about code in the user's projects that are emulated?
Their bytecode is also not available in jar files.
Basically, I consider what you're pursuing an acceptable short-term hack
for somehow getting scala-library bytecode into the GWT toolchain. But
it's not a long-term solution for emulated/super-sourced code. Which
more generally solves these problems.
(Which, even if we say you can't *write* super-sourced code in
scala-gwt, scala-gwt .scala files will almost certainly consume
super-sourced code. It is more than just the scala library we
have to worry about.)
I don't want to take away from your compile output showing the ~100 some
errors in the scala library that call APIs unsupported by GWT. That is
really awesome and will be helpful for our progress. But I'm not
convinced it's a final solution.
- Stephen
That is what I thought as well, however, looking at the 2.9's scalac
> This might be confusing but makes some sense. You have -bootclasspath
> is the option that modifies boot classpath for VM that is running
> scala compiler.
shell script, I don't think it overrides the VM's bootclasspath for
either -bootclasspath or -javabootclasspath--both options are passed in
as command line args to nsc.Main.
This explains why the compiler itself doesn't blow up if String.format
isn't available in the jars passed to the -bootclasspath or
-javabootclasspath arguments.
It seems like, per what Lex said is standard compiler design, the scala
compiler can? distinguish between the compiler classpath and the program
classpath. I wonder why Scala IDE is (or was) different then. Maybe it's
due to binary incompatibility in stuff like the pickled type signatures,
that may/may not be parseable between different versions of the
scala-compiler.
> Having said all of that, -javabootclasspath allows one to overrideAgreed, I see what you're doing. And I can see how it will work for
> Java's standard library for resolving dependencies in compiled Scala
> code and that's exactly what we need.
cobbling together our forked scala-library.jar.
...but what about code in GWT that is emulated? We'd have to get the
special bytecode for those files into a jar file and onto scalac's
classpath. And what about code in the user's projects that are emulated?
Their bytecode is also not available in jar files.
Basically, I consider what you're pursuing an acceptable short-term hack
for somehow getting scala-library bytecode into the GWT toolchain. But
it's not a long-term solution for emulated/super-sourced code. Which
more generally solves these problems.
(Which, even if we say you can't *write* super-sourced code in
scala-gwt, scala-gwt .scala files will almost certainly consume
super-sourced code. It is more than just the scala library we
have to worry about.)
I don't want to take away from your compile output showing the ~100 some
errors in the scala library that call APIs unsupported by GWT. That is
really awesome and will be helpful for our progress. But I'm not
convinced it's a final solution.
Sure.
In a regular GWT project (or GWT library code), you might have a class
that you want to have different implementations of on the server-side
vs. the client-side. Either because you don't control the
original .java source (and it uses reflection/whatever), and still want
to use it in your GWT client-side code, or because you just really do
want different implementations (for perf/environment/etc. reasons) on
the client vs. server.
So, in GWT, you do this today via super-sourcing [1]--basically write
two versions of the file. Your project would look like:
* com/foo/myapp/blah/Foo.java <- server-side
* com/foo/myapp/supersrc/com/foo/myapp/blah/Foo.java <- client-side
I'm not a huge fan of the package rerooting (the duplication of the
com/foo/myapp directories within the supersrc), but nonetheless, that's
how it works.
So, when GWT goes to compile your app, it reads the app.gwt.xml, which
says "<super-source path=supersrc/>", so it then passes the 2nd Foo.java
to the ecj compiler, and uses the resulting bytecode/AST throughout
your app instead of the 1st Foo.java.
So, when compiling client-side code that calls Foo, it's important that
the compiler resolves the calls against right Foo.java.
However, the Scala IDE will never know about this--by itself, scalac
will always pick the 1st Foo.java, who API may technically be different
than the 2nd Foo.java, and so potentially lead to undefined behavior
(like the errors we've seen in UnifyAst).
That's what I meant by asking how the -javabootclasspath approach would
solve other emulated code--these "2nd"/client-side-only Foo.java's that
exist both within the GWT library itself and within users' projects.
Stepping back, note the similarity between this "two versions of
Foo.java" and exactly what we're facing with the scala-library
fork--two versions of Option.scala, List.scala, etc.
This is why I'm asserting that embedding the scalac compiler inside of
GWT is the more general solution--it can handle emulated scala-library
code, emulated java-library code, plus emulated GWT library code, plus
emulated user code. It handles all emulation (super sourcing) and not
just the two specific jars of scala-library and gwt-lang.
Additionally, the embedding approach shouldn't be hard to implement as
GWT already knows how to go looking for the "right" version of .java
(or .scala) files, as they have all the gwt.xml parsing + directory
scanning built. And, after the embedded compiler (either ecj or scalac)
hands GWT back bytecode+ASTs, GWT already knows how to keep the
client-side versions of these separate (on classpaths/etc.) from the
server-side versions.
(This is what DevMode's CompilingClassLoader does, for example.)
So, my point has been that GWT has already solved the forking,
emulation, dual-version problem. And not just for scala lib or java
lib--for any lib. Such that, in my opinion, scalagwt would achieve the
best integration by just playing along with the embedded approach.
- Stephen
[1]:
http://code.google.com/webtoolkit/doc/latest/DevGuideOrganizingProjects.html
...so, if I invoke scalac with something like:
java -cp sl.jar:sc.jar nsc.Main scala/Option.scala java/lang/Object.java ...
I have an unmolested scala-library (sl) and scala-compiler (sc) jar on the JVM
classpath--can't the compiler look up all of its symbols/definitions/whatever in
the JVM's scala-library?
And then when compiling molested files, like a forked Object.java, etc., only
worry about the text source code/ASTs?
Is this supposed to work? Or if the split between compiler classpath and
user code not that simple?
- Stephen
...so, if I invoke scalac with something like:
> https://github.com/paulp/scala-full/blob/master/src/compiler/scala/reflect/internal/Definitions.scala
>
> Then you can see lots of absolute paths. If something gets moved and
> compiler expects it in different package in a library all kind of
> wrong things can happen (ideally, it should crash but not always this
> is the case).
java -cp sl.jar:sc.jar nsc.Main scala/Option.scala java/lang/Object.java ...
I have an unmolested scala-library (sl) and scala-compiler (sc) jar on the JVM
classpath--can't the compiler look up all of its symbols/definitions/whatever in
the JVM's scala-library?
And then when compiling molested files, like a forked Object.java, etc., only
worry about the text source code/ASTs?
Is this supposed to work? Or if the split between compiler classpath and
user code not that simple?
So, I was perhaps complicating things by having scalac try to compile .scala
files with a forked java/lang/Object.java (instead of a forked
Object.class).
*Normally* what will happen is .gwtar files will be used to fetch the forked
but already-compiled Object.class file, so it'll end up:
* bin/java/lang/Object.class <-forked
* source/scala/whatever
* target/ <- bytecode|jribble output here
And I call scalac with a -javabootstrap of bin/.
This seems to work almost as expected. If I try to use String.format in
.scala files in source/, I get a nice error:
scalac error /tmp/gwt-scalac/source/scalatest/client/Foo.scala: value format is not a member of object java.lang.String
However, this is followed by an AssertionError that stops the compilation
from continuing: https://gist.github.com/1128407. Oddly, if I invoke
scalac directly from the command line, this assertion doesn't happen--I
see the "format is not a member" error and then scalac exits gracefully.
*Previously* what had happened is that, due to javac vs. Eclipse
serialVersionUID issues, I couldn't load the .gwtar file, and my
unit cache was empty, so I had no pre-compiled Object.class fork
available, so it looked like:
* bin/ <- nothing here
* source/java/lang/Object.java
* source/scala/whatever
And this is what blew up with the NPE in Global.Run.compileLate. If
we can't handle this case, we may be fine, as the user should normally
be able to pull in Object.class from the .gwtar file, but, nonetheless,
if it's an easy fix, it would be cool to see this work too.
(If scalac can see Object.java without blowing up, then next the
JdtCompiler would compile it to Object.class and put it in the unit
cache so that on the next compilation round, it'd end up in bin/
as Object.class instead of source/ as Object.java.)
> > Is this supposed to work? Or if the split between compiler
> > classpath and user code not that simple?
>
> I'm not really sure if I follow the question. I don't understand what
> should be loaded from where.
In my naive understanding, there are two things: A) .class files the
compiler needs to run itself, and B) .class/.java/.scala files it needs
to resolve/type check the user code.
I'm hoping that the A) can be the JVM classpath, and B) can be the
arguments passed to -javabootclasspath + bin/ + source/ files.
And that if the files in bin/ and source/ drift quite a bit from the
JVM's scala-library/scala-compiler, we'll still be okay, and the
compiler won't blow up, and can resolve our user code against the types
in bin/ and source/ even if those types vary from it's own version of
those types.
- Stephen
So, I was perhaps complicating things by having scalac try to compile .scala
> Yes, scala compiler can build symbols out of Java files. The question is if
> it's going to load those Java files if it sees class files in it's classpath
> (e.g. through javabootclasspath).
files with a forked java/lang/Object.java (instead of a forked
Object.class).
*Normally* what will happen is .gwtar files will be used to fetch the forked
but already-compiled Object.class file, so it'll end up:
* bin/java/lang/Object.class <-forked
* source/scala/whatever
* target/ <- bytecode|jribble output here
And I call scalac with a -javabootstrap of bin/.
This seems to work almost as expected. If I try to use String.format in
.scala files in source/, I get a nice error:
scalac error /tmp/gwt-scalac/source/scalatest/client/Foo.scala: value format is not a member of object java.lang.String
However, this is followed by an AssertionError that stops the compilation
from continuing: https://gist.github.com/1128407. Oddly, if I invoke
scalac directly from the command line, this assertion doesn't happen--I
see the "format is not a member" error and then scalac exits gracefully.
-Ylog-classpath is nice.
I don't think the definitions are looked up within the JVM's scala-library--I
think it's trying to use my user code's source path.
Here's some output:
https://gist.github.com/1128652
It is failing because the scala.Boolean symbol ends up being NoSymbol.
Because the scala package object has nothing in it except for
ScalaObject.
Which (unfortunately) makes sense, because I'm using a very minimal
scala source tree:
* /tmp/gwt-scala/source/scala/ScalaObject.scala <- this is my only scala
file for this test
So, to me it looks like the symbol definitions are resolved against the
user code's classpath (which right now has hardly anything on it). I
don't think we want that--it's likely our scala library user code
(whether we embed scalac or not) won't have all of the necessary
symbols.
In other words, the scala-compiler is in fact placing restrictions on
what the user code's scala-library needs to look like, even if
technically the compiler is executing with its own scala-library.
So, maybe I just need to throw a more complete scala library at
this and see what happens.
- Stephen
- Stephen