Hi all,
I have some partial fixes for SI-3452 together with analysis of the other
remaining problems and some questions, and Paul Phillips urged me to discuss
them here rather than on JIRA. I don't know if I should introduce myself, since
I never posted here; you will find a minimal presentation at the end.
My patches are [on Github][Commits] and after discussion I expect to turn
(part of) them into a pull request. Interested reviewers are @odersky (value
classes), @magarciaEPFL and @gkossakowski (backend), and probably whoever
contributed to erasure and generic signatures.
The patches introduce no failure, except a timeout during partest (which seems
to happen only locally, not on the build bot, but there are too many differences
to be sure). It seems that all tests are compiled correctly---the error is quite
unspecific, hence I could not investigate it yet:
```
/Users/pgiarrusso/Documents/Research/Sorgenti/scala/build.xml:1953:
java.lang.RuntimeException: Test suite finished with 1 case failing:
worker timed out; adding failed test [TIMOUT]
```
I tried to ensure that one can read only part of this mail and still get the
gist. I also sectioned this email for readability, and chose to use Markdown
syntax for this, like on GitHub (please tell me if prefer other solutions).
# Problem statement
To solve issue SI-3452, JVM generic signatures (also simply _signatures_ in the
following) emitted by Scalac should always be coherent with method descriptors
in emitted bytecode; when a mismatch occurs for a method, that method cannot be
called by Java code, because `javac` computes the descriptor to call from the
generic signature, and a mismatch implies that `javac` in this way will produce
an incorrect descriptor.
There are various bugs causing mismatches; since I just started hacking the
compiler, I've fixed the low-hanging fruits. I've enabled `-Ycheck:jvm`, added
[Paul's patch][PaulFix], and investigated the remaining warnings and the
testcase failures introduced by the patch.
During this investigation I discovered that Scalac erases value classes to
Object (incorrectly?) unlike the generic signature algorithm; this seems to have
more general relevance, although it might be intended. More details and
other problems in next section.
# Bugs
1. Value classes (@odersky): the erasure algorithm erases them to `Object`, while in
signatures they are left as-is. It's not clear to me why they should be erased
`Object`, and such a behavior does not seem to be described in SIP 15. I also
tried STFW, without success. On the other hand, the same is done for primitive
value types, which is also confusing---I can only imagine reasons of binary
compatibility. If that's intended, I guess I could fix the algorithm computing
signatures.
2. `Nothing` vs `Nothing$`: who implemented prepareSigMap assumed that Nothing
is erased to Nothing$ during erasure, which sounds reasonable, while for some
unclear reason that's done during code generation. I'm not sure where to fix the
problem---I feel that the `Nothing`->`Nothing$` transformation belongs to
erasure, but I implemented a less invasive fix on `prepareSigMap` (which could
be simplified a bit further).
3. Intersection types are erased incorrectly by the generic signature algorithm,
implemented in `erasure.javaSig`. I and Paul could not yet agree on the correct
behavior.
I could easily [fix][2] `erasure.javaSig`, which computes the generic
signatures, to be coherent with the erasure algorithm specified by SLS 3.7
and implemented by Scalac.
However `erasure.erasure`, which is used during the check, still gives an
incorrect result, causing a spurious warning, and reading the code does not
reveal why, only that a fair amount of code duplication seems to be involved.
4. Paul's patch fixes the majority of the remaining problems, but introduces
some regression (discussed below). I identified an algorithm which should solve
the problem, but I am not sure how to implement it.
5. There's at least one further warning which I did not investigate yet.
Bugs 3 and 4 are discussed in more detail in following sections.
## How to develop on the compiler?
I could not investigate the runtime behavior since I have yet to setup Scalac
in some IDE with debugging support - trying Scala IDE 2.1-M1-2.10 resulted in
utter failure. I might try IntelliJ next but I would accept suggestions/docs.
I'm also aware of the nice [guide to reflection by Eugene
Burmako][ReflectionGuide], I just did not have yet time to investigate it.
# Intersection types.
In this section I explain the problem with intersection types and the solution I
implemented. I am confident that my solution is correct, but I could not yet
convince Paul.
Consider the following declarations:
```
abstract class A
trait B extends A
class Test {
def f[T](t: T) = new A with B { }
}
```
In this code, `f` will receive a generic signature incoherent with its
descriptor (the result of erasure); this generic signature will prevent Java
code from calling `f`.
According to SLS 3.7, the erasure of `B with A` is then defined to be `B`,
since `A` is filtered out as a supertype of `B` immediately.
To me it sounded clear that generic signatures should follow the same rule.
However, before I found the SLS rule, Paul had argued that in this example, `B
<: A` is not visible in Java, hence from the point of view of Java neither `B`
nor `A` is more specific, which is something I had missed. I did not understand
if he suggested a different resolution. Paul, what do you think?
# Paul's fix and the regressions it causes
Unrelated to intersection types, Paul [fixed][PaulFix] the two biggest problems
with the bug at hand, but its fix introduces regressions. I managed to analyze the
problem and propose a conceptual solution, but implementing it is not so easy
for me, so I thought I'd better propose the algorithm for now.
The visible regression is that according to `javac`, `AbstractFunction1` does not
implement all `Function1` methods because of errors in generic signatures, as
shown in the failing testcases [detailed by Paul][PaulFix]. That error however
involves specialized signatures, hence it is more instructive to verify that
`javac` refuses code such as:
```
public static Traversable<A> trav = new scala.collection.AbstractTraversable<A>() {};
```
for the wrong reason. It complains, for instance, that `AbstractTraversable`
does not implement `seq: Traversable<A>`, which is abstract in the `Traversable`
interface. Instead, AbstractTraversable implements `seq: Traversable<Object>`,
which is incompatible since JVM generics are invariant.
`AbstractTraversable` still omit the implementation of foreach, but that's
irrelevant here.
Consider the following code (originally from Paul, altered a bit):
```
trait C[T]
trait Search2[M] {
def search(input: M): C[Int] = null
def filter(input: M => M): C[Int] = null
def crazy(i1: M, i2: M => M): C[Int] = null
}
object StringSearch2 extends Search2[String]
```
When a class/object (like StringSearch) extends a mixin, the mixin composition
phase must create forwarder methods and assign them a descriptor and a generic
signature. The generic signature must however erase to the descriptor, which in
turn must match the descriptor in the superclass.
The problem arises when a type parameter of the mixin has a concrete value in
the superclass---can we simply replace the occurrences of the parameter with its
value? Not always. If we did so for `search`, it would become `search(String):
C[Object]`, which does not match the descriptor in `Search2`, that is `search(Object)`.
The algorithm computing descriptors already gets this right, but not the one for
generic signatures.
Naked occurrences of type parameters, like the one in `search`, erase to
`Object` in descriptors, and this must be preserved: they are either left as-is,
or erase to Object. Paul's change ensures this is done, even though I don't get
how. It seems that in fact his change erases all type parameters to `Object`;
while I can understand the description of his change and of `asSeenFrom` (the
method he uses), why that makes a difference is quite non-obvious to me.
For non-naked occurrences of type parameters, like the one in `filter`, we have
the opposite problem. On the one hand, they simply disappear from descriptors,
hence there is no need to erase them to `Object` if a value is substituted as
done by Paul's change. Moreover, this change is in fact harmful: for `javac`, a
method overrides another if its descriptor _and_ its generic signature are at
least as specific; moreover, type constructors are invariant for the JVM, so
their parameters must match.
Hence we have a constraint opposite to naked occurrences, and must preserve the
old behavior. Moreover, we can't reuse code from the erasure algorithm since it
does not need to handle this case.
Method `crazy` simply shows that the two cases might appear in the same descriptor.
To solve this problem, it seems necessary to create generic signatures by
substituting type arguments in Scala type signatures (before erasure) according
to the ideas outlined above---substitution would need to distinguish naked and
non-naked occurrences of type parameters. But I'd like to start from an existing
implementation of type substitution, given the amount of cases to handle, and given
that name capture seems a potential problem; I suspect it might be possible to
start from some variant of `SubstMap`/`TypeMap`.
In this code, BTW, I find confusing that `C[Int]` erases to `C[Object]` instead
of the more precise `C[java.lang.Integer]`---although that's the same for derived value
classes (and also there it's confusing). This is also not documented in SLS 3.7.
I can only think of compatibility reasons (I think that boxing used different
wrappers before 2.8, but I can't find any more where I read it).
# Who I am
I use Scala for research because I am a PhD student under the supervision of
Klaus Ostermann. I've experienced and reported various Scalac/SBT bugs during my
research activity (which is not about Scalac stability, but on developing DSELs).
I have previous experience in Open Source development---among other things, as a
Linux kernel maintainer of a peripheral subsystem (UserModeLinux). My homepage
is available [here][Home].