https://lampsvn.epfl.ch/trac/scala/changeset/20311
https://issues.scala-lang.org/browse/SI-2876
https://lampsvn.epfl.ch/trac/scala/changeset/20490
That was for views. The issue in 2876 doesn't look so bad from this
vantage, but as I recall there was a more serious compiler bug which
was never resolved which made the whole process too tedious.
It's probably doable, but in my experience one tends to discover new
or (worse) old and unfixed compiler bugs when attempting these things,
so you need to be prepared for battle.
https://lampsvn.epfl.ch/trac/scala/changeset/20311
Date: Wed Dec 23 17:57:36 2009 +0000
Created team of private[collection] abstract classes
and traits in scala.collection.views. Factored boilerplate
and base Transformed traits out of *ViewLike classes.
Executive summary and motivation:
4812029 Dec 23 09:47 scala-library.jar // before
4604150 Dec 23 09:24 scala-library.jar // after
Direct size savings of 4.5%. Review by odersky.
% date
Tue Oct 18 14:39:09 PDT 2011
% ls -l lib/scala-library.jar
-rw-r--r-- 1 paulp admin 9824041 Oct 16 09:02 lib/scala-library.jar
The bug I was talking about is this.
https://issues.scala-lang.org/browse/SI-2897
I trust I'm allowed to call that a bug.
"This week, on Behind the Bytecode... remember 200K meant something?"
Yes, absolutely. (But also take my responses to mean: please see that
the test suite completely passes from scratch, that is "ant all.clean
test" gets all the way to the part where it says yay.)
As it says in the ticket, "this bug may have stood between me and a
solution to SI-2876." In other words, I was restructuring the views to
get the linearization right, and ran directly into that bug. Not
having the luxury of shaving that particular yak, I reverted.
Yes, absolutely. (But also take my responses to mean: please see that
the test suite completely passes from scratch, that is "ant all.clean
test" gets all the way to the part where it says yay.)
Oh, of course. Well, after throwing only a couple hours of tweaking at it:$ ls -l dists/scala-2.10.0.r25850-b20111018225736/lib/scala-library.jar-rw-r--r-- 1 tvierling tvierling 8697074 2011-10-18 23:13 dists/scala-2.10.0.r25850-b20111018225736/lib/scala-library.jar$ ls -l build/pack/lib/scala-library.jar-rw-r--r-- 1 tvierling tvierling 7825478 2011-10-19 01:48 build/pack/lib/scala-library.jarThe main changes so far: adding AbstractIterator, AbstractIterable, AbstractSet, and some *ViewLike.AbstractTransformed.Looks like I can probably trim another 200-400k from what I see remaining so far in the collections tree. Will post more status later in the week after giving it some hefty workouts with both the tests and eyeball-glazing proofreading (mainly to ensure that the Abstract* types don't accidentally leak out into public API through methods' return type signatures).Some of this leads me to wonder whether certain specific types (e.g., AbstractIterator) should be part of the public API after all, as convenience classes for the user. For now, they're confined to private[collection], but if they prove useful and not a speed burden, that's another debate we can have after-the-fact.
Great!
I am hopeful that the reduced indirection will also lead to improved
runtime performance. It will be interesting to do some benchmarking.
Best,
Ismael
Best,
Ismael
There is also the JIT inlining aspect. HotSpot relies on bytecode size
to compute what to inline and the indirection introduced by the trait
forwarders may cause problems. I haven't had the chance to do specific
benchmarking around this, but I've seen some performance results that
could be explained by that.
Best,
Ismael
Best,
Ismael
The bytecode inlining budget, yes. There is also a limit to the number
of methods HotSpot will inline, which is also a potential issue.
If HotSpot relied on the generated code to compute its inlining
budget, then things would be different (and probably better, although
there are other complications in that case).
Best,
Ismael
$ ls -l dists/scala-2.10.0.r25850-b20111018225736/lib/scala-library.jar-rw-r--r-- 1 tvierling tvierling 8697074 2011-10-18 23:13 dists/scala-2.10.0.r25850-b20111018225736/lib/scala-library.jar
There's still more to trim and other tasks to do. After I'm done injecting all the 'Abstract...' layers, I'll review it for possible type linearization issues, run the ant tests, then post it for review and community benchmarking (hell, I'm not sure what to test!) on my webserver: a vanilla dist, the source diff, and a dist with the diff applied.
While doing this, I ran across some cases outside of scala.collection that were pretty useful as well. In particular, the compiler, upon seeing { (x) => foo }, emits an anon class that extends scala.runtime.AbstractFunction1; but explicit inheritances from (A => B) instantiate all the trait stubs in Function1 (and there are a lot of them, thanks to @specialized). So I was able to trim even more by making use of AbstractFunction1 explicitly in a few places.
So, the library jar is getting kinda bulky, especially after the addition of parallel collections. I've been noticing that there's a whole lot of repetitive trait-stub instantiations -- scala/collection/Iterator$$anon$* are great examples of these (they're all different subtypes of Iterator itself) -- and I wonder if anyone has thought to create some library-private abstract classes to flatten some of the bytecode bloat.
To explain what I did, how it managed to reduce scala-library.jar by about a meg and a half, and how this could be applicable to other codebases, I blogged the gory details here: http://blog.duh.org/2011/11/scala-pitfalls-trait-bloat.html
Amazing. Interesting, can compiler compute common minimal set of
traits for set of classes, to do such optimization authomatically [?]
Offhand, I think that's a freaking great idea.
I'm wary of this increasing size in the standard library, though it
would indeed help the case of user code.
It makes me wonder if perhaps an annotation to enable this behavior
per-trait might be a better choice.
--
-- Todd Vierling <t...@duh.org> <t...@pobox.com>
On Thu, Dec 1, 2011 at 5:48 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
> What if the compiler will create 'abstract class Foo$AC extends Foo' forI'm wary of this increasing size in the standard library, though it
> every trait Foo?
would indeed help the case of user code.
It makes me wonder if perhaps an annotation to enable this behavior
per-trait might be a better choice.
--
-- Todd Vierling <t...@duh.org> <t...@pobox.com>
Yes, I know. I like the idea, but I wonder if it will cause more
strangeness than it solves. Proof of concept would probably be the
only way to find out.
> It wouldn't need to emit the Foo$AC if it's a purely virtual trait.
True, but nearly all traits in the standard library have at least one
implemented method. I would definitely prefer overloading Foo$class
instead, though, since it will already be in use (for static methods)
and would mean one less class file added to the mix.
I agree, it was this aspect which was so immediately appealing. All
those forwarders add a ton of weight. Also, my (unsubstantiated)
guess is hotspot will do much better at optimizing with an instance
method calling static in the same class vs. a forwarder passing 'this'
to a separate class. At least, I seriously doubt it'll do worse.
Yes, I know. I like the idea, but I wonder if it will cause more strangeness than it solves.
One issue with this is that it makes it unavailable to users unless
they're willing to extend a class with a '$' in the name, which should
definitely make them very nervous. So I think there should be a non-$
class which extends the $class class, the name of which could be
derived automatically or could be supplied by the annotation.
Oh, and that reminds me, it might give us a shot at a subset of these:
https://issues.scala-lang.org/browse/SI-2296
Basically, we get burned on accessing protected members in java
classes because the jvm requires you be in a subclass of the actual
class whereas in scala the code performing the access, if defined in a
trait, will not be in a subclass but called through a forwarder.
I believe the point would be for the compiler to detect the presence*
of the backing class and pull it in automatically by linearization
order: first inheritance entry. So inheritance would still be simply
"class MyClass extends MyTrait ...".
(* assuming that this pre-instantiation might not be universal to all
traits - see my and Martin's notes about possibly controlling this
code generation via annotation.)
On Thu, Dec 1, 2011 at 12:44 PM, Paul Phillips <pa...@improving.org> wrote:I believe the point would be for the compiler to detect the presence*
> One issue with this is that it makes it unavailable to users unless
> they're willing to extend a class with a '$' in the name, which should
> definitely make them very nervous. So I think there should be a non-$
> class which extends the $class class, the name of which could be
> derived automatically or could be supplied by the annotation.
of the backing class and pull it in automatically by linearization
order: first inheritance entry. So inheritance would still be simply
"class MyClass extends MyTrait ...".
I don't know about doing it everywhere. Many traits are meant to be mixed into other traits before they are instantiated. Examples are all ...Like traits in the collection classes. Generating an abstract class for each of them would be pure waste, since no class ever inherits from them in first position.
On the other hand, maybe an annotation could do the trick of giving more precise control to implementers without having to do a lot of typing. Something like
@classbacked trait Iterator { ... }
There's one tricky issue: A static method (like the ones in the implementation class) may not collide with a dynamic method (like the ones in the abstract class), where collide means "have the same name and type signature". Collisions are possible, for instance in the following case:
@classbacked trait T {
def foo(x: T) = ???
def foo() = ???
}
This will generate:
class T$class {
def foo(x: T) = T$class.foo(this, x)
def foo() = T$class.foo(this)
static def foo(_this: T, x: T) = ???
static def foo(_this: T) = ???
}
Note the collision between the first instance foo and the last static foo.
The compiler needs to check for collisions when generating T$class.
Fortunately, there's a remedy: In case of collision, simply do not generate the instance member.
Only in cases where it's the first trait. For multiple inheritance
cases, the additional traits will always have additional forwarders
(there's no way around this).
> I am not a big fan of generating useless code but generate-by-default
> approach have one appealing property, I think:
> No matter how many times you use (extend) a trait you'll have only one copy
> of forwarders.Only in cases where it's the first trait. For multiple inheritance
cases, the additional traits will always have additional forwarders
(there's no way around this).
Look at any collection class's inheritance hierarchy. :)
That doesn't make the idea bad; it just clarifies that the generated
forwarders will still happen in some cases.
> Interesting, how many there are such cases comparing to single inheritance
> of traits?Look at any collection class's inheritance hierarchy. :)
That doesn't make the idea bad; it just clarifies that the generated
forwarders will still happen in some cases.
Out of general curiosity, two questions:
Is it jar-size or permgen-size that is the issue?
If the former, why not generate the classes at runtime?
BR,
John
Is it jar-size or permgen-size that is the issue?
If the former, why not generate the classes at runtime?
Bytecode generation should be avoided if possible, as it prevents:
- targeting alternative VMs (Android Dalvik)
- ahead-of-time compilation (gcj)
- linkage needed to run obfuscation/optimization tools (RetroGuard)
It can also make debugging significantly more difficult, and may bloat
JIT memory footprint.
Ok, fair enough. I guess most concerns could be addressed by different compilation backends or an optional postprocessing step, but I can see how its enough of an issue to not warrant the added complexity.
Thanks for the info!
John
I don't know about doing it everywhere. Many traits are meant to be mixed into other traits before they are instantiated. Examples are all ...Like traits in the collection classes. Generating an abstract class for each of them would be pure waste, since no class ever inherits from them in first position.
There is this feature with default interface implementations [1] that
AFAIK requires modified JVM, but... should drastically reduce the size
of scala library.
Maybe it's better to wait a little for java 8 and have it solved
ultimatelly? Or then at least have separate java8 version or standard
scala library?
Roman
http://www.wiki.jvmlangsummit.com/images/a/a1/2011_Goetz_Extension_Slides.pdf
Scala may benefit from this feature but it will be some years until
the Java 8 JVM would feasibly be a target for Scala.
--
Johannes
-----------------------------------------------
Johannes Rudolph
http://virtual-void.net
--
paul.butcher->msgCount++
Snetterton, Castle Combe, Cadwell Park...
Who says I have a one track mind?
http://www.paulbutcher.com/
LinkedIn: http://www.linkedin.com/in/paulbutcher
MSN: pa...@paulbutcher.com
AIM: paulrabutcher
Skype: paulrabutcher
How would the compiler know the difference?
In my (maybe temporary?) changes to the collection classes, there is a
hierarchy that looks like this:
abstract class AbstractTraversable[+A] extends Traversable[A]
abstract class AbstractIterable[+A] extends AbstractTraversable[A]
with Iterable[A]
abstract class AbstractSeq[+A] extends AbstractIterable[A] with Seq[A]
This does involve an inheritance hierarchy because the number of new
forwarders at each level is non-trivial. However, I didn't create an
AbstractTraversableOnce because its methods are almost completely
overridden at both the Iterator and Traversable subtrait levels. I
also didn't create classes for Gen* because those are also mostly
overridden, and not directly inherited.
If this were done only as the following, the bytecode (and number of
redundant forwarders) would be notably larger:
abstract class AbstractTraversable[+A] extends Traversable[A]
abstract class AbstractIterable[+A] extends Iterable[A]
abstract class AbstractSeq[+A] extends Seq[A]
This is the I prefer the idea of a compiler annotation. It allows for
finer-grain control over the process, so that there is not a
proliferation of either extra superclasses or redundant forwarders.
On Thu, Dec 1, 2011 at 9:43 PM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
> I've just noticed that I haven't stated this clearly enough:
> There's no need to generate forwarders for traits used as first supertrait
> in another traits, as well as in classes.How would the compiler know the difference?
That's the issue here: There's a much more complicated hierarchy than
just one or two traits. It's pretty extensive inside scala.collection,
and can result in some pretty deep inheritance trees for the final
concrete classes.
So I'm not so sure that it's a clear win. It it could even be
detrimental (to actual runtime overhead) to create abstract class
layers for everything, including traits like TraversableLike which are
never meant to be inherited directly.
3) bigger class metadata, bigger number of VMTs/IMTs.
...
Third issue however can be a problem for such VMs as Dalvik. Here we factor out many identical forwarder methods and pay for this by increased size of metadata and increaed number of VMTs.
Note however, that overall count of loaded classes will not increase - we already load T.class & T$class.class for each inherited interface, and initialize every T$class on object creation (as we call each T$class.$init$). The overall number of declared methods in all those classes will also not be increased substantially. Thus more VMTs (more precisely, same number of bigger VMTs) seems to be major potential problem for me.
...
Its contents at bytecode level can be approximated here by contents of the "abstract class GenTraversableOnce$AC[+T] extends AnyRef with GenTraversableOnce[T]" plus current contents of the GenTraversableOnce$class.
...
If we declare its forwarders class as "abstract class Traversable$AC[+T] extends AnyRef with Traversable[T]" ...
But if we replace "AnyRef" here with first Traversable supertrait's forwarder class (TraversableLike$AC), ...
...
Loosing weight for Traversable requires creation of class TraversableLike$AC, which I just defined as "abstract class TraversableLike$AC[+T, +R] extends AnyRef with TraversableLike[T, R]".
1. A class is generated for every trait.
2. If something "extends" a trait, then the corresponding class is
used instead of forwarders.
Now, assume this _also_ gets implemented:
3. Add constructor parameters to traits.
In such a case, would there be any point in keeping "class" as a
language construct instead of an implementation detail?
On Thu, Dec 1, 2011 at 15:38, martin odersky <martin....@epfl.ch> wrote:
>
>
> On Thu, Dec 1, 2011 at 11:48 AM, Pavel Pavlov <pavel.e...@gmail.com>
> wrote:
>>
>> What if the compiler will create 'abstract class Foo$AC extends Foo' for
>> every trait Foo?
>> Then it will be able to rewrite every definition of the form 'class C
>> extends Foo ...' to 'class C extends Foo$AC ...', i.e. replace only the
>> first supertrait by generated superclass.
>> This will catch all the uses like 'new Iterator {...}' or 'C extends A =>
>> B' in both library and user's code.
>> Comparing to the now used scheme, extra code will be generated only for
>> those traits which are never used as first superclass.
>>
>> Moreover, it makes sense to combine Foo$class and Foo$AC into one class.
>> This will help to futher reduce bytecode size, as most of constant pool
>> entries will be unified in these classes.
>> So, Foo$class will contain two versions of every non-abstract Foo's
>> method: static method with original method's body (as now) and instance
>> method with stub that invokes that static.
>>
>> What do you think of this?
>>
>
> I don't know about doing it everywhere. Many traits are meant to be mixed
> into other traits before they are instantiated. Examples are all ...Like
> traits in the collection classes. Generating an abstract class for each of
> them would be pure waste, since no class ever inherits from them in first
> position.
>
> On the other hand, maybe an annotation could do the trick of giving more
> precise control to implementers without having to do a lot of typing.
> Something like
>
> @classbacked trait Iterator { ... }
>
> And in that case, I agree we should try to combine the abstract class with
> the implementation class. So both abstract class and implementation class
> could be called Iterator$class, which is neat - no new classfiles!
>
> There's one tricky issue: A static method (like the ones in the
> implementation class) may not collide with a dynamic method (like the ones
> in the abstract class), where collide means "have the same name and type
> signature". Collisions are possible, for instance in the following case:
>
> @classbacked trait T {
> def foo(x: T) = ???
> def foo() = ???
> }
>
> This will generate:
>
> class T$class {
> def foo(x: T) = T$class.foo(this, x)
> def foo() = T$class.foo(this)
>
> static def foo(_this: T, x: T) = ???
> static def foo(_this: T) = ???
> }
>
> Note the collision between the first instance foo and the last static foo.
>
> The compiler needs to check for collisions when generating T$class.
> Fortunately, there's a remedy: In case of collision, simply do not generate
> the instance member.
>
> Cheers
>
> -- Martin
>
>
>
>
>
> --
> Martin Odersky
> Prof., EPFL and Chairman, Typesafe
> PSED, 1015 Lausanne, Switzerland
> Tel. EPFL: +41 21 693 6863
> Tel. Typesafe: +41 21 691 4967
>
--
Daniel C. Sobral
I travel to the future all the time.
A question on Stack Overflow led me to a new thought on this matter.
Assume the following got implemented:1. A class is generated for every trait.
2. If something "extends" a trait, then the corresponding class is
used instead of forwarders.Now, assume this _also_ gets implemented:
3. Add constructor parameters to traits.
In such a case, would there be any point in keeping "class" as a
language construct instead of an implementation detail?
--
Now, assume this _also_ gets implemented:
3. Add constructor parameters to traits.
The same way it is already solved:
scala> trait A[X, Y]
defined trait A
scala> trait B[T] extends A[T, Int]
defined trait B
scala> trait C[T] extends A[T, Double]
defined trait C
scala> trait D[T1, T2] extends B[T1] with C[T2]
<console>:10: error: illegal inheritance;
trait D inherits different type instances of trait A:
A[T2,Double] and A[T1,Int]
trait D[T1, T2] extends B[T1] with C[T2]
^
By making it illegal.
Note that this doesn't impose any additional restriction over what
classes already offer. Classes *already* make this kind of
double-inheritance illegal by making it impossible.
However, say you have
trait A[T](foo: T)
trait B[T] extends A[T]
trait C[T] extends B[T] with A[T]
class D[T] extends C[T] with B[T]
Where's the correct place to initialize A.foo?
This is the reason C++ has virtual base classes, something that I feel
would really complicate Scala if added.
Where's the correct place to initialize A.foo?
It sounds very intriguing. Somehow I never liked the "early initialization"
syntax. The possibility of providing trait constructor parameters and
secondary constructors in the way you described is a lot more pleasant to my
eyes.
If it does not interact badly with other features this would be a very nice
addition. And as it mimics super class constructor calls it's in itself not a
new syntactic feature.
Nice idea!
Greetings
Bernd
Well, since no one is passing a parameter to A, this is plainly incorrect.
If someone is passing parameters to A, then apply the same rules as
Scala applies for *type* parameters (with the appropriate changes) as
shown above.
You don't get my point: There's two places where A[T] is inherited.
Which one should have the initialization parameter?
If you don't fully grok the problem introduced by having
initialization parameters for traits, read up on C++ virtual base
classes.
I think the only possible alternative is to make value parameters to
traits optional and to check that only one instance of a trait is
parameterized. For instance, this would be OK:
trait A(a: T)
trait B(b: T) extends A // no parameters here
trait C(c: T) extends A(c) with B(c)
trait D extends B(d1) with C(d2)
And so would this:
trait A(a: T)
trait B(b: T) extends A(b)
trait C(c: T) extends A with B(c)
trait D extends B(d1) with C(d2)
But this would be illegal:
trait A(a: T)
trait B(b: T) extends A(b)
trait C(c: T) extends A(c) with B(c)
trait D extends B(d1) with C(d2)
Another possibility (which we explored earlier) would be to
translatetrait parameters to abstract vals and let linearization
decide which definition is current. That would have accepted the last
code with `a` in A initialized to d2. So the alternative is more
general, but it's also considerably harder to implement and I am not
sure it's what one wants in the end.
Cheers
-- Martin
What if we allow secondary (but not primary!) constructors in traits?
I grok, and I have answered, and you seem completely oblivious to my
point. So, I'll try a different tack.
A type parameter is a parameter.
It's a special kind of parameter, sure, but it still is a parameter.
So, replace "type parameter" by "parameter", and thing of "[T]" as
"(x)".
So the example I gave, and showed how Scala handles by running it on REPL, was:
trait A[X, Y]
trait B[T] extends A[T, Int]
trait C[T] extends A[T, Double]
trait D[T1, T2] extends B[T1] with C[T2]
Translated it would become:
trait A(x, y)
trait B(b) extends A(b, 1) // replacing Int with 1
trait C(c) extends A(c, 2) // replacing Double with 2
trait D(b, c) extends B(b) with C(c)
And the example I based it on, which was the "diamond problem example", was:
trait A(val x: Int, val y: Int)
trait B(b: Int) extends A(b, 1)
trait C(c: Int) extends A(c, 2)
trait D(b: Int, c: Int) extends B(b) with C(c)
See? Same thing. And it would be treated exactly how Scala already
treats the first example, aside from obvious differences between types
and values.
Not necessarily. Let's take your example:
trait A(val x: Int, val y: Int)
trait B(b: Int) extends A(b, 1) // A.x = B.b, A.y = 1
trait C(c: Int) extends A(c, 2) // A.x = C.c, A.y = 2
trait D(b: Int, c: Int, f: Int => Int, g: Int => Int) extends B(f(b))
with C(g(c))
B.b = f(b)
C.c = g(c)
A.x = B.b = f(D.b)
A.x = C.c = g(D.c) // not direct parameter passing, so invalid
A.y = 1
A.y = 2 // different, so invalid, but, given your correction for it to
mean 1, this would be allowed
Note that even if C.c were equal to f(D.c) or f(D.b), it would _still_
be disallowed. However, this would be ok:
trait A(val x: Int, val y: Int)
trait B(b: Int) extends A(b, 1)
trait C(c: Int) extends A(c, 1) // A.x = C.c, A.y = 2
trait D(b: Int, c: Int, f: Int => Int, g: Int => Int) extends B(b) with C(b)
A.x = B.b = D.b
A.x = C.c = D.b
A.y = 1
A.y = 1
There's no need to evaluate anything here. For any trait, all
constructor parameters for parent traits will either be a literal,
equal to one of the trait's own parameters, or something else.
Literals can be compared, a trait's own parameters can be traced, and
everything else should simply be disallowed in case of conflict.
Note that this can deal with everything that is possible in Scala
right now, since Scala simply does not allow a class constructor
parameter to be initialized by more than one value.
> 2) How do you deal with side effects?
Side effects fall into the "something else" case. If there's any
conflict, where conflict is taken to mean more than one initialization
path, then it is disallowed.
>
> Consider an example:
>
>
> trait A(val x: Int, val y: Int)
> trait B(b: Int) extends A(b, 1)
> trait C(c: Int) extends A(c, 2)
> trait D(b: Int, c: Int, f: Int => Int, g: Int => Int) extends B(f(b)) with
> C(g(c))
This is declared invalid here, at declaration site.
>
> def foo = 123
> def bar(x: Int) = x
> def baz(x: Int) = { launchMissiles(); x }
>
> val y = new D(123, foo, bar, bar) // is this correct or not?
> val x = new D(123, 123, baz, baz) // and this?
It doesn't matter if it _could_ be correct at usage site or not. It is
allowed or disallowed at declaration.
I'll grant that the rules are more complex than the simple rules that
class offers. However, I think eliminating "class" altogether would
result in an overall _simplification_ of the language. You add the
initialization rules for values, and get rid of everything about
classes.
trait A(a: T)
trait B(b: T) extends A(b)
trait C(c: T) extends A(c) with B(c)
trait D extends B(d1) with C(d2)
throwing some sort of exception if d1 != d2. I think accords well with
the general intuition that there's only one A in a D, so that that A
can be initialized just one way. But it would be a pain to implement,
and you're still left with the question of whether it's really what
you want.
A