I am currently looking at adding Scala support to the SLF4J library.
Previous work done done by Brian Clapper [1, 2] and Heiko Seeberger [3]
make heavy use of "by-name" parameters [4].
Looking more closely, I noticed that for each code location where a
method taking by-name parameters, an inner class is generated.
For precisely, for the object A defined below, scalac will generate a
class for the evaluation of the expresison {"i = " + i} and another
class for the evaluation of {"Done"}.
package a;
object A {
def foo(s: => String) { println(s) }
def main(in: Array[String]) {
for(i <- 0 to 1000) {
foo("i = " + i)
}
foo("Done")
}
}
For a 100'000 line project, assuming the method A.foo is used
frequently, say in 4 out of 100 lines of code, then there will be
4000 extra classes in the project. If the 4 percent rule holds, then
in a 1'000'000 line project there will be 40'000 additional classes
shipping with the project. Given that each extra class seems to be 1K
large, 40'000 classes will take 40MB of extra disk space.
Can the extra classes generated in support of by-name parameters
significantly degrade performance? I am thinking of longer class loading
times and /or longer application start up times.
Is my concern valid?
--
Ceki
[1] http://www.mail-archive.com/slf4j...@qos.ch/msg00048.html
[2] http://bmc.github.com/grizzled-slf4j/
[3] https://github.com/weiglewilczek/slf4s
[4] http://www.artima.com/pins1ed/control-abstraction.html#9.5
It might be useful for the compiler to optimize away the generation of
a class if the by-name parameter is a already evaluated, for example
the literal "Done", or a reference to a non-lazy val.
Question for the JVM watchers on the list: will any of the planned JVM
improvements help to reduce the code-size overhead of closures?
-jason
Your concern is valid, but your numbers are kind of made up. I don't
know if "the 4 percent rule" is the name of a real rule or just you
referring to that supposition, but in real life there are no methods
which are called 4 times out of every 100 lines of code in a 100K line
project. Maybe if it is unimaginably bad code, but boy, these people
are missing out on some abstraction and a few thousand extra objects is
not their biggest problem.
Try it with real numbers and it won't look as bad. Is it still a
problem? Maybe. I did an analysis of the by-name usage in trunk at some
point. As I remember it, by far the most frequently used were:
Predef.assert
Option#getOrElse
Everything else distantly distant. Not long ago I rewrote the scala
parser to be much, much more beautiful codewise. Unfortunately I also
counted the by-name objects and found I'd taken it from 100ish to
200ish. I ended up shelving it, while also wondering if I couldn't
eliminate all those objects with inlining. They were all private
by-name methods driving the object count bump, and I don't (yet) know
why they couldn't be eliminated.
private def bippy[T](body: => T) = {
val saved = something
try body
finally something = saved
}
Lots of methods like that. But all that method is trying to do is get
its sticky fingers into the caller:
// I write this
def foo() = bippy({ parseSomething })
// But I mean this:
def foo() = {
val saved = something
try parseSomething
finally something = saved
}
But being a sensible person I don't want to duplicate those three lines
all over the place. Still, the object buys me nothing, so I really
think it ought to be eliminable.
Oh. You got me there. I forget about logging calls because I'm against
logging. We have lost too many old-growth forests already.
That reminds me, I've already faced this situation, and by-name won:
http://www.scala-lang.org/node/825
TL/DR: making second argument of assert by-name added 3% to the weight
of trunk (215 classfiles) and made it 2% faster. Grossly
oversimplifying of course.
original code:
logIt(whateever()) // not by name, log message is always evaluates
now add magic implicits:
desugared code:
logIt(new RichLoggingMessage(whatever()) // <- log message generating code called directly if logging is on
logIt(RichLoggingMessage.None) // implicitly replaces by RichLoggingMessage.None if logging is off
turning logging on and off would require a recompilation though ;/
-------- Original-Nachricht --------
> Datum: Thu, 5 May 2011 09:48:49 +1000
> Von: Ishaaq Chandy <ish...@gmail.com>
> An: Paul Phillips <pa...@improving.org>
> CC: Ceki Gulcu <ce...@qos.ch>, scala...@googlegroups.com
> Betreff: Re: [scala-user] Massive use of by-name parameters dangerous?
Using a dummy implicit conversion (and an allocation) to avoid a logging
call when you're recompiling the entire thing and can do whatever you
want is like smuggling mcdonald's ketchup packets into a fine restaurant
and feeling like you got away with something.
import annotation.elidable
object Test {
@elidable(500) def log(msg: => Any) = {
println("Logging: " + msg)
}
def main(args: Array[String]): Unit = {
log("Called main")
println("I'm main.")
log("Main has been called.")
}
}
% rcscala -nc -Xelide-below 499 ./0505.scala
Logging: Called main
I'm main.
Logging: Main has been called.
% rcscala -nc -Xelide-below 501 ./0505.scala
I'm main.
public void main(java.lang.String[]);
Code:
Stack=3, Locals=2, Args_size=2
0: aload_0
1: new #53; //class Test$$anonfun$main$1
4: dup
5: invokespecial #54; //Method Test$$anonfun$main$1."<init>":()V
8: invokevirtual #56; //Method log:(Lscala/Function0;)V
11: getstatic #19; //Field scala/Predef$.MODULE$:Lscala/Predef$;
14: ldc #58; //String I'm main.
16: invokevirtual #43; //Method
scala/Predef$.println:(Ljava/lang/Object;)V
19: aload_0
20: new #60; //class Test$$anonfun$main$2
23: dup
24: invokespecial #61; //Method Test$$anonfun$main$2."<init>":()V
27: invokevirtual #56; //Method log:(Lscala/Function0;)V
30: return
I'd say yes: MethodHandles should be able to replace most of all
FunctionX classes and instances. You would lift closures to simple
methods and use MethodHandles to reference them and close over values.
I don't know too much about how and in which phases the Scala compiler
is doing the lifting to anonymous classes right now but I guess that
changing the compiler to use MethodHandles will require some major
effort.
--
Johannes
-----------------------------------------------
Johannes Rudolph
http://virtual-void.net
If constant expressions could be optimized out and non-constant
expressions output as MethodHandles by scalac, then by-name parameters
could work nicely in the case of logging API like SLF4J. However, these
scalac optimizations are not currently available. If SLF4J's scala
offering is widely adopted by the scala community, I fear that in the
absence of said compiler optimizations, by-name parameters will come
back to haunt us in the form of excessively bloated jar files.
I will enter a bug report requesting said scalac optimization. BTW,
thank you all for informative your comments.
--
Ceki
Even if method handles comes out in JDK 1.7 or 1.8, the Scala compiler
won't be able to take advantage of them for a good long time, at least
not without a compiler flag, because otherwise using Scala would
require JDK 1.7 or 1.8. Many users of Java don't upgrade for a very
long time, so it is important Scala only requires JDK 1.5 for a good
long time. Therefore for a good long time unnecessary use of by-name
parameters will result in the generation of unnecessary class files.
I'm curious how much this matters in practice, because this is a big
area of difference between ScalaTest and specs. One place it shows up
is that ScalaTest's "should" and "must" methods do not take a by name
parameter, where specs does. This allows specs to have a slightly more
concise syntax in a few places, but at the cost of generating loads
more class files. For example in specs matchers you can say:
s.charAt(-1) must throwA [IndexOutOfBoundsException]
This works because must takes a by-name to its left side (via the
implicit conversion that adds must). In ScalaTest it is slightly more
verbose because must does not take a by-name. You have to say:
evaluating { s.charAt(-1) } must produce [IndexOutOfBoundsException]
The reason I went this slightly more verbose route was to avoid
generating class files except when they are actually needed. The
trouble with the specs approach is that because must takes a by-name,
you also get a class file when you write:
"Hello world" must have size (11)
But in ScalaTest you don't. So for example, if you compile the first
example in Eric's quick start guide for specs2:
import org.specs2.mutable._
class HelloWorldSpec extends Specification {
"The 'Hello world' string" should {
"contain 11 characters" in {
"Hello world" must have size(11)
}
"start with 'Hello'" in {
"Hello world" must startWith("Hello")
}
"end with 'world'" in {
"Hello world" must endWith("world")
}
}
}
You get 16 class files:
HelloWorldSpec$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$1.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10$$anonfun$apply$11.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10$$anonfun$apply$12$$anonfun$apply$13.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10$$anonfun$apply$12.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$14.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$4.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5$$anonfun$apply$6.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5$$anonfun$apply$7$$anonfun$apply$8.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5$$anonfun$apply$7.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$9.class
HelloWorldSpec$$anonfun$1.class
HelloWorldSpec.class
But if you compile a similar ScalaTest class:
import org.scalatest.WordSpec
import org.scalatest.matchers.MustMatchers
class HelloWorldSpec extends WordSpec with MustMatchers {
"The 'Hello world' string" should {
"contain 11 characters" in {
"Hello world" must have length (11)
}
"start with 'Hello'" in {
"Hello world" must startWith ("Hello")
}
"end with 'world'" in {
"Hello world" must endWith ("world")
}
}
}
You get 5 class files:
HelloWorldSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$mcV$sp$2.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$mcV$sp$3.class
HelloWorldSpec$$anonfun$1.class
HelloWorldSpec.class
So what I'm wondering is how much this actually matters in practice. I
have always considered that a design flaw in specs matchers, but I
have never heard a single specs user complain about it. It is possible
that specs users didn't realize it was happening because they only got
a few extra class files per compile, like boiled frogs, or it might
simply not matter in practice. I'm curious what people think.
What I know is that the tests for ScalaTest itself already generate a
very large number of class files, because there's a massive number of
tests in there. As of this morning I count 28,126 class files in
ScalaTest's tests of itself. It takes 25 seconds to put these in a jar
file on my laptop, and 1 minute 52 seconds to delete everything in an
"ant clean". If the number of class files were tripled (i.e., if I
used specs to test ScalaTest) I'm not sure how much longer these build
tasks would take, but I imagine they'd be painfully longer. I'm also
curious how much time the Scala compiler takes to generate these
files, because it takes about 20 minutes to compile the tests. Does
anyone know if any significant time would be spent writing class files
during a compile that generates 28,000 of them?
Bill
----
Bill Venners
Artima, Inc.
http://www.artima.com
The Scala compiler is full of those as well.
Sometimes you can get away with format strings, if the expensive
string computation is in fact '.toString' on some object.
def log(level: Int, fmt: String, args: Any*) {
if (level > whatever)
println(fmt.format(args:_*))
}
this would call .toString only when logging is enabled.
iulian
--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais