Massive use of by-name parameters dangerous?

367 views
Skip to first unread message

Ceki Gulcu

unread,
May 4, 2011, 9:23:55 AM5/4/11
to scala...@googlegroups.com

Hello,

I am currently looking at adding Scala support to the SLF4J library.
Previous work done done by Brian Clapper [1, 2] and Heiko Seeberger [3]
make heavy use of "by-name" parameters [4].

Looking more closely, I noticed that for each code location where a
method taking by-name parameters, an inner class is generated.

For precisely, for the object A defined below, scalac will generate a
class for the evaluation of the expresison {"i = " + i} and another
class for the evaluation of {"Done"}.

package a;
object A {
def foo(s: => String) { println(s) }
def main(in: Array[String]) {
for(i <- 0 to 1000) {
foo("i = " + i)
}
foo("Done")
}
}


For a 100'000 line project, assuming the method A.foo is used
frequently, say in 4 out of 100 lines of code, then there will be
4000 extra classes in the project. If the 4 percent rule holds, then
in a 1'000'000 line project there will be 40'000 additional classes
shipping with the project. Given that each extra class seems to be 1K
large, 40'000 classes will take 40MB of extra disk space.

Can the extra classes generated in support of by-name parameters
significantly degrade performance? I am thinking of longer class loading
times and /or longer application start up times.

Is my concern valid?

--
Ceki

[1] http://www.mail-archive.com/slf4j...@qos.ch/msg00048.html
[2] http://bmc.github.com/grizzled-slf4j/
[3] https://github.com/weiglewilczek/slf4s
[4] http://www.artima.com/pins1ed/control-abstraction.html#9.5

Jason Zaugg

unread,
May 4, 2011, 10:41:33 AM5/4/11
to Ceki Gulcu, scala...@googlegroups.com
On Wed, May 4, 2011 at 3:23 PM, Ceki Gulcu <ce...@qos.ch> wrote:
> For precisely, for the object A defined below, scalac will generate a
> class for the evaluation of the expresison {"i = " + i} and another class
> for the evaluation of {"Done"}.
>
> package a;
> object A {
>  def foo(s: => String) { println(s) }
>  def main(in: Array[String]) {
>    for(i <- 0 to 1000) {
>      foo("i = " + i)
>    }
>    foo("Done")
>  }
> }

It might be useful for the compiler to optimize away the generation of
a class if the by-name parameter is a already evaluated, for example
the literal "Done", or a reference to a non-lazy val.

Question for the JVM watchers on the list: will any of the planned JVM
improvements help to reduce the code-size overhead of closures?

-jason

Trond Olsen

unread,
May 4, 2011, 5:47:45 PM5/4/11
to Ceki Gulcu, scala...@googlegroups.com
I think its partly because not everyone is aware of what by-name parameters generate. Doesn't make sense in a logging framework as I discovered myself.

On Wed, May 4, 2011 at 3:23 PM, Ceki Gulcu <ce...@qos.ch> wrote:

Paul Phillips

unread,
May 4, 2011, 6:03:59 PM5/4/11
to Ceki Gulcu, scala...@googlegroups.com
On 5/4/11 6:23 AM, Ceki Gulcu wrote:
> Looking more closely, I noticed that for each code location where a
> method taking by-name parameters, an inner class is generated.

Your concern is valid, but your numbers are kind of made up. I don't
know if "the 4 percent rule" is the name of a real rule or just you
referring to that supposition, but in real life there are no methods
which are called 4 times out of every 100 lines of code in a 100K line
project. Maybe if it is unimaginably bad code, but boy, these people
are missing out on some abstraction and a few thousand extra objects is
not their biggest problem.

Try it with real numbers and it won't look as bad. Is it still a
problem? Maybe. I did an analysis of the by-name usage in trunk at some
point. As I remember it, by far the most frequently used were:

Predef.assert
Option#getOrElse

Everything else distantly distant. Not long ago I rewrote the scala
parser to be much, much more beautiful codewise. Unfortunately I also
counted the by-name objects and found I'd taken it from 100ish to
200ish. I ended up shelving it, while also wondering if I couldn't
eliminate all those objects with inlining. They were all private
by-name methods driving the object count bump, and I don't (yet) know
why they couldn't be eliminated.

private def bippy[T](body: => T) = {
val saved = something
try body
finally something = saved
}

Lots of methods like that. But all that method is trying to do is get
its sticky fingers into the caller:

// I write this
def foo() = bippy({ parseSomething })

// But I mean this:
def foo() = {
val saved = something
try parseSomething
finally something = saved
}

But being a sensible person I don't want to duplicate those three lines
all over the place. Still, the object buys me nothing, so I really
think it ought to be eliminable.

Ishaaq Chandy

unread,
May 4, 2011, 7:48:49 PM5/4/11
to Paul Phillips, Ceki Gulcu, scala...@googlegroups.com
I believe Ceki is specifically referring to logging calls - where his 4% rule is definitely not unimaginable. Looking at my own code, logging calls are more prevalent than even Predef.assert calls

Ishaaq

Paul Phillips

unread,
May 4, 2011, 7:51:35 PM5/4/11
to Ishaaq Chandy, Ceki Gulcu, scala...@googlegroups.com
On 5/4/11 4:48 PM, Ishaaq Chandy wrote:
> I believe Ceki is specifically referring to logging calls - where his 4%
> rule is definitely not unimaginable. Looking at my own code, logging
> calls are more prevalent than even Predef.assert calls

Oh. You got me there. I forget about logging calls because I'm against
logging. We have lost too many old-growth forests already.

Paul Phillips

unread,
May 4, 2011, 8:01:22 PM5/4/11
to Ishaaq Chandy, Ceki Gulcu, scala...@googlegroups.com
On 5/4/11 4:48 PM, Ishaaq Chandy wrote:
> I believe Ceki is specifically referring to logging calls - where his 4%
> rule is definitely not unimaginable. Looking at my own code, logging
> calls are more prevalent than even Predef.assert calls

That reminds me, I've already faced this situation, and by-name won:

http://www.scala-lang.org/node/825

TL/DR: making second argument of assert by-name added 3% to the weight
of trunk (215 classfiles) and made it 2% faster. Grossly
oversimplifying of course.

Dennis Haupt

unread,
May 5, 2011, 3:29:29 AM5/5/11
to Ishaaq Chandy, pa...@improving.org, scala...@googlegroups.com, ce...@qos.ch
isn't it possible to cheat around the by-name parameters by doing an implicit conversion?

original code:
logIt(whateever()) // not by name, log message is always evaluates

now add magic implicits:

desugared code:
logIt(new RichLoggingMessage(whatever()) // <- log message generating code called directly if logging is on

logIt(RichLoggingMessage.None) // implicitly replaces by RichLoggingMessage.None if logging is off

turning logging on and off would require a recompilation though ;/

-------- Original-Nachricht --------
> Datum: Thu, 5 May 2011 09:48:49 +1000
> Von: Ishaaq Chandy <ish...@gmail.com>
> An: Paul Phillips <pa...@improving.org>
> CC: Ceki Gulcu <ce...@qos.ch>, scala...@googlegroups.com
> Betreff: Re: [scala-user] Massive use of by-name parameters dangerous?

Paul Phillips

unread,
May 5, 2011, 3:50:02 AM5/5/11
to Dennis Haupt, Ishaaq Chandy, scala...@googlegroups.com, ce...@qos.ch
On 5/5/11 12:29 AM, Dennis Haupt wrote:
> isn't it possible to cheat around the by-name parameters by doing an implicit conversion?

Using a dummy implicit conversion (and an allocation) to avoid a logging
call when you're recompiling the entire thing and can do whatever you
want is like smuggling mcdonald's ketchup packets into a fine restaurant
and feeling like you got away with something.

import annotation.elidable

object Test {
@elidable(500) def log(msg: => Any) = {
println("Logging: " + msg)
}

def main(args: Array[String]): Unit = {
log("Called main")
println("I'm main.")
log("Main has been called.")
}
}

% rcscala -nc -Xelide-below 499 ./0505.scala
Logging: Called main
I'm main.
Logging: Main has been called.

% rcscala -nc -Xelide-below 501 ./0505.scala
I'm main.


public void main(java.lang.String[]);
Code:
Stack=3, Locals=2, Args_size=2
0: aload_0
1: new #53; //class Test$$anonfun$main$1
4: dup
5: invokespecial #54; //Method Test$$anonfun$main$1."<init>":()V
8: invokevirtual #56; //Method log:(Lscala/Function0;)V
11: getstatic #19; //Field scala/Predef$.MODULE$:Lscala/Predef$;
14: ldc #58; //String I'm main.
16: invokevirtual #43; //Method
scala/Predef$.println:(Ljava/lang/Object;)V
19: aload_0
20: new #60; //class Test$$anonfun$main$2
23: dup
24: invokespecial #61; //Method Test$$anonfun$main$2."<init>":()V
27: invokevirtual #56; //Method log:(Lscala/Function0;)V
30: return

Johannes Rudolph

unread,
May 5, 2011, 4:05:32 AM5/5/11
to Jason Zaugg, Ceki Gulcu, scala...@googlegroups.com
On Wed, May 4, 2011 at 4:41 PM, Jason Zaugg <jza...@gmail.com> wrote:
> Question for the JVM watchers on the list: will any of the planned JVM
> improvements help to reduce the code-size overhead of closures?

I'd say yes: MethodHandles should be able to replace most of all
FunctionX classes and instances. You would lift closures to simple
methods and use MethodHandles to reference them and close over values.
I don't know too much about how and in which phases the Scala compiler
is doing the lifting to anonymous classes right now but I guess that
changing the compiler to use MethodHandles will require some major
effort.

--
Johannes

-----------------------------------------------
Johannes Rudolph
http://virtual-void.net

Kevin Wright

unread,
May 5, 2011, 4:16:56 AM5/5/11
to Johannes Rudolph, Jason Zaugg, Ceki Gulcu, scala...@googlegroups.com
Here you go, I highlighted anything that might be relevant...


scalac -Xshow-phases
    phase name  id  description
    ----------  --  -----------
        parser   1  parse source into ASTs, perform simple desugaring
         namer   2  resolve names, attach symbols to named trees
packageobjects   3  load package objects
         typer   4  the meat and potatoes: type the trees
superaccessors   5  add super accessors in traits and nested classes
       pickler   6  serialize symbol tables
     refchecks   7  reference/override checking, translate nested objects
  selectiveanf   8  
      liftcode   9  reify trees
  selectivecps  10  
       uncurry  11  uncurry, translate function values to anonymous classes
     tailcalls  12  replace tail calls by jumps
    specialize  13  @specialized-driven class and method specialization
 explicitouter  14  this refs to outer pointers, translate patterns
       erasure  15  erase types, add interfaces for traits
      lazyvals  16  allocate bitmaps, translate lazy vals into lazified defs
    lambdalift  17  move nested functions to top level
  constructors  18  move field definitions into constructors
       flatten  19  eliminate inner classes
         mixin  20  mixin composition
       cleanup  21  platform-specific cleanups, generate reflective calls
         icode  22  generate portable intermediate code
       inliner  23  optimization: do inlining
      closelim  24  optimization: eliminate uncalled closures
           dce  25  optimization: eliminate dead code
           jvm  26  generate JVM bytecode
      terminal  27  The last phase in the compiler chain


Uncurry looks to be your candidate here...
--
Kevin Wright

gtalk / msn : kev.lee...@gmail.com
mail: kevin....@scalatechnology.com
vibe / skype: kev.lee.wright
quora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Nils Kilden-Pedersen

unread,
May 5, 2011, 8:42:58 AM5/5/11
to Paul Phillips, Ishaaq Chandy, Ceki Gulcu, scala...@googlegroups.com
I was surprised about the argument for making the assert condition itself by-name. Probably because I was under the assumption that Scala's assert would be costless at runtime (on a JIT VM) if assertion is disabled, like Java.

Does Scala's assert carry a cost then, even if assertion is disabled?

Ceki Gulcu

unread,
May 6, 2011, 10:12:19 AM5/6/11
to scala...@googlegroups.com
On 05.05.2011 10:05, Johannes Rudolph wrote:
> On Wed, May 4, 2011 at 4:41 PM, Jason Zaugg<jza...@gmail.com> wrote:
>> Question for the JVM watchers on the list: will any of the planned JVM
>> improvements help to reduce the code-size overhead of closures?
>
> I'd say yes: MethodHandles should be able to replace most of all
> FunctionX classes and instances. You would lift closures to simple
> methods and use MethodHandles to reference them and close over values.
> I don't know too much about how and in which phases the Scala compiler
> is doing the lifting to anonymous classes right now but I guess that
> changing the compiler to use MethodHandles will require some major
> effort.

If constant expressions could be optimized out and non-constant
expressions output as MethodHandles by scalac, then by-name parameters
could work nicely in the case of logging API like SLF4J. However, these
scalac optimizations are not currently available. If SLF4J's scala
offering is widely adopted by the scala community, I fear that in the
absence of said compiler optimizations, by-name parameters will come
back to haunt us in the form of excessively bloated jar files.

I will enter a bug report requesting said scalac optimization. BTW,
thank you all for informative your comments.
--
Ceki

Bill Venners

unread,
May 6, 2011, 12:20:55 PM5/6/11
to Ceki Gulcu, scala...@googlegroups.com
Hi Ceki,

Even if method handles comes out in JDK 1.7 or 1.8, the Scala compiler
won't be able to take advantage of them for a good long time, at least
not without a compiler flag, because otherwise using Scala would
require JDK 1.7 or 1.8. Many users of Java don't upgrade for a very
long time, so it is important Scala only requires JDK 1.5 for a good
long time. Therefore for a good long time unnecessary use of by-name
parameters will result in the generation of unnecessary class files.

I'm curious how much this matters in practice, because this is a big
area of difference between ScalaTest and specs. One place it shows up
is that ScalaTest's "should" and "must" methods do not take a by name
parameter, where specs does. This allows specs to have a slightly more
concise syntax in a few places, but at the cost of generating loads
more class files. For example in specs matchers you can say:

s.charAt(-1) must throwA [IndexOutOfBoundsException]

This works because must takes a by-name to its left side (via the
implicit conversion that adds must). In ScalaTest it is slightly more
verbose because must does not take a by-name. You have to say:

evaluating { s.charAt(-1) } must produce [IndexOutOfBoundsException]

The reason I went this slightly more verbose route was to avoid
generating class files except when they are actually needed. The
trouble with the specs approach is that because must takes a by-name,
you also get a class file when you write:

"Hello world" must have size (11)

But in ScalaTest you don't. So for example, if you compile the first
example in Eric's quick start guide for specs2:

import org.specs2.mutable._

class HelloWorldSpec extends Specification {

"The 'Hello world' string" should {
"contain 11 characters" in {
"Hello world" must have size(11)
}
"start with 'Hello'" in {
"Hello world" must startWith("Hello")
}
"end with 'world'" in {
"Hello world" must endWith("world")
}
}
}

You get 16 class files:

HelloWorldSpec$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$1$$anonfun$apply$3.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$1.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10$$anonfun$apply$11.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10$$anonfun$apply$12$$anonfun$apply$13.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10$$anonfun$apply$12.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$10.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$14.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$4.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5$$anonfun$apply$6.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5$$anonfun$apply$7$$anonfun$apply$8.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5$$anonfun$apply$7.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$5.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$9.class
HelloWorldSpec$$anonfun$1.class
HelloWorldSpec.class

But if you compile a similar ScalaTest class:

import org.scalatest.WordSpec
import org.scalatest.matchers.MustMatchers

class HelloWorldSpec extends WordSpec with MustMatchers {

"The 'Hello world' string" should {
"contain 11 characters" in {
"Hello world" must have length (11)
}
"start with 'Hello'" in {
"Hello world" must startWith ("Hello")
}
"end with 'world'" in {
"Hello world" must endWith ("world")
}
}
}

You get 5 class files:

HelloWorldSpec$$anonfun$1$$anonfun$apply$mcV$sp$1.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$mcV$sp$2.class
HelloWorldSpec$$anonfun$1$$anonfun$apply$mcV$sp$3.class
HelloWorldSpec$$anonfun$1.class
HelloWorldSpec.class

So what I'm wondering is how much this actually matters in practice. I
have always considered that a design flaw in specs matchers, but I
have never heard a single specs user complain about it. It is possible
that specs users didn't realize it was happening because they only got
a few extra class files per compile, like boiled frogs, or it might
simply not matter in practice. I'm curious what people think.

What I know is that the tests for ScalaTest itself already generate a
very large number of class files, because there's a massive number of
tests in there. As of this morning I count 28,126 class files in
ScalaTest's tests of itself. It takes 25 seconds to put these in a jar
file on my laptop, and 1 minute 52 seconds to delete everything in an
"ant clean". If the number of class files were tripled (i.e., if I
used specs to test ScalaTest) I'm not sure how much longer these build
tasks would take, but I imagine they'd be painfully longer. I'm also
curious how much time the Scala compiler takes to generate these
files, because it takes about 20 minutes to compile the tests. Does
anyone know if any significant time would be spent writing class files
during a compile that generates 28,000 of them?

Bill
----
Bill Venners
Artima, Inc.
http://www.artima.com

Josh Suereth

unread,
May 6, 2011, 1:52:36 PM5/6/11
to Bill Venners, Ceki Gulcu, scala...@googlegroups.com
Hey guys, i'm not sure this was stated already, but for logging there's a very simple reason that by-name parameters can reduce overhead of calls:

Creating logging strings can be expensive.

In our java applications at my previous company, you often found this boilerplate around all logging:

if (logger.isLoggingFatal) {
  logger.log(FATAL, <some expensive string computation>)
}

This drastically improved throughput in our core networking stack that had logging strewn all over it.  (proprietary networking layer between agents and master).

However, in scala we can do better.   logger.log(FATAL, <some expensive string computation>) constructs a class which holds the mechanism for computing the string.  If that log lvevel is not enabled, then the by-name execution never occurs, resulting in faster, more succinct code.   The *real* question to ask here is if HotSpot can optimise both uses as efficiently.

I know I'd prefer to use the scala by-name logging framework for convenience.  If the overhead is simply a class-file overhead, well... I already have a ton of those.

iulian dragos

unread,
May 6, 2011, 6:27:22 PM5/6/11
to Josh Suereth, Bill Venners, Ceki Gulcu, scala...@googlegroups.com
On Fri, May 6, 2011 at 7:52 PM, Josh Suereth <joshua....@gmail.com> wrote:
> Hey guys, i'm not sure this was stated already, but for logging there's a
> very simple reason that by-name parameters can reduce overhead of calls:
> Creating logging strings can be expensive.
> In our java applications at my previous company, you often found this
> boilerplate around all logging:
> if (logger.isLoggingFatal) {
>   logger.log(FATAL, <some expensive string computation>)
> }

The Scala compiler is full of those as well.

Sometimes you can get away with format strings, if the expensive
string computation is in fact '.toString' on some object.

def log(level: Int, fmt: String, args: Any*) {
if (level > whatever)
println(fmt.format(args:_*))
}

this would call .toString only when logging is enabled.

iulian

--
« Je déteste la montagne, ça cache le paysage »
Alphonse Allais

Josh Suereth

unread,
May 6, 2011, 7:42:46 PM5/6/11
to iulian dragos, Bill Venners, Ceki Gulcu, scala...@googlegroups.com
It seems in scala, with some judicious @inline of final logging methods, you might be able to optimise this particular use case to look as simple as:

logger.log(FATAL, <some expensive string computation>)

but compile down to:

if (logger.isLoggingFatal) {
  logger.log(FATAL, <some expensive string computation>)
}

That is, if inlining occurs first, then the closure can be eliminated because it does not leave the scope.

I really haven't played with experimental closure elimination or inlining enough.   Do you think this is a potential?   Honestly, with logging, the ability to elide it out for performance and bring it back as needed can be *huge*.
Reply all
Reply to author
Forward
0 new messages