Bad performance of for loop

584 views
Skip to first unread message

wolfgang

unread,
May 20, 2011, 11:48:53 AM5/20/11
to scala-user
Hello,

I evaluated Scala with a numerical program (basically calculating
inner
products of looong Double vectors) and found its performance
significantly
slower than that of the equivalent Java program (the elapsed time of
the
scala program was 4-5 times the elapsed time of the Java program).

So after some research (decompiling the generated class files back to
Java) I
found that the bad performance was caused by the for loops.
Substituting

for (i <- 0 until 1000000) {
//... s += v(i) * w(i)
}

with

var i = 0
while (i < 1000000) {
//...
i += 1
}

made the Scala program as fast as the Java program!

But using "while" instead of "for" is not really nice (the "i" is in
the
outer scope, it is easy to forget the "i += 1" or put it outside of
the
loop). So I tried something like:

def loop(from: Int, to: Int)(op: Int => Unit): Unit = {
var i = from
while (i < to) {
op(i)
i += 1
}
}

loop(0, 1000000) { i =>
{
//..
}
}

but the performance was bad again (maybe for another reason).

Is there anything like a "fast" for loop in Scala?
Are there any alternatives?

Regards,

Wolfgang Seifert

PS: The numerical program stems from the finite-difference-
approximation of
the heat equation and (as usual in numerical mathematics) computing
speed is crtitical.
The Java program (and Scala program with while) is in fact as fast as
the equivalent C/C++ program.

Jason Zaugg

unread,
May 20, 2011, 12:09:28 PM5/20/11
to wolfgang, scala-user
On Fri, May 20, 2011 at 5:48 PM, wolfgang <wolf.s...@web.de> wrote:
> Is there anything like a "fast" for loop in Scala?
> Are there any alternatives?

There are a bunch of old mailing list threads on the subject;
"Optimizing Simple Fors" [1] features an in depth (don't miss the
second page) discussion, with code, benchmarks, and compiler plugins,
between a range of for-loop luminaries.

If you want to live on the edge a bit, you can try ScalaCL [2], a
compiler plugin with both general purpose and GPU optimizations. I've
working on similar problems, and where necessary I just use while
loops.

-jason

[1] http://scala-programming-language.1934581.n4.nabble.com/optimizing-simple-fors-td2545502.html
[2] http://code.google.com/p/scalacl/

Lex

unread,
May 20, 2011, 12:56:59 PM5/20/11
to wolfgang, scala-user
There seems to be a strong reluctance to officially optimize
for-loops. I was holding my breath for almost 2 years now without any
progress. So if you are going to choose Scala then get ready to use
only while loops.

Ismael Juma

unread,
May 20, 2011, 1:05:59 PM5/20/11
to Lex, wolfgang, scala-user
On Fri, May 20, 2011 at 5:56 PM, Lex <lex...@gmail.com> wrote:
> There seems to be a strong reluctance to officially optimize
> for-loops. I was holding my breath for almost 2 years now without any
> progress. So if you are going to choose Scala then get ready to use
> only while loops.

Or use the compiler plugin that does that for you. I agree that we
should have it as part of the distribution (it's a bit funny that so
much work is being done to take advantage of multiple cores when we do
a bad job of taking advantage of a single core), but until then:

http://code.google.com/p/scalacl/wiki/Usage
http://code.google.com/p/scalacl/wiki/ScalaCLPlugin

Best,
Ismael

√iktor Ҡlang

unread,
May 20, 2011, 2:34:33 PM5/20/11
to Lex, wolfgang, scala-user
On Fri, May 20, 2011 at 6:56 PM, Lex <lex...@gmail.com> wrote:
There seems to be a strong reluctance to officially optimize
for-loops. I was holding my breath for almost 2 years now without any
progress. So if you are going to choose Scala then get ready to use
only while loops.


To be honest, there are no "for loops", there are only for-comprehensions.
 



--
Viktor Klang

Akka Tech Lead
Typesafe - Enterprise-Grade Scala from the Experts

Twitter: @viktorklang

Naftoli Gugenheim

unread,
May 20, 2011, 3:26:59 PM5/20/11
to Lex, wolfgang, scala-user
I'm not sure if it was a reluctance to optimize them or a reluctance to take resources away from other work.
In other words, if a patch was submitted would they take it?

Ruediger Keller

unread,
May 20, 2011, 3:53:48 PM5/20/11
to Naftoli Gugenheim, scala-user
2011/5/20 Naftoli Gugenheim <nafto...@gmail.com>:

> I'm not sure if it was a reluctance to optimize them or a reluctance to take
> resources away from other work.
> In other words, if a patch was submitted would they take it?

I don't think so. IIRC Paul already produced a patch that compiled for
comprehensions to while loops for the common cases. But it was not
accepted.

Although I don't remember the details, so maybe there is a chance for
acceptance of such a patch.

Regards,
Rüdiger

Daniel Sobral

unread,
May 20, 2011, 6:06:23 PM5/20/11
to Naftoli Gugenheim, Lex, wolfgang, scala-user
On Fri, May 20, 2011 at 16:26, Naftoli Gugenheim <nafto...@gmail.com> wrote:
> I'm not sure if it was a reluctance to optimize them or a reluctance to take
> resources away from other work.
> In other words, if a patch was submitted would they take it?

By optimizing for loops people really mean optimizing an inlining of
range's foreach. Range's foreach is pretty fast already, but it
invokes a Function1, which is what causes the whole trouble. Ideally,
JVM would optimize _that_, but it doesn't.

To make this more clear, consider this:

for { i <- 1 to 10 } println(i)
val range = 1 to 10; for { i <- range } println(i)
val range = 1 to 10; range.foreach(i => println(i))
val range = 1 to 10; val function = (i: Int) => println(i);
range.foreach(function)

These lines are pretty much equivalent, so what needs optimizing is
"range.foreach(function)".

Now, the concern that has been presented before is that Scala
shouldn't make Range special. An optimization should be general, and
not tied to a library class.

I can see at least two reasons for that. First, Scala should be
extensible through libraries, and by special-casing some classes one
decreases this. For instance, no one would be able to write a
Range-equivalent that performed the same. Second, range loops are very
imperative (particularly with foreach) -- they are mostly used to
handle indices into stuff. A functional style should rely instead on
mapping, zipping, etc. An optimization on range's foreach would _not_
bring any benefit to the functional style, and reduce the incentive to
optimize that.

--
Daniel C. Sobral

I travel to the future all the time.

Naftoli Gugenheim

unread,
May 20, 2011, 6:30:00 PM5/20/11
to Daniel Sobral, Lex, wolfgang, scala-user


On Fri, May 20, 2011 at 6:06 PM, Daniel Sobral <dcso...@gmail.com> wrote:
First, Scala should be
extensible through libraries, and by special-casing some classes one
decreases this. For instance, no one would be able to write a
Range-equivalent that performed the same.

I disagree. We're not talking about decreasing the performance of an extension library, so the extensibility of scala remains the same. Optimizing something built in may give that built-in an edge over something not built in, but it's not a competition. The non-built-in retains the same flexibility as without the optimization.

Rex Kerr

unread,
May 20, 2011, 7:51:43 PM5/20/11
to Daniel Sobral, scala-user
I think you've done a pretty good job of presenting the current reasoning, Daniel, but I think there's another argument to make.


On Fri, May 20, 2011 at 6:06 PM, Daniel Sobral <dcso...@gmail.com> wrote:
On Fri, May 20, 2011 at 16:26, Naftoli Gugenheim <nafto...@gmail.com> wrote:
> I'm not sure if it was a reluctance to optimize them or a reluctance to take
> resources away from other work.
> In other words, if a patch was submitted would they take it?

By optimizing for loops people really mean optimizing an inlining of
range's foreach.

Not if "people" includes me.  I mean optimizing an inlining of Array foreach also.  One can of course use range to index into the array if one must, but that moves from a very inferior solution to Java to a merely somewhat inferior solution for arrays.

Any real solution will do both.  And, as far as I know, any real solution _only_ needs to do both since nothing else is fast enough to matter (i.e. you want to specialize the collection, but that's enough) or is already generic.  The problem with array and range access is that specialization is not enough to avoid performance penalties that range from 30% to 3x (depending on what the JVM manages to inline).
 
Range's foreach is pretty fast already, but it
invokes a Function1, which is what causes the whole trouble. Ideally,
JVM would optimize _that_, but it doesn't.

The Sun JVM does somewhat.  JRockit is considerably slower (~3x IIRC).
 

To make this more clear, consider this:

for { i <- 1 to 10 } println(i)
val range = 1 to 10; for { i <- range } println(i)
val range = 1 to 10; range.foreach(i => println(i))
val range = 1 to 10; val function = (i: Int) => println(i);
range.foreach(function)

These lines are pretty much equivalent, so what needs optimizing is
"range.foreach(function)".

Now, the concern that has been presented before is that Scala
shouldn't make Range special. An optimization should be general, and
not tied to a library class.

Agreed, in that an optimization should handle every relevant case.  Optimizations need not worry about cases where that particular optimization would be irrelevant anyway (e.g. if we can't cover seek-and-read in a file, it doesn't matter).
 
I can see at least two reasons for that. First, Scala should be
extensible through libraries, and by special-casing some classes one
decreases this. For instance, no one would be able to write a
Range-equivalent that performed the same.

Granted, but with a sufficiently potent Range, there would be a vanishingly small use case for a thing that was just like Range except wasn't Range.
 
Second, range loops are very
imperative (particularly with foreach) -- they are mostly used to
handle indices into stuff. A functional style should rely instead on
mapping, zipping, etc. An optimization on range's foreach would _not_
bring any benefit to the functional style, and reduce the incentive to
optimize that.

Which is exactly why it needs to optimize Array also.  If you cover Range and Array, everything else you're likely to run into is already an object, and that provides enough overhead that the additional overhead of a specialized Function1 call is minimal.

  --Rex

Lex

unread,
May 20, 2011, 8:00:32 PM5/20/11
to Naftoli Gugenheim, Daniel Sobral, wolfgang, scala-user
I think the real reason for the lack of for-loop optimization is that
Scala tries very hard to sell functional style. Optimizing for-loops
would be a slippery slope towards a situation where imperative
approach is significantly faster and just as easy to use. That would
encourage imperative style rather than functional.

It makes absolutely no sense to me that this simple case hasn't been
optimized yet. And if something doesn't make sense, it usually
involves politics.

Kevin Wright

unread,
May 20, 2011, 8:06:19 PM5/20/11
to Lex, Naftoli Gugenheim, Daniel Sobral, wolfgang, scala-user
It's a cost/benefit thing, the extra maintainence that this specific optimisation would add just doesn't justify it.  

Especially not when the while loop is a valid workaround and we'll be gaining stuff like virtualised Scala and fixnums in the JVM at some point in the future, which means the optimisation probably wouldn't have too long a lifespan anyway.

There are more profitable fish to be fried here.
--
Kevin Wright

gtalk / msn : kev.lee...@gmail.com
mail: kevin....@scalatechnology.com
vibe / skype: kev.lee.wright
quora: http://www.quora.com/Kevin-Wright
twitter: @thecoda

"My point today is that, if we wish to count lines of code, we should not regard them as "lines produced" but as "lines spent": the current conventional wisdom is so foolish as to book that count on the wrong side of the ledger" ~ Dijkstra

Rex Kerr

unread,
May 20, 2011, 9:44:58 PM5/20/11
to Kevin Wright, scala-user
On Fri, May 20, 2011 at 8:06 PM, Kevin Wright <kev.lee...@gmail.com> wrote:
It's a cost/benefit thing, the extra maintainence that this specific optimisation would add just doesn't justify it.

Especially not when the while loop is a valid workaround and we'll be gaining stuff like virtualised Scala

The overloadable while loops seem...well...I'd like to see how they're implemented to be as fast as actual while loops (and immune to problems with multicasting and object creation and so on).  All the easy implementations seem to me to either require underlying constructs that are not lightweight, or are a poor-man's version of lisp-style macros--but I don't know what Tiark is actually doing, so it could be great in ways that I haven't realized.  The inline methods _would_ be great (it's shocking how much slower a max b and c.isNan are than math.max(a,b) and Double.isNaN(c)), but they're not really applicable here.
 
and fixnums in the JVM at some point in the future

If specialization is already not fast enough, how could fixnums (which have a little bit of overhead above regular primitives) solve the problem?  That's a way to spend only a small penalty instead of a large one with non-specialized code.
 
, which means the optimisation probably wouldn't have too long a lifespan anyway.

There are more profitable fish to be fried here.

For most people, probably so.  Since I spend about 30% of my time coding in Scala writing while loops required for adequate performance, it's a pretty valuable fish to me.  (I'd probably be twice as fast to code for loops, if you count both the reduced boilerplate and the lower rate of errors.  In Java I'm able to code these things significantly faster--but Scala still wins because I'm so much slower in Java with the other 70%.)

Unfortunately, since the speed is mission-critical for me, and the ScalaCL optimization plugin is not officially supported, I have not felt comfortable doing anything but writing the while loops by hand.

Anyway, if it's a matter of maintenance, I suggest not maintaining something else.  For example, I think we can pretty safely throw out -optimise--just remove it, or don't maintain any of it--since this conversation almost never happens:

  A: "Oh, wow, my code is so slow!"
  B: "Turn on -optimise!"
  A: "It's so much faster now, thanks!"

and this conversation happens every month or so:

  A: "Oh, wow, my for loop is so slow!"
  B: "Use a while loop!"
  A: "It's so much faster now!  Thanks, I guess, but it's so clunky!"

Since even Dalvik now has a passable JIT compiler which is getting better all the time, I think the for loop optimization would save way more time (both coder and computation) than the clever things that both the JIT and -optimise can do.

  --Rex

wolfgang

unread,
May 21, 2011, 4:30:02 AM5/21/11
to scala-user
Following your suggestions I tried the ScalaCL compiler.
First it crashed, because I was using Scala 2.9.0.
So I went back to Scala 2.8.1 (had to change back some code) and it
worked.
And it is fast (decompiling shows me a Java for loop now).
But because ScalaCL is not part of the scala distribution ScalaCL is
not an option for me
(I do not want to have the same kind of problems - see crash above -
with every scala upgrade).

Following the discussion I see that there is really a need for a
"fast" for loop (as part of the scala distribution)
and only some "philosophical" arguments against it.
This is not improving the acceptance of Scala, because people either
have to use while loops (slow programming)
or use for loops (and have slow programs).

So I will stick to while loops (or Java?).

Wolfgang



On 21 Mai, 03:44, Rex Kerr <icho...@gmail.com> wrote:

Ruediger Keller

unread,
May 21, 2011, 7:12:45 AM5/21/11
to Rex Kerr, Kevin Wright, scala-user
I recall reading several times on the lists that using -optimize had
no positive effect at all, whereas I don't recall ever reading about
it having any positive effect. So what type of code is actually
optimized by it to a significant degree?

I just tried it a few times on a pet project of mine where I was doing
3D graphics with LWJGL/OpenGL which was *quite* slow at the beginning.
Using -optimize yielded no measurable improvement at all. Rewriting
all the nice for loops as ugly while loops made a HUGE difference. I
don't remember the exact numbers, but a factor of 10 is no
exaggeration, perhaps even more.

And waiting for more optimizations in the JVM isn't really an option.
Nobody knows if there will ever be optimizations that do this for us
and even if, how long will it take until we get them?

Regards,
Rüdiger


2011/5/21 Rex Kerr <ich...@gmail.com>:

Tiark Rompf

unread,
May 21, 2011, 10:00:03 AM5/21/11
to Rex Kerr, Kevin Wright, scala-user
With -optimize, scalac should compile simple for comprehensions to the equivalent of a while loop. If it does not it's a bug (either in the library or in the compiler) and it needs fixing. So if there is indeed a bug, gather solid performance numbers, submit a ticket and continue lobbying to get it fixed.

On May 21, 2011, at 3:44 AM, Rex Kerr wrote:

On Fri, May 20, 2011 at 8:06 PM, Kevin Wright <kev.lee...@gmail.com> wrote:
Especially not when the while loop is a valid workaround and we'll be gaining stuff like virtualised Scala

The overloadable while loops seem...well...I'd like to see how they're implemented to be as fast as actual while loops (and immune to problems with multicasting and object creation and so on).  All the easy implementations seem to me to either require underlying constructs that are not lightweight, or are a poor-man's version of lisp-style macros--but I don't know what Tiark is actually doing, so it could be great in ways that I haven't realized.  


We like to think so ;-) Totally off-topic but here is a little teaser of the things under development. Consider the following code and note how many times it creates or traverses arrays of size n (fill, map, sum):

def square(x: Rep[Double]) = x*x
def mean(xs: Rep[Array[Double]]) = xs.sum / xs.length
def variance(xs: Rep[Array[Double]]) = xs.map(square) / xs.length - square(mean(xs))

val fun = compile { n: Rep[Int] =>

  val array1 = Array.fill(n) { i => 1 }

  val array2 = Array.fill(n) { i => 2*i }

  val array3 = Array.fill(n) { i => array1(i) + array2(i) }

  val m = mean(array3)
  val v = variance(array3)

  println(m)
  println(v)
}

fun(100)

The magic piece here is the 'compile' function. It takes a closure and at runtime generates, compiles and loads, a heavily optimized version of the code, inlining function calls and fusing loops and traversals. In the case above the generated code is this:

class Fun$1 extends ((Int)=>(Unit)) {
  def apply(x0: Int): Unit = {
    var x47 = 0
    var x51 = 0
    var x11 = 0
    while (x11 < x0) {
      val x44 = 2.0*x11
      val x45 = 1.0+x44
      val x50 = x45*x45
      x47 += x45
      x51 += x50
      x11 += 1
    }
    val x48 = x47/x0
    val x49 = println(x48)
    val x52 = x51/x0
    val x53 = x48*x48
    val x54 = x52-x53
    val x55 = println(x54)
    x27
     }
}

Note how all the intermediate arrays are optimized away and there is only a single loop to compute the mean and variance at the same time.

The caveat is that you cannot pass *any* code to 'compile' but only things that are 'virtualized' i.e. lifted to the domain of Rep[_] types. Of course the goal is to lift a reasonable share of the standard library but the most interesting use is for DSLs, which can hook into the compilation process and add both custom expression types and optimizations while maintaining the 'just a library' impression.

An alternative to calling 'compile' and then directly executing the result is to generate the code, save it away and use it later. This approach may be more suited for some DSLs, where you have a well delineated 'DSL program'. Plus, if you generate and run the code with Delite, it will also be parallelized and executed across multi-core CPUs and GPUs.

As I said this is all ongoing development work and it is way to early to make any commitments about releases or availability.

cheers,
- Tiark

some pointers:

Ismael Juma

unread,
May 21, 2011, 10:06:13 AM5/21/11
to Tiark Rompf, Rex Kerr, Kevin Wright, scala-user
Hi Tiark,

On Sat, May 21, 2011 at 3:00 PM, Tiark Rompf <tiark...@epfl.ch> wrote:
> Note how all the intermediate arrays are optimized away and there is only a
> single loop to compute the mean and variance at the same time.

Nice!

> As I said this is all ongoing development work and it is way to early to
> make any commitments about releases or availability.

I wish I could have that yesterday. ;)

Best,
Ismael

Ruediger Keller

unread,
May 21, 2011, 10:39:55 AM5/21/11
to Tiark Rompf, scala-user
This looks pretty awsome!

I guess this is the research that was recently funded by the European
Union, isn't it?

Regards,
Rüdiger


2011/5/21 Tiark Rompf <tiark...@epfl.ch>:

Kevin Wright

unread,
May 21, 2011, 11:14:42 AM5/21/11
to Ruediger Keller, Tiark Rompf, scala-user
On 21 May 2011 15:39, Ruediger Keller <ruedige...@rk42.de> wrote:
This looks pretty awsome!

I guess this is the research that was recently funded by the European
Union, isn't it?

Regards,
Rüdiger



Yes, and before that it was done in collaboration with Stanford PPL.
This stuff predates the euro grant by quite some time...

Daniel Sobral

unread,
May 21, 2011, 11:53:53 AM5/21/11
to Ruediger Keller, Rex Kerr, Kevin Wright, scala-user
I have just tried -optimize with Scala 2.9.0 on a simple foreach loop
on a range, and it produced a remarkably different result than Scala
2.8.1.

--

Kevin Wright

unread,
May 21, 2011, 11:55:50 AM5/21/11
to Daniel Sobral, Ruediger Keller, Rex Kerr, scala-user
Different how?

We want numbers, or it didn't happen!


HamsterofDeath

unread,
May 21, 2011, 12:39:59 PM5/21/11
to scala...@googlegroups.com
we want the benchmark sourcecode.

Ismael Juma

unread,
May 21, 2011, 12:56:37 PM5/21/11
to HamsterofDeath, scala...@googlegroups.com
On Sat, May 21, 2011 at 5:39 PM, HamsterofDeath <h-s...@gmx.de> wrote:
> we want the benchmark sourcecode.

Daniel wrote a blog:

http://dcsobral.blogspot.com/2011/05/scala-29-optimizes-for-comprehensions.html

Best,
Ismael

Ruediger Keller

unread,
May 21, 2011, 1:10:01 PM5/21/11
to Daniel Sobral, scala-user
Oh nice find! I didn't realize this at all.

So, who's going to give us some meaty performance numbers? ;-)

Regards,
Rüdiger

2011/5/21 Daniel Sobral <dcso...@gmail.com>:

Daniel Sobral

unread,
May 21, 2011, 9:02:21 PM5/21/11
to Ruediger Keller, scala-user
I vote for Rex Kerr, or, failing that, Ismael Juma. They are far
superior benchmarkers than I.

Rex Kerr

unread,
May 22, 2011, 9:44:11 AM5/22/11
to Tiark Rompf, scala-user
On Sat, May 21, 2011 at 10:00 AM, Tiark Rompf <tiark...@epfl.ch> wrote:
With -optimize, scalac should compile simple for comprehensions to the equivalent of a while loop.

I think you are placing far too much faith in "equivalent".
 
If it does not it's a bug (either in the library or in the compiler) and it needs fixing. So if there is indeed a bug, gather solid performance numbers, submit a ticket and continue lobbying to get it fixed.

I gathered performance numbers months ago:

On Sun, Sep 19, 2010 at 7:04 PM, Rex Kerr <ich...@gmail.com> wrote:
<snip>

Take a look at these benchmark times (after my signature) comparing loops that are rather cruel to the JVM--the comparison is between while loops with for loops using range ("Limit") and the specialized utility loop function "cfor" (which emulates the C for loop).  Code is attached.  The bottom line is that while loops are fast, whether they are a single loop (While1) or nested multiple times (While3, While5) or are a mishmash of a bunch of different loops (While?).  Relying upon the JVM to optimize generic specialized code is occasionally--well, just look at the timings, especially the "Limit1" and "Cfor1" times in the second block.

That performance is a great improvement over unspecialized stuff for, say, web development or database or somesuch.  For serious numerical work, it's pretty intolerable to get 10x slowdowns because of the particular execution path your code takes de-optimizing a critical loop.

For loops in the special case of while loops _should_ be as fast as they are, but given how common the common cases are, why not write custom code for them?

  --Rex

P.S. Paul--y'might want to try this out on your for loop also.

P.P.S. Here are the timings:

-1243309312 While1 Elapsed: 0.305 s
-1243309312 While3 Elapsed: 0.319 s
-1243309312 While5 Elapsed: 0.438 s
-17609343 While? Elapsed: 1.163 s
-1243309312 Limit1 Elapsed: 0.318 s
-1243309312 Limit3 Elapsed: 0.512 s
-1243309312 Limit5 Elapsed: 1.279 s
-17609343 Limit? Elapsed: 2.603 s
-1243309312  Cfor1 Elapsed: 0.323 s
-1243309312  Cfor3 Elapsed: 0.612 s
-1243309312  Cfor5 Elapsed: 0.883 s
-17609343 Cfor?? Elapsed: 5.648 s

-1243309312 While1 Elapsed: 0.296 s
-1243309312 While3 Elapsed: 0.315 s
-1243309312 While5 Elapsed: 0.435 s
-17609343 While? Elapsed: 1.146 s
-1243309312 Limit1 Elapsed: 3.976 s
-1243309312 Limit3 Elapsed: 0.348 s
-1243309312 Limit5 Elapsed: 1.238 s
-17609343 Limit? Elapsed: 2.695 s
-1243309312  Cfor1 Elapsed: 9.792 s
-1243309312  Cfor3 Elapsed: 0.329 s
-1243309312  Cfor5 Elapsed: 0.855 s
-17609343 Cfor?? Elapsed: 5.608 s

-1243309312 While1 Elapsed: 0.296 s
-1243309312 While3 Elapsed: 0.302 s
-1243309312 While5 Elapsed: 0.314 s
-17609343 While? Elapsed: 1.142 s
-1243309312 Limit1 Elapsed: 3.966 s
-1243309312 Limit3 Elapsed: 0.339 s
-1243309312 Limit5 Elapsed: 1.216 s
-17609343 Limit? Elapsed: 2.689 s
-1243309312  Cfor1 Elapsed: 9.774 s
-1243309312  Cfor3 Elapsed: 0.329 s
-1243309312  Cfor5 Elapsed: 0.826 s
-17609343 Cfor?? Elapsed: 5.606 s

-1243309312 While1 Elapsed: 0.296 s
-1243309312 While3 Elapsed: 0.301 s
-1243309312 While5 Elapsed: 0.316 s
-17609343 While? Elapsed: 1.140 s
-1243309312 Limit1 Elapsed: 3.963 s
-1243309312 Limit3 Elapsed: 0.340 s
-1243309312 Limit5 Elapsed: 1.193 s
-17609343 Limit? Elapsed: 2.683 s
-1243309312  Cfor1 Elapsed: 9.783 s
-1243309312  Cfor3 Elapsed: 0.342 s
-1243309312  Cfor5 Elapsed: 0.837 s
-17609343 Cfor?? Elapsed: 5.628 s


and with 2.9.0.final, things have only gotten _worse_ (here, after warmup):

// These are raw while loops
-1243309312 While1 Elapsed: 0.314 s
-1243309312 While3 Elapsed: 0.301 s
-1243309312 While5 Elapsed: 0.315 s
-17609343 While? Elapsed: 1.153 s

// These are for loops on ranges
-1243309312 Limit1 Elapsed: 0.599 s
-1243309312 Limit3 Elapsed: 1.186 s
-1243309312 Limit5 Elapsed: 2.837 s
-17609343 Limit? Elapsed: 1.530 s

// This is a C-style for built with anonymous functions
-1243309312  Cfor1 Elapsed: 9.796 s
-1243309312  Cfor3 Elapsed: 0.322 s
-1243309312  Cfor5 Elapsed: 0.451 s
-17609343 Cfor?? Elapsed: 5.614 s

The code is attached, again.  The test is called "Cruel" because although it is built out of "simple" for loops, it uses them nested in various ways, which is common in real code but may not be in microbenchmarks.

Bottom line is that for loops are 50% to 9x worse.  -optimise _does_ speed up "Limit?" by a factor of 2, which is nice, but it slows Limit3 down by 30%, so it's a mixed bag.  The anonymous functions would be nearly as good as while loops if only the JVM could keep all the different calls straight and optimize them separately.  The strangely gigantic timings come from the JVM getting paranoid.

Anyway, "compile" looks awesome!  But right now, optimised for loops are not competitive with while loops.

  --Rex

Rex Kerr

unread,
May 22, 2011, 9:55:38 AM5/22/11
to scala-user
Forgot to attach the file here, but I did create a ticket with the benchmark file attached:

  https://issues.scala-lang.org/browse/SI-4633

--Rex

On Sun, May 22, 2011 at 9:44 AM, Rex Kerr <ich...@gmail.com> wrote:
<a bunch of stuff, including a promise of benchmarks of for loops>

Lex

unread,
May 22, 2011, 5:09:12 PM5/22/11
to Rex Kerr, scala-user
+1

I fear that unless enough people request this optimization then
nothing will be done.
So follow this link: https://issues.scala-lang.org/browse/SI-4633,
login and click "Vote".

Piotr Kołaczkowski

unread,
May 23, 2011, 4:42:13 AM5/23/11
to scala...@googlegroups.com, Rex Kerr, scala-user
+1.

BTW: do you know where to vote for JVM megamorphic inlining improvements
by code cloning, as described by Cliff here:
http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
?

This is what Scala needs very badly.

Regards,
Piotr

W dniu 2011-05-22 23:09, Lex pisze:

Ismael Juma

unread,
May 25, 2011, 1:10:13 PM5/25/11
to wolfgang, scala-user
On Sat, May 21, 2011 at 9:30 AM, wolfgang <wolf.s...@web.de> wrote:
> Following your suggestions I tried the ScalaCL compiler.
> First it crashed, because I was using Scala 2.9.0.

For what is worth, the author released a version compatible with 2.9.0 today:

https://twitter.com/#!/ochafik/status/73435122782838784

Best,
Ismael

Reply all
Reply to author
Forward
0 new messages