There are a bunch of old mailing list threads on the subject;
"Optimizing Simple Fors" [1] features an in depth (don't miss the
second page) discussion, with code, benchmarks, and compiler plugins,
between a range of for-loop luminaries.
If you want to live on the edge a bit, you can try ScalaCL [2], a
compiler plugin with both general purpose and GPU optimizations. I've
working on similar problems, and where necessary I just use while
loops.
-jason
[1] http://scala-programming-language.1934581.n4.nabble.com/optimizing-simple-fors-td2545502.html
[2] http://code.google.com/p/scalacl/
Or use the compiler plugin that does that for you. I agree that we
should have it as part of the distribution (it's a bit funny that so
much work is being done to take advantage of multiple cores when we do
a bad job of taking advantage of a single core), but until then:
http://code.google.com/p/scalacl/wiki/Usage
http://code.google.com/p/scalacl/wiki/ScalaCLPlugin
Best,
Ismael
There seems to be a strong reluctance to officially optimize
for-loops. I was holding my breath for almost 2 years now without any
progress. So if you are going to choose Scala then get ready to use
only while loops.
I don't think so. IIRC Paul already produced a patch that compiled for
comprehensions to while loops for the common cases. But it was not
accepted.
Although I don't remember the details, so maybe there is a chance for
acceptance of such a patch.
Regards,
Rüdiger
By optimizing for loops people really mean optimizing an inlining of
range's foreach. Range's foreach is pretty fast already, but it
invokes a Function1, which is what causes the whole trouble. Ideally,
JVM would optimize _that_, but it doesn't.
To make this more clear, consider this:
for { i <- 1 to 10 } println(i)
val range = 1 to 10; for { i <- range } println(i)
val range = 1 to 10; range.foreach(i => println(i))
val range = 1 to 10; val function = (i: Int) => println(i);
range.foreach(function)
These lines are pretty much equivalent, so what needs optimizing is
"range.foreach(function)".
Now, the concern that has been presented before is that Scala
shouldn't make Range special. An optimization should be general, and
not tied to a library class.
I can see at least two reasons for that. First, Scala should be
extensible through libraries, and by special-casing some classes one
decreases this. For instance, no one would be able to write a
Range-equivalent that performed the same. Second, range loops are very
imperative (particularly with foreach) -- they are mostly used to
handle indices into stuff. A functional style should rely instead on
mapping, zipping, etc. An optimization on range's foreach would _not_
bring any benefit to the functional style, and reduce the incentive to
optimize that.
--
Daniel C. Sobral
I travel to the future all the time.
First, Scala should be
extensible through libraries, and by special-casing some classes one
decreases this. For instance, no one would be able to write a
Range-equivalent that performed the same.
On Fri, May 20, 2011 at 16:26, Naftoli Gugenheim <nafto...@gmail.com> wrote:By optimizing for loops people really mean optimizing an inlining of
> I'm not sure if it was a reluctance to optimize them or a reluctance to take
> resources away from other work.
> In other words, if a patch was submitted would they take it?
range's foreach.
Range's foreach is pretty fast already, but it
invokes a Function1, which is what causes the whole trouble. Ideally,
JVM would optimize _that_, but it doesn't.
To make this more clear, consider this:
for { i <- 1 to 10 } println(i)
val range = 1 to 10; for { i <- range } println(i)
val range = 1 to 10; range.foreach(i => println(i))
val range = 1 to 10; val function = (i: Int) => println(i);
range.foreach(function)
These lines are pretty much equivalent, so what needs optimizing is
"range.foreach(function)".
Now, the concern that has been presented before is that Scala
shouldn't make Range special. An optimization should be general, and
not tied to a library class.
I can see at least two reasons for that. First, Scala should be
extensible through libraries, and by special-casing some classes one
decreases this. For instance, no one would be able to write a
Range-equivalent that performed the same.
Second, range loops are very
imperative (particularly with foreach) -- they are mostly used to
handle indices into stuff. A functional style should rely instead on
mapping, zipping, etc. An optimization on range's foreach would _not_
bring any benefit to the functional style, and reduce the incentive to
optimize that.
It makes absolutely no sense to me that this simple case hasn't been
optimized yet. And if something doesn't make sense, it usually
involves politics.
It's a cost/benefit thing, the extra maintainence that this specific optimisation would add just doesn't justify it.Especially not when the while loop is a valid workaround and we'll be gaining stuff like virtualised Scala
and fixnums in the JVM at some point in the future
, which means the optimisation probably wouldn't have too long a lifespan anyway.There are more profitable fish to be fried here.
I just tried it a few times on a pet project of mine where I was doing
3D graphics with LWJGL/OpenGL which was *quite* slow at the beginning.
Using -optimize yielded no measurable improvement at all. Rewriting
all the nice for loops as ugly while loops made a HUGE difference. I
don't remember the exact numbers, but a factor of 10 is no
exaggeration, perhaps even more.
And waiting for more optimizations in the JVM isn't really an option.
Nobody knows if there will ever be optimizations that do this for us
and even if, how long will it take until we get them?
Regards,
Rüdiger
2011/5/21 Rex Kerr <ich...@gmail.com>:
On Fri, May 20, 2011 at 8:06 PM, Kevin Wright <kev.lee...@gmail.com> wrote:Especially not when the while loop is a valid workaround and we'll be gaining stuff like virtualised Scala
The overloadable while loops seem...well...I'd like to see how they're implemented to be as fast as actual while loops (and immune to problems with multicasting and object creation and so on). All the easy implementations seem to me to either require underlying constructs that are not lightweight, or are a poor-man's version of lisp-style macros--but I don't know what Tiark is actually doing, so it could be great in ways that I haven't realized.
On Sat, May 21, 2011 at 3:00 PM, Tiark Rompf <tiark...@epfl.ch> wrote:
> Note how all the intermediate arrays are optimized away and there is only a
> single loop to compute the mean and variance at the same time.
Nice!
> As I said this is all ongoing development work and it is way to early to
> make any commitments about releases or availability.
I wish I could have that yesterday. ;)
Best,
Ismael
I guess this is the research that was recently funded by the European
Union, isn't it?
Regards,
Rüdiger
2011/5/21 Tiark Rompf <tiark...@epfl.ch>:
This looks pretty awsome!
I guess this is the research that was recently funded by the European
Union, isn't it?
Regards,
Rüdiger
--
Daniel wrote a blog:
http://dcsobral.blogspot.com/2011/05/scala-29-optimizes-for-comprehensions.html
Best,
Ismael
So, who's going to give us some meaty performance numbers? ;-)
Regards,
Rüdiger
2011/5/21 Daniel Sobral <dcso...@gmail.com>:
With -optimize, scalac should compile simple for comprehensions to the equivalent of a while loop.
If it does not it's a bug (either in the library or in the compiler) and it needs fixing. So if there is indeed a bug, gather solid performance numbers, submit a ticket and continue lobbying to get it fixed.
<snip>
Take a look at these benchmark times (after my signature) comparing loops that are rather cruel to the JVM--the comparison is between while loops with for loops using range ("Limit") and the specialized utility loop function "cfor" (which emulates the C for loop). Code is attached. The bottom line is that while loops are fast, whether they are a single loop (While1) or nested multiple times (While3, While5) or are a mishmash of a bunch of different loops (While?). Relying upon the JVM to optimize generic specialized code is occasionally--well, just look at the timings, especially the "Limit1" and "Cfor1" times in the second block.
That performance is a great improvement over unspecialized stuff for, say, web development or database or somesuch. For serious numerical work, it's pretty intolerable to get 10x slowdowns because of the particular execution path your code takes de-optimizing a critical loop.
For loops in the special case of while loops _should_ be as fast as they are, but given how common the common cases are, why not write custom code for them?
--Rex
P.S. Paul--y'might want to try this out on your for loop also.
P.P.S. Here are the timings:
-1243309312 While1 Elapsed: 0.305 s
-1243309312 While3 Elapsed: 0.319 s
-1243309312 While5 Elapsed: 0.438 s
-17609343 While? Elapsed: 1.163 s
-1243309312 Limit1 Elapsed: 0.318 s
-1243309312 Limit3 Elapsed: 0.512 s
-1243309312 Limit5 Elapsed: 1.279 s
-17609343 Limit? Elapsed: 2.603 s
-1243309312 Cfor1 Elapsed: 0.323 s
-1243309312 Cfor3 Elapsed: 0.612 s
-1243309312 Cfor5 Elapsed: 0.883 s
-17609343 Cfor?? Elapsed: 5.648 s
-1243309312 While1 Elapsed: 0.296 s
-1243309312 While3 Elapsed: 0.315 s
-1243309312 While5 Elapsed: 0.435 s
-17609343 While? Elapsed: 1.146 s
-1243309312 Limit1 Elapsed: 3.976 s
-1243309312 Limit3 Elapsed: 0.348 s
-1243309312 Limit5 Elapsed: 1.238 s
-17609343 Limit? Elapsed: 2.695 s
-1243309312 Cfor1 Elapsed: 9.792 s
-1243309312 Cfor3 Elapsed: 0.329 s
-1243309312 Cfor5 Elapsed: 0.855 s
-17609343 Cfor?? Elapsed: 5.608 s
-1243309312 While1 Elapsed: 0.296 s
-1243309312 While3 Elapsed: 0.302 s
-1243309312 While5 Elapsed: 0.314 s
-17609343 While? Elapsed: 1.142 s
-1243309312 Limit1 Elapsed: 3.966 s
-1243309312 Limit3 Elapsed: 0.339 s
-1243309312 Limit5 Elapsed: 1.216 s
-17609343 Limit? Elapsed: 2.689 s
-1243309312 Cfor1 Elapsed: 9.774 s
-1243309312 Cfor3 Elapsed: 0.329 s
-1243309312 Cfor5 Elapsed: 0.826 s
-17609343 Cfor?? Elapsed: 5.606 s
-1243309312 While1 Elapsed: 0.296 s
-1243309312 While3 Elapsed: 0.301 s
-1243309312 While5 Elapsed: 0.316 s
-17609343 While? Elapsed: 1.140 s
-1243309312 Limit1 Elapsed: 3.963 s
-1243309312 Limit3 Elapsed: 0.340 s
-1243309312 Limit5 Elapsed: 1.193 s
-17609343 Limit? Elapsed: 2.683 s
-1243309312 Cfor1 Elapsed: 9.783 s
-1243309312 Cfor3 Elapsed: 0.342 s
-1243309312 Cfor5 Elapsed: 0.837 s
-17609343 Cfor?? Elapsed: 5.628 s
<a bunch of stuff, including a promise of benchmarks of for loops>
I fear that unless enough people request this optimization then
nothing will be done.
So follow this link: https://issues.scala-lang.org/browse/SI-4633,
login and click "Vote".
BTW: do you know where to vote for JVM megamorphic inlining improvements
by code cloning, as described by Cliff here:
http://www.azulsystems.com/blog/cliff/2011-04-04-fixing-the-inlining-problem
?
This is what Scala needs very badly.
Regards,
Piotr
W dniu 2011-05-22 23:09, Lex pisze:
For what is worth, the author released a version compatible with 2.9.0 today:
https://twitter.com/#!/ochafik/status/73435122782838784
Best,
Ismael