@inline final override def foreach[U](f: Int => U) {
var numIters = numSteps
if(numIters == -1) {
if (step == 0) throw new IllegalArgumentException("step cannot be 0.")
else {
val word = if (isInclusive) "to" else "until"
val descr = List(start, word, end, "by", step) mkString " "
throw new IllegalArgumentException(descr + ": seqs cannot contain more than Int.MaxValue elements.")
}
}
Right, and an existing method can be used (as shown below). There's a slight performance degradation however (accessing lazy vals, even in a cold path, apparently makes the JITer less aggressive, for some reason).
@inline final override def foreach[U](f: Int => U) {if(numIters == -1) this.length // invoked for side-effect (throw IllegalArgumentException)
var numIters = numSteps
Should we automatically synthesize a public accessor for private members
referenced from an @inline method? I think it'd have to have a mangled
name though since subclasses can introduce a different private field
with the same name.
-- Geoff
I have observed the same thing. If there is a lazy val anywhere, good
luck not seeing that minimum 2x difference between while and foreach.
I don't know about everyone else, but I find there are only two
execution profiles among all plausible implementations: foreach is
about the same speed as while, or foreach is half the speed of while.
(OK, there are a bunch that are slower yet.) I would like to
understand that, because the consistency of the 2x factor tells me
it's some very specific, happens-or-doesn't-happen aspect. And the
lazy val is in my experimentation a lock to move the needle to
"happens".
Intuitively this is related to the fact that a lazy val involves going
through a monitor - that optimizations are precluded if something
downstream from the method requires coordination - but that's pure
speculation on my part.
I have observed the same thing. If there is a lazy val anywhere, goodluck not seeing that minimum 2x difference between while and foreach.
Actually, I got while-like performance without -optimise in the ByOne
case. Better than -optimise as often as not, in fact.
> (1) accessing a lazy val (e.g., Range.length) is a performance killer (may
> be prevents JIT compilation?)
>
> (2) invoking the closure arg can be done with or without casting, the cast
> neither hurts nor improves performance significantly:
>
> (f.asInstanceOf[Int => Unit]).apply(i)
> vs
> f(i)
>
> (3) it's desirable to perform that invocation only once, otherwise each
> occurrence can lead to inlining the loop body.
>
> (4) keeping Range.foreach small helps, which means in practice
> pre-processing beforehand, not forgetting "corner cases" such as:
>
> Int.MinValue to Int.MaxValue by (Int.MinValue / 2).abs
> 1 to 1 by Int.MaxValue
> 0 to Int.MinValue by Int.MinValue
Paul has also raised the hypothesis that using a literal increment
instead of adding step may also contribute to speed diferences. I can
see where this might make a difference to JIT: constant-increment
loops are so common it might special-case it.
I have not tested this hypothesis. I'll look into it.
On the other hand, it seems that subclassing Range gives benefits even
in the absence of changes to RichInt's type signature.
> (5) Daniel Sobral has made available a very complete benchmarking harness.
I'm taking pull requests to improve it further too. I have renamed the
repository, though. The new name is
https://github.com/dcsobral/scala-foreach-benchmark. Paul sent a pull
request to remove the hardcoded path I had in it, which made it really
easy to use.
--
Daniel C. Sobral
I travel to the future all the time.
Me too.
It does.
http://wikis.sun.com/display/HotSpotInternals/PerformanceTechniques
Constants
Use constants when you can.
It's OK to store them in local variables; the SSA representation tends
to keep them visible.
It's OK to store them in static final fields, too.
Static final fields can hold non-primitive, non-string constants.
The class of a static final field value is also constant, as is the
array length (if it's an array).
[Ed: that's what pushed me toward implementing public static final fields.]
Loops
The server compiler likes a loop with an int counter (int i = 0), a
constant stride (i++), and loop-invariant limit (i <= n).
^-----
Loops over arrays work especially well when the compiler can relate
the counter limit to the length of the array(s).
For long loops over arrays, the majority of iterations are free of
individual range checks.
Loops are typically peeled by one iteration, to "shake out" tests
which are loop invariant but execute only on a non-zero tripcount.
Null checks are the key example.
If a loop contains a call, it is best if that call is inlined, so that
loop can be optimized as a whole.
A loop can have multiple exits. Any deoptimization point counts as a loop exit.
If your loop has a rare exceptional condition, consider exiting to
another (slower) loop when it happens.
I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly, which will consume most of the time anyway. In some ways, it sounds natural that one can't beat imperative style on its own ground by abusing functional style. I haven't tried to convert the other tests but, for the greater good, optimizing idiomatic code paths against "while" loops encourages best coding practices.
Cheers,
Sébastien
I don't think it is worth investing more effort in optimizing "foreach" because it must perform I/O if it is used correctly
Or, look at every method implementation in TraversableLike.
Here is my grouped answer:
@Matthew To rephrase it another way, I think there has been enough
time sinked in optimizing foreach (at least I'll stop trying it) and,
as shown in my latest benchmarks, idiomatic code deserves some love
too. I was not trying to win a language popularity contest here, I
just tried to help and then share my conclusions in all honesty. See
also paragraph below.
@Ismael Unless the syntactic-sugar for for-comprehensions changes in
the future, I'd use "while" in Scala to program imperatively at memory
level and mutate in place arrays or vectors in performance critical
code. It has a less nicer syntax than the "for" loop in Java but
hopefully the scope can be limited. The reason is that I don't
entertain the thought that "foreach" is equivalent to "for" in Java.
For example, using "for" in Java to compute the sum of the elements in
an array is fast and good style. In Scala, "foreach" is a higher-order
function tailored for I/O. You can do the same as in Java, by mutating
a data structure outside the scope of the effectful function closure,
but it is less efficient than its imperative counterpart at that job.
Fair enough. "foldLeft" is another higher-function like "foreach", but
this one is fit for the job and it can be easily optimized to perform
comparably to the imperative loop. Thinking this way, one can use the
right tool for the right job and make the most out of Scala.
@Paul Ah ah, it doesn't leave much choices, does it? Well, I'll answer
that implementing a collection in terms of "foreach" is a design
choice that is questionable at best. See this one for example:
def isEmpty: Boolean = {
var result = true
breakable {
for (x <- this) {
result = false
break
}
}
result
}
It is much too complex for what it is and most of these methods are
overriden in subclasses anyway. Look at every overriden method
implementation in IterableLike for example. So, why bother?
Cheers!
Sébastien
the future, I'd use "while" in Scala to program imperatively at memory@Ismael Unless the syntactic-sugar for for-comprehensions changes in
level and mutate in place arrays or vectors in performance critical
code. It has a less nicer syntax than the "for" loop in Java but
hopefully the scope can be limited.
In Scala, "foreach" is a higher-order function tailored for I/O.
+100
-- Erik