Performance problem with for-expression

57 views
Skip to first unread message

Rüdiger Klaehn

unread,
Apr 25, 2012, 2:14:58 PM4/25/12
to scala-user
Hi all,

I just had a weird performance problem. I have an (immutable) Map with
string keys and very large objects as values. Since the values are
large, calling hashCode on them can take significant time.

I was iterating through the map with a for-expression looking like
this (in the real code I was doing something with name to get x, but
that is besides the point):

for ((name, value)<-map;x=name) {
println(x)
}

I noticed that this calls the hashCode on the value, which takes a lot
of time. By rewriting the loop like this:

for ((name, value)<-map) {
val x=name
println(x)
}

I don't get any invocation of value.hashCode, which is what I would
expect. What exactly is happening here?

Here is the complete code of the example:

---

package test

case class BigObject(name:String) {
override def hashCode = {
println("hashCode "+this)
name.hashCode
}
}

object ForExpressionTest extends App {

val objects = (0 until 5).map(x=>BigObject(x.toString))
val map = objects.map(x=>x.name->x).toMap
println("val inside loop body:")
for ((name, value)<-map) {
val x=name
println(x)
}
println("variable in for expression:")
for ((name, value)<-map;x=name) {
println(x)
}
}

---
And here is the output:
---

val inside loop body:
4
1
0
2
3
variable in for expression:
hashCode BigObject(4)
hashCode BigObject(1)
hashCode BigObject(4)
hashCode BigObject(1)
hashCode BigObject(0)
hashCode BigObject(2)
hashCode BigObject(3)
0
2
3
4
1

HamsterofDeath

unread,
Apr 25, 2012, 2:32:45 PM4/25/12
to scala...@googlegroups.com
hashcode is called on the tuple. i have never seen the usage of x = name
in a for comprehension, btw.
is it a crippled pattern match like "val 1 = 2"? what does the compiler
turn it to?


variable in for expression:
java.lang.Exception: Stack trace
at java.lang.Thread.dumpStack(Thread.java:1206)
at hstar.randomstuff.BigObject.hashCode(Hashcodetest.scala:12)
at scala.runtime.ScalaRunTime$._hashCode(ScalaRunTime.scala:193)
at scala.Tuple2.hashCode(Tuple2.scala:22)
at scala.collection.immutable.HashMap.elemHashCode(HashMap.scala:62)
at scala.collection.immutable.HashMap.computeHash(HashMap.scala:71)
at scala.collection.immutable.HashMap.$plus(HashMap.scala:53)
at scala.collection.immutable.HashMap.$plus(HashMap.scala:56)
at scala.collection.immutable.Map$Map4.updated(Map.scala:181)
at scala.collection.immutable.Map$Map4.$plus(Map.scala:182)
at scala.collection.immutable.Map$Map4.$plus(Map.scala:167)
at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:28)
at scala.collection.mutable.MapBuilder.$plus$eq(MapBuilder.scala:24)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:194)
at
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:176)
at
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:345)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:194)
at scala.collection.immutable.HashMap.map(HashMap.scala:36)
at
hstar.randomstuff.ForExpressionTest$delayedInit$body.apply(Hashcodetest.scala:27)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at
scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:60)
at scala.App$$anonfun$main$1.apply(App.scala:60)
at
scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:45)
at
scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)
at scala.App$class.main(App.scala:60)
at hstar.randomstuff.ForExpressionTest$.main(Hashcodetest.scala:17)
at hstar.randomstuff.ForExpressionTest.main(Hashcodetest.scala)

this fixes it:
for ((name, value) <- map; x <- List(name)) {
println(x)

Roland Kuhn

unread,
Apr 25, 2012, 2:40:05 PM4/25/12
to HamsterofDeath, scala...@googlegroups.com
ouch, I didn’t know _that_:

scala> reify(for(i <- 0 to 5; val x = i) println)
warning: there were 1 deprecation warnings; re-run with -deprecation for details
res54: reflect.mirror.Expr[Unit] =
Expr[Unit](scala.this.Predef.intWrapper(0).to(5).map(((i) => {
val x = i;
scala.Tuple2.apply(i, x)
}))(immutable.this.IndexedSeq.canBuildFrom).foreach({
final <synthetic> class $anonfun extends {
def <init>() = {
super.<init>();
()
};
final def apply(x$1) = {
case <synthetic> val x1 = x$1: @unchecked;
case3(){
if (x1.ne(null))
matchEnd2(scala.this.Predef.println())
else
case4()
};
case4(){
matchEnd2(throw new <type ?>(x1))
};
matchEnd2(x){
x
}
}
};
new <type ?>()
}))

That should explain things … (I have come to like the reify macro ;-) )

Regards,

Roland
Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn


Rüdiger Klaehn

unread,
Apr 25, 2012, 3:12:29 PM4/25/12
to Roland Kuhn, HamsterofDeath, scala...@googlegroups.com
Hi Roland,

I guess the reify macro is part of scala 2.10? My scala console
(2.9.1.final) does not know it. The only way to desugar a for loop is
to do scala -Xprint:typer -e "..."

I think that the behavior is very surprising and unintuitive. Is there
a reason the variable declaration can not be just moved into the
closure during for expresison desugaring?

I guess I have to go over many lines of code tomorrow and see if there
are any occurrences of that pattern.

Roland Kuhn

unread,
Apr 25, 2012, 3:17:24 PM4/25/12
to Rüdiger Klaehn, HamsterofDeath, scala...@googlegroups.com
Hi Rüdiger,

On Apr 25, 2012, at 21:12 , Rüdiger Klaehn wrote:

> I guess the reify macro is part of scala 2.10? My scala console
> (2.9.1.final) does not know it. The only way to desugar a for loop is
> to do scala -Xprint:typer -e "..."
>
yes, it’s 2.10.

> I think that the behavior is very surprising and unintuitive. Is there
> a reason the variable declaration can not be just moved into the
> closure during for expresison desugaring?
>
> I guess I have to go over many lines of code tomorrow and see if there
> are any occurrences of that pattern.
>
As I said, I was also surprised. My guess is that the wrapping inside a Tuple2 (and hence the map() operation) is necessary in the general case to make both x and i available to the code down-stream.

Regards,

Roland

Rüdiger Klaehn

unread,
Apr 25, 2012, 3:28:10 PM4/25/12
to Roland Kuhn, HamsterofDeath, scala...@googlegroups.com
On Wed, Apr 25, 2012 at 9:17 PM, Roland Kuhn <goo...@rkuhn.info> wrote:
> Hi Rüdiger,
>
> On Apr 25, 2012, at 21:12 , Rüdiger Klaehn wrote:
>
>> I guess the reify macro is part of scala 2.10? My scala console
>> (2.9.1.final) does not know it. The only way to desugar a for loop is
>> to do scala -Xprint:typer -e "..."
>>
> yes, it’s 2.10.
>
>> I think that the behavior is very surprising and unintuitive. Is there
>> a reason the variable declaration can not be just moved into the
>> closure during for expresison desugaring?
>>
>> I guess I have to go over many lines of code tomorrow and see if there
>> are any occurrences of that pattern.
>>
> As I said, I was also surprised. My guess is that the wrapping inside a Tuple2 (and hence the map() operation) is necessary in the general case to make both x and i available to the code down-stream.
>
Well, I guess it's just another thing one needs to be aware of.

It wouldn't be so bad if it would be wrapped like (k, (v,x)). But an
unnecessary call to hashCode is a real pain in many cases, especially
in a language like scala where almost everything overrides hashCode.
Reply all
Reply to author
Forward
0 new messages