The rationale of Scala MapLike.mapValues() ?

409 views
Skip to first unread message

Jan Goyvaerts

unread,
Mar 19, 2014, 8:11:07 AM3/19/14
to java...@googlegroups.com, scala...@googlegroups.com

I have been chasing a weird bug for some time now - and finally found it. I thought MapLike.mapValues() was a shortcut variant of map(), but to only map the values. Easy and less code.

The ScalaDoc of mapValues reads "Transforms this map by applying a function to every retrieved value". A generic explanation that might mean a lot of things. And surely, until now it always did the expected. I thought I was safe anyway because I made sure to use only immutable maps.

However, mapValues() causes the transformation function to run each time the map value is accessed. Which in my case is a big nono because the transformation takes a lot of time. Hence, doing it just once and store the results.

This is solved by using map() instead. It does the transformation once immediately.

While I understand the need for such functionalities, it also introduces the danger of misunderstanding them. It is poorly documented and the assumption of being safe when using immutable maps. Immutable => no side effects involved => safe. In my opinion, this is not an immutable map - because the values can change. Even though by deed I understand the whole structure is immutable. That's debate I won't even try to start. :-)

So, what's the rationale - Is it part of a concept I don't know yet about ? A concept with which "mapValues" would mean something else than mapping the values alone ?

And more importantly: Are there any other such traps hiding in Scala's API ? :-)

Is there a way to recognize them without checking every method explicitly first ?

Thanks !

Jan


====================================================================

class MapTransformationLearningSpec extends FunSpec {

  var addStatisticsCalled = 0

  describe("a Scala programmer") {
    it("should know how map transformations work") {
      val e = new Variable("e", "energy", classOf[Double])
      val m = new Variable("m", "mass", classOf[Double])
      val c = new Variable("c", "speed of light", classOf[Double])

      val vars = Seq(e, m, c)
      assert(addStatisticsCalled == 0, "addStatistics() was called %d times".format(addStatisticsCalled))
      val map: Map[String, Variable] = vars.map( v => v.name -> v).toMap.mapValues(v => addStatistics(v))

      // the map isn't empty, is it ?
      assert(!map.isEmpty)
      assert(map.size === 3)

      // I expect addStatistics() to be called already - was it ?
      println("Before accessing the statistics: addStatistics() was called %d times".format(addStatisticsCalled))

      // do the operation to check the statistics - does it call addStatistics then ?
      map.values.foreach(v => {
        assert(v.stats != None)
      })

      // do the operation to check the statistics - does it call addStatistics all over again ?
      map.values.foreach(v => {
        assert(v.stats != None)
      })

      // By now it should ideally be called three times only
      println("Before accessing the statistics: addStatistics() was called %d times".format(addStatisticsCalled))

      // At least be called sometime
      assert(addStatisticsCalled == 3, "addStatistics() was called an unexpected number of times: %d".format(addStatisticsCalled))
    }
  }

  def addStatistics(in: Variable): Variable = {
    addStatisticsCalled += 1
    val stat = new Statistics(0, 10, 100)
    in.copy(stats = Some(stat))
  }

}

case class Variable(name: String, title: String, dataType: Class[_], stats: Option[Statistics] = None)

case class Statistics(min: Double, max: Double, count: Int)


Jason Zaugg

unread,
Mar 19, 2014, 8:28:58 AM3/19/14
to Jan Goyvaerts, scala-user
On Wed, Mar 19, 2014 at 1:11 PM, Jan Goyvaerts <java.a...@gmail.com> wrote:

I have been chasing a weird bug for some time now - and finally found it. I thought MapLike.mapValues() was a shortcut variant of map(), but to only map the values. Easy and less code.

The ScalaDoc of mapValues reads "Transforms this map by applying a function to every retrieved value". A generic explanation that might mean a lot of things. And surely, until now it always did the expected. I thought I was safe anyway because I made sure to use only immutable maps.

However, mapValues() causes the transformation function to run each time the map value is accessed. Which in my case is a big nono because the transformation takes a lot of time. Hence, doing it just once and store the results.

This is solved by using map() instead. It does the transformation once immediately.

While I understand the need for such functionalities, it also introduces the danger of misunderstanding them. It is poorly documented and the assumption of being safe when using immutable maps. Immutable => no side effects involved => safe. In my opinion, this is not an immutable map - because the values can change. Even though by deed I understand the whole structure is immutable. That's debate I won't even try to start. :-)

So, what's the rationale - Is it part of a concept I don't know yet about ? A concept with which "mapValues" would mean something else than mapping the values alone ?

Ideally the method name would include the term "view", which would describe its behaviour better.

We've got a ticket to discuss improvements: https://issues.scala-lang.org/browse/SI-4776

Unfortunately these API decisions aren't easy to fix after the fact.
 

And more importantly: Are there any other such traps hiding in Scala's API ? :-)
 
Is there a way to recognize them without checking every method explicitly first ?

Map.filterKeys is another, mentioned on that ticket. 

Any time you call a higher-order method on a Seq, you need to be aware that it might be a `scala.collection.immutable.Stream`, which is lazy.

Example:

def sideEffect(ss: Seq[Int]): Seq[String] = ss.map { x => println(x); x.toString }
sideEffect: (ss: Seq[Int])Seq[String]

scala> sideEffect(Stream(1, 2, 3))
1
res0: Seq[String] = Stream(1, ?)

scala> sideEffect(List(1, 2, 3))
1
2
3
res1: Seq[String] = List(1, 2, 3)

It could also be a View:

scala> sideEffect(List(1, 2, 3).view)
res2: Seq[String] = SeqViewM(...)

So Seq abstracts over evaluation semantics of collections. When dealing with one generically, don't side-effect inside `map` / `filter` etc. Or convert to a concrete collection `.toList` / `.toVector` first (but be aware you might have been passed an infinite lazy stream!)

-jason

Jan Goyvaerts

unread,
Mar 19, 2014, 8:31:40 AM3/19/14
to Jason Zaugg, scala-user
On 03/19/2014 01:28 PM, Jason Zaugg wrote:
On Wed, Mar 19, 2014 at 1:11 PM, Jan Goyvaerts <java.a...@gmail.com> wrote:
...

Ideally the method name would include the term "view", which would describe its behaviour better.

We've got a ticket to discuss improvements: https://issues.scala-lang.org/browse/SI-4776

Unfortunately these API decisions aren't easy to fix after the fact.
IF they can be fixed at all. :-)

 
Map.filterKeys is another, mentioned on that ticket.
I'll remember this one. And yes, I'm using this one also. But apparently without harm this time.:-)


Any time you call a higher-order method on a Seq, you need to be aware that it might be a `scala.collection.immutable.Stream`, which is lazy.
Is that only for Seq ? Or are other collection types also impacted ?

What worries me is that it's so easy to make this mistake. Or to introduce it later because of the type introspection.

Anyway, many thanks for the explanation ! :-)

Jan

Jason Zaugg

unread,
Mar 19, 2014, 8:45:32 AM3/19/14
to Jan Goyvaerts, scala-user
On Wed, Mar 19, 2014 at 1:31 PM, Jan Goyvaerts <java.a...@gmail.com> wrote:
Is that only for Seq ? Or are other collection types also impacted ?

Naturally the supertypes of Seq are the same.

But we can also have views over IndexedSeq.

scala> IndexedSeq(1, 2, 3).view.map{ x => println("!!"); x }
res6: scala.collection.SeqView[Int,Seq[_]] = SeqViewM(...)

scala> .apply(2)
!!
res7: Int = 3 

So in general unless you are dealing with a concrete type (e.g. List, Vector, HashMap), you shouldn't side-effect in functions passed to higher-order methods on that collection. The exception is `foreach`:

scala> IndexedSeq(1, 2, 3).view.foreach(_ => println("!!"))
!!
!!
!!

What worries me is that it's so easy to make this mistake. Or to introduce it later because of the type introspection.

Anyway, many thanks for the explanation ! :-)

Java 8's approach to this is interesting: it makes *all* data structures lazy before letting you use map/filter et al. The motivation for this is not just to flush out bugs with the infrequently encountered lazy collection, as we've been discussing here, but rather to enable parallelization of collection transformations, which also rely on a side-effect free style.

-jason

mar

unread,
Mar 19, 2014, 10:34:26 AM3/19/14
to scala...@googlegroups.com, Jan Goyvaerts
This is also the case (as most probably know) with paulp's psp-view project and.. I must say it kind of makes sense. C# gets a similar idea since all the LINQ-operations work with ienumerables which also defer evaluation until it is needed.

It would be really nice.. But I digress..

Som Snytt

unread,
Mar 19, 2014, 12:07:45 PM3/19/14
to Jan Goyvaerts, scala-user
Fixing it would break the puzzler:


> And more importantly: Are there any other such traps hiding in Scala's API ? :-)

The list of puzzlers suffers for incompleteness.  And maybe lack of a PR budget.  (Taking PR as either public relations or pull requests.)

I like the branding for an eager version suggested on the ticket: mapValuesNow!

It makes me want to call the 800 number on your screen.




--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ryan LeCompte

unread,
Mar 19, 2014, 12:09:30 PM3/19/14
to Som Snytt, Jan Goyvaerts, scala-user
In our code base we have added an 'eagerMapValues' and
'eagerFilterKeys' to our RichMap implicit class. :-) It almost feels
like part of the standard library at this point!

Jan Goyvaerts

unread,
Mar 19, 2014, 12:57:44 PM3/19/14
to scala-user
On 03/19/2014 05:09 PM, Ryan LeCompte wrote:
> In our code base we have added an 'eagerMapValues' and
> 'eagerFilterKeys' to our RichMap implicit class. :-) It almost feels
> like part of the standard library at this point!
Changing the API is of course the way to go. :-)
Reply all
Reply to author
Forward
0 new messages