Examining List.foldRight performance.

630 views
Skip to first unread message

Sean Parsons

unread,
Jan 5, 2013, 6:40:56 PM1/5/13
to scala-i...@googlegroups.com
After seeing this bug: https://issues.scala-lang.org/browse/SI-3295 I thought it best to actually investigate the performance. So I created this benchmark: https://gist.github.com/4464286 As you can see I included the two alternative implementations proposed in the bug report with some minor modifications (.elements became .iterator and I made the list a further parameter to the method) have also been included.

The performance results from Caliper are included in the gist and demonstrate that the two alternatives are likely not suitable candidates as they have proportionally much worse performance with all but the largest of Lists. Looking at the results for .reverse.foldLeft is enlightening as the performance difference between that and the current .foldRight isn't that much for lists smaller than 500 items, gets much closer at 500 items and for 1,000 items and above becomes more performant.

This benchmark however doesn't examine the potential issue of increased garbage due to the .reverse call that happens but replacing the implementation of List.foldRight with List.reverse.foldLeft doesn't look all that bad at all and makes List.foldRight viable for large collections (which is handy if your code only sees a Seq for example).

Paolo G. Giarrusso

unread,
Jan 5, 2013, 9:39:57 PM1/5/13
to scala-i...@googlegroups.com
On performance itself, you overlooked the need to flip the argument to foldRight. Looking at the type signatures should be enough to convince you that (coll foldRight zero)(f) is only equivalent to (coll.reverse foldLeft zero)(flip(f)), and I'd guess flip(f) costs significantly more than f. I'd ask you to rerun the tests.
However, at some point we should be able to inline foldRight in the caller and have flip(f) evaluated at compile-time by the optimizer - I just don't know how close we are (Miguel Garcia is the person to ask).

The rest of my answer will focus on why the bug should indeed be fixed, even if it was closed as WONTFIX. My strongest argument is that some implementations of foldRight have already been fixed, so the current situation is at best incoherent.

The main concern discussed in the ticket is not performance, but correctness. The problem is that different people disagree on the specification of foldRight. But I'd argue that at least the ScalaDocs should be fixed (so I've filed https://issues.scala-lang.org/browse/SI-6922).

I'll quote the current status from John Connor:
> As it stands now the function is useless – using it is a bug waiting to happen.

As I understand it, one of Martin's arguments was that since
> It's obvious that foldright is not tail-recursive
there's no point in fixing it. Now, I don't know in 2010, but at least in 2013 I feel Ben Wing is right:
> In reality, although Martin may personally have a clear idea which functions are and aren't tail-recursively-safe, the average Scala programmer has little or no idea.
By current standards, this should really be documented.

Moreover, Martin argued that one would need to fix also `last`, `reduceRight` and so on. In fact, `TraversableLike.last` is currently not affected; TraversableOnce.{fold,reduce}Right, IndexedSeqOptimized.{fold,reduce}Right are already fixed by using reverse.{fold,reduce}Left, while LinearSeqOptimized.fold/reduceRight have the same bug.

Another of Martin's arguments was that the change would confuse the computational model, but he never fleshed out that argument and I don't get it. I wondered whether an impure operator f might distinguish (coll foldRight zero)(f) from (coll.reverse foldLeft zero)(flip(f)), and I think the answer is "no". Moreover, the fix is already in the library.

Last Martin's argument went on as:
> It would just confuse the computation model, and will not work for streams anyway. Furthermore, you then need to do the same thing for reduceRight, last, and so on and so on. I don't want to go there. I realize that seen in isolation every ticket seems hugely important and therefore one tends to overengineer, at the cost of overall simplicity and consistency. I have to be the guardian of those.

(But take a look at Paul Chiusano's reply, about streams among other things - I need to take another look at it:

Paul Phillips

unread,
Jan 6, 2013, 4:29:20 AM1/6/13
to scala-i...@googlegroups.com
Here's an implementation which appears to me to dominate all other approaches: it is slightly slower than the current implementation for small N, significantly faster for large N, does not blow the stack, and can hardly be judged unreasonably complicated relative to just about anything else in our codebase. It takes advantage of lengthCompare to walk the list for the smaller of the list length and MAX_DEPTH (I used 10) with that small-constant-factor probe being the only overhead relative to the current foldRight (for N < 10) or relative to reverse.foldLeft (for N >= 10).

  def foldRightAlternative4[A, B](list: List[A])(z: B)(f: (A, B) => B): B =
    if (list.lengthCompare(10) < 0) // MAX_DEPTH=10
      list.foldRight(z)(f)
    else
      list.reverse.foldLeft(z)(flip(f))

As previously mentioned, reduceLeft should go through foldLeft and last is already not a problem - and would we have it otherwise? How on earth is anyone going to intuit that a) their sequence is or might be "too large" and b) that this quality is relevant for a call to .last? If I saw code like

  // can't use xs.last; somehow I know I have to be large-list-safe
  // at this particular spot, but not at other spots
  xs.reverse.head

I would think/hope the author's head needed examining. I'd be mortified if it turned out they were using the library as intended.

Sean Parsons

unread,
Jan 6, 2013, 5:34:30 AM1/6/13
to scala-i...@googlegroups.com
I think Stream should be left aside from this, anyone using Stream
should probably limit its exposure outside of it's specific uses due
to the inherent danger of creating infinite Streams or incredibly
large Streams that blow the heap.

It's easy to argue that in our simple examples a lot of people should
be able to see that with (0 to 100000).toList will go bang when
calling foldRight, but as Scala gains popularity what most people will
be seeing is something like Seq[User] returned from a library or a
method from another team. That developer will have no idea that they
shouldn't have used foldRight until 6 months into that code being in
production it blows the stack. Certainly in my case recently I found
that the fastest way to create a collection by appending was to go
through List.newBuilder, so it wouldn't surprise me if one day I used
List.foldRight without noticing this issue was going to happen. Given
that a developer can use a Seq that is actually a List, it would imply
that the use of foldRight should be explicitly discouraged at the
level of Seq down and encouraged for those collections that wont
explode, which seems like a terrible precedent to have.

The gist has been updated with the introduced flip call and it does
have some performance impact, but as you say that could easily vanish
through inlining, interestingly it almost appears as if is being
inlined with Paul's alternative version as that's much faster for
larger lists.

Sean.

Paul Phillips

unread,
Jan 6, 2013, 5:51:08 AM1/6/13
to scala-i...@googlegroups.com


On Sun, Jan 6, 2013 at 2:34 AM, Sean Parsons <seantp...@gmail.com> wrote:
I think Stream should be left aside from this, anyone using Stream
should probably limit its exposure outside of it's specific uses due
to the inherent danger of creating infinite Streams or incredibly
large Streams that blow the heap.

Not to mention that in its lifetime Stream has supplied us with an infinite sequence of pessimizations and performance bugs. The current implementation isn't much of a reason for anything except a new Stream. (A macro-based stream could be lazy in the head with no overhead...)

Sean Parsons

unread,
Jan 8, 2013, 2:10:08 PM1/8/13
to scala-i...@googlegroups.com
Is there a chance of this getting fixed or will it just remain broken forever being a landmine for developers to unwittingly step onto? I can create a PR for it, but if it's just gonna get closed straight away there seems no point.

Sean.

Paul Phillips

unread,
Jan 8, 2013, 2:18:06 PM1/8/13
to scala-i...@googlegroups.com
I have studiously avoided taking a position in the foldRight tickets because I like to take a break from arguing about things, but now that I've offered and measured an implementation and see literally no downside to using it instead, I guess I'm stuck. Please do open a ticket and attach the implementation from this thread. (Or reopen the john conner ticket, which is similar to my implementation.)
Reply all
Reply to author
Forward
0 new messages