Seq aliasing scala.collection.Seq considered dangerous

1,794 views
Skip to first unread message

Heiko Seeberger

unread,
Oct 22, 2012, 12:44:26 PM10/22/12
to scala-i...@googlegroups.com
Hi,

Many Scala users, even very advanced ones, don't know that Seq (as defined in the package object scala)  is *not* an alias for scala.collection.immutable.Seq like many other collection aliases (Set, Map, etc.). Actually Seq is an alias for scala.collection.Seq which could be immutable or mutable.

Therefore, if one does not explicitly import scala.collection.immutable.Seq, mutability can leak into APIs which are considered immutable. This is in contradiction to Scala making it easy to write immutable code. And it simply is dangerous. Any chance this can be fixed?

Thanks
Heiko

--

Heiko Seeberger
Twitter: @hseeberger
Company: Typesafe - The software stack for applications that scale
Author of "Durchstarten mit Scala, a tutorial-style Scala book"

Alex Cruise

unread,
Oct 22, 2012, 1:38:10 PM10/22/12
to scala-i...@googlegroups.com
On Mon, Oct 22, 2012 at 9:44 AM, Heiko Seeberger <heiko.s...@gmail.com> wrote:
Many Scala users, even very advanced ones, don't know that Seq (as defined in the package object scala)  is *not* an alias for scala.collection.immutable.Seq like many other collection aliases (Set, Map, etc.). Actually Seq is an alias for scala.collection.Seq which could be immutable or mutable.

Therefore, if one does not explicitly import scala.collection.immutable.Seq, mutability can leak into APIs which are considered immutable. This is in contradiction to Scala making it easy to write immutable code. And it simply is dangerous. Any chance this can be fixed?

Wow, that's... Surprising.

-0xe1a

Daniel Sobral

unread,
Oct 22, 2012, 1:56:22 PM10/22/12
to scala-i...@googlegroups.com
On Mon, Oct 22, 2012 at 2:44 PM, Heiko Seeberger
<heiko.s...@gmail.com> wrote:
> Hi,
>
> Many Scala users, even very advanced ones, don't know that Seq (as defined
> in the package object scala) is *not* an alias for
> scala.collection.immutable.Seq like many other collection aliases (Set, Map,
> etc.). Actually Seq is an alias for scala.collection.Seq which could be
> immutable or mutable.
>
> Therefore, if one does not explicitly import scala.collection.immutable.Seq,
> mutability can leak into APIs which are considered immutable. This is in
> contradiction to Scala making it easy to write immutable code. And it simply
> is dangerous. Any chance this can be fixed?

It would prevent Array from being passed as Seq, which would be very
annoying when doing things around varargs.

--
Daniel C. Sobral

I travel to the future all the time.

Paul Phillips

unread,
Oct 22, 2012, 2:29:26 PM10/22/12
to scala-i...@googlegroups.com

On Mon, Oct 22, 2012 at 10:56 AM, Daniel Sobral <dcso...@gmail.com> wrote:
It would prevent Array from being passed as Seq, which would be very
annoying when doing things around varargs.

There is a cleanup available here which addresses more than one problem.  This has always horrified me:

scala> val xs = Array(1, 2, 3)
xs: Array[Int] = Array(1, 2, 3)

scala> case class Foo(args: Int*) { }
defined class Foo

scala> Foo(xs: _*)
res0: Foo = Foo(WrappedArray(1, 2, 3))

scala> xs(0) = 123

scala> res0
res2: Foo = Foo(WrappedArray(123, 2, 3))

Treating varargs this way, with no way to exclude mutable inputs, taints varargs for all uses.  This is especially annoying because case classes privilege varargs such that you can pattern match with unapplySeq semantics.

scala> res0 match { case Foo(123, xs @ _*) => xs }
res3: Seq[Int] = WrappedArray(2, 3)

So to me, varargs should require immutable.Seq, just as it would if the arguments were being passed individually.  That would mean a defensive copy of the array has to be made.  I would consider this a feature.  If that were to be a performance problem in some contexts, then "don't do that."

Undoubtedly controversial, but an implicit from Array -> immutable.Seq (which copies) would be easy enough to offer.

Jonas Bonér

unread,
Oct 22, 2012, 2:30:52 PM10/22/12
to scala-i...@googlegroups.com
I like that.

>



--
Jonas Bonér
Phone: +46 733 777 123
Home: http://jonasboner.com
Twitter: @jboner

martin odersky

unread,
Oct 22, 2012, 3:46:53 PM10/22/12
to scala-i...@googlegroups.com
I am not sure. Java uses mutable arrays. We will no doubt lose considerable performance in the interest of purity. Can we afford it? Right now, "slow collections" is one of the main arguments Scala detractors put forward.

And, it seems nobody in Java land cares one jota about loss of purity anyway. But they will hang us on the performance argument.

I am not yet decided on this, but just wanted to raise some doubts.

Cheers

 - Martin
 

>



--
Jonas Bonér
Phone: +46 733 777 123
Home: http://jonasboner.com
Twitter: @jboner



--
Martin Odersky
Prof., EPFL and Chairman, Typesafe
PSED, 1015 Lausanne, Switzerland
Tel. EPFL: +41 21 693 6863
Tel. Typesafe: +41 21 691 4967

Daniel Spiewak

unread,
Oct 22, 2012, 3:55:36 PM10/22/12
to scala-i...@googlegroups.com
As someone who cares very deeply about collection performance, I still fall on the "make type Seq = immutable.Seq" side of the fence.  The reason for this is I have no expectation of performance when I'm using varargs functions.  Seriously!  I will never, ever use a varargs function in a hot path, regardless of whether or not I am passing an Array as a Seq (which, incidentally, is something I would never do anyway).  Is this a regression from Java's varargs semantics?  Maybe, but it's something I can live with.

As has been pointed out, Java's varargs are broken.  We shouldn't preserve an extremely dangerous type semantic merely to attempt the preservation of Java's brokenness.

If Scala varargs performance were really a legitimate issue, the very first thing that should be done is create specialized "SeqN" collections for common arities.  In my testing, this results in an order of magnitude improvement, both in space and time.  Since this hasn't been done (and is not even on the radar), my assumption is that varargs are not considered (in general) to be a performance-sensitive feature, and therefore we shouldn't use varargs performance as a reason to restrict improvement in other areas.

Daniel

Matthew Pocock

unread,
Oct 22, 2012, 6:02:40 PM10/22/12
to scala-i...@googlegroups.com
I'm not sure how often this is an issue in real-life code. Certainly, in my code it's quite rare for me to use the incantation to make a Seq instance into a vararg parameter. How often are varargs parameters pushed in as a Seq instance, rather than supplied by compiler sugar from a list of instances? If varargs where altered to take an immutable Seq, then presumably we could have some sugaring/wrapping of a java native array as an immutable Seq, and then wash our hands of the whole affair? Native arrays already have to be wrapped in an ArrayWrapper (or at least did last time I looked).

Matthew
--
Dr Matthew Pocock
Integrative Bioinformatics Group, School of Computing Science, Newcastle University
skype: matthew.pocock
tel: (0191) 2566550

Rex Kerr

unread,
Oct 22, 2012, 6:03:13 PM10/22/12
to scala-i...@googlegroups.com
On Mon, Oct 22, 2012 at 3:55 PM, Daniel Spiewak <djsp...@gmail.com> wrote:
As someone who cares very deeply about collection performance, I still fall on the "make type Seq = immutable.Seq" side of the fence.  The reason for this is I have no expectation of performance when I'm using varargs functions.  Seriously!

Orders of magnitude still matter.  Making a defensive immutable copy of arrays (and other mutable collections) on Seq is a great way to force people to optimize code paths that otherwise wouldn't have required any attention.
 
I will never, ever use a varargs function in a hot path, regardless of whether or not I am passing an Array as a Seq (which, incidentally, is something I would never do anyway).  Is this a regression from Java's varargs semantics?  Maybe, but it's something I can live with.

But you will have to redefine what your "hot path" is in some cases.  If you're using arrays, you already know not to mutate them out from under yourself.  Why double-to-(O(1)->O(n)) your runtime when you are already aware that you're using arrays and that you have to keep their mutability in mind?
 
As has been pointed out, Java's varargs are broken.  We shouldn't preserve an extremely dangerous type semantic merely to attempt the preservation of Java's brokenness.

They are efficient and work as designed.  You can use Java varargs for moderately (but not extremely) performance-critical stuff.  I don't know how that is broken except inasmuch as everything mutable is broken (i.e. breakable).
 
If Scala varargs performance were really a legitimate issue, the very first thing that should be done is create specialized "SeqN" collections for common arities.

Agreed.  That would be 3-4x faster than the Java varargs method for small arities when dynamic dispatch could be skipped (and assuming specialization, or that you wouldn't use it for primitives).
 
In my testing, this results in an order of magnitude improvement, both in space and time.

Not over Java varargs.  The space usage is comparable to an array.  Time is ~3-4x.  Not that you're likely to run out of memory because of varargs usage in any case.
 
Since this hasn't been done (and is not even on the radar), my assumption is that varargs are not considered (in general) to be a performance-sensitive feature, and therefore we shouldn't use varargs performance as a reason to restrict improvement in other areas.

Okay, but that doesn't mean it's wise to degrade performance by another factor of two-to-N to save you from a problem that you already should be well-aware of.  Let's not make varargs usage the hotspot sooner than we need to.

  --Rex

P.S. We'd also need to add a ClonedArray class in the immutable part of the hierarchy, or the performance could get much, much worse as we run into unavoidable boxing of primitives.  The only reason it's not much, much worse now is that WrappedArray.ofX is effectively specialized.

nafg

unread,
Oct 22, 2012, 10:26:52 PM10/22/12
to scala-i...@googlegroups.com
Maybe varargs should be spec'd to use collection.Seq, not scala.Seq?

Paul Phillips

unread,
Oct 23, 2012, 7:34:45 PM10/23/12
to scala-i...@googlegroups.com


On Mon, Oct 22, 2012 at 3:03 PM, Rex Kerr <ich...@gmail.com> wrote:
 
If Scala varargs performance were really a legitimate issue, the very first thing that should be done is create specialized "SeqN" collections for common arities.

Agreed.  That would be 3-4x faster than the Java varargs method for small arities when dynamic dispatch could be skipped (and assuming specialization, or that you wouldn't use it for primitives).

Can you guys give me some code examples of exactly what you're talking about with this, ideally with highlighted sections labeled "what it does now" and "what it should do" ?

Heiko Seeberger

unread,
Oct 24, 2012, 7:26:07 AM10/24/12
to scala-i...@googlegroups.com
On Oct 22, 2012, at 9:46 PM, martin odersky <martin....@epfl.ch> wrote:

> I am not sure. Java uses mutable arrays. We will no doubt lose considerable performance in the interest of purity. Can we afford it? Right now, "slow collections" is one of the main arguments Scala detractors put forward.
>
> And, it seems nobody in Java land cares one jota about loss of purity anyway. But they will hang us on the performance argument.

I am not concerned about purity (here), but about correctness. Even very very experienced Scala developers working for some company offering commercial support for Scala and the likes introduced bugs (the possibility of leaking mutability into an immutable API) into their otherwise outstanding software.

In my opinion Seq -> collection.Seq is too dangerous to stay alive. If vararg performance suffers, I consider this less critical.

Heiko

Rex Kerr

unread,
Oct 24, 2012, 12:08:07 PM10/24/12
to scala-i...@googlegroups.com

It's just like Sets, with Set1 through Set4--that's done to improve performance of sets; it's a huge win there, but there are still pretty big wins for other collections in certain use cases.  That's really the key.  The upside is speed for tiny collections, but the downside is that you might have to do dynamic dispatch on more code paths than before.  I don't have time to test that exhaustively, but here's a highlight of the differences:


  def ptime[A](a: => A) = {
    val t0 = System.nanoTime
    val ans = a
    println("Elapsed: %.3f".format((System.nanoTime-t0)*1e-9))
    ans
  }
  def repeat[A](n: Int)(a: => A): A = if (n<=1) a else { a; repeat(n-1)(a) }

  trait ClassImpl { def apply(i: Int): Int }
  class FourInts(a: Int, b: Int, c: Int, d: Int) extends ClassImpl {
    def apply(i: Int) = if (i<2) if (i<1) a else b else if (i<3) c else d
  }
  class SixInts(a: Int, b: Int, c: Int, d: Int, e: Int, f: Int) extends ClassImpl {
    def apply(i: Int) = if (i<3) { if (i<1) a else if (i<2) b else c } else if (i<5) { if (i<4) d else e } else f
  }

  def classimpl = {
    var i,s = 0
    while (i<1000000) {
      val cia = if ((i&1)==1) new SixInts(i-2,i-1,i,i+1,i+2,i+3) else new FourInts(i,i+1,i+2,i+3)
      s += cia(3)
      i += 1
    }
    s
  }

  def vararg(xs: Int*) = xs
  def varargsimpl = {
    var i,s = 0
    while (i<1000000) {
      val va = if ((i&1)==1) vararg(i-2,i-1,i,i+1,i+2,i+3) else vararg(i,i+1,i+2,i+3)
      s += va(3)
      i += 1
    }
    s
  }

  def arrayimpl = {
    var i,s = 0
    while (i<1000000) {
      val ar = (
        if ((i&1)==1) {
          val a = new Array[Int](6); a(0) = i-2; a(1) = i-1; a(2) = i; a(3) = i+1; a(4) = i+2; a(5) = i+3; a
        } else {
          val a = Array(i,i+1,i+2,i+3); a(0) = i; a(1) = i+1; a(2) = i+2; a(3) = i+3; a
        }
      )
      s += ar(3)
      i += 1
    }
    s
  }

  def vectorimpl = {
    var i,s = 0
    while (i<1000000) {
      val vb = Vector.newBuilder[Int]
      val v = (
        if ((i&1)==1) { vb+=i-2; vb+=i-1; vb+=i; vb+=i+1; vb+=i+2; vb+=i+3 }
        else { vb+=i; vb+=i+1; vb+=i+2; vb+=i+3 }
      ).result
      s += v(3)
      i += 1
    }
    s
  }

If you run ptime(repeat(100){ classimpl }) it's about 2x faster than ptime(repeat(100){ varargsimpl }) on 2.10.  It's 4x faster than arrayimpl, which really just points that we need to improve our array handling (it's 8x faster than the naive Array-by-using-companion-apply, not shown here).  It's 12x faster than vectorimpl.  (6x faster than listimpl, not shown here, but which you can imagine; 45x faster than collection.immutable.Seq(...), 55x faster than collection.immutable.IndexedSeq(...).)

Anyway, the current varargs implementation is pretty good--so good that it shows that the array implementation is embarrassingly bad right now.  It could be better yet with custom classes.

  --Rex

P.S. Just to show why Daniel's "I don't use varargs" can be a good point, try this one out:

  def func4(a: Int, b: Int, c: Int, d: Int) = c
  def func6(a: Int, b: Int, c: Int, d: Int, e: Int, f: Int) = c
  def funcimpl = {
    var i,s = 0
    while (i<1000000) {
      s += (if ((i&1)==1) func6(i-2,i-1,i,i+1,i+2,i+3) else func4(i,i+1,i+2,i+3))
      i += 1
    }
    s
  }

This is ~7x faster than classimpl, and 14x faster than what you can get with varargs now.  So, indeed, in a tight inner loop, don't use varargs!  But making it several times slower again is still probably foolish, and if you've already got a Seq, it shouldn't really hurt to pass it along to a varargs function instead of one that explicitly takes a collection (multiple dispatch being a possible exception).

Paul Phillips

unread,
Oct 24, 2012, 12:30:31 PM10/24/12
to scala-i...@googlegroups.com
Thanks a lot for that.  The main reason I asked is that the place I could see where this would matter is a method like this, which indeed was your example:


   def vararg(xs: Int*) = xs

And I never encounter methods with signatures like that. So I was wondering why this would be a subject of great interest. The signature I encounter a lot is this:

  def vararg[T](xs: T*) = xs

This could well be because I don't get out much.

Rex Kerr

unread,
Oct 24, 2012, 12:47:26 PM10/24/12
to scala-i...@googlegroups.com
Sure, that matters less, relatively, because the overhead of the other stuff you're doing is greater.  Then varargs is only about 1.5x slower than classimpl (and classimpl is only about 1.4x slower than funcimpl):

  class X(val x: Int) {
    var next: X = null
  }
  var x0 = new X(0); var x1 = new X(1); var x2 = new X(2); var x3 = new X(3); var x4 = new X(4); var x5 = new X(5)
  x0.next = x1; x1.next = x2; x2.next=x3; x3.next=x4; x4.next=x5; x5.next=x0;
  var x = x0
  def nx = {x=x.next; x}

  trait ClassImpl[A] { def apply(i: Int): A }
  class Four[A](a: A, b: A, c: A, d: A) extends ClassImpl[A] {

    def apply(i: Int) = if (i<2) if (i<1) a else b else if (i<3) c else d
  }
  class Six[A](a: A, b: A, c: A, d: A, e: A, f: A) extends ClassImpl[A] {

    def apply(i: Int) = if (i<3) { if (i<1) a else if (i<2) b else c } else if (i<5) { if (i<4) d else e } else f
  }
 
  def classimpl = {
    var i,s = 0
    while (i<1000000) {
      val cia = if ((i&1)==1) new Six(nx,nx,nx,nx,nx,nx) else new Four(nx,nx,nx,nx)
      s += cia(3).x
      i += 1
    }
    s
  }

  def vararg[A](as: A*) = as

  def varargsimpl = {
    var i,s = 0
    while (i<1000000) {
      val va = if ((i&1)==1) vararg(nx,nx,nx,nx,nx,nx) else vararg(nx,nx,nx,nx)
      s += va(3).x
      i += 1
    }
    s
  }

--Rex

Paul Phillips

unread,
Oct 24, 2012, 1:16:43 PM10/24/12
to scala-i...@googlegroups.com
Okay, using this as my test:

object Test {
  def ptime[A](a: => A) = {
    val t0 = System.nanoTime
    val ans = a
    println("Elapsed: %.3f".format((System.nanoTime-t0)*1e-9))
    ans
  }
  def repeat[A](n: Int)(a: => A): A = if (n<=1) a else { a; repeat(n-1)(a) }

  def vararg1(xs: Int*) = xs
  def vararg2[T](xs: T*) = xs

  def varargsimpl1 = {
    var i,s = 0
    while (i < 1000000) {
      val va = if ((i&1)==1) vararg1(i-2,i-1,i,i+1,i+2,i+3) else vararg1(i,i+1,i+2,i+3)
      s += va(3)
      i += 1
    }
    s
  }
  def varargsimpl2 = {
    var i,s = 0
    while (i < 1000000) {
      val va = if ((i&1)==1) vararg2(i-2,i-1,i,i+1,i+2,i+3) else vararg2(i,i+1,i+2,i+3)
      s += va(3)
      i += 1
    }
    s
  }

  def varargsimpl3 = {
    var i,s = 0
    while (i < 1000000) {
      val va = if ((i&1)==1) vararg2("1", "2", "3", "4", "5", "6") else vararg2("1", "2", "3", "4")
      s += va(3).toInt
      i += 1
    }
    s
  }

  def main(args: Array[String]): Unit = {
    ptime(repeat(200)(varargsimpl1))
    ptime(repeat(200)(varargsimpl2))
    ptime(repeat(200)(varargsimpl3))
  }
}

I moved the line this far with type-and-arity-specialized sequences for varargs:

% qscalac -d /tmp test/files/run/tspecialize-varargs.scala && qscala -cp /tmp Test 
Elapsed: 1.916
Elapsed: 1.832
Elapsed: 3.419

% scalac3 -d /tmp test/files/run/tspecialize-varargs.scala && scala3 -cp /tmp Test 
Elapsed: 2.646
Elapsed: 2.507
Elapsed: 4.266

If you think it should have moved further, or would like to improve the test, I'm all ears.

Rex Kerr

unread,
Oct 24, 2012, 1:45:07 PM10/24/12
to scala-i...@googlegroups.com, Daniel Spiewak
Looks good to me--about what I'd have expected, and pushes the "you must optimize" boundary a nice bit further away.  Maybe Daniel can comment on the "order of magnitude" thing vs. the observed ~4/3.

  --Rex

P.S. Crikey--how'd you get that done so fast?

Daniel Spiewak

unread,
Oct 24, 2012, 1:48:54 PM10/24/12
to Rex Kerr, scala-i...@googlegroups.com
The "order of magnitude" was going off of some benchmarking I did a while back with Anti-XML and small collections.  We discovered that by specializing Vector for cases 0-4, we were able to obtain a massive overall improvement, and an order of magnitude at some call sites.  I hadn't done benchmarking with varargs specifically, and a number of things have changed since I did the original testing with Anti-XML (in the 2.8.x timeline).  ~4/3 sounds within the margin of expectation, given how I was carrying results over from a slightly-related benchmark.

So…can we fix the Seq alias now?  :-)

Daniel

Alex Cruise

unread,
Oct 24, 2012, 2:16:30 PM10/24/12
to scala-i...@googlegroups.com, Rex Kerr
On Wed, Oct 24, 2012 at 10:48 AM, Daniel Spiewak <djsp...@gmail.com> wrote:
So…can we fix the Seq alias now?  :-)
 
Does this have the potential to break existing code?  

-0xe1a

Daniel Spiewak

unread,
Oct 24, 2012, 2:17:32 PM10/24/12
to scala-i...@googlegroups.com, Rex Kerr
Yes, but arguably any such existing code was dubious to begin with.

Daniel

Rex Kerr

unread,
Oct 24, 2012, 5:15:17 PM10/24/12
to Daniel Spiewak, scala-i...@googlegroups.com
On Wed, Oct 24, 2012 at 1:48 PM, Daniel Spiewak <djsp...@gmail.com> wrote:
So…can we fix the Seq alias now?  :-)

Er, no?  You were saying that we should fix Seq because who cares if it's slower because varargs is already way too slow.  Paul just made varargs faster.  That is the opposite of making it a better idea to fix the Seq alias, unless we have benchmarks showing that the immutable Seq solution isn't too slow.  I didn't write any such benchmarks--I only wrote those for tiny collections with varargs since you claimed they were slow.

If Paul's speedup change is also entirely immutable (I didn't quite catch whether that was the case), then we just need one more benchmark to check the large-existing-collection case and see how bad it is.  Then we can make an informed decision.  If it's not entirely immutable, we'd also need to know how fast or slow the immutable solution was.

  --Rex

Daniel Spiewak

unread,
Oct 24, 2012, 5:27:22 PM10/24/12
to scala-i...@googlegroups.com
My point is that the question of performance is an artificial conflation of concerns.  I spend a good deal of time working on super-tight, performance sensitive code in Scala.  If performance is even a question in my mind, I'm simply not going to use varargs in any form.  Whether we fix the Seq alias or not is irrelevant to such code.

In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language.  I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion.  If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases.  If it wasn't a serious concern before, why should it be a concern now?  Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?

There are a lot of things in Scala and in the standard library that are just horrendously slow.  There are obvious things like structural types, but also somewhat less obvious things like BitSet.  Despite the performance problems, these features still exist and are the way they are for a good reason.  People using the language just have to be aware of the implications of these features and design accordingly.  My argument is that varargs has traditionally fallen (and should continue to fall) in the same category: a convenience feature that is used judiciously with eyes open.

Fixing the Seq alias neither rectifies nor exacerbates this situation.

Daniel

Rex Kerr

unread,
Oct 24, 2012, 6:01:04 PM10/24/12
to scala-i...@googlegroups.com
On Wed, Oct 24, 2012 at 5:27 PM, Daniel Spiewak <djsp...@gmail.com> wrote:
My point is that the question of performance is an artificial conflation of concerns.  I spend a good deal of time working on super-tight, performance sensitive code in Scala.

Yes, I do too.
 
  If performance is even a question in my mind, I'm simply not going to use varargs in any form.  Whether we fix the Seq alias or not is irrelevant to such code.

It is irrelevant to the most-heavily-executed bits of code.  But there's a lot _more_ code, the not-the-inner-loop-but-still-executed-kind-of-a-lot code, that one has to worry about.  These things add up, as you have probably noticed from your profiling.  Do I use a for loop or do I use while?  Well, that depends.  I'd rather use the for loop, but in the innermost loop of time-sensitive code I don't even think about it: it's while or recursion, only.  In the next loop out, if the inner loop is small, I _still_ can't use for.  The slower for is, the more inner iterations I need before I can decide to not worry about it and use the compact expressive form instead of the highest-performance form.

Surely you do this also.

If you speed up the library, then you get to use compact expressive code in more places.  If you slow it down, you get to use it in fewer places (or your program just gets slower).
 
In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language.  I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion.  If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases.  If it wasn't a serious concern before, why should it be a concern now?  Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?

Because as you saw, the change _wasn't actually that big_.  Varargs already were pretty good.  If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.

Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq.  I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think.  Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
 
There are a lot of things in Scala and in the standard library that are just horrendously slow.  There are obvious things like structural types, but also somewhat less obvious things like BitSet.

Yes.  Many maps too.

What I conclude is that things need work, like immutable RB trees did.  They're much improved in 2.10.

I don't conclude that we should make more things slow, just because some other things are slow.
 
Despite the performance problems, these features still exist and are the way they are for a good reason.  People using the language just have to be aware of the implications of these features and design accordingly.

Agreed, and implementing "just-Seq is immutable.Seq" changes all these implications that were hopefully thought about before when it mattered.  It's not something to do lightly if you care about performance.  If there are really no substantive downsides, then I'm all for it.  I prefer immutability by default, mutability by request.  But I'm not at all convinced yet that there are no substantive downsides.

Let me give you an example of something that *might* make a difference for immutable Seq in general (not varargs), picked at random out of a grep "Seq".  I have a line of code that looks like

  def decode(encoded: Seq[String], format: String)

in a little custom base64 library.  I'd get the Seq[String] as s.split(' '), which means I've got an array.  Adding the cost of a single-object wrapper isn't too bad.  What about copying the entire array?  Is that enough to matter for my use case (which is decoding a bunch of small binary data packed into a string)?  Don't know.  Probably not, but I might have to check again if parsing performance goes down too much, especially in the case where I'm only trying to extract a subset of the items (which I can control by altering the format string).
 
My argument is that varargs has traditionally fallen (and should continue to fall) in the same category: a convenience feature that is used judiciously with eyes open.

I think we should fix these things so we can use them as injudiciously as possible.
 
Fixing the Seq alias neither rectifies nor exacerbates this situation.

I think it does/can exacerbate, for the reasons described above.  Again, it's not a no, but it's a proceed-with-care.  I don't think we've done quite enough due diligence to even change the vararg to immutable.Seq, let alone the default Seq to immutable.Seq.

  --Rex

Naftoli Gugenheim

unread,
Oct 24, 2012, 7:36:59 PM10/24/12
to scala-i...@googlegroups.com
Why is the implementation of varargs dependent on what Seq means by default when I write it in my code?
Spec varargs to use scala.collection.Seq, and change the type alias scala.Seq to point to s.c.i.Seq. What's the problem with that?

Daniel Sobral

unread,
Oct 24, 2012, 11:37:54 PM10/24/12
to scala-i...@googlegroups.com
On Wed, Oct 24, 2012 at 9:36 PM, Naftoli Gugenheim <nafto...@gmail.com> wrote:
> Why is the implementation of varargs dependent on what Seq means by default
> when I write it in my code?
> Spec varargs to use scala.collection.Seq, and change the type alias
> scala.Seq to point to s.c.i.Seq. What's the problem with that?

I'm almost sure it was like that at one time before, and people kept
getting confused about "val s: Seq[T] = args" not working.

>
>
> On Wed, Oct 24, 2012 at 7:26 AM, Heiko Seeberger <heiko.s...@gmail.com>
> wrote:
>>
>> On Oct 22, 2012, at 9:46 PM, martin odersky <martin....@epfl.ch>
>> wrote:
>>
>> > I am not sure. Java uses mutable arrays. We will no doubt lose
>> > considerable performance in the interest of purity. Can we afford it? Right
>> > now, "slow collections" is one of the main arguments Scala detractors put
>> > forward.
>> >
>> > And, it seems nobody in Java land cares one jota about loss of purity
>> > anyway. But they will hang us on the performance argument.
>>
>> I am not concerned about purity (here), but about correctness. Even very
>> very experienced Scala developers working for some company offering
>> commercial support for Scala and the likes introduced bugs (the possibility
>> of leaking mutability into an immutable API) into their otherwise
>> outstanding software.
>>
>> In my opinion Seq -> collection.Seq is too dangerous to stay alive. If
>> vararg performance suffers, I consider this less critical.
>>
>> Heiko
>>
>



Roland Kuhn

unread,
Oct 25, 2012, 1:03:09 AM10/25/12
to scala-i...@googlegroups.com

25 okt 2012 kl. 00:01 skrev Rex Kerr:

In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language.  I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion.  If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases.  If it wasn't a serious concern before, why should it be a concern now?  Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?

Because as you saw, the change _wasn't actually that big_.  Varargs already were pretty good.  If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.

Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq.  I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think.  Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
 
You are arguing besides the main point: all data structures which are not specially imported from collection.mutable are names for the immutable version, except for Seq.

  def makeDecider(trapExit: Seq[Class[_ <: Throwable]]): Decider =
    { case xif (trapExit exists (_ isInstance x)) Restart else Escalate }

This seemingly correct code (incidentally written by YT) is in fact a bad bug, since you could pass in an Array[] and happily observe all the breakage which ensues after changing the array’s content afterwards (thinking that the immutable Decider must surely be okay).

collection.Seq not having mutators is not at all a valid defense!

Now ask yourself how many Scala developers would actually have noticed this bug during a review (quite a few very experienced ones actually missed it). Set, Seq, Map, …

The very same argument holds for all the other odd ones (TraverableOnce, Traversable, Iterable, IndexedSeq). Correctness trumps performance, anytime.

Regards,

Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn


Erik Osheim

unread,
Oct 25, 2012, 1:12:17 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 07:03:09AM +0200, Roland Kuhn wrote:
> You are arguing besides the main point: all data structures which are
> not specially imported from collection.mutable are names for the
> immutable version, except for Seq.

So if Seq were changed to forbid Array, then what type would one use to
generalize across List, Vector, and Array? Or is it the idea that this
generalization exists that you object to?

-- Erik

Naftoli Gugenheim

unread,
Oct 25, 2012, 1:21:26 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 1:12 AM, Erik Osheim <er...@plastic-idolatry.com> wrote:
On Thu, Oct 25, 2012 at 07:03:09AM +0200, Roland Kuhn wrote:
> You are arguing besides the main point: all data structures which are
> not specially imported from collection.mutable are names for the
> immutable version, except for Seq.

So if Seq were changed to forbid Array, then what type would one use to
generalize across List, Vector, and Array?

collection.Seq

Roland Kuhn

unread,
Oct 25, 2012, 1:23:30 AM10/25/12
to scala-i...@googlegroups.com, scala-i...@googlegroups.com


On 25 okt 2012, at 07:21, Naftoli Gugenheim <nafto...@gmail.com> wrote:



On Thu, Oct 25, 2012 at 1:12 AM, Erik Osheim <er...@plastic-idolatry.com> wrote:
On Thu, Oct 25, 2012 at 07:03:09AM +0200, Roland Kuhn wrote:
> You are arguing besides the main point: all data structures which are
> not specially imported from collection.mutable are names for the
> immutable version, except for Seq.

So if Seq were changed to forbid Array, then what type would one use to
generalize across List, Vector, and Array?

collection.Seq
 
Precisely! Allowing mutable collections should be a conscious choice.

Regards,

Roland

Heiko Seeberger

unread,
Oct 25, 2012, 1:34:44 AM10/25/12
to scala-i...@googlegroups.com
On Oct 25, 2012, at 7:03 AM, Roland Kuhn <goo...@rkuhn.info> wrote:

> Correctness trumps performance, anytime.

Correctness is the exact and only reason why I started this thread.
Amen

Heiko

Kevin Wright

unread,
Oct 25, 2012, 1:57:09 AM10/25/12
to scala-i...@googlegroups.com

+1

Seth Tisue

unread,
Oct 25, 2012, 7:37:22 AM10/25/12
to scala-i...@googlegroups.com
I'm with Heiko on this. It's hard to say which is worse, the
unsafeness or the inconsistency. Combine them, and it's like being
stabbed by your two best friends from two different directions.

This has been bugging me for years. It nags at me practically every
time I type S-e-q, and that's a lot.

Seriously, I'd love it if this were changed.

--
Seth Tisue | Northwestern University | http://tisue.net
developer, NetLogo: http://ccl.northwestern.edu/netlogo/

√iktor Ҡlang

unread,
Oct 25, 2012, 7:47:31 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 1:37 PM, Seth Tisue <se...@tisue.net> wrote:
I'm with Heiko on this. It's hard to say which is worse, the
unsafeness or the inconsistency. Combine them, and it's like being
stabbed by your two best friends from two different directions.

This has been bugging me for years. It nags at me practically every
time I type S-e-q, and that's a lot.

Seriously, I'd love it if this were changed.

Also related:

collection.Seq.to[immutable.Seq] doesn't short-circuit if the collection.Seq IS-A immutable.Seq already
 

--
Seth Tisue | Northwestern University | http://tisue.net
developer, NetLogo: http://ccl.northwestern.edu/netlogo/



--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Josh Suereth

unread,
Oct 25, 2012, 8:43:32 AM10/25/12
to scala-i...@googlegroups.com

.to never short circuits.

Roland Kuhn

unread,
Oct 25, 2012, 8:52:11 AM10/25/12
to scala-i...@googlegroups.com
Yes, but why is this desirable?

Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn


√iktor Ҡlang

unread,
Oct 25, 2012, 8:55:17 AM10/25/12
to scala-i...@googlegroups.com


On Thu, Oct 25, 2012 at 2:43 PM, Josh Suereth <joshua....@gmail.com> wrote:

.to never short circuits.

The usefulness of it is extremely diminished if I need to write this every time:

def smth(x: Seq[Int]) {
val realX = x match {
  case x: immutable.Seq[_] => x.asInstanceOf[immutable.Seq[Whathaveyou]]
  case other => other.to[immutable.Seq]

Josh Suereth

unread,
Oct 25, 2012, 8:56:00 AM10/25/12
to scala-i...@googlegroups.com
Never said it was desirable.  It's more a limitation currently.  If you have a mechanism/fix for that, please submit!  I have ideas for how to make it so, but none of my initial attempts panned out.   Remember the method was initially called 'buildTo' to denote this.   The other to methods can short circuit because they use the class hierarchy.   The to method will have to use some fancy type tricks on top of the existing fancy type tricks.

√iktor Ҡlang

unread,
Oct 25, 2012, 9:01:14 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 2:56 PM, Josh Suereth <joshua....@gmail.com> wrote:
Never said it was desirable.  It's more a limitation currently.  If you have a mechanism/fix for that, please submit!  I have ideas for how to make it so, but none of my initial attempts panned out.   Remember the method was initially called 'buildTo' to denote this.   The other to methods can short circuit because they use the class hierarchy.   The to method will have to use some fancy type tricks on top of the existing fancy type tricks.

Haven't thought much about it, but instinctively I fee like it should be a method on CanBuildFrom, or we reify To and check if this isInstanceOf To

Josh Suereth

unread,
Oct 25, 2012, 9:37:02 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 9:01 AM, √iktor Ҡlang <viktor...@gmail.com> wrote:


On Thu, Oct 25, 2012 at 2:56 PM, Josh Suereth <joshua....@gmail.com> wrote:
Never said it was desirable.  It's more a limitation currently.  If you have a mechanism/fix for that, please submit!  I have ideas for how to make it so, but none of my initial attempts panned out.   Remember the method was initially called 'buildTo' to denote this.   The other to methods can short circuit because they use the class hierarchy.   The to method will have to use some fancy type tricks on top of the existing fancy type tricks.

Haven't thought much about it, but instinctively I fee like it should be a method on CanBuildFrom, or we reify To and check if this isInstanceOf To
 
We could reify the ClassTag, but types are not in the standard library.  My instinct said it belongs in CanBuildFrom as well, but that's part of a much more aggressive collection refactoring.

- Josh

Simon Ochsenreither

unread,
Oct 25, 2012, 9:50:06 AM10/25/12
to scala-i...@googlegroups.com

We could reify the ClassTag, but types are not in the standard library.

I have thought a lot about this for a long time ... any reason why we shouldn't move over to Type/Class-tagged collections by default? 4/8 bytes overhead per collection instance vs. primitive arrays and more intuitive pattern matching semantics?

Josh Suereth

unread,
Oct 25, 2012, 9:52:55 AM10/25/12
to scala-i...@googlegroups.com
For Types, it's because Types are a lot of overhead (the whole scala-reflect).

For Classes, it's not a bad idea except for collections of things that aren't quite a class.

val x: Seq[{ def close(): Unit }] = Seq(System.in, System.out)


- Josh

Simon Ochsenreither

unread,
Oct 25, 2012, 10:03:09 AM10/25/12
to scala-i...@googlegroups.com
Could the compiler choose a parent type that's a valid Java class?

√iktor Ҡlang

unread,
Oct 25, 2012, 10:06:01 AM10/25/12
to scala-i...@googlegroups.com


On Thu, Oct 25, 2012 at 4:03 PM, Simon Ochsenreither <simon.och...@gmail.com> wrote:
Could the compiler choose a parent type that's a valid Java class?

How would it handle: def x[T](f: T) = Seq(f)

Rex Kerr

unread,
Oct 25, 2012, 10:11:48 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 1:03 AM, Roland Kuhn <goo...@rkuhn.info> wrote:

25 okt 2012 kl. 00:01 skrev Rex Kerr:

In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language.  I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion.  If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases.  If it wasn't a serious concern before, why should it be a concern now?  Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?

Because as you saw, the change _wasn't actually that big_.  Varargs already were pretty good.  If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.

Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq.  I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think.  Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
 
You are arguing besides the main point: all data structures which are not specially imported from collection.mutable are names for the immutable version, except for Seq.

I'm arguing a different aspect of the point, I think.  I agree that ideally the defaults would be consistent and would be immutable.  The question is how big of an impact it is to change at this point.  In particular, is it okay to just switch from collection.Seq to collection.immutable.Seq at some point, or does one need to figure out how to have a deprecation period?  If it's an inconsequential change, you just fix it.  My point is that it is _not_ necessarily an inconsequential change for people who have high-performance code, and that the fix is less trivial than many things that are deprecated (like removing a method) because it will presumably massively affect the library interface also, and thus force people to re-optimize if they want to maintain performance.
 

  def makeDecider(trapExit: Seq[Class[_ <: Throwable]]): Decider =
    { case xif (trapExit exists (_ isInstance x)) Restart else Escalate }

This seemingly correct code (incidentally written by YT) is in fact a bad bug, since you could pass in an Array[] and happily observe all the breakage which ensues after changing the array’s content afterwards (thinking that the immutable Decider must surely be okay).

collection.Seq not having mutators is not at all a valid defense!

Agreed.  I wasn't really thinking it was enough defense, just that it's as much defense as you can get without having to think about performance a lot.

If you ask whether I would have noticed the code that you wrote above as a bug, the answer is "probably not".  But if you ask instead whether I would have caught a bug where you took an array, passed it in somewhere, and then modified it: probably yes!  If you spend much time working with mutable data structures, it's a good idea to get used to being careful about mutating the data out from under you.  It is kind of a gotcha if you don't use mutable data structures very often, I'll agree, and that's why I agree it would have been better to have Seq be immutable.Seq by default.
 
The very same argument holds for all the other odd ones (TraverableOnce, Traversable, Iterable, IndexedSeq). Correctness trumps performance, anytime.

As long as there is a high-performance route, I agree.  Otherwise, no that's not a given; people are willing to spend some extra thought to gain extra performance if they can, sometimes.  That the computation-speed-minded people are to be ignored in favor of the coding-speed-minded people is not a given.  One of Scala's claims to fame is that it can be used to write high-performance code.  I agree that a bias towards correctness is good, but one has to ask what the tradeoff really is.

In this particular case, I think high-performance options can still exist since collection.Seq still exists.  But do you think we can change the interface of the ~215 instances of Seq in defs in the Scala library without thinking about the impact on performance, or deciding which of those really need to be collection.Seq and which need to be collection.immutable.Seq?

Also, I'm not sure I agree about TraversableOnce etc.--there is something to be said for easy abstraction as well as correctness.

  --Rex

Matthew Pocock

unread,
Oct 25, 2012, 10:39:39 AM10/25/12
to scala-i...@googlegroups.com
Yeah, I think this is part of a more aggressive collection refactoring. My first thought about how to support coll.to[C1, C2] was that there should be a typeclass that captures this operation, CollectionTo, which coll.to delegates on to. One instance would require an <:< instance and do the cast, another instances would delegate to a CanBuildFrom instance and handle a generic collection to a buildable type, and then perhaps we'd provide hand-rolled ones to handle things like [SortedSet, SortedSet] that needs to pipe the ordering correctly, and so on. This would let us get the right behaviour for the different contexts without runtime instanceof checks or relying on implementors overriding the correct magical hooks or trusting to the vagaries of trait linearisation.

This ship has sailed for 2.10, but it would be really nice to revisit collections once value classes and macros have settled a bit, with a view to getting a clean separation between the different collections interfaces and the data structures that underpin these.

Matthew

--
Dr Matthew Pocock
Integrative Bioinformatics Group, School of Computing Science, Newcastle University
skype: matthew.pocock
tel: (0191) 2566550

√iktor Ҡlang

unread,
Oct 25, 2012, 11:07:23 AM10/25/12
to scala-i...@googlegroups.com
On Thu, Oct 25, 2012 at 4:11 PM, Rex Kerr <ich...@gmail.com> wrote:


On Thu, Oct 25, 2012 at 1:03 AM, Roland Kuhn <goo...@rkuhn.info> wrote:

25 okt 2012 kl. 00:01 skrev Rex Kerr:

In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language.  I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion.  If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases.  If it wasn't a serious concern before, why should it be a concern now?  Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?

Because as you saw, the change _wasn't actually that big_.  Varargs already were pretty good.  If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.

Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq.  I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think.  Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
 
You are arguing besides the main point: all data structures which are not specially imported from collection.mutable are names for the immutable version, except for Seq.

I'm arguing a different aspect of the point, I think.  I agree that ideally the defaults would be consistent and would be immutable. 

Yeah

In predef:

  type Map[A, +B] = immutable.Map[A, B]
  type Set[A]     = immutable.Set[A]

exercise for the reader:

  type Seq[A] = ???
 
The question is how big of an impact it is to change at this point.  In particular, is it okay to just switch from collection.Seq to collection.immutable.Seq at some point, or does one need to figure out how to have a deprecation period?  If it's an inconsequential change, you just fix it.  My point is that it is _not_ necessarily an inconsequential change for people who have high-performance code, and that the fix is less trivial than many things that are deprecated (like removing a method) because it will presumably massively affect the library interface also, and thus force people to re-optimize if they want to maintain performance.
 

  def makeDecider(trapExit: Seq[Class[_ <: Throwable]]): Decider =
    { case xif (trapExit exists (_ isInstance x)) Restart else Escalate }

This seemingly correct code (incidentally written by YT) is in fact a bad bug, since you could pass in an Array[] and happily observe all the breakage which ensues after changing the array’s content afterwards (thinking that the immutable Decider must surely be okay).

collection.Seq not having mutators is not at all a valid defense!

Agreed.  I wasn't really thinking it was enough defense, just that it's as much defense as you can get without having to think about performance a lot.

If you ask whether I would have noticed the code that you wrote above as a bug, the answer is "probably not".  But if you ask instead whether I would have caught a bug where you took an array, passed it in somewhere, and then modified it: probably yes!  If you spend much time working with mutable data structures, it's a good idea to get used to being careful about mutating the data out from under you.  It is kind of a gotcha if you don't use mutable data structures very often, I'll agree, and that's why I agree it would have been better to have Seq be immutable.Seq by default.
 
The very same argument holds for all the other odd ones (TraverableOnce, Traversable, Iterable, IndexedSeq). Correctness trumps performance, anytime.

As long as there is a high-performance route, I agree.  Otherwise, no that's not a given; people are willing to spend some extra thought to gain extra performance if they can, sometimes.  That the computation-speed-minded people are to be ignored in favor of the coding-speed-minded people is not a given.  One of Scala's claims to fame is that it can be used to write high-performance code.  I agree that a bias towards correctness is good, but one has to ask what the tradeoff really is.

In this particular case, I think high-performance options can still exist since collection.Seq still exists.  But do you think we can change the interface of the ~215 instances of Seq in defs in the Scala library without thinking about the impact on performance, or deciding which of those really need to be collection.Seq and which need to be collection.immutable.Seq?

Also, I'm not sure I agree about TraversableOnce etc.--there is something to be said for easy abstraction as well as correctness.

  --Rex




--
Viktor Klang

Akka Tech Lead
Typesafe - The software stack for applications that scale

Twitter: @viktorklang

Heiko Seeberger

unread,
Oct 25, 2012, 11:34:18 AM10/25/12
to scala-i...@googlegroups.com
OK, it is an official issue now: https://issues.scala-lang.org/browse/SI-6570

IMHO we should generate some deprecation or warning for Scala 2.10 (if this is possible and in line with the release policy) and apply the change in 2.11.

Heiko

Paolo G. Giarrusso

unread,
Oct 25, 2012, 12:57:12 PM10/25/12
to scala-i...@googlegroups.com
In this code:

def foo(varargs: Any*) = println(varargs.getClass())
foo(1, 2, 3) //prints "class scala.collection.mutable.WrappedArray$ofRef".

the compiler could wrap Array(1, 2, 3) in a different, immutable wrapper without any extra copy.
And I'd really like the compiler to do that.
Discussion below.

Il giorno mercoledì 24 ottobre 2012 23:01:06 UTC+1, Rex Kerr ha scritto:
If you speed up the library, then you get to use compact expressive code in more places.  If you slow it down, you get to use it in fewer places (or your program just gets slower).
I see that argument a lot and usually like it. But I like when it does not mean worsening the semantics: the right tradeoff is to give the programmer the right tools (think garbage collection) while using very fancy techniques (say, the result of 40 years of research) for making it fast. If you apply this indefinitely, you get something like Self (a much better and extremely cool Smalltalk).

I guess here we might not have those techniques (but see below), and maybe what you suggest is the right decision, but this argument is still much weaker.

In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language.  I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion.  If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases.  If it wasn't a serious concern before, why should it be a concern now?  Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?

Because as you saw, the change _wasn't actually that big_.  Varargs already were pretty good.  If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.

Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq.  I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think.  Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.

Do you pass many arrays as varargs explicitly? In my code, there's many more arrays being passed _implicitly_, which is extremely annoying, because the compiler uses Array by default:

def foo(varargs: Any*) = println(varargs.getClass())
foo(1, 2, 3) //prints "class scala.collection.mutable.WrappedArray$ofRef".

Now, the hot path we're talking about does not include mutating varargs. I'm sure there can be cases when you do that, and maybe important cases, but that's definitely not the most common choice. So, why does the compiler automatically create a _mutable_ wrapper to the array? It could create an immutable wrapper without introducing any copy in the hot path. However, creating such a wrapper is an "unsafe" operation: you can call it only if you know that the raw array is not used afterward and cannot be extracted from the wrapper. But in the above code, Array(1, 2, 3) is not going to be aliased, so you don't need the copy.

Today, otherwise immutable case classes become mutable because of varargs, and that's really bad and serves little performance purpose. And I think this should change even if Seq keeps pointing to collection.Seq.

That's the only thing I take issue with. If you have a mutable sequence, I accept that you really want to pass it to a variadic function.

Paolo

Stefan Zeiger

unread,
Oct 26, 2012, 6:48:07 AM10/26/12
to scala-i...@googlegroups.com
On 2012-10-25 15:37, Josh Suereth wrote:
Haven't thought much about it, but instinctively I fee like it should be a method on CanBuildFrom, or we reify To and check if this isInstanceOf To
 
We could reify the ClassTag, but types are not in the standard library.  My instinct said it belongs in CanBuildFrom as well, but that's part of a much more aggressive collection refactoring.

In that case, I'll just point to my instinct from 4 months ago when this was discussed before: https://groups.google.com/d/msg/scala-internals/q7tVEOItmP8/ADqNrTipxTkJ

Pavel Pavlov

unread,
Oct 27, 2012, 7:00:35 AM10/27/12
to scala-i...@googlegroups.com


On Thursday, October 25, 2012 11:57:12 PM UTC+7, Paolo G. Giarrusso wrote:

Do you pass many arrays as varargs explicitly? In my code, there's many more arrays being passed _implicitly_, which is extremely annoying, because the compiler uses Array by default:

def foo(varargs: Any*) = println(varargs.getClass())
foo(1, 2, 3) //prints "class scala.collection.mutable.WrappedArray$ofRef".

Now, the hot path we're talking about does not include mutating varargs. I'm sure there can be cases when you do that, and maybe important cases, but that's definitely not the most common choice. So, why does the compiler automatically create a _mutable_ wrapper to the array? It could create an immutable wrapper without introducing any copy in the hot path. However, creating such a wrapper is an "unsafe" operation: you can call it only if you know that the raw array is not used afterward and cannot be extracted from the wrapper. But in the above code, Array(1, 2, 3) is not going to be aliased, so you don't need the copy.

 
Why not pass this way to the end and not to create a fully functional immutable array class in the standard library? Something like this:

========================
class ImmArray[+T] private(data: Array[T]) extends immutable.IndexedSeq[T] {
  def apply(i: Int) = data(i)
  def toArray = data.clone()
  ...
}

object ImmArray {
  private[collection] def makeNoCopy[T](arr: Array[T]) = new ImmArray(arr)
  def make[T](xs: collection.Seq[T]): ImmArray[T] = xs match {
    case immArray: ImmArray[T] => immArray
    case imm: immutable.Seq[T] => makeNoCopy(imm.toArray)
    case _ => makeNoCopy(xs.toArray.clone())
  }

  def apply[T](xs: T*) = make(xs)

  def fill[T](n: Int)(elem: => T) = makeNoCopy(Array.fill(n)(elem))
  def tabulate[T](n: Int)(f: Int => T) = makeNoCopy(Array.tabulate(n)(f))
  def iterate[T](start: T, len: Int)(f: T => T) = makeNoCopy(Array.iterate(start, len)(f))

  def newBuilder[T: ClassTag]: Builder[T, ImmArray[T]] = ArrayBuilder.make[T]() mapResult makeNoCopy
  ...
}
========================

With it the only thing compiler should do when passing varargs to method is to wrap collected varargs array with `ImmArray.makeNoCopy`.

I believe this class will be very usable besides varargs processing:
 - it can have public creation methods symmetric to those in Array object: apply, fill, tabulate, iterate, etc.
 - it can be "specialized" to avoid boxing of primitives (like WrappedArray.ofXXX)
 - ImmArray can be made covariant on its element type
 - ImmArrayBuilder that works like ListBuffer (i.e. copy exported collection lazily) can be implemented; or better, method `toImmArray` with the same semantics can be added to ArrayBuffer
 - some collection implementations can use `ImmArray.makeNoCopy` internally to export internal arrays they do not mutate
 - (arguably) `makeNoCopy` can be made accessible to those users who need most performance while using "trust me, nobody would dare alter this array"-style interfaces (e.g. for Java interop or legacy code).

So, with this class we'll get fastest/smallest random-access immutable covariant collection backed by an array.

What do you think?


Regards,
Pavel

Paul Phillips

unread,
Oct 27, 2012, 10:21:31 AM10/27/12
to scala-i...@googlegroups.com


On Sat, Oct 27, 2012 at 4:00 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
What do you think?

I've thought about implementing an immutable array a number of times; it does seem to fill a need. I only wish we had some kind of uniqueness capability which would allow for promoting an existing array to an immutable array with verification that there are no more references to it running around elsewhere. But it sounds plenty useful just the way you describe it.

Matthew Pocock

unread,
Oct 27, 2012, 10:28:30 AM10/27/12
to scala-i...@googlegroups.com


On Oct 27, 2012 3:21 PM, "Paul Phillips" <pa...@improving.org> wrote:
>
>
>
> On Sat, Oct 27, 2012 at 4:00 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
>>
>> What do you think?
>
>
> I've thought about implementing an immutable array a number of times; it does seem to fill a need.

An immutable array wrapper is a useful thing. There are lots of cases where we know that nobody has a reference to the underlying array.

> I only wish we had some kind of uniqueness capability which would allow for promoting an existing array to an immutable array with verification that there are no more references to it running around elsewhere.

Dereference at most once references would be useful for this.

Matthew

Pavel Pavlov

unread,
Oct 27, 2012, 11:25:39 AM10/27/12
to scala-i...@googlegroups.com


2012/10/27 Paul Phillips <pa...@improving.org>


I only wish we had some kind of uniqueness capability which would allow for promoting an existing array to an immutable array with verification that there are no more references to it running around elsewhere.

Then you need something like C++ rvalue references, which will (greatly? somewhat?) complicate type system. Moreover, AFAIU to ensure that uniqueness property holds for a reference the compiler must employ some form of escape analysis (C++ doesn't check this and allow user manually declare object reference/pointer as "safe" with `std::move`).

BTW, what people think of the great immutability hole also known as `ListBuffer.readOnly` method? (This method destroy guarantees that any external List you use is immutable).

Paul Phillips

unread,
Oct 27, 2012, 12:13:18 PM10/27/12
to scala-i...@googlegroups.com


On Sat, Oct 27, 2012 at 8:25 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
BTW, what people think of the great immutability hole also known as `ListBuffer.readOnly` method? (This method destroy guarantees that any external List you use is immutable).

Yikes. I hadn't really noticed it. Seems like a serious bug to me.

scala> val buf = scala.collection.mutable.ListBuffer[Int](1, 2)
buf: scala.collection.mutable.ListBuffer[Int] = ListBuffer(1, 2)

scala> def f(): List[Int] = buf.readOnly
f: ()List[Int]

scala> val innocent = f()
innocent: List[Int] = List(1, 2)

scala> buf ++= (1 to 100)
res4: buf.type = ListBuffer(1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)

scala> innocent
res5: List[Int] = List(1, 2, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100)

Daniel Spiewak

unread,
Oct 27, 2012, 9:33:15 PM10/27/12
to scala-i...@googlegroups.com
This doesn't surprise me, give how ListBuffer is implemented.  Really the only way to avoid this situation while preserving a O(1) immutable copy in the common case would be to change the state of the ListBuffer such that subsequent writes lazily rebuild the internal structure.

Daniel

Paul Phillips

unread,
Oct 27, 2012, 10:02:52 PM10/27/12
to scala-i...@googlegroups.com

On Sat, Oct 27, 2012 at 6:33 PM, Daniel Spiewak <djsp...@gmail.com> wrote:
This doesn't surprise me, give how ListBuffer is implemented.  Really the only way to avoid this situation while preserving a O(1) immutable copy in the common case would be to change the state of the ListBuffer such that subsequent writes lazily rebuild the internal structure.

It already does that. The issue is that readOnly offers a means to get a List, but does not mark it as exported so it can be mutated after that point. If you call toList, you get the same list as you do with readOnly, but the buffer is marked dirty.

It's completely unnecessary to expose readOnly as far as I can see.

martin odersky

unread,
Oct 28, 2012, 6:44:11 AM10/28/12
to scala-i...@googlegroups.com
Is it used somewhere in the standard library? I agree that if possible we should deprecate it.

Cheers

 - Martin

Pavel Pavlov

unread,
Oct 28, 2012, 7:36:34 AM10/28/12
to scala-i...@googlegroups.com

2012/10/28 martin odersky <martin....@epfl.ch>


On Sun, Oct 28, 2012 at 3:02 AM, Paul Phillips <pa...@improving.org> wrote:

It's completely unnecessary to expose readOnly as far as I can see.

 
Is it used somewhere in the standard library? I agree that if possible we should deprecate it.

First of all, `readOnly` implementation in ListBuffer is just wrong. Correct one should be `def readOnly = toList` or `def readOnly: collection.Seq = this`.

Method `readOnly` declared in BufferLike as:
  /** Provide a read-only view of this buffer as a sequence
* @return A sequence which refers to this buffer for all its operations.
*/
  def readOnly: scala.collection.Seq[A] = toSeq

I must say that its exact semantics is unclear for me from this declaration. I see three possibilities here:
A) it should return a snapshot over current state of the buffer. In that case it should return immutable.Seq, not collection.Seq.
Besides that, I'm not sure that the name `readOnly` is good enough for such semantics. It's better to have something like `snapshot` here.

B) it should return a view over changing state of the buffer. In that case it can be implemented right in BufferLike as:
  final def readOnly: scala.collection.Seq[A] = this
Such implementation fix all the problems. However, in that case I wonder if there's a duplication of views functionality here?
So, do we really need such method? Again, is the name `readOnly` clear enough?

C) it works like (A) or (B), but it is unspecified how exactly. I believe it's the worst case possible.

In the standard library, `readOnly` has virtually no usages.

To summarize it up, I'd deprecate this method and reimplement it as in (B).

 

Cheers

 - Martin


Pavel Pavlov

unread,
Oct 28, 2012, 8:20:22 AM10/28/12
to scala-i...@googlegroups.com


On Sunday, October 28, 2012 6:36:37 PM UTC+7, Pavel Pavlov wrote:
B) it should return a view over changing state of the buffer. In that case it can be implemented right in BufferLike as:
  final def readOnly: scala.collection.Seq[A] = this
To summarize it up, I'd deprecate this method and reimplement it as in (B).

If no one has objections, I'll create pull request into 2.10.0-wip.

Pavel Pavlov

unread,
Oct 28, 2012, 8:50:43 AM10/28/12
to scala-i...@googlegroups.com
2012/10/27 Paul Phillips <pa...@improving.org>


But it sounds plenty useful just the way you describe it.

Ok, then there's a plan we may follow to cure varargs mutability problem:

1) Add ImmArray class to the standard library
2) Change the way compiler generates varargs: call ImmArray.makeNoCopy to wrap arrays.
3) Change the treatment of `T*` in the spec & compiler: translate `T*` to `immutable.Seq[T]`, not to `collection.Seq[T]` as it's done now.

First two steps produce binary incompatible changes, so they can be done either just now, or we will wait for 2.11.
Third step can be done only at major revision and moreover, it requires deprecation cycle:
3a) Implement deprecation warning in scalac, which catch expressions like `foo(xs: _*)`, where type of `xs` does not conform to immutable.Seq and `xs` isn't varargs argument itself: such code will not compile after step (3).

These steps should fix varargs problem. If we'll go this way, we must implement step (3a) right now, before 2.10.0 release.
Otherwise, we'll have to live with the varargs bug at least for another two major Scala versions.

What do you think of it?

Regards,
Pavel.

P.S. It would be great to fix Seq aliasing problem as well, however I cannot see now how we can establish deprecation cycle in that case. Is there any ideas?

Jason Zaugg

unread,
Oct 28, 2012, 9:15:37 AM10/28/12
to scala-i...@googlegroups.com
On Sun, Oct 28, 2012 at 1:50 PM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
> Ok, then there's a plan we may follow to cure varargs mutability problem:
>
> 1) Add ImmArray class to the standard library
> 2) Change the way compiler generates varargs: call ImmArray.makeNoCopy to
> wrap arrays.
> 3) Change the treatment of `T*` in the spec & compiler: translate `T*` to
> `immutable.Seq[T]`, not to `collection.Seq[T]` as it's done now.
>
> First two steps produce binary incompatible changes, so they can be done
> either just now, or we will wait for 2.11.
> Third step can be done only at major revision and moreover, it requires
> deprecation cycle:
> 3a) Implement deprecation warning in scalac, which catch expressions like
> `foo(xs: _*)`, where type of `xs` does not conform to immutable.Seq and `xs`
> isn't varargs argument itself: such code will not compile after step (3).
>
> These steps should fix varargs problem. If we'll go this way, we must
> implement step (3a) right now, before 2.10.0 release.
> Otherwise, we'll have to live with the varargs bug at least for another two
> major Scala versions.

Perhaps you could you do it in a single step (more realistically in
2.11.0) by having the compiler rewrite `foo(xs: _*)` (where `xs` does
not conform and is not adaptable to `immutable.Seq[_]`) to
`foo(xs.toImmutableSeq: _*)` and issue the deprecation warning at the
same time. I'd prefer this to a blanket implicit from Seq =>
immutable.Seq.

> P.S. It would be great to fix Seq aliasing problem as well, however I cannot
> see now how we can establish deprecation cycle in that case. Is there any
> ideas?

Yep, it will be painful.

@deprecated("scala.Seq will be retargeted to use scala.immutable.Seq
in Scala 2.12.0. Use `scala.collection.Seq` instead. Sorry about
that...", since = "2.11.0")
type Seq[X] = collection.Seq[X]

-jason

Paul Phillips

unread,
Oct 28, 2012, 9:24:36 AM10/28/12
to scala-i...@googlegroups.com


On Sun, Oct 28, 2012 at 5:50 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
These steps should fix varargs problem. If we'll go this way, we must implement step (3a) right now, before 2.10.0 release.
Otherwise, we'll have to live with the varargs bug at least for another two major Scala versions.

Based on the positions being taken in pull request land, you should discard any idea of getting any of this into 2.10.

Paul Phillips

unread,
Oct 28, 2012, 10:01:45 AM10/28/12
to scala-i...@googlegroups.com


On Sun, Oct 28, 2012 at 4:36 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
First of all, `readOnly` implementation in ListBuffer is just wrong. Correct one should be `def readOnly = toList` or `def readOnly: collection.Seq = this`.

It is wrong, but if the implementation were "def readOnly = toList" it would be completely redundant. The whole point of the method, ill-advised as it may be, is to not be toList. As witnessed by the comment:

  /** expose the underlying list but do not mark it as exported */
  override def readOnly: List[A] = start

That's the implementation in ListBuffer (not the one inherited from BufferLike.) As testified, the only different between that and toList is the absence of the line "exported = !isEmpty".

Pavel Pavlov

unread,
Oct 28, 2012, 11:10:21 AM10/28/12
to scala-i...@googlegroups.com


2012/10/28 Paul Phillips <pa...@improving.org>

I see. My point is that `ListBuffer.readOnly` overrides public method `BufferLike.readOnly`, so it must satisfy LSP. In that case it means it should behave like a snapshot: (A) `def readOnly = toList` or like a view: (B) `def readOnly: collection.Seq = this`.
Both variants are not redundant: `def foo(b: Buffer) = b.readOnly` and `def bar(b: Buffer) = b.toList` lead to different results for arbitrary Buffer implementation (e.g. for ArrayBuffer).

BTW, I found that initial intension of `readOnly` was variant (B):
  /** return a read only alias of this buffer
   */
  def readOnly : Seq[A]
https://github.com/scala/scala/commit/c739e595a326a9288179b270426991ee6c8431cf#diff-26

martin odersky

unread,
Oct 28, 2012, 11:28:26 AM10/28/12
to scala-i...@googlegroups.com
I think you are right. As far as I can tell, the intent was that it provides a view. So, the implementation

  def readOnly = this

Is perfectly fine. No need to destroy any invariant.

Cheers

 - Martin

Som Snytt

unread,
Oct 28, 2012, 12:50:49 PM10/28/12
to scala-i...@googlegroups.com

@migration("Life is all about change. People change, APIs change. But the default Seq should be immutable.", changedIn="2.10.0")
 
Reply all
Reply to author
Forward
0 new messages