Many Scala users, even very advanced ones, don't know that Seq (as defined in the package object scala) is *not* an alias for scala.collection.immutable.Seq like many other collection aliases (Set, Map, etc.). Actually Seq is an alias for scala.collection.Seq which could be immutable or mutable.Therefore, if one does not explicitly import scala.collection.immutable.Seq, mutability can leak into APIs which are considered immutable. This is in contradiction to Scala making it easy to write immutable code. And it simply is dangerous. Any chance this can be fixed?
It would prevent Array from being passed as Seq, which would be very
annoying when doing things around varargs.
>
--
Jonas Bonér
Phone: +46 733 777 123
Home: http://jonasboner.com
Twitter: @jboner
As someone who cares very deeply about collection performance, I still fall on the "make type Seq = immutable.Seq" side of the fence. The reason for this is I have no expectation of performance when I'm using varargs functions. Seriously!
I will never, ever use a varargs function in a hot path, regardless of whether or not I am passing an Array as a Seq (which, incidentally, is something I would never do anyway). Is this a regression from Java's varargs semantics? Maybe, but it's something I can live with.
As has been pointed out, Java's varargs are broken. We shouldn't preserve an extremely dangerous type semantic merely to attempt the preservation of Java's brokenness.
If Scala varargs performance were really a legitimate issue, the very first thing that should be done is create specialized "SeqN" collections for common arities.
In my testing, this results in an order of magnitude improvement, both in space and time.
Since this hasn't been done (and is not even on the radar), my assumption is that varargs are not considered (in general) to be a performance-sensitive feature, and therefore we shouldn't use varargs performance as a reason to restrict improvement in other areas.
If Scala varargs performance were really a legitimate issue, the very first thing that should be done is create specialized "SeqN" collections for common arities.
Agreed. That would be 3-4x faster than the Java varargs method for small arities when dynamic dispatch could be skipped (and assuming specialization, or that you wouldn't use it for primitives).
So…can we fix the Seq alias now? :-)
So…can we fix the Seq alias now? :-)
My point is that the question of performance is an artificial conflation of concerns. I spend a good deal of time working on super-tight, performance sensitive code in Scala.
If performance is even a question in my mind, I'm simply not going to use varargs in any form. Whether we fix the Seq alias or not is irrelevant to such code.
In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language. I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion. If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases. If it wasn't a serious concern before, why should it be a concern now? Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?
There are a lot of things in Scala and in the standard library that are just horrendously slow. There are obvious things like structural types, but also somewhat less obvious things like BitSet.
Despite the performance problems, these features still exist and are the way they are for a good reason. People using the language just have to be aware of the implications of these features and design accordingly.
My argument is that varargs has traditionally fallen (and should continue to fall) in the same category: a convenience feature that is used judiciously with eyes open.
Fixing the Seq alias neither rectifies nor exacerbates this situation.
In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language. I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion. If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases. If it wasn't a serious concern before, why should it be a concern now? Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?
Because as you saw, the change _wasn't actually that big_. Varargs already were pretty good. If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.
Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq. I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think. Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn
On Thu, Oct 25, 2012 at 07:03:09AM +0200, Roland Kuhn wrote:So if Seq were changed to forbid Array, then what type would one use to
> You are arguing besides the main point: all data structures which are
> not specially imported from collection.mutable are names for the
> immutable version, except for Seq.
generalize across List, Vector, and Array?
On Thu, Oct 25, 2012 at 1:12 AM, Erik Osheim <er...@plastic-idolatry.com> wrote:
On Thu, Oct 25, 2012 at 07:03:09AM +0200, Roland Kuhn wrote:So if Seq were changed to forbid Array, then what type would one use to
> You are arguing besides the main point: all data structures which are
> not specially imported from collection.mutable are names for the
> immutable version, except for Seq.
generalize across List, Vector, and Array?collection.Seq
+1
I'm with Heiko on this. It's hard to say which is worse, the
unsafeness or the inconsistency. Combine them, and it's like being
stabbed by your two best friends from two different directions.
This has been bugging me for years. It nags at me practically every
time I type S-e-q, and that's a lot.
Seriously, I'd love it if this were changed.
--
Seth Tisue | Northwestern University | http://tisue.net
developer, NetLogo: http://ccl.northwestern.edu/netlogo/
.to never short circuits.
Roland Kuhn
Typesafe – The software stack for applications that scale.
twitter: @rolandkuhn
.to never short circuits.
Never said it was desirable. It's more a limitation currently. If you have a mechanism/fix for that, please submit! I have ideas for how to make it so, but none of my initial attempts panned out. Remember the method was initially called 'buildTo' to denote this. The other to methods can short circuit because they use the class hierarchy. The to method will have to use some fancy type tricks on top of the existing fancy type tricks.
On Thu, Oct 25, 2012 at 2:56 PM, Josh Suereth <joshua....@gmail.com> wrote:
Never said it was desirable. It's more a limitation currently. If you have a mechanism/fix for that, please submit! I have ideas for how to make it so, but none of my initial attempts panned out. Remember the method was initially called 'buildTo' to denote this. The other to methods can short circuit because they use the class hierarchy. The to method will have to use some fancy type tricks on top of the existing fancy type tricks.Haven't thought much about it, but instinctively I fee like it should be a method on CanBuildFrom, or we reify To and check if this isInstanceOf To
We could reify the ClassTag, but types are not in the standard library.
Could the compiler choose a parent type that's a valid Java class?
25 okt 2012 kl. 00:01 skrev Rex Kerr:In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language. I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion. If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases. If it wasn't a serious concern before, why should it be a concern now? Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?
Because as you saw, the change _wasn't actually that big_. Varargs already were pretty good. If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.
Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq. I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think. Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
You are arguing besides the main point: all data structures which are not specially imported from collection.mutable are names for the immutable version, except for Seq.
def makeDecider(trapExit: Seq[Class[_ <: Throwable]]): Decider ={ case x ⇒ if (trapExit exists (_ isInstance x)) Restart else Escalate }This seemingly correct code (incidentally written by YT) is in fact a bad bug, since you could pass in an Array[] and happily observe all the breakage which ensues after changing the array’s content afterwards (thinking that the immutable Decider must surely be okay).collection.Seq not having mutators is not at all a valid defense!
The very same argument holds for all the other odd ones (TraverableOnce, Traversable, Iterable, IndexedSeq). Correctness trumps performance, anytime.
On Thu, Oct 25, 2012 at 1:03 AM, Roland Kuhn <goo...@rkuhn.info> wrote:25 okt 2012 kl. 00:01 skrev Rex Kerr:In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language. I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion. If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases. If it wasn't a serious concern before, why should it be a concern now? Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?
Because as you saw, the change _wasn't actually that big_. Varargs already were pretty good. If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.
Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq. I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think. Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
You are arguing besides the main point: all data structures which are not specially imported from collection.mutable are names for the immutable version, except for Seq.
I'm arguing a different aspect of the point, I think. I agree that ideally the defaults would be consistent and would be immutable.
The question is how big of an impact it is to change at this point. In particular, is it okay to just switch from collection.Seq to collection.immutable.Seq at some point, or does one need to figure out how to have a deprecation period? If it's an inconsequential change, you just fix it. My point is that it is _not_ necessarily an inconsequential change for people who have high-performance code, and that the fix is less trivial than many things that are deprecated (like removing a method) because it will presumably massively affect the library interface also, and thus force people to re-optimize if they want to maintain performance.
def makeDecider(trapExit: Seq[Class[_ <: Throwable]]): Decider ={ case x ⇒ if (trapExit exists (_ isInstance x)) Restart else Escalate }This seemingly correct code (incidentally written by YT) is in fact a bad bug, since you could pass in an Array[] and happily observe all the breakage which ensues after changing the array’s content afterwards (thinking that the immutable Decider must surely be okay).collection.Seq not having mutators is not at all a valid defense!
Agreed. I wasn't really thinking it was enough defense, just that it's as much defense as you can get without having to think about performance a lot.
If you ask whether I would have noticed the code that you wrote above as a bug, the answer is "probably not". But if you ask instead whether I would have caught a bug where you took an array, passed it in somewhere, and then modified it: probably yes! If you spend much time working with mutable data structures, it's a good idea to get used to being careful about mutating the data out from under you. It is kind of a gotcha if you don't use mutable data structures very often, I'll agree, and that's why I agree it would have been better to have Seq be immutable.Seq by default.
The very same argument holds for all the other odd ones (TraverableOnce, Traversable, Iterable, IndexedSeq). Correctness trumps performance, anytime.
As long as there is a high-performance route, I agree. Otherwise, no that's not a given; people are willing to spend some extra thought to gain extra performance if they can, sometimes. That the computation-speed-minded people are to be ignored in favor of the coding-speed-minded people is not a given. One of Scala's claims to fame is that it can be used to write high-performance code. I agree that a bias towards correctness is good, but one has to ask what the tradeoff really is.
In this particular case, I think high-performance options can still exist since collection.Seq still exists. But do you think we can change the interface of the ~215 instances of Seq in defs in the Scala library without thinking about the impact on performance, or deciding which of those really need to be collection.Seq and which need to be collection.immutable.Seq?
Also, I'm not sure I agree about TraversableOnce etc.--there is something to be said for easy abstraction as well as correctness.
--Rex
If you speed up the library, then you get to use compact expressive code in more places. If you slow it down, you get to use it in fewer places (or your program just gets slower).
In my experience, and in my day job, the Seq alias change is a question of pure correctness and usability of the language. I raised the point about small varargs sizes in an attempt to illustrate how tangential the entire performance question really is in this discussion. If varargs performance were such a serious concern before, someone would have sat down and made an attempt to optimize some of these common cases. If it wasn't a serious concern before, why should it be a concern now? Much less a concern affecting a language decision which has essentially no impact on serious performance-sensitive code?
Because as you saw, the change _wasn't actually that big_. Varargs already were pretty good. If they suddenly got 10x or 100x slower, it might require optimization of code that didn't need it before.
Likewise with anything else taking a collection.Seq that would now have to take a collection.immutable.Seq. I really don't want to have to make a defensive copy of every array I pass into something annotated with Seq, I don't think. Maybe a defensive wrapper is okay--but then collection.Seq already *is* a defensive wrapper (well, defensive upcast) because it doesn't have any mutating methods.
Haven't thought much about it, but instinctively I fee like it should be a method on CanBuildFrom, or we reify To and check if this isInstanceOf ToWe could reify the ClassTag, but types are not in the standard library. My instinct said it belongs in CanBuildFrom as well, but that's part of a much more aggressive collection refactoring.
Do you pass many arrays as varargs explicitly? In my code, there's many more arrays being passed _implicitly_, which is extremely annoying, because the compiler uses Array by default:def foo(varargs: Any*) = println(varargs.getClass())foo(1, 2, 3) //prints "class scala.collection.mutable.WrappedArray$ofRef".Now, the hot path we're talking about does not include mutating varargs. I'm sure there can be cases when you do that, and maybe important cases, but that's definitely not the most common choice. So, why does the compiler automatically create a _mutable_ wrapper to the array? It could create an immutable wrapper without introducing any copy in the hot path. However, creating such a wrapper is an "unsafe" operation: you can call it only if you know that the raw array is not used afterward and cannot be extracted from the wrapper. But in the above code, Array(1, 2, 3) is not going to be aliased, so you don't need the copy.
On Oct 27, 2012 3:21 PM, "Paul Phillips" <pa...@improving.org> wrote:
>
>
>
> On Sat, Oct 27, 2012 at 4:00 AM, Pavel Pavlov <pavel.e...@gmail.com> wrote:
>>
>> What do you think?
>
>
> I've thought about implementing an immutable array a number of times; it does seem to fill a need.
An immutable array wrapper is a useful thing. There are lots of cases where we know that nobody has a reference to the underlying array.
> I only wish we had some kind of uniqueness capability which would allow for promoting an existing array to an immutable array with verification that there are no more references to it running around elsewhere.
Dereference at most once references would be useful for this.
Matthew
I only wish we had some kind of uniqueness capability which would allow for promoting an existing array to an immutable array with verification that there are no more references to it running around elsewhere.
BTW, what people think of the great immutability hole also known as `ListBuffer.readOnly` method? (This method destroy guarantees that any external List you use is immutable).
This doesn't surprise me, give how ListBuffer is implemented. Really the only way to avoid this situation while preserving a O(1) immutable copy in the common case would be to change the state of the ListBuffer such that subsequent writes lazily rebuild the internal structure.
On Sun, Oct 28, 2012 at 3:02 AM, Paul Phillips <pa...@improving.org> wrote:It's completely unnecessary to expose readOnly as far as I can see.Is it used somewhere in the standard library? I agree that if possible we should deprecate it.
I must say that its exact semantics is unclear for me from this declaration. I see three possibilities here:/** Provide a read-only view of this buffer as a sequence* @return A sequence which refers to this buffer for all its operations.*/def readOnly: scala.collection.Seq[A] = toSeq
Such implementation fix all the problems. However, in that case I wonder if there's a duplication of views functionality here?final def readOnly: scala.collection.Seq[A] = this
Cheers- Martin
B) it should return a view over changing state of the buffer. In that case it can be implemented right in BufferLike as:
final def readOnly: scala.collection.Seq[A] = this
To summarize it up, I'd deprecate this method and reimplement it as in (B).
But it sounds plenty useful just the way you describe it.
These steps should fix varargs problem. If we'll go this way, we must implement step (3a) right now, before 2.10.0 release.
Otherwise, we'll have to live with the varargs bug at least for another two major Scala versions.
First of all, `readOnly` implementation in ListBuffer is just wrong. Correct one should be `def readOnly = toList` or `def readOnly: collection.Seq = this`.