- (set should be covariant, but I guess there is nothing that can be
done about this at this point. Maybe have another version of set?)
I think that having an efficient collecions library is tremenously
important. At least for me, having maximum efficiency in the
collections is more important than new language features such as
macros.
OK, copy. Which of the two would have been the more appropriate for
the original post.
The question is if having a slow method available on a collectionwhere you expect it to be fast is better or worse than not having it
available at all.
Is it really possible to do fundamental changes to the design of the
scala collections at this point without breaking a lot of scala code
out there?
That would be great, as long as it does not break too much existing
source code. What is the timeframe for that new collections library? I
would really like to get some improvements quickly.
> Macros are part of how we'll get efficient.So macros are going to be involved in the redesign of the collections
>
library? Just when I started to understand the current design...
- (set should be covariant, but I guess there is nothing that can be
done about this at this point. Maybe have another version of set?)
trait Set[A: Equiv]
implicit val SetFunctor = new Functor[Set] { def fmap[A, B](s: Set[A])(f: A => B) }
Making set covariant would have required extending Any => Boolean
instead of T => Boolean. So you would not get an error if you have a
Set[Int] and try contains("abc"). But why should you get an error?
Just returning false is a perfectly valid response, since obviously a
set of Int can never contain any string.
...
Making set covariant would have required extending Any => Boolean
instead of T => Boolean. So you would not get an error if you have a
Set[Int] and try contains("abc"). But why should you get an error?
I know that there are people that consider any program containing Any
defective. But the way scala is designed even methods like != and ==
use Any. So if you want to avoid Any at any cost it is very hard in
scala.
To be told there is a programmer error at compile time in preference to silently being given the right answer to the wrong question at runtime is the reason there are types.
But what about Seq[T]? That has a contains method taking Any so that
Seq[T] can be covariant. Do you consider that a mistake as well?
Because contains is just some method on Seq, whereas contains on Set
is more important?
Maybe because it is 100% of the time not what you meant to write? Or do you spend a lot of time looking for Strings in sets of Ints?
What about Int/BigInt? In the current design equality is not based on the type and I consider that to be a good thing in general.
Making it invariant would break that and would make it considerably harder for equal things to be considered equal.
For a language like scala that has a common base type like Any, a Set
of Any makes perfect sense. And Any defines equals, so it is possible
to determine if two values of type Any are considered equal. You have
to rely on the equals method being properly implemented, but that is
true in any case.
On Thu, Aug 9, 2012 at 12:49 PM, Chris Marshall <oxbow...@gmail.com> wrote:I don't see how you can pull this off without deprecating almost every
> sense at all under *any* circumstances? Scala is (as I understand it)
> currently embarking on a deprecation cycle that will eventually see the
> "removal of equals" from AnyRef. At least, Martin posted a plan for doing
> this a while ago - not sure if it's still the intention
>
piece of scala code out there.
Simple - make the use of == require an implicit type class and then make sure there is (to begin with) such an implicit instance in scope for any type. The original conversation is here:Not sure what the status of it is - last message was over 18 months ago
Because contains is just some method on Seq, whereas contains on Set
is more important?Yes, apply (aliased to contains) is the central operation of Set, whereas contains is indeed "just some method" on Seq.Nobody's saying you don't sometimes want a covariant Set, but on balance, it is more useful invariant.
Hi all,
all in all I am quite happy with the scala collections library. But
there are quite a few annoyances and inefficiencies I have been
confronted with over the last year. This is an attempt to summarize
them and put them up to discussion. I am working mostly with the
default implementations of Set and Map, so this is where I have the
most remarks
In general, while the collections "just work" and do exactly what you
expect, they occasionally behave inefficiently and unintuively
regarding performance and structural sharing. Often methods that you
would expect to work very efficiently for a certain type of collection
instead just use the default implementation from e.g. TraversableLike.
Set:
- filter is easy to implement for HashSet, and it is very important
since it is used by operations that people expect to be fast on a set,
like intersection. Nevertheless filter just uses the default
implementation from TraversableLike, which is very inefficient. I
created a SI for that issue and implemented a filter method on
HashSet. For maximum benefit you would have to implement filter on
Set1..Set4 as well: https://issues.scala-lang.org/browse/SI-6196
- merge exists for HashMap, but not for HashSet. It should be
implemented, and used from union and ++ whenever possible. That would
increase the efficiency of that operation tremendously. A set should
have quick union and intersection methods.
- while implementing filter on HashSet, I also found some minor issues
that result in some inefficiency:
https://issues.scala-lang.org/browse/SI-6197 and
https://issues.scala-lang.org/browse/SI-6198
- intersect should check if both sides are of the same type, and then
do smaller.filter(larger). People should not have to do this manually
to get best performance. See for example this discussion:
https://groups.google.com/forum/?fromgroups#!searchin/scala-user/smaller$20filter$20larger/scala-user/3QJYuUU3frU/YfMTW9mRXsYJ
- (set should be covariant, but I guess there is nothing that can be
done about this at this point. Maybe have another version of set?)
Map:
- since map is very similar to set, it should also have a filter
method that works efficiently instead of just using the one from
TraversableLike. I added one here:
https://issues.scala-lang.org/browse/SI-6200 . Basically the same as
https://issues.scala-lang.org/browse/SI-6196 .
- HashMap has a merge method. There was a bug in it which has been
fixed. https://issues.scala-lang.org/browse/SI-5879 . But almost
nobody will ever use this very efficient method because when people
want to merge two maps they use ++. ++ should check if both arguments
are of the right type and use HashMap.merge if possible.
- mapValues and filterKeys: either make it clear that they are lazy by
using an appropriate return value (MapView?), or make them eager. I
would prefer the latter. There is an old SI for this issue (
https://issues.scala-lang.org/browse/SI-4776 ), but nothing has been
done on it
- updated should not create a new instance if not necessary.
https://issues.scala-lang.org/browse/SI-5139
Vector:
- vector is a brilliant data structure that would be very useful for
number crunching like time series analysis if it wasn't for the boxing
overhead of Vector[Double] etc. Would it be possible to use
@specialized for Vector? And for other collections like Set and Map
for that matter?
In general, care should be taken to use structural sharing whenever
possible. If that means that you have to override some of the generic
methods of TraversableLike or SeqLike, then so be it. For example
this:
scala> val x=Set(1,2,3,4)
x: scala.collection.immutable.Set[Int] = Set(1, 2, 3, 4)
scala> x eq (Set.empty ++ x)
res9: Boolean = false // Set.empty ++ x builds a completely new set,
which can be very slow for large sets!
scala> x eq (x ++ Set.empty)
res10: Boolean = true
This would not happen if ++ would use merge when both sides are a Set.
I think that having an efficient collecions library is tremenously
important. At least for me, having maximum efficiency in the
collections is more important than new language features such as
macros.
best regards,
Rüdiger