On the behavior of BitSet mapping

127 views
Skip to first unread message

Guanpeng Xu

unread,
Jul 13, 2015, 2:40:05 AM7/13/15
to scala...@googlegroups.com
Hello,

I am a new Scala user and learning the architecture of Scala collections.

In chapter 25 of the book /Programming in Scala/ 2nd edition, it is noted that mapping an Iterable dynamically should keep its dynamic type (page 613):

Welcome to Scala version 2.11.4-20141023-000000-d783face36 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> val list1 = List(1, 2, 3)
list1: List[Int] = List(1, 2, 3)

scala> list1.map(identity)
res0: List[Int] = List(1, 2, 3)

scala> val iter1: Iterable[Int] = List(1, 2, 3)
iter1: Iterable[Int] = List(1, 2, 3)

scala> iter1.map(identity)
res1: Iterable[Int] = List(1, 2, 3)

But the result of mapping a BitSet is not consistent with this behavior:

scala> import scala.collection.immutable.BitSet
import scala.collection.immutable.BitSet

scala> val set1 = BitSet(1, 2, 4)
set1: scala.collection.immutable.BitSet = BitSet(1, 2, 4)

scala> set1.map(identity)
res2: scala.collection.immutable.BitSet = BitSet(1, 2, 4)

scala> val iter2: Iterable[Int] = BitSet(1, 2, 4)
iter2: Iterable[Int] = BitSet(1, 2, 4)

scala> iter2.map(identity)
res3: Iterable[Int] = Set(1, 2, 4)

Is this a design or an issue?  Thanks.

Sincerely yours,
Guanpeng Xu

Seth Tisue

unread,
Jul 20, 2015, 10:44:14 AM7/20/15
to scala...@googlegroups.com, herber...@gmail.com
Looks like a bug to me. There doesn't seem to be a ticket for it yet at https://issues.scala-lang.org.

Stephen Compall

unread,
Jul 20, 2015, 7:59:52 PM7/20/15
to scala...@googlegroups.com
On 7/12/2015 7:45 PM, Guanpeng Xu wrote:
But the result of mapping a BitSet is not consistent with this behavior:

scala> val iter2: Iterable[Int] = BitSet(1, 2, 4)
iter2: Iterable[Int] = BitSet(1, 2, 4)

scala> iter2.map(identity)
res3: Iterable[Int] = Set(1, 2, 4)

Is this a design or an issue?  Thanks.

This behavior looks correct to me.  Once you leave the type-level bitset, all bets are off with regards to the class-level.  Set[Int] is an Iterable[Int].

The fix would be to percolate a special case for "same output as input type" throughout the constellation of implicit CanBuildFroms and the self-builder methods they call everywhere in the collections hierarchy.  I cannot imagine the Scala devs adding such complexity just to provide this guarantee, when you already have a mechanism for obtaining this guarantee that isn't based on documentation or runtime testing: the type system.

Seth Tisue

unread,
Jul 20, 2015, 8:29:12 PM7/20/15
to scala...@googlegroups.com, s...@member.fsf.org
On Monday, July 20, 2015 at 7:59:52 PM UTC-4, Stephen Compall wrote:
The fix would be to percolate a special case for "same output as input type" throughout the constellation of implicit CanBuildFroms and the self-builder methods they call everywhere in the collections hierarchy.  I cannot imagine the Scala devs adding such complexity just to provide this guarantee

Nonetheless, the guarantee is provided. I cannot find a concrete collection other than immutable.BitSet that does *not* behave in accordance with this guarantee. (I didn’t exhaustively test them all, but I tried a dozen.)

That’s my basis for calling this a bug in immutable.BitSet in particular.

The mechanism that provides this guarantee is described in Odersky & Moors, “Fighting Bit Rot with Types (Experience Report: Scala Collections)”, http://lampwww.epfl.ch/~odersky/papers/fsttcs2009.pdf: "The idea is to give the apply method of CanBuildfrom access to the dynamic type of the original collection via its from argument..." It happens without any percolation of special cases throughout the hierarchy.

Rex Kerr

unread,
Jul 20, 2015, 8:34:24 PM7/20/15
to Seth Tisue, scala-user, s...@member.fsf.org
But it doesn't have that type information any more.  It's cast to an Iterable.

The default CBF for Iterable is smart enough to build sets as sets, but not to know that "identity" maps non-negative Ints to non-negative Ints and therefore can retain the BitSet type as backing.

It's not a bug, it's a design decision, and not one that I can envision how to "fix" without a truly massive overhaul.  There _are_ bugs out there, where things don't remember their types (e.g. ListSet).  But this isn't one of them.

  --Rex


--
You received this message because you are subscribed to the Google Groups "scala-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-user+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Seth Tisue

unread,
Jul 20, 2015, 9:02:34 PM7/20/15
to scala...@googlegroups.com, ich...@gmail.com, s...@member.fsf.org, se...@tisue.net
On Monday, July 20, 2015 at 8:34:24 PM UTC-4, Rex Kerr wrote:
not to know that "identity" maps non-negative Ints to non-negative Ints

Eh? The question of sign doesn't bother it in the case where the precise type is known at compile-time:

scala> collection.immutable.BitSet(1, 2, 3).map(_ + 1)
res0: scala.collection.immutable.BitSet = BitSet(2, 3, 4)

scala> collection.immutable.BitSet(1, 2, 3).map(_ * -1)
java.lang.IllegalArgumentException: requirement failed
  at scala.Predef$.require(Predef.scala:207)
  at scala.collection.mutable.BitSet.add(BitSet.scala:83)

If sign is considered a runtime question here, it should be also be considered a runtime question in the case where the precise type isn't known at compile time.

As for the more general question of how `map` behaves in the absence of precise compile-time type information, I think this discussion may have gone off on the wrong track because the original poster's REPL transcript didn't fully demonstrate the issue. Here is a new transcript that fully demonstrates it:

scala> val set1: Iterable[Int] = collection.immutable.BitSet(1, 2, 4)
set1: Iterable[Int] = BitSet(1, 2, 4)

scala> set1.map(identity).isInstanceOf[collection.immutable.BitSet]
res10: Boolean = false

note the *runtime* check to see what kind of concrete collection we have *at runtime*.

whereas here's how it should behave, and how the Odersky & Moors paper describes it as behaving:

scala> val set2: Iterable[Int] = collection.immutable.HashSet(1, 2, 4)
set2: Iterable[Int] = Set(1, 2, 4)

scala> set2.map(identity).isInstanceOf[collection.immutable.HashSet[_]]
res12: Boolean = true

the compiler doesn't know it's a HashSet, but *the runtime does*, and so we get the right runtime "type".

You can substitute any collection type here for immutable.HashSet and it will behave the same. *Only BitSet* behaves differently. That's why I say it's a bug.

Jason Zaugg

unread,
Jul 20, 2015, 9:20:55 PM7/20/15
to Seth Tisue, scala...@googlegroups.com, ich...@gmail.com, s...@member.fsf.org
On Tue, Jul 21, 2015 at 11:02 AM Seth Tisue <se...@tisue.net> wrote:
On Monday, July 20, 2015 at 8:34:24 PM UTC-4, Rex Kerr wrote:
not to know that "identity" maps non-negative Ints to non-negative Ints

Eh? The question of sign doesn't bother it in the case where the precise type is known at compile-time:

Let’s look at the machinery in more detail by progressively expanding and inlining the two cases:

scala> (List(1): Iterable[Int]).map(identity)
res15: Iterable[Int] = List(1)

scala> (List[Int](1): Iterable[Int]).map[Int, Iterable[Int]](identity[Int])(Iterable.canBuildFrom[Int])
res16: Iterable[Int] = List(1)

scala> val self: Iterable[Int] = List(1); val cbf = Iterable.canBuildFrom[Int]; val builder = cbf.apply(temp); builder += identity(self.head); builder.result()
self: Iterable[Int] = List(1)
cbf: scala.collection.generic.CanBuildFrom[Iterable.Coll,Int,Iterable[Int]] = scala.collection.generic.GenTraversableFactory$anon$1@3212d853
builder: scala.collection.mutable.Builder[Int,Iterable[Int]] = ListBuffer(1)
res17: Iterable[Int] = List(1)

scala> val self: Iterable[Int] = List(1); val cbf = Iterable.canBuildFrom[Int]; val builder = temp.genericBuilder[Int]; builder += identity(self.head); builder.result()
self: Iterable[Int] = List(1)
cbf: scala.collection.generic.CanBuildFrom[Iterable.Coll,Int,Iterable[Int]] = scala.collection.generic.GenTraversableFactory$anon$1@3212d853
builder: scala.collection.mutable.Builder[Int,Iterable[Int]] = ListBuffer(1)
res18: Iterable[Int] = List(1)

Things are similar for BitSet:

scala> val self: Iterable[Int] = BitSet(1); val cbf = Iterable.canBuildFrom[Int]; val builder = temp.genericBuilder[Int]; builder += identity(self.head); builder.result()
self: Iterable[Int] = BitSet(1)

scala> val self: Iterable[Int] = BitSet(1); val cbf = Iterable.canBuildFrom[Int]; val builder = cbf.apply(self); builder += identity(self.head); builder.result()
self: Iterable[Int] = BitSet(1)
cbf: scala.collection.generic.CanBuildFrom[Iterable.Coll,Int,Iterable[Int]] = scala.collection.generic.GenTraversableFactory$anon$1@3212d853
builder: scala.collection.mutable.Builder[Int,Iterable[Int]] = scala.collection.mutable.SetBuilder@5eac24de
res29: Iterable[Int] = Set(1)

scala> val self: Iterable[Int] = BitSet(1); val cbf = Iterable.canBuildFrom[Int]; val builder = self.genericBuilder[Int]; builder += identity(self.head); builder.result()
self: Iterable[Int] = BitSet(1)
cbf: scala.collection.generic.CanBuildFrom[Iterable.Coll,Int,Iterable[Int]] = scala.collection.generic.GenTraversableFactory$anon$1@3212d853
builder: scala.collection.mutable.Builder[Int,Iterable[Int]] = scala.collection.mutable.SetBuilder@7fc9253c
res30: Iterable[Int] = Set(1)

The key difference is:

scala> List(1).genericBuilder[Int]
res32: scala.collection.mutable.Builder[Int,List[Int]] = ListBuffer()

scala> BitSet(1).genericBuilder[Int]
res33: scala.collection.mutable.Builder[Int,scala.collection.immutable.Set[Int]] = scala.collection.mutable.SetBuilder@23bd7935

Noting the interplay between GenericBuilder and GenTraversableTemplate#genericBuilder.

BitSet can’t offer to generically map to another BitSet, it can only do that with an implicit witness of that the resulting element type is an Int. Instead, it has to build the a truly generic collection, Set.

-jason

Stephen Compall

unread,
Jul 20, 2015, 9:27:20 PM7/20/15
to scala...@googlegroups.com
On 7/20/2015 8:29 PM, Seth Tisue wrote:
On Monday, July 20, 2015 at 7:59:52 PM UTC-4, Stephen Compall wrote:
The fix would be to percolate a special case for "same output as input type" throughout the constellation of implicit CanBuildFroms and the self-builder methods they call everywhere in the collections hierarchy.  I cannot imagine the Scala devs adding such complexity just to provide this guarantee

Nonetheless, the guarantee is provided. I cannot find a concrete collection other than immutable.BitSet that does *not* behave in accordance with this guarantee. (I didn’t exhaustively test them all, but I tried a dozen.)

Here are some more that do not behave that way.

Welcome to Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_45).
Type in expressions to have them evaluated.
Type :help for more information.

scala> Map(1 -> 2)
res0: scala.collection.immutable.Map[Int,Int] = Map(1 -> 2)

scala> (res0: Iterable[(Int,Int)]) map identity
res1: Iterable[(Int, Int)] = List((1,2))

scala> collection.immutable.TreeSet(1, 2)
res2: scala.collection.immutable.TreeSet[Int] = TreeSet(1, 2)

scala> (res2: Iterable[Int]) map identity
res3: Iterable[Int] = Set(1, 2)

scala> 1 to 2
res4: scala.collection.immutable.Range.Inclusive = Range(1, 2)

scala> (res4: Iterable[Int]) map identity
res5: Iterable[Int] = Vector(1, 2)

scala> object Blah extends Enumeration {val one, two = Value}
defined object Blah

scala> Blah.values
res6: Blah.ValueSet = Blah.ValueSet(one, two)

scala> (res6: Iterable[Blah.Value]) map identity
res7: Iterable[Blah.Value] = Set(one, two)

scala> Array("one", "two")
res8: Array[String] = Array(one, two)

scala> (Array("one", "two"): Iterable[String]) map identity
res9: Iterable[String] = ArrayBuffer(one, two)

scala> "xyz" map identity
res10: String = xyz

scala> ("xyz": Iterable[Char]) map identity
res11: Iterable[Char] = Vector(x, y, z)

Maybe the last two are unfair.  But they are in the same mold as the others.


The mechanism that provides this guarantee is described in Odersky & Moors, “Fighting Bit Rot with Types (Experience Report: Scala Collections)”, http://lampwww.epfl.ch/~odersky/papers/fsttcs2009.pdf: "The idea is to give the apply method of CanBuildfrom access to the dynamic type of the original collection via its from argument..." It happens without any percolation of special cases throughout the hierarchy.

The reconstruction of values of these classes are special cases, though.  That's why the ordinary machinery relied upon by the mechanism described in the section of the paper to which you refer, the method genericBuilder on the "from" collection (which is universally quantified on the output collection's element type, which should be telling), does not work for these cases.

(Jason Zaugg has elaborated on the specifics in his own message while I composed mine.)

-- 
Stephen Compall
If anyone from the MSA is online, they should watch this flythrough.

Seth Tisue

unread,
Jul 20, 2015, 11:43:53 PM7/20/15
to scala...@googlegroups.com, s...@member.fsf.org
Uncle! It's not just a "bug". (But the situation is rather more subtle than the first round of objections allowed, I think.)

Stephen, thanks for pointing out that there are other collections like BitSet. I had not seen that. Some of your examples bring up other issues, but I think TreeSet and Map are "fair" examples in that the basic issue is the same. That is, the collection has a constraint on the element type. In the case of BitSet, the element type must be Int; for Map, it must be Tuple2; in the case of TreeSet, we need evidence of an Ordering on the element type.

When the compiler has full type information, the CanBuildFrom mechanism takes those constraints into account and does the right thing at compile time. But the runtime builder selection mechanism described in the paper only allows us to inspect the runtime type of the input collection; the element type is erased and unavailable. As Jason writes, we would need an "implicit witness [on] the resulting element type", but the API offers no way to pass that witness in.

So, this isn't fixable just for BitSet. There would need to be API changes. It seems to me those changes shouldn't be impossible — the compiler does know the element type, after all! — but perhaps not worth the trouble?

Thanks, all, for helping me (us?) reach a better understanding.

Seth Tisue

unread,
Jul 20, 2015, 11:57:42 PM7/20/15
to scala...@googlegroups.com, se...@tisue.net, s...@member.fsf.org
P.S. Now that I know to look for TreeSet too, I found this previous similar discussion: http://stackoverflow.com/q/5678682/86485

Guanpeng Xu

unread,
Jul 21, 2015, 4:23:14 AM7/21/15
to scala...@googlegroups.com
FWIW, I did some experiments with the code and discovered the following designs, to which the above behavior is attributed:

  1. Type preservation is done through genericBuilder, which depends on a field of type GenericCompanion of a collection's companion object.  GenericCompanion is a higher kinded type which requires another higher kinded type, +CC[X] <: GenericTraversable[X].

  2. Such companion objects as Set or List define `companion`, since the type of Set[X] or List[X] conforms to the type required by GenericCompanion.

  3. But BitSet is a proper type and Maps takes two types, both do not conform to the type required by GenericCompanion.  Instead, they use `companion` of their superclasses' companion objects, hence the above behavior.

These points may be too specific but they explain why adding such guarantees to BitSets and Maps etc are not trivial.

Best regards,
Guanpeng Xu

Rex Kerr

unread,
Jul 21, 2015, 1:57:21 PM7/21/15
to Seth Tisue, scala-user, s...@member.fsf.org
On Mon, Jul 20, 2015 at 6:02 PM, Seth Tisue <se...@tisue.net> wrote:
On Monday, July 20, 2015 at 8:34:24 PM UTC-4, Rex Kerr wrote:
not to know that "identity" maps non-negative Ints to non-negative Ints

Eh? The question of sign doesn't bother it in the case where the precise type is known at compile-time:

scala> collection.immutable.BitSet(1, 2, 3).map(_ + 1)
res0: scala.collection.immutable.BitSet = BitSet(2, 3, 4)

scala> collection.immutable.BitSet(1, 2, 3).map(_ * -1)
java.lang.IllegalArgumentException: requirement failed
  at scala.Predef$.require(Predef.scala:207)
  at scala.collection.mutable.BitSet.add(BitSet.scala:83)

If sign is considered a runtime question here, it should be also be considered a runtime question in the case where the precise type isn't known at compile time.

Yes, but given that exception, you violate LSP if you make the underlying type of (bitset: Set[Int]).map(- _) be a BitSet.  And that would surely be a bug.  Collections that say that they are a subtype of another but explode on sensible input are not things I like to find in a library.

Also, it's kind of unfortunate that by refining a type from Set[Int] to BitSet, code that works will blow up at runtime.  I'd argue that this is a bug also.  (In fact, I was working on a BitSet replacement that allows negative indices to at least get around this.)

  --Rex

Guanpeng Xu

unread,
Jul 21, 2015, 8:40:45 PM7/21/15
to scala-user, se...@tisue.net, s...@member.fsf.org

It looks like an issue of BitSet itself instead of the collection mechanism to me: BitSet is "partial", in the sense that it declares itself as SortedSet[Int] but in fact it is SortedSet[UnsignedInt], although UnsignedInt does not exist in Java.

How do you think?

Best regards,
Guanpeng Xu
Reply all
Reply to author
Forward
0 new messages