The counterintuitive behavior of Set.sameElements

484 views
Skip to first unread message

Christopher Martin

unread,
Feb 7, 2013, 12:07:16 AM2/7/13
to scala-...@googlegroups.com
 
if (Set(1,2).sameElements(Set(2,1))) then "oh good, sets work" else "curse you #Scala! Foiled again!"

"sameElements" seems like a reasonable name for this method when you're looking at the Iterable trait, but it is very unfortunate that Set has to inherit it with this name.

Part of the confusion arises from the fact that a "Set" in Scala/Java world does not quite line up with the mathematical set-theory definition of a "set", but I think the more striking problem is that sameElements is a very unclear name. Without reading the docs, you really wouldn't know what it does, even in the context of Iterable. Switching your brain into Set thinking just pushes the situation from ambiguous to extremely counterintuitive.

Anyone unfamiliar with the strict definition of this method would think: Of course sets (1,2) and (2,1) "have the same elements"! This is the sort of thing that knee-jerk reactions in the early API learning stages and turns people away from languages.

I think there's a better name for this method that more accurately reflects its semantics: sameIteration. Iterables have the same iteration if their iterators behave in the same way. Sets have the same iteration if their iterators behave in the same way. That makes sense, doesn't it, than to say "Sets have the same elements if they have the same elements in the same order"?

- Chris

Paul Phillips

unread,
Feb 7, 2013, 12:38:39 AM2/7/13
to Christopher Martin, scala-...@googlegroups.com

On Wed, Feb 6, 2013 at 9:07 PM, Christopher Martin <ch.m...@gmail.com> wrote:
Anyone unfamiliar with the strict definition of this method would think: Of course sets (1,2) and (2,1) "have the same elements"!

They do have the same elements. It's a bug. I hope nobody is going to say there's no bug. Either the bug is that sameElements is defined on Iterable, or the bug is that sets with the same elements do not return true for sameElements. I'll take either. But there's no sensible reason to place methods on Iterable which are broken by design on Set.

This particular bug is specific to sets of size 2, 3, and 4. Which makes it more dangerous than it would otherwise be.

scala> Set(1,2,3,4,5,6,7) sameElements Set(5,4,6,3,7,7,7,2,1)
res0: Boolean = true

Christopher Martin

unread,
Feb 7, 2013, 1:07:00 AM2/7/13
to scala-...@googlegroups.com, Christopher Martin
On Thursday, February 7, 2013 12:38:39 AM UTC-5, Paul Phillips wrote:
But there's no sensible reason to place methods on Iterable which are broken by design on Set.

Well, I wouldn't say this method is necessarily broken by design on Set. If you look past the method's name to its description:

Checks if the other iterable collection contains the same elements in the same order as this iterable collection.

This may very well be a legitimate method for a Set to have. Take, for example, two TreeSets with different Orderings. You may want to know whether the iteration orders of the two sets are the same. Your complaint, though, I agree with. The documentation mentions:

Note: might return different results for different runs, unless the underlying collection type is ordered.

The method is broken "by design" on any subtype that is unordered (it just so happens that we often colloquially take "set" and "unordered collection" to be synonymous when they aren't). So if the collection type is unordered, it shouldn't even have this method. The type hierarchy is wrong. This isn't the only method I've had that thought about, though. sameElements and size both come with the caveat:
 
Note: will not terminate for infinite-sized collections.

Is the inheritance of size equally erroneous for any infinite-sized Iterable?
 

Paul Phillips

unread,
Feb 7, 2013, 1:11:34 AM2/7/13
to Christopher Martin, scala-...@googlegroups.com

On Wed, Feb 6, 2013 at 10:07 PM, Christopher Martin <ch.m...@gmail.com> wrote:
Is the inheritance of size equally erroneous for any infinite-sized Iterable?

size is erroneous long before you reach infinity. Erroneous kicks in somewhere around O(n).

Simon Schäfer

unread,
Feb 7, 2013, 5:17:30 AM2/7/13
to scala-debate

On 02/07/2013 07:07 AM, Christopher Martin wrote:
On Thursday, February 7, 2013 12:38:39 AM UTC-5, Paul Phillips wrote:
But there's no sensible reason to place methods on Iterable which are broken by design on Set.

Well, I wouldn't say this method is necessarily broken by design on Set. If you look past the method's name to its description:
I would say its name is broken. Nobody expects from a method called sameElements that it compares the ordering of the elements. The method name is ambiguous with ==. It should be changed to sameOrderedElements or even better to sameByOrdering. The latter one then should expect an implicit parameter list expecting an ordering that makes this method more general.

An alternative would be to kill this method completely and add it to runtime.Tuple2Zipped to allow its usage as (xs,ys).zipped.sameElements, which is not ambiguous anymore (on the other side, instead of sameElements there can be used forall(_==_)).

Christopher Martin

unread,
Feb 7, 2013, 10:13:05 AM2/7/13
to scala-...@googlegroups.com
On Thursday, February 7, 2013 5:17:30 AM UTC-5, Simon Schäfer wrote:
... sameByOrdering. The latter one then should expect an implicit parameter list expecting an ordering that makes this method more general.

Well, now you're talking about something rather different. I think this method, as it is, does have value. It just has a bad name.

On Thursday, February 7, 2013 5:17:30 AM UTC-5, Simon Schäfer wrote:
It should be changed to sameOrderedElements or even better to sameByOrdering.

I'm not a fan of just calling this "order" equality, because that too is misleading. This method's implementation checks iteration order, which is not necessarily the same notion as order in the context of some collection. Take, for example, a circular linked list, with iteration that begins at an arbitrary node. Least surprise would be:

(Circular(1,2,3) sameElements Circular(3,2,1)) == true
(Circular(1,2,3) sameElements Circular(3,3,2,1)) == false
(Circular(1,2,3) sameOrderedElements Circular(3,2,1)) == false
(Circular(1,2,3) sameOrderedElements Circular(3,1,2)) == true
(Circular(...) sameIteration Circular(...)) == { undefined because this type's iteration order is undefined }

Rich Oliver

unread,
Feb 7, 2013, 10:39:55 AM2/7/13
to scala-...@googlegroups.com
What about samePermutation for the method name? What about creating a PreservesOrder trait and putting the method in there? List would inherit from PreserveOrder. Set would not. Then why not kill two birds with one stone and add a Co-variant ListSet to the standard library.

Christopher Martin

unread,
Feb 7, 2013, 11:28:37 AM2/7/13
to scala-...@googlegroups.com
A new trait seems like a good idea to me. Again, though, I think it would be best to avoid making undue assumptions about what order means (harking back to my circular list example). I'd call it something along the lines of DefinedIterationOrder. It would imply only that the iteration order is "meaningful" in some way. For List it would be the list order, for ArrayBuffer it would be the insertion order, for TreeSet it would the sorted order. Likewise you could define a HashSet that preserves insertion order, etc. But Set itself, as you said, would not inherit it.

Chris L

unread,
Jul 11, 2013, 1:48:15 PM7/11/13
to scala-...@googlegroups.com
the behavior is not even predictable ( Scala version 2.10.2):
    scala> Set(1,2) sameElements Set(2,1)
    res3: Boolean = false

    scala> Set(3, 2, 1) sameElements Set(2, 1, 3)
    res2: Boolean = false

    scala> Set(3, 2, 1, 4) sameElements Set(2, 1, 4, 3)
    res1: Boolean = false

    scala> Set(3, 2, 1, 4, 5) sameElements Set(2, 1, 4, 5, 3)
    res0: Boolean = true

    scala> Set(3, 2, 1, 4, 5, 6, 7) sameElements Set(7, 2, 1, 4, 5, 6, 3)
    res4: Boolean = true

any one knows why?




Hanns Holger Rutz

unread,
Jul 11, 2013, 1:52:57 PM7/11/13
to Chris L, scala-...@googlegroups.com
"Checks if the other iterable collection contains the same elements in the same order as this general iterable collection .

Note: might return different results for different runs, unless the underlying collection type is ordered."

This comes from `GenIterableLike`. Of course, it is still confusing because of the method name, who would expect this restriction, and moreover its completely uselessness in sub types such as Set?

btw. Set(1,2) == Set(2,1) // equality works
> --
> You received this message because you are subscribed to the Google Groups "scala-debate" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

---
"Si hay reelección, Capriles será reelecto"

martin odersky

unread,
Jul 12, 2013, 6:11:02 AM7/12/13
to Hanns Holger Rutz, Chris L, scala-debate
On Thu, Jul 11, 2013 at 7:52 PM, Hanns Holger Rutz <con...@sciss.de> wrote:
"Checks if the other iterable collection contains the same elements in the same order as this general iterable collection .

Note: might return different results for different runs, unless the underlying collection type is ordered."

This comes from `GenIterableLike`. Of course, it is still confusing because of the method name, who would expect this restriction, and moreover its completely uselessness in sub types such as Set?

btw. Set(1,2) == Set(2,1)   // equality works

I think the original poster in the thread is right. This should be named "sameIteration" not "sameElements".

Cheers

 - Martin

 

On 11 Jul 2013, at 19:48, Chris L wrote:

> the behavior is not even predictable ( Scala version 2.10.2):
>     scala> Set(1,2) sameElements Set(2,1)
>     res3: Boolean = false
>
>     scala> Set(3, 2, 1) sameElements Set(2, 1, 3)
>     res2: Boolean = false
>
>     scala> Set(3, 2, 1, 4) sameElements Set(2, 1, 4, 3)
>     res1: Boolean = false
>
>     scala> Set(3, 2, 1, 4, 5) sameElements Set(2, 1, 4, 5, 3)
>     res0: Boolean = true
>
>     scala> Set(3, 2, 1, 4, 5, 6, 7) sameElements Set(7, 2, 1, 4, 5, 6, 3)
>     res4: Boolean = true
>
> any one knows why?
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "scala-debate" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

---
"Si hay reelección, Capriles será reelecto"

--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.





--
Martin Odersky
Prof., EPFL and Chairman, Typesafe
PSED, 1015 Lausanne, Switzerland
Tel. EPFL: +41 21 693 6863
Tel. Typesafe: +41 21 691 4967

Matthew Pocock

unread,
Jul 12, 2013, 9:48:45 AM7/12/13
to martin odersky, Hanns Holger Rutz, Chris L, scala-debate
So, in what situation is 'same iteration' a useful comparison? Is it genuinely used in any places where equality would not be the intended semantics? Is it actually used for non-set collections? It is unclear from the source if two sets with the same elements will in fact have the same iteration order e.g. MySet(1,2,3) and YourSet(1,2,3) could have different iteration orders despite being initialised from the identical sequence of integers.

Matthew

Matthew
Dr Matthew Pocock
Turing ate my hamster LTD

Integrative Bioinformatics Group, School of Computing Science, Newcastle University

skype: matthew.pocock
tel: (0191) 2566550

martin odersky

unread,
Jul 12, 2013, 11:22:28 AM7/12/13
to Matthew Pocock, Hanns Holger Rutz, Chris L, scala-debate
On Fri, Jul 12, 2013 at 3:48 PM, Matthew Pocock <turingate...@gmail.com> wrote:
So, in what situation is 'same iteration' a useful comparison? Is it genuinely used in any places where equality would not be the intended semantics?

I think the main reason for its existence is that equality on arrays is reference equality, not element equality (that's another long discussion why this is so, in which I do not want to get into now.). So there was the need to provide a 
separate function that gives element equality for all sequences. 

Why not define it on Seq only then? In fact, `sameElements` is not alone in this regard; there are quite a few operations on Iterable that behave unpredictably for non-ordered collections. `zip`, `head`, `init`, `foreach` are all such examples. `sameElements` even makes sense sometimes for sets, e.g. if xs and ys are both LinkedHashSets, `xs sameElements ys` means that both sets have the same elements that were entered in the same order. 

Generally, it was decided at some point that collection types should capture neither the property of being ordered, nor the property of being finite. Both were tried but as far as I remember led to types that were deemed too complicated. 

In fact, in light of the array example, I am not sure anymore that sameIteration is preferable to sameElements. It avoids the trap mentioned in this thread but makes it so much harder to find for people who want to get an elementwise comparison of arrays. 

Cheers

 - Martin

martin odersky

unread,
Jul 12, 2013, 11:24:59 AM7/12/13
to Matthew Pocock, Hanns Holger Rutz, Chris L, scala-debate
On Fri, Jul 12, 2013 at 5:22 PM, martin odersky <martin....@epfl.ch> wrote:

In fact, in light of the array example, I am not sure anymore that sameIteration is preferable to sameElements. It avoids the trap mentioned in this thread but makes it so much harder to find for people who want to get an elementwise comparison of arrays. 

If we feel this is important enough to change, then I would favor moving sameElements to Seq. 

Cheers
 
 - Martin


Christopher Martin

unread,
Jul 12, 2013, 11:25:38 AM7/12/13
to scala-...@googlegroups.com, martin odersky, Hanns Holger Rutz, Chris L
On Friday, July 12, 2013 9:48:45 AM UTC-4, Matthew Pocock wrote:
So, in what situation is 'same iteration' a useful comparison? Is it genuinely used in any places where equality would not be the intended semantics? Is it actually used for non-set collections? It is unclear from the source if two sets with the same elements will in fact have the same iteration order e.g. MySet(1,2,3) and YourSet(1,2,3) could have different iteration orders despite being initialised from the identical sequence of integers.


`scala.collection` uses sameElements to implement equals in GenSeqLike, IndexedSeqOptimized, and LinearSeqOptimized. It also has a few uses in `scala.xml`.

The documentation for sameElements does specify that it only works for ordered collections, but I can't find any place where the doc for Set explicitly mentions that its default implementations are unordered.

Christopher Martin

unread,
Jul 12, 2013, 12:41:34 PM7/12/13
to scala-...@googlegroups.com, Matthew Pocock, Hanns Holger Rutz, Chris L
On Friday, July 12, 2013 11:22:28 AM UTC-4, Martin wrote:

Generally, it was decided at some point that collection types should capture neither the property of being ordered, nor the property of being finite. Both were tried but as far as I remember led to types that were deemed too complicated.
 
On Friday, July 12, 2013 11:24:59 AM UTC-4, Martin wrote:

If we feel this is important enough to change, then I would favor moving sameElements to Seq. 


I don't follow your earlier comment, then - Doesn't Seq "capture the property of being ordered"?

- Chris

martin odersky

unread,
Jul 12, 2013, 12:47:00 PM7/12/13
to Christopher Martin, scala-debate, Matthew Pocock, Hanns Holger Rutz, Chris L
No, a LinkedHashMap is ordered, but it is not a Seq. It can't be, because apply is different for maps and for sets.

 - Martin
 
- Chris

Josh Suereth

unread,
Jul 12, 2013, 1:23:53 PM7/12/13
to martin odersky, Hanns Holger Rutz, Chris L, scala-debate
On Fri, Jul 12, 2013 at 6:11 AM, martin odersky <martin....@epfl.ch> wrote:



On Thu, Jul 11, 2013 at 7:52 PM, Hanns Holger Rutz <con...@sciss.de> wrote:
"Checks if the other iterable collection contains the same elements in the same order as this general iterable collection .

Note: might return different results for different runs, unless the underlying collection type is ordered."

This comes from `GenIterableLike`. Of course, it is still confusing because of the method name, who would expect this restriction, and moreover its completely uselessness in sub types such as Set?

btw. Set(1,2) == Set(2,1)   // equality works

I think the original poster in the thread is right. This should be named "sameIteration" not "sameElements".

Maybe this is a consequence from when "elements" would return the iterator...

martin odersky

unread,
Jul 12, 2013, 1:25:13 PM7/12/13
to Josh Suereth, Hanns Holger Rutz, Chris L, scala-debate
I did not dig, but, yes, you might be right.

 - Martin
 
--
You received this message because you are subscribed to the Google Groups "scala-debate" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-debate...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 
Reply all
Reply to author
Forward
0 new messages