__hash__ and ordered vs. unordered collections

51 views
Skip to first unread message

Josh B.

unread,
Nov 20, 2017, 12:47:15 PM11/20/17
to
Suppose we're implementing an immutable collection type that comes in unordered and ordered flavors. Let's call them MyColl and MyOrderedColl.

We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).

But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff list(some_elements) == list(other_elements).

This works just like dict and collections.OrderedDict, in other words.

Since our collection types are immutable, let's say we want to implement __hash__.

We must ensure that our __hash__ results are consistent with __eq__. That is, we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements), then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).

Now for the question: Is this useful? I ask because this leads to the following behavior:

>>> unordered = MyColl([1, 2, 3])
>>> ordered = MyOrderedColl([3, 2, 1])
>>> s = {ordered, unordered}
>>> len(s)
1
>>> s = {ordered}
>>> unordered in s
True
>>> # etc.

In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.

This is a bit reminiscent of:

>>> s = {1.0}
>>> True in s
True
>>> d = {1: int, 1.0: float, True: bool}
>>> len(d)
1
>>> # etc.

The first time I encountered this was a bit of an "aha", but to be clear, I think this behavior is totally right.

However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?

Thanks!

Josh

Chris Angelico

unread,
Nov 20, 2017, 1:55:26 PM11/20/17
to
On Tue, Nov 21, 2017 at 4:47 AM, Josh B. <jabr...@gmail.com> wrote:
> Now for the question: Is this useful? I ask because this leads to the following behavior:
>
>>>> unordered = MyColl([1, 2, 3])
>>>> ordered = MyOrderedColl([3, 2, 1])
>>>> s = {ordered, unordered}
>>>> len(s)
> 1
>>>> s = {ordered}
>>>> unordered in s
> True
>>>> # etc.
>
> In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.
>
> However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?
>

This isn't a consequence of __hash__, it's a consequence of __eq__.
You have declared that MyColl and MyOrderedColl are equal, therefore
only one of them stays in the set.

But what you have is the strangeness of non-transitive equality, which
is likely to cause problems.

>>> unordered = MyColl([1, 2, 3])
>>> ordered1 = MyColl([3, 2, 1])
>>> ordered2 = MyColl([2, 1, 3])

unordered is equal to each of the others, but they're not equal to
each other. So if you put them into a set, you'll get results that
depend on order. Here's a simpler form of non-transitive equality:

>>> class Oddity(int):
... def __eq__(self, other):
... if other - 5 <= self <= other + 5:
... return True
... return int(self) == other
... def __hash__(self):
... return 1
...
>>> x, y, z = Oddity(5), Oddity(10), Oddity(15)
>>> x == y, y == z, x == z
(True, True, False)
>>> {x, y, z}
{5, 15}
>>> {y, x, z}
{10}

Setting __hash__ to a constant value is safe (but inefficient); it's
all based on how __eq__ works. So the question is: are you willing to
accept the bizarre behaviour of non-transitive equality?

ChrisA

MRAB

unread,
Nov 20, 2017, 2:31:40 PM11/20/17
to
On 2017-11-20 17:47, Josh B. wrote:
> Suppose we're implementing an immutable collection type that comes in unordered and ordered flavors. Let's call them MyColl and MyOrderedColl.
>
> We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).
>
What if there are duplicate elements?

Should that be MyColl(some_elements) == MyOrderedColl(other_elements)
iff len(some_elements) == len(other_elements) and set(some_elements) ==
set(other_elements)?
If MyColl(some_elements) == MyOrderedColl(other_elements), then
len({MyColl(some_elements), MyOrderedColl(other_elements)}) == 1 seems
right.

As for which one is in the set:

>>> {1, 1.0}
{1}
>>> {1.0, 1}
{1.0}

So if MyColl(some_elements) == MyOrderedColl(other_elements), then
{MyColl(some_elements), MyOrderedColl(other_elements)} ==
{MyColl(some_elements)}.

Josh B.

unread,
Nov 20, 2017, 2:50:47 PM11/20/17
to
On Monday, November 20, 2017 at 1:55:26 PM UTC-5, Chris Angelico wrote:
> But what you have is the strangeness of non-transitive equality, which
> is likely to cause problems.

But this is exactly how Python's built-in dict and OrderedDict behave:

>>> od = OrderedDict([(1, 0), (2, 0), (3, 0)])
>>> od2 = OrderedDict([(3, 0), (2, 0), (1, 0)])
>>> ud = dict(od)
>>> od == ud
True
>>> od2 == ud
True
>>> od == od2
False


Given that, it would seem wrong for our MyOrderedColl.__eq__ to not behave similarly.

Or are you suggesting that OrderedDict.__eq__ should not have been implemented this way in the first place?


> So the question is: are you willing to
> accept the bizarre behaviour of non-transitive equality?

Forget what I'm personally willing to do :)
The question here actually is to tease out what Python's existing design is telling us to do.

If it helps, substitute "frozenset" for "MyColl" and "FrozenOrderedSet" for "MyOrderedColl". How would you implement their __eq__ methods? What would be the correct design for our hypothetical frozen(ordered)set library? What would be more useful, intuitive, and usable for our users?

Thanks very much for the good examples and for helping me clarify the question!

Josh B.

unread,
Nov 20, 2017, 2:53:10 PM11/20/17
to
On Monday, November 20, 2017 at 2:31:40 PM UTC-5, MRAB wrote:
> What if there are duplicate elements?
>
> Should that be MyColl(some_elements) == MyOrderedColl(other_elements)
> iff len(some_elements) == len(other_elements) and set(some_elements) ==
> set(other_elements)?

Yes, that's what I meant. Thanks for catching :)

Please let me know if you have any thoughts on how you would design our hypothetical frozen(ordered)set library to behave in these cases.

Chris Angelico

unread,
Nov 20, 2017, 3:17:49 PM11/20/17
to
What I'm saying is that non-transitive equality can cause a lot of
confusion in sets/dicts; since OrderedDict and dict are unhashable,
they won't themselves be problematic, and Python doesn't have a
built-in FrozenOrderedSet. So there isn't really a precedent here, and
it's up to you to decide how you want to deal with this.

Basically, you're going to have to accept one of two situations:
* Either your class doesn't behave the same way dict and OD do
* Or your class, when put into a set, depends on ordering.

Neither is perfect. You have to take your pick between them.

ChrisA

Josh B.

unread,
Nov 21, 2017, 12:44:07 PM11/21/17
to
On Monday, November 20, 2017 at 3:17:49 PM UTC-5, Chris Angelico wrote:
> Neither is perfect. You have to take your pick between them.

Right on, thanks for weighing in, Chris. Your responses have been very helpful.

I wouldn't feel comfortable claiming the authority to make this call alone. But fortunately I reached out to Raymond Hettinger and am delighted to have his guidance, pasted below. Great to have this predicament resolved.

In case of interest, I've implemented Raymond's advice in the latest release of bidict, the bidirectional map library I authored <http://bidict.rtfd.io>. Feedback always welcome.

Thanks,
Josh

---------- Forwarded message ----------
From: Raymond Hettinger <r...@....com>
Date: Mon, Nov 20, 2017 at 4:46 PM
Subject: Re: __hash__ and ordered vs. unordered collections
To: j...@math.brown.edu


If you want to make ordered and unordered collections interoperable, I would just let equality be unordered all the time. You can always provide a separate method for an ordered_comparison.

IMO, the design for __eq__ in collections.OrderedDict was a mistake. It violates the Liskov Substitution Principle which would let ordered dicts always be substituted whereever regular dicts were expected. It is simpler to have comparisons be unordered everywhere.

But there are no perfect solutions. A user of an ordered collection may rightfully expect an ordered comparison, while a user of both collections may rightfully expect them to be mutually substitutable.


Raymond
Reply all
Reply to author
Forward
0 new messages