41 views

Skip to first unread message

Nov 20, 2017, 12:47:15 PM11/20/17

to

Suppose we're implementing an immutable collection type that comes in unordered and ordered flavors. Let's call them MyColl and MyOrderedColl.

We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).

But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff list(some_elements) == list(other_elements).

This works just like dict and collections.OrderedDict, in other words.

Since our collection types are immutable, let's say we want to implement __hash__.

We must ensure that our __hash__ results are consistent with __eq__. That is, we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements), then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).

Now for the question: Is this useful? I ask because this leads to the following behavior:

>>> unordered = MyColl([1, 2, 3])

>>> ordered = MyOrderedColl([3, 2, 1])

>>> s = {ordered, unordered}

>>> len(s)

1

>>> s = {ordered}

>>> unordered in s

True

>>> # etc.

In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.

This is a bit reminiscent of:

>>> s = {1.0}

>>> True in s

True

>>> d = {1: int, 1.0: float, True: bool}

>>> len(d)

1

>>> # etc.

The first time I encountered this was a bit of an "aha", but to be clear, I think this behavior is totally right.

However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?

Thanks!

Josh

We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).

But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff list(some_elements) == list(other_elements).

This works just like dict and collections.OrderedDict, in other words.

Since our collection types are immutable, let's say we want to implement __hash__.

We must ensure that our __hash__ results are consistent with __eq__. That is, we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements), then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).

Now for the question: Is this useful? I ask because this leads to the following behavior:

>>> unordered = MyColl([1, 2, 3])

>>> ordered = MyOrderedColl([3, 2, 1])

>>> s = {ordered, unordered}

>>> len(s)

1

>>> s = {ordered}

>>> unordered in s

True

>>> # etc.

In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.

This is a bit reminiscent of:

>>> s = {1.0}

>>> True in s

True

>>> d = {1: int, 1.0: float, True: bool}

>>> len(d)

1

>>> # etc.

The first time I encountered this was a bit of an "aha", but to be clear, I think this behavior is totally right.

However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?

Thanks!

Josh

Nov 20, 2017, 1:55:26 PM11/20/17

to

On Tue, Nov 21, 2017 at 4:47 AM, Josh B. <jabr...@gmail.com> wrote:

> Now for the question: Is this useful? I ask because this leads to the following behavior:

>

>>>> unordered = MyColl([1, 2, 3])

>>>> ordered = MyOrderedColl([3, 2, 1])

>>>> s = {ordered, unordered}

>>>> len(s)

> 1

>>>> s = {ordered}

>>>> unordered in s

> True

>>>> # etc.

>

> In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.

>

> Now for the question: Is this useful? I ask because this leads to the following behavior:

>

>>>> unordered = MyColl([1, 2, 3])

>>>> ordered = MyOrderedColl([3, 2, 1])

>>>> s = {ordered, unordered}

>>>> len(s)

> 1

>>>> s = {ordered}

>>>> unordered in s

> True

>>>> # etc.

>

> In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.

>

> However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?

>

This isn't a consequence of __hash__, it's a consequence of __eq__.
>

You have declared that MyColl and MyOrderedColl are equal, therefore

only one of them stays in the set.

But what you have is the strangeness of non-transitive equality, which

is likely to cause problems.

>>> unordered = MyColl([1, 2, 3])

>>> ordered2 = MyColl([2, 1, 3])

unordered is equal to each of the others, but they're not equal to

each other. So if you put them into a set, you'll get results that

depend on order. Here's a simpler form of non-transitive equality:

>>> class Oddity(int):

... def __eq__(self, other):

... if other - 5 <= self <= other + 5:

... return True

... return int(self) == other

... def __hash__(self):

... return 1

...

>>> x, y, z = Oddity(5), Oddity(10), Oddity(15)

>>> x == y, y == z, x == z

(True, True, False)

>>> {x, y, z}

{5, 15}

>>> {y, x, z}

{10}

Setting __hash__ to a constant value is safe (but inefficient); it's

all based on how __eq__ works. So the question is: are you willing to

accept the bizarre behaviour of non-transitive equality?

ChrisA

Nov 20, 2017, 2:31:40 PM11/20/17

to

On 2017-11-20 17:47, Josh B. wrote:

> Suppose we're implementing an immutable collection type that comes in unordered and ordered flavors. Let's call them MyColl and MyOrderedColl.

>

> We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).

>

What if there are duplicate elements?
> Suppose we're implementing an immutable collection type that comes in unordered and ordered flavors. Let's call them MyColl and MyOrderedColl.

>

> We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).

>

Should that be MyColl(some_elements) == MyOrderedColl(other_elements)

iff len(some_elements) == len(other_elements) and set(some_elements) ==

set(other_elements)?

len({MyColl(some_elements), MyOrderedColl(other_elements)}) == 1 seems

right.

As for which one is in the set:

>>> {1, 1.0}

{1}

>>> {1.0, 1}

{1.0}

So if MyColl(some_elements) == MyOrderedColl(other_elements), then

{MyColl(some_elements), MyOrderedColl(other_elements)} ==

{MyColl(some_elements)}.

Nov 20, 2017, 2:50:47 PM11/20/17

to

On Monday, November 20, 2017 at 1:55:26 PM UTC-5, Chris Angelico wrote:

> But what you have is the strangeness of non-transitive equality, which

> is likely to cause problems.

But this is exactly how Python's built-in dict and OrderedDict behave:
> But what you have is the strangeness of non-transitive equality, which

> is likely to cause problems.

>>> od = OrderedDict([(1, 0), (2, 0), (3, 0)])

>>> od2 = OrderedDict([(3, 0), (2, 0), (1, 0)])

>>> ud = dict(od)

>>> od == ud

True

>>> od2 == ud

True

>>> od == od2

False

Given that, it would seem wrong for our MyOrderedColl.__eq__ to not behave similarly.

Or are you suggesting that OrderedDict.__eq__ should not have been implemented this way in the first place?

> So the question is: are you willing to

> accept the bizarre behaviour of non-transitive equality?

The question here actually is to tease out what Python's existing design is telling us to do.

If it helps, substitute "frozenset" for "MyColl" and "FrozenOrderedSet" for "MyOrderedColl". How would you implement their __eq__ methods? What would be the correct design for our hypothetical frozen(ordered)set library? What would be more useful, intuitive, and usable for our users?

Thanks very much for the good examples and for helping me clarify the question!

Nov 20, 2017, 2:53:10 PM11/20/17

to

On Monday, November 20, 2017 at 2:31:40 PM UTC-5, MRAB wrote:

> What if there are duplicate elements?

>

> Should that be MyColl(some_elements) == MyOrderedColl(other_elements)

> iff len(some_elements) == len(other_elements) and set(some_elements) ==

> set(other_elements)?

Yes, that's what I meant. Thanks for catching :)
> What if there are duplicate elements?

>

> Should that be MyColl(some_elements) == MyOrderedColl(other_elements)

> iff len(some_elements) == len(other_elements) and set(some_elements) ==

> set(other_elements)?

Please let me know if you have any thoughts on how you would design our hypothetical frozen(ordered)set library to behave in these cases.

Nov 20, 2017, 3:17:49 PM11/20/17

to

confusion in sets/dicts; since OrderedDict and dict are unhashable,

they won't themselves be problematic, and Python doesn't have a

built-in FrozenOrderedSet. So there isn't really a precedent here, and

it's up to you to decide how you want to deal with this.

Basically, you're going to have to accept one of two situations:

* Either your class doesn't behave the same way dict and OD do

* Or your class, when put into a set, depends on ordering.

Neither is perfect. You have to take your pick between them.

ChrisA

Nov 21, 2017, 12:44:07 PM11/21/17

to

On Monday, November 20, 2017 at 3:17:49 PM UTC-5, Chris Angelico wrote:

> Neither is perfect. You have to take your pick between them.

Right on, thanks for weighing in, Chris. Your responses have been very helpful.
> Neither is perfect. You have to take your pick between them.

I wouldn't feel comfortable claiming the authority to make this call alone. But fortunately I reached out to Raymond Hettinger and am delighted to have his guidance, pasted below. Great to have this predicament resolved.

In case of interest, I've implemented Raymond's advice in the latest release of bidict, the bidirectional map library I authored <http://bidict.rtfd.io>. Feedback always welcome.

Thanks,

Josh

---------- Forwarded message ----------

From: Raymond Hettinger <r...@....com>

Date: Mon, Nov 20, 2017 at 4:46 PM

Subject: Re: __hash__ and ordered vs. unordered collections

To: j...@math.brown.edu

If you want to make ordered and unordered collections interoperable, I would just let equality be unordered all the time. You can always provide a separate method for an ordered_comparison.

IMO, the design for __eq__ in collections.OrderedDict was a mistake. It violates the Liskov Substitution Principle which would let ordered dicts always be substituted whereever regular dicts were expected. It is simpler to have comparisons be unordered everywhere.

But there are no perfect solutions. A user of an ordered collection may rightfully expect an ordered comparison, while a user of both collections may rightfully expect them to be mutually substitutable.

Raymond

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu