Rich comparison methods don't work in sets?

1 view
Skip to first unread message

Gustavo Narea

unread,
Jun 19, 2009, 3:02:44 PM6/19/09
to
Hello, everyone.

I've noticed that if I have a class with so-called "rich comparison"
methods
(__eq__, __ne__, etc.), when its instances are included in a set,
set.__contains__/__eq__ won't call the .__eq__ method of the elements
and thus
the code below:
"""
obj1 = RichComparisonClass()
obj2 = RichComparisonClass()

set1 = set([obj1])
set2 = set([obj2])

if obj1 == obj2:
print "Objects 1 and 2 are equivalent"
else:
print "Objects 1 and 2 aren't equivalent"

if set1 == set2:
print "Sets 1 and 2 are equivalent"
else:
print "Sets 1 and 2 aren't equivalent"
"""

Would output:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 aren't equivalent
"""

instead of:
"""
Objects 1 and 2 are equivalent
Sets 1 and 2 are equivalent
"""

How can I work around this? The only solution that comes to my mind is
to
write a function like this:
"""
def same_sets(set1, set2):
if len(set1) != len(set2): return False
for element1 in set1:
found = False
for element2 in set2:
if element1 == element2:
found = True
break;
if not found:
return False
return True
"""

But I see it pretty ugly; also I'd also have to roll out my own
"contains"
(e.g., `element in set`) function.

I expect and hope there's a pythonic way to do this. Does it exist?

Thanks in advance!

Matthew Wilson

unread,
Jun 19, 2009, 3:26:03 PM6/19/09
to
On Fri 19 Jun 2009 03:02:44 PM EDT, Gustavo Narea wrote:
> Hello, everyone.
>
> I've noticed that if I have a class with so-called "rich comparison"
> methods
> (__eq__, __ne__, etc.), when its instances are included in a set,
> set.__contains__/__eq__ won't call the .__eq__ method of the elements
> and thus
> the code below:
> """
> obj1 = RichComparisonClass()
> obj2 = RichComparisonClass()

What does

>>> obj1 is obj2

return? I don't know anything about set internals.

Matt

Peter Otten

unread,
Jun 19, 2009, 3:37:17 PM6/19/09
to
Gustavo Narea wrote:

> I've noticed that if I have a class with so-called "rich comparison"
> methods
> (__eq__, __ne__, etc.), when its instances are included in a set,
> set.__contains__/__eq__ won't call the .__eq__ method of the elements
> and thus
> the code below:
> """
> obj1 = RichComparisonClass()

How is RichComparisonClass implemented? Did you provide a __hash__() method
such that obj1 == obj2 implies hash(obj1) == hash(obj2)? This is necessary
for instances of the class to work properly with sets and dicts.

Peter

Chris Rebert

unread,
Jun 19, 2009, 3:42:49 PM6/19/09
to Gustavo Narea, pytho...@python.org

Note that sets are dict-based and thus use hash tables internally.
Thus, you must implement a sensible __hash__ method.
The default __hash__ provided by class `object` uses the object ID for the hash.

Since id(x) == id(y) iff x is y, and
hash(x) != hash(y) implies x != y,
If you don't implement __hash__, object's implementation causes every
distinct object of your type to be considered unequal a priori, so it
doesn't bother to check any further by calling the comparison methods.

Cheers,
Chris
--
http://blog.rebertia.com

Aaron Brady

unread,
Jun 19, 2009, 5:00:49 PM6/19/09
to
On Jun 19, 12:42 pm, Chris Rebert <c...@rebertia.com> wrote:
> On Fri, Jun 19, 2009 at 12:02 PM, Gustavo Narea<m...@gustavonarea.net> wrote:
> > Hello, everyone.
>
> > I've noticed that if I have a class with so-called "rich comparison"
> > methods
> > (__eq__, __ne__, etc.), when its instances are included in a set,
> > set.__contains__/__eq__ won't call the .__eq__ method of the elements
> > and thus
> > the code below:
snip

You're using sets for uniqueness and faster lookups than a list or
other collection. Uniqueness is determined by hash key first, then by
identity (due to an idiosyncrasy of 'RichCompareBool'), then by rich
equality. If you want to compare to every element, you can't use
sets, because they take short-cuts by omitting many of the
comparisons.

Gustavo Narea

unread,
Jun 20, 2009, 11:50:41 AM6/20/09
to
Hello again, everybody.

Thank you very much for your responses. You guessed right, I didn't
use the __hash__ method (and I forgot to mention that, sorry).

And unfortunately, I think I can't make them hashable, because the
objects are compared based on their attributes, which are in turn
other kind of objects compared based on other attributes. All these
class instances are compared with __eq__/__ne__ and they wrap
relatively complex data which would be hard to attempt to represent
them unambiguously using a 32-bit integer. That's why I'm afraid I
cannot use hashables.

I guess I'll have to use something like the function of my first
post. :(

Thanks,

- Gustavo.

Steven D'Aprano

unread,
Jun 20, 2009, 12:13:04 PM6/20/09
to
Gustavo Narea wrote:

> Hello again, everybody.
>
> Thank you very much for your responses. You guessed right, I didn't
> use the __hash__ method (and I forgot to mention that, sorry).
>
> And unfortunately, I think I can't make them hashable, because the
> objects are compared based on their attributes, which are in turn
> other kind of objects compared based on other attributes. All these
> class instances are compared with __eq__/__ne__ and they wrap
> relatively complex data which would be hard to attempt to represent
> them unambiguously using a 32-bit integer.

There is no need for hash to represent the data unambiguously.

>>> hash(-1)
-2
>>> hash(-2)
-2
>>> hash(2.0**32 - 1)
-2


>>> hash(1)
1
>>> hash(2.0**64 - 1)
1
>>> hash(2.0**64 + 1)
1


>>> hash(2)
2
>>> hash(1+2**32)
2
>>> hash(1+2**64)
2
>>> hash(1+2**128)
2
>>> hash(2.0)
2

The rule is, if x and y are equal, then hash(x) should equal hash(y). This
does NOT imply that if hash(x) == hash(y), then x must equal y, nor is
there any requirement for every unique piece of data to have a unique hash.

--
Steven

MRAB

unread,
Jun 20, 2009, 12:27:45 PM6/20/09
to pytho...@python.org
A hash doesn't have to be unambiguous. It's just a way to reduce the
number of equality checks that have to be made.

Could you create a hash from a tuple of attributes?

If all else fails you could define a hash function that returns a
constant. You would, however, lose the speed advantage that a hash
gives in narrowing down the possible matches.

Gabriel Genellina

unread,
Jun 20, 2009, 12:28:28 PM6/20/09
to pytho...@python.org
En Sat, 20 Jun 2009 12:50:41 -0300, Gustavo Narea <m...@gustavonarea.net>
escribiᅵ:

> Thank you very much for your responses. You guessed right, I didn't
> use the __hash__ method (and I forgot to mention that, sorry).
>
> And unfortunately, I think I can't make them hashable, because the
> objects are compared based on their attributes, which are in turn
> other kind of objects compared based on other attributes. All these
> class instances are compared with __eq__/__ne__ and they wrap
> relatively complex data which would be hard to attempt to represent
> them unambiguously using a 32-bit integer. That's why I'm afraid I
> cannot use hashables.

Combine the hash values of whatever objects are involved in __eq__ but
make sure they cannot change (in that case the hash value would be
invalid).
No need for hash to be unambiguous - objects that compare equal must have
the same hash value, but objects with the same hash value may be different.

--
Gabriel Genellina

Robert Kern

unread,
Jun 20, 2009, 4:54:14 PM6/20/09
to pytho...@python.org
On 2009-06-20 11:27, MRAB wrote:
> A hash doesn't have to be unambiguous. It's just a way to reduce the
> number of equality checks that have to be made.
>
> Could you create a hash from a tuple of attributes?

A typical way to implement this is to make a method that returns a tuple of
attributes to use in both the __eq__ and __hash__ methods.

class Foo(object):

def key(self):
return (self.x, self.y, self.bar)

def __eq__(self, other):
return self.key() == other.key()

def __ne__(self, other):
return not (self == other)

def __hash__(self):
return hash(self.key())


As long as those attributes are also hashable, this usually works well.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Aaron Brady

unread,
Jun 20, 2009, 6:39:00 PM6/20/09
to

Ooo sneaky. +1 fancy.

> You would, however, lose the speed advantage that a hash
> gives in narrowing down the possible matches.

Are there /any/ immutable members of the objects?

Terry Reedy

unread,
Jun 20, 2009, 7:22:59 PM6/20/09
to pytho...@python.org
Gustavo Narea wrote:
> Hello again, everybody.
>
> Thank you very much for your responses. You guessed right, I didn't
> use the __hash__ method (and I forgot to mention that, sorry).
>
> And unfortunately, I think I can't make them hashable, because the
> objects are compared based on their attributes, which are in turn
> other kind of objects compared based on other attributes. All these
> class instances are compared with __eq__/__ne__ and they wrap
> relatively complex data which would be hard to attempt to represent
> them unambiguously using a 32-bit integer. That's why I'm afraid I
> cannot use hashables.

If the result of 'o1 == o2' changes over time, then the objects are not
very suitable as set members.

Robert Kern

unread,
Jun 20, 2009, 7:32:50 PM6/20/09
to pytho...@python.org
On 2009-06-20 18:22, Terry Reedy wrote:

> Gustavo Narea wrote:
>> Hello again, everybody.
>>
>> Thank you very much for your responses. You guessed right, I didn't
>> use the __hash__ method (and I forgot to mention that, sorry).
>>
>> And unfortunately, I think I can't make them hashable, because the
>> objects are compared based on their attributes, which are in turn
>> other kind of objects compared based on other attributes. All these
>> class instances are compared with __eq__/__ne__ and they wrap
>> relatively complex data which would be hard to attempt to represent
>> them unambiguously using a 32-bit integer. That's why I'm afraid I
>> cannot use hashables.
>
> If the result of 'o1 == o2' changes over time, then the objects are not
> very suitable as set members.

Sometimes, I think it would be nice if sets and the other containers could be
modified to accept a user-supplied hash and comparison functions on
construction. A key function like list.sort() would probably also work. Even
when you have objects that aren't hashable in general, there are many times when
you know you will not be modifying them in a given context. Being able to do set
operations on them in that context would be really useful.

Gustavo Narea

unread,
Jun 21, 2009, 8:27:32 AM6/21/09
to
Hi, everyone.

OK, I got it now! The value of the hash is not decisive, as __eq__
will still be called when the hashes match. It's like a filter, for
performance reasons.

It's really nice, I just tried it and it worked.

Thank you very, very much!!

Cheers,

- Gustavo.

Reply all
Reply to author
Forward
0 new messages