Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

What is the correct way to define __hash__?

40 views
Skip to first unread message

Peng Yu

unread,
Oct 12, 2009, 4:45:30 PM10/12/09
to pytho...@python.org
Hi,

I'm wondering what is the general way to define __hash__. I could add
up all the members. But I am wondering if this would cause a
performance issue for certain classes.

Regards,
Peng


#!/usr/bin/env python

class A:
def __init__(self, a, b) :
self._a = a
self._b = b

def __str__(self):
return 'A(%s, %s)' %(self._a, self._b)

__repr__ = __str__

def __cmp__(self, other):
if self._a < other._a:
return -1
elif self._a > other._a:
return 1
elif self._b < other._b:
return -1
elif self._b > other._b:
return 1
else:
return 0

def __hash__(self):
return self._a + self._b

if __name__ == '__main__':

x = A(1, 1)

aset = set()
aset.add(x)
print aset

Peng Yu

unread,
Oct 12, 2009, 4:56:41 PM10/12/09
to pytho...@python.org


What if A has a third member, which is a string? Is there a function
to convert an arbitrary string to an int?

Robert Kern

unread,
Oct 12, 2009, 5:03:49 PM10/12/09
to pytho...@python.org
On 2009-10-12 15:45 PM, Peng Yu wrote:
> Hi,
>
> I'm wondering what is the general way to define __hash__. I could add
> up all the members. But I am wondering if this would cause a
> performance issue for certain classes.

Unless if you are very familiar with the math of hash functions, I don't
recommend that you try to implement one directly. Instead, make a tuple of the
hashable content of your class and return the result of calling hash() on that
tuple. Be sure to make your equality comparison do the right thing.

class A(object):
def __init__(self, a, b):
self.a = a
self.b = b

def _key(self):
# I include the name of the class so as to differentiate between other
# classes that might also have a _key() method. If you have several classes
# or subclasses that are allowed to compare equal to each other, use some
# other common string here.
return (type(self).__name__, a, b)

def __hash__(self):
return hash(self._key())

# Coincidentally, the _key() method can usually be reused for comparisons.
# I recommend doing this for the equality comparisons, at least, when you can
# because of the requirement that two items that compare equal must have the
# same hash value.
def __eq__(self, other):
return self._key() == other._key()

def __ne__(self, other):
return not (self == other)

...

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Christian Heimes

unread,
Oct 12, 2009, 5:10:54 PM10/12/09
to pytho...@python.org
Peng Yu schrieb:

> Hi,
>
> I'm wondering what is the general way to define __hash__. I could add
> up all the members. But I am wondering if this would cause a
> performance issue for certain classes.

> def __hash__(self):
> return self._a + self._b


The hash of a tuple is based on the hash of its values. A common way to
define a hash method is:

def __hash__(self):
return hash((self._a, self._b))

Christian

Peng Yu

unread,
Oct 12, 2009, 6:12:46 PM10/12/09
to pytho...@python.org
On Mon, Oct 12, 2009 at 4:03 PM, Robert Kern <rober...@gmail.com> wrote:
> On 2009-10-12 15:45 PM, Peng Yu wrote:
>>
>> Hi,
>>
>> I'm wondering what is the general way to define __hash__. I could add
>> up all the members. But I am wondering if this would cause a
>> performance issue for certain classes.
>

Do I need to define other 4 comparison operators besides __eq__ and __ne__?

Chris Rebert

unread,
Oct 12, 2009, 6:20:43 PM10/12/09
to Peng Yu, pytho...@python.org

No; dictionaries are unordered and thus don't utilize the non-equality
comparison operators.

Cheers,
Chris
--
http://blog.rebertia.com

Robert Kern

unread,
Oct 12, 2009, 6:29:36 PM10/12/09
to pytho...@python.org

Yes. There are recipes to fill in the rest if you just provide __eq__ and __lt__:

http://code.activestate.com/recipes/576685/

Or you could continue to use the __cmp__ method, but it will be going away in
Python 3.

Steven D'Aprano

unread,
Oct 12, 2009, 10:04:20 PM10/12/09
to
On Mon, 12 Oct 2009 15:45:30 -0500, Peng Yu wrote:

> def __cmp__(self, other):
> if self._a < other._a:
> return -1
> elif self._a > other._a:
> return 1
> elif self._b < other._b:
> return -1
> elif self._b > other._b:
> return 1
> else:
> return 0

This can be simplified to:

return cmp((self._a, self._b), (other._a, other._b))

--
Steven

Chris Rebert

unread,
Oct 12, 2009, 10:16:51 PM10/12/09
to Steven D'Aprano, pytho...@python.org

Assuming you're not using Python 3.x that is.

greg

unread,
Oct 13, 2009, 8:09:23 PM10/13/09
to

If you're using Python 3, you won't be writing a __cmp__
method in the first place.

--
Greg

0 new messages