Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

hashability

4 views
Skip to first unread message

James Stroud

unread,
Aug 11, 2009, 8:54:36 PM8/11/09
to
Hello All,

I wrote the function to test hashability of arbitrary objects. My reason
is that the built-in python (2.5) hashing is too permissive for some
uses. A symptom of this permissiveness comes from the ability to
successfully hash() arbitrary objects:

py> class C(object): pass
...
py> {C():4}[C()]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

The basis for the exception is that the two instances do not have the
same hash() although conceptually they might seem equal to the
unitiated. Were I to re-design python, I'd throw an exception in this
case because of the ill-defined behavior one might expect if a C()
serves as a key for a dict.

To prevent users of one of my libraries from falling into this and
similar traps (which have potentially problematic consequences), I came
up with this test for hashability:

def hashable(k):
try:
hash(k)
except TypeError:
good = False
else:
good = (hasattr(k, '__hash__') and
(hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
return good

It works as I would like for most of the cases I can invent:

py> all(map(hashable, [1,1.0,"",(1,2,3)]))
True
py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
False

Can anyone think of boundary cases I might be missing with this approach?


James

Carl Banks

unread,
Aug 11, 2009, 9:40:15 PM8/11/09
to
On Aug 11, 5:54 pm, James Stroud <jstr...@mbi.ucla.edu> wrote:
> Hello All,
>
> I wrote the function to test hashability of arbitrary objects. My reason
> is that the built-in python (2.5) hashing is too permissive for some
> uses. A symptom of this permissiveness comes from the ability to
> successfully hash() arbitrary objects:
>
>    py> class C(object): pass
>    ...
>    py> {C():4}[C()]
>    ------------------------------------------------------------
>    Traceback (most recent call last):
>      File "<ipython console>", line 1, in <module>
>    <type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>
>
> The basis for the exception is that the two instances do not have the
> same hash() although conceptually they might seem equal to the
> unitiated. Were I to re-design python, I'd throw an exception in this
> case because of the ill-defined behavior one might expect if a C()
> serves as a key for a dict.

That's arguably the right thing to do.

Personally I've found that being able to use class instances as
hashable objects to be terribly useful (these objects are hashed and
compared by identity, of course), so I don't mind it. But I can
definitely see how this straddles the line between "practicality" and
"face of ambiguity". And if Python didn't do it by default, it would
be little trouble to add the appropriate __eq__ and __hash__ methods.


> To prevent users of one of my libraries from falling into this and
> similar traps (which have potentially problematic consequences),

Even so, I would consider whether some of your users might, like me,
also find this terribly useful, and if so (probably a few will unless
they are all novices), allow them to disable or disregard this check.

> I came
> up with this test for hashability:
>
> def hashable(k):
>    try:
>      hash(k)
>    except TypeError:
>      good = False
>    else:
>      good = (hasattr(k, '__hash__') and
>              (hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
>    return good

I wouldn't call the function "hashable". Class instances like C() are
hashable whether you approve or not. Something like
"deliberately_hashable" would be a better name.


> It works as I would like for most of the cases I can invent:
>
>    py> all(map(hashable, [1,1.0,"",(1,2,3)]))
>    True
>    py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
>    False
>
> Can anyone think of boundary cases I might be missing with this approach?

It is possible to redefine == operator by defining __ne__ instead of
__eq__, at least on Python 2.5, so you should keep that in mind.


Carl Banks

Asun Friere

unread,
Aug 11, 2009, 10:06:54 PM8/11/09
to
On Aug 12, 10:54 am, James Stroud <jstr...@mbi.ucla.edu> wrote:

> I wrote the function to test hashability of arbitrary objects. My reason
> is that the built-in python (2.5) hashing is too permissive for some
> uses. A symptom of this permissiveness comes from the ability to
> successfully hash() arbitrary objects:

Arbitrary, or anonymous objects and some uses or some users? I'm
can't see why anyone would expect different instance of a class to be
equivalent keys.

> The basis for the exception is that the two instances do not have the
> same hash() although conceptually they might seem equal to the
> unitiated.

Perhaps the best solution would be for the unitiated to correct their
misaprehensions? If you don't understand that you are instantiating a
number of anonymous instances of a class you are missing something
very fundamental.

> Were I to re-design python, I'd throw an exception in this
> case because of the ill-defined behavior one might expect if a C()
> serves as a key for a dict.

Then you couldn't to this:

d = {C():1, C():2, C():3}
a,b,c = d.keys()
d[c]

Anonymous instances are a GoodThing(tm) and they can usually be de-
anonymised if need be.

James Stroud

unread,
Aug 11, 2009, 10:15:07 PM8/11/09
to
Carl Banks wrote:
> On Aug 11, 5:54 pm, James Stroud <jstr...@mbi.ucla.edu> wrote:
>
>> To prevent users of one of my libraries from falling into this and
>> similar traps (which have potentially problematic consequences),
>
> Even so, I would consider whether some of your users might, like me,
> also find this terribly useful, and if so (probably a few will unless
> they are all novices), allow them to disable or disregard this check.

I realize I left out my use. The intent of the function is to flag
objects that will make useful keys for a persistent dictionary. The
{C():4}[C()] example demonstrates why I want to avoid certain types of
keys--I don't want users to do things like {C():4, C():4}, which python
happily allows but falls apart at the level of persistence.

However, I also want to avoid isinstance() and/or type checking in order
to allow maximal flexibility so that users don't have to work too hard
to make their keys.

> I wouldn't call the function "hashable". Class instances like C() are
> hashable whether you approve or not. Something like
> "deliberately_hashable" would be a better name.

I chose "keyable".

>> Can anyone think of boundary cases I might be missing with this approach?
>
> It is possible to redefine == operator by defining __ne__ instead of
> __eq__, at least on Python 2.5, so you should keep that in mind.

Thanks for this hint. I've already put it in.

James

Asun Friere

unread,
Aug 11, 2009, 10:24:39 PM8/11/09
to
On Aug 12, 12:15 pm, James Stroud <jstr...@mbi.ucla.edu> wrote:

> I realize I left out my use. The intent of the function is to flag
> objects that will make useful keys for a persistent dictionary. The
> {C():4}[C()] example demonstrates why I want to avoid certain types of
> keys--I don't want users to do things like {C():4, C():4}, which python
> happily allows but falls apart at the level of persistence.

What am I missing here? How, in terms of persistence, is {C():4, C():
4} conceptually different from {'spam':4, 'ham':4}?

James Stroud

unread,
Aug 12, 2009, 1:32:50 AM8/12/09
to

You should be more imaginative.

James Stroud

unread,
Aug 12, 2009, 1:33:50 AM8/12/09
to
Asun Friere wrote:
> Perhaps the best solution would be for the unitiated to correct their
> misaprehensions

Can you come give a class to my users?

Chris Rebert

unread,
Aug 12, 2009, 1:52:51 AM8/12/09
to Asun Friere, pytho...@python.org
2009/8/11 Asun Friere <afr...@yahoo.co.uk>:

Thought Experiment:
Consider, for each dict, the case of pickling it twice to 2 separate files.
When loaded from both files into the same program, the spam-ham dicts
will work as expected between each other.
The dicts with C()s will not. For example, assuming the unpickled
dicts are `cs1` and `cs2`, cs1[cs2.keys()[0]] will raise KeyError.

Cheers,
Chris
--
http://blog.rebertia.com

Asun Friere

unread,
Aug 12, 2009, 2:18:59 AM8/12/09
to
On Aug 12, 3:32 pm, James Stroud <nospamjstroudmap...@mbi.ucla.edu>
wrote:

> You should be more imaginative.

I'm by no means discounting that there might be some actual problem
you're trying to solve here, but I honestly can't see it.

There really is no need to get personal about this, so rather than
asking for a level of imagination from me, (which I apparently lack),
please just explain to me how {one_instance_of_a_hashable_class : val,
another_instance_of_a_hashable_class :val} is conceptually different
{one_instance_of_class_str: val, another_instance_of_class_str: val},
in terms of persistence.

If I am missing something here, I would actually like to know. If on
the other hand, I'm not, then rather at taking umbrage, you might want
instead to save yourself the effort of solving a non-existent problem?

> Can you come give a class to my users?

No.

However, I think it's fairly central to the notion of a class that it
is a template for creating different instances which themselves have a
unique identity. And that subsequent calls to a class' constructor
ought to create unique instances of that class (subject perhaps to
implementation tricks such as interning). If it is not obvious that {C
():4}[C()] invovles subsequent calls to C's constructor, then that
very example is itself didactic.

Chris Rebert

unread,
Aug 12, 2009, 2:25:04 AM8/12/09
to Asun Friere, pytho...@python.org
2009/8/11 Asun Friere <afr...@yahoo.co.uk>:

Consider the case of pickling the data twice to 2 separate files.


When loaded from both files into the same program, the spam-ham dicts

will work as expected.
The C()s will not. For cs1[cs2.keys()[0]] will raise KeyError.

Chris Rebert

unread,
Aug 12, 2009, 2:26:31 AM8/12/09
to Asun Friere, pytho...@python.org
On Wed, Aug 12, 2009 at 2:25 AM, Chris Rebert<cl...@rebertia.com> wrote:
> 2009/8/11 Asun Friere <afr...@yahoo.co.uk>:

>> On Aug 12, 12:15 pm, James Stroud <jstr...@mbi.ucla.edu> wrote:

Apologies for the possible repeated post. Gmail failed to mark the
draft as sent for some reason.

- Chris

Asun Friere

unread,
Aug 12, 2009, 2:29:41 AM8/12/09
to
On Aug 12, 3:52 pm, Chris Rebert <ch...@rebertia.com> wrote:

> Thought Experiment:
> Consider, for each dict, the case of pickling it twice to 2 separate files.
> When loaded from both files into the same program, the spam-ham dicts
> will work as expected between each other.
> The dicts with C()s will not. For example, assuming the unpickled
> dicts are `cs1` and `cs2`, cs1[cs2.keys()[0]] will raise KeyError.
>

Aha!

cheers.

David Stanek

unread,
Aug 12, 2009, 2:31:55 AM8/12/09
to Asun Friere, pytho...@python.org
On Wed, Aug 12, 2009 at 2:18 AM, Asun Friere<afr...@yahoo.co.uk> wrote:
> On Aug 12, 3:32 pm, James Stroud <nospamjstroudmap...@mbi.ucla.edu>
> wrote:
>
>> You should be more imaginative.
>
> I'm by no means discounting that there might be some actual problem
> you're trying to solve here, but I honestly can't see it.

How about a cache? Hashing by id means nothing across machines or even
process runs on the same machine.

--
David
blog: http://www.traceback.org
twitter: http://twitter.com/dstanek

James Stroud

unread,
Aug 12, 2009, 2:52:17 AM8/12/09
to
Asun Friere wrote:
> On Aug 12, 3:32 pm, James Stroud <nospamjstroudmap...@mbi.ucla.edu>
> wrote:
>
>> You should be more imaginative.
>
> I'm by no means discounting that there might be some actual problem
> you're trying to solve here, but I honestly can't see it.
>
> There really is no need to get personal about this, so rather than
> asking for a level of imagination from me, (which I apparently lack),
> please just explain to me how {one_instance_of_a_hashable_class : val,
> another_instance_of_a_hashable_class :val} is conceptually different
> {one_instance_of_class_str: val, another_instance_of_class_str: val},
> in terms of persistence.

Sorry for being a twit. This list used to be quite nice but some people
on this list got pretty abrasive. I couldn't tell you weren't one of
these abrasive people from your post. I stopped giving the benefit of
the doubt many moons ago and promptly enter attack mode at the slightest
hint of abrasiveness. My attitude probably exacerbates the problem--but
it sure does make me feel better.


Anyway, I think the problem has been described better than I'm able, but
once an object goes to the file system, one can not take its hash value
for granted. It is not possible to rely on the ability to recreate any
arbitrary object de-novo and use the recreation as a key in proxy for
the original object. I'm after this "je ne sais pas que l'appeler"
quality of objects to use as keys (or to generate keys) for persistent
dictionaries. Carl Banks suggested that this quality should not be
called "hashable", so I'm calling it "keyable".

Asun Friere

unread,
Aug 12, 2009, 3:13:50 AM8/12/09
to
On Aug 12, 4:52 pm, James Stroud <nospamjstroudmap...@mbi.ucla.edu>
wrote:

> Sorry for being a twit.

Don't be ridiculous. You haven't been a twit, I have!

I've just had a complete "blonde" moment here (with apologies to any
real blondes out there. My only excuse is that I've been up to 02:30
for the last three nights running (or maybe it's the ageing process, a
cerebrovascular accident or something).

I just checked a class I recently wrote specifically for the purposes
of hashing a dict (in case I had made this error IRL). Wouldn't you
know it, it's subclassed to tuple, and defines both __eq__ and
__cmp__. Luckily when I write production code the guy who knows what
he's doing takes over. And this in an app which compares keys from
different pickled files (representing DB holdings)?! Of all things.

I can't believe how unfathomably stupid I've been. I'm extremely
embarassed. I think I'll just go out and shoot myself now. Or get
some sleep.

Message has been deleted

Asun Friere

unread,
Aug 12, 2009, 3:25:06 AM8/12/09
to
On Aug 12, 5:14 pm, Dennis Lee Bieber <wlfr...@ix.netcom.com> wrote:

> c1 = C()
> c2 = C()
>
> {c1:4}[c2]
>
> to behave? That IS the equivalent of your statement -- two instances are
> two distinctly different entities...
>

Thankyou, that is EXACTLY the mistake I made that sent me off into
lunacy.

At the risk of further embarassment, the answer is that one would
expect it to behave analogously to:

c1 = int(x)
c2 = int(x)

{c1:4}[c2]

Or is my brain still on vacation?

James Stroud

unread,
Aug 12, 2009, 3:33:01 AM8/12/09
to
Dennis Lee Bieber wrote:
> On Tue, 11 Aug 2009 17:54:36 -0700, James Stroud <jst...@mbi.ucla.edu>
> declaimed the following in gmane.comp.python.general:

>
>> ...
>> py> {C():4}[C()]
>> ------------------------------------------------------------
>> Traceback (most recent call last):
>> File "<ipython console>", line 1, in <module>
>> <type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>
>>
>> The basis for the exception is that the two instances do not have the
>> same hash() although conceptually they might seem equal to the
>> unitiated. Were I to re-design python, I'd throw an exception in this
>> case because of the ill-defined behavior one might expect if a C()
>> serves as a key for a dict.
>>
> "A" C()... How would you expect

>
> c1 = C()
> c2 = C()
>
> {c1:4}[c2]
>
> to behave?


This seems like a subjective question. I think I demonstrated how it
*does* behave, which is purely objective--and I know I can't do anything
about that. But the subjective answer to the subjective question is that
I would like an exception to be raised when the dictionary is
constructed. I know an exception doesn't get thrown in the current
incarnation of the python language. That is the objective reality of the
situation which affronts my subjective notions of how reality should be.

That IS the equivalent of your statement -- two instances are
> two distinctly different entities...

Tell that to two different machines on two different days. Then bet the
life of yourself and your nearest and dearest family on that fact and
see whether you really want to take a hash value for granted. If a
property of the python language fails the "bet the lives of your nearest
and dearest on a consistent result" test, I call it "ill defined" and,
subjectively speaking, I prefer exceptions to be thrown--And, by damned,
I'll throw them myself if I have to.

"If it saves one life, it's worth it all."

James


Steven D'Aprano

unread,
Aug 12, 2009, 3:34:33 AM8/12/09
to
On Tue, 11 Aug 2009 17:54:36 -0700, James Stroud wrote:

> Hello All,
>
> I wrote the function to test hashability of arbitrary objects. My reason
> is that the built-in python (2.5) hashing is too permissive for some
> uses. A symptom of this permissiveness comes from the ability to
> successfully hash() arbitrary objects:

No it doesn't.


>>> hash([])


Traceback (most recent call last):

File "<stdin>", line 1, in <module>
TypeError: list objects are unhashable

Python successfully hashes *non*-arbitrary objects, including those that
inherit from object.

> py> class C(object): pass
> ...
> py> {C():4}[C()]
> ------------------------------------------------------------
> Traceback (most recent call last):
> File "<ipython console>", line 1, in <module>
> <type 'exceptions.KeyError'>: <__main__.C object at 0xe21610>

Why would you expect that to succeed? The object you are using as the key
is a different instance than the object you are looking up. Since your
instances don't even compare equal, why would you expect them to hash
equal?

>>> C() == C()
False

> The basis for the exception is that the two instances do not have the
> same hash() although conceptually they might seem equal to the
> unitiated.

Yes, well, the ignorant and uninitiated frequently make mistakes. I feel
your pain.

> Were I to re-design python, I'd throw an exception in this
> case because of the ill-defined behavior one might expect if a C()
> serves as a key for a dict.

It's not ill-defined, it's a perfectly reasonable design you just don't
happen to like. It's quite useful for some.


> To prevent users of one of my libraries from falling into this and
> similar traps (which have potentially problematic consequences), I came
> up with this test for hashability:

That's not a test for hashability. That's a test for hashability and
equality testing together.


> def hashable(k):
> try:
> hash(k)
> except TypeError:
> good = False
> else:
> good = (hasattr(k, '__hash__') and
> (hasattr(k, '__eq__') or hasattr(k, '__cmp__')))
> return good

The test for __hash__ is redundant, given that hash() has already
succeeded.

It will wrongly reject classes that deliberately don't define equality,
for whatever reason, and that *expect* the behaviour you are treating as
an error.

> It works as I would like for most of the cases I can invent:
>
> py> all(map(hashable, [1,1.0,"",(1,2,3)])) True
> py> any(map(hashable, [None, [1,2], {}, C(), __import__('sys')]))
> False

Well there you go -- why on earth would you prohibit None as a dictionary
key??? That's a serious failure.

--
Steven

Steven D'Aprano

unread,
Aug 12, 2009, 5:39:16 AM8/12/09
to
On Wed, 12 Aug 2009 00:33:01 -0700, James Stroud wrote:

> Tell that to two different machines on two different days. Then bet the
> life of yourself and your nearest and dearest family on that fact and
> see whether you really want to take a hash value for granted.

As far as I know, Python doesn't make any guarantees about hashes being
repeatable on different machines, different versions, or even different
runs of the interpreter.


> If a
> property of the python language fails the "bet the lives of your nearest
> and dearest on a consistent result" test, I call it "ill defined" and,
> subjectively speaking, I prefer exceptions to be thrown--And, by damned,
> I'll throw them myself if I have to.
>
> "If it saves one life, it's worth it all."

Depends on the life, and the cost. Would you pay a million dollars from
your own pocket to save the life of a 119 year old with advanced lung
cancer, a bad heart, and a raging infection of Ebola, from choking on a
fish bone?

What if the effort of saving that one life kills two lives? Opportunity
costs are costs too.


--
Steven

James Stroud

unread,
Aug 12, 2009, 1:37:45 PM8/12/09
to
Steven D'Aprano wrote:
> Well there you go -- why on earth would you prohibit None as a dictionary
> key??? That's a serious failure.


roentgen 1% python
Python 2.5 (r25:51908, Sep 20 2006, 17:36:21)
[GCC 3.4.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
135543872


mbi136-176 98% /usr/bin/python
Python 2.5.1 (r251:54863, Feb 6 2009, 19:02:12)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
py> hash(None)
2030240

That's why. Read the whole thread. You are one of the abrasive ones.

Chris Rebert

unread,
Aug 12, 2009, 1:50:33 PM8/12/09
to James Stroud, pytho...@python.org
On Wed, Aug 12, 2009 at 1:37 PM, James Stroud<jst...@mbi.ucla.edu> wrote:
> Steven D'Aprano wrote:
>>
>> Well there you go -- why on earth would you prohibit None as a dictionary
>> key??? That's a serious failure.
>
>
> roentgen 1% python
> Python 2.5 (r25:51908, Sep 20 2006, 17:36:21) [GCC 3.4.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> py> hash(None)
> 135543872
>
>
> mbi136-176 98% /usr/bin/python
> Python 2.5.1 (r251:54863, Feb  6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc.
> build 5465)] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> py> hash(None)
> 2030240

Actually, None is a special-case as a built-in singleton value --
there's only ever *exactly one* instance of it in a given interpreter
session. And I'm reasonably sure dict pickles don't store the hash
code of items (the dict gets recreated from scratch), so there's no
problem.

Steven D'Aprano

unread,
Aug 12, 2009, 2:11:40 PM8/12/09
to

I've read the whole thread. Pay close attention:

[steve@ando ~]$ python
Python 2.4.3 (#1, Mar 14 2007, 18:51:08)
[GCC 4.1.1 20070105 (Red Hat 4.1.1-52)] on linux2


Type "help", "copyright", "credits" or "license" for more information.

>>> import pickle
>>> d = {None: 42}
>>> f = open('pickled_dict', 'w')
>>> pickle.dump(d, f)
>>> f.close()
>>>
[steve@ando ~]$ ssh sylar
steve@sylar's password:
Last login: Wed Aug 12 21:44:47 2009
[steve@sylar ~]$ python2.6
Python 2.6.1 (r261:67515, Dec 24 2008, 00:33:13)
[GCC 4.1.2 20070502 (Red Hat 4.1.2-12)] on linux2


Type "help", "copyright", "credits" or "license" for more information.

>>> import pickle
>>> f = open('pickled_dict', 'r')
>>> d = pickle.load(f)
>>> d
{None: 42}


I have successfully pickled a dict using None as a key on one machine
using Python 2.4, and unpickled it on a completely different machine
running Python 2.6.

Still think that pickling None is a problem?

--
Steven

Carl Banks

unread,
Aug 12, 2009, 2:41:51 PM8/12/09
to

FYI: If you need the actual hash value to be consistent across
versions of Python the built hash function won't suffice. The
language doesn't guanrantee they will persist across versions. (IIRC,
there was a change to the hash value of longs at one point related to
int/long unification issues.)

Now, when I saw your explanation I assumed that your persistence
mechanism merely doesn't preserve identity (unlike, say, simple
pickling, which does), meaning that objects that were once identical
might be reconstituted as non-identical (or vice versa), which would
be an issue if these objects are stored in dicts or sets. Equality
must be preserved for dict keys and set members to continue to work
property. However, the actual hash code doesn't need to be preserved.

As was mentioned already, None is guaranteed by the language to be
equal to itself, so equality is preserved and there should be no issue
with it, even if the hash code changes across invocations.

Now, if you are doing something weird with the hash value itself--
which I highly discourage--then all bets are off.


Carl Banks

0 new messages