Sets in Python

sapsi

unread,

Sep 18, 2007, 8:39:47 PM9/18/07

to

Hello,
I recently tried using the set function in Python and was surprised to
find that

a=[ 1, 2,3, [1,2] ]

doesn't work with 'set', throwing TyperError (unhashable exception). I
found out that this is because lists can't be hashed.

So,this implies 'a' cannot be a set in python which i think is quite
unfortunate, after all 'a' does look like a mathematical set.

My question is,
1) Why can't lists be hashed?
and
2) This is not related, but is there i neat way (without pop and list
comprehension) to convert a set into a list? I say neat because i'm
guessing using list comprehension might turn out be slow and there
might be other methods which are faster.

Thank you for your time
SM

Evil Bert

unread,

Sep 18, 2007, 8:48:27 PM9/18/07

to

sapsi wrote:
> 2) This is not related, but is there i neat way (without pop and list
> comprehension) to convert a set into a list? I say neat because i'm
> guessing using list comprehension might turn out be slow and there
> might be other methods which are faster.

a = set([1, 2, 3, 4])
b = list(a)

Raymond Hettinger

unread,

Sep 18, 2007, 8:59:02 PM9/18/07

to

On Sep 18, 5:39 pm, sapsi <saptarshi.g...@gmail.com> wrote:
> I recently tried using the set function in Python and was surprised to
> find that
>
> a=[ 1, 2,3, [1,2] ]
>
> doesn't work with 'set', throwing TyperError (unhashable exception). I
> found out that this is because lists can't be hashed.
> So,this implies 'a' cannot be a set in python which i think is quite
> unfortunate, after all 'a' does look like a mathematical set.

This is written as:

a = set([1, 2, 3, frozenset([1, 2])])

> This is not related, but is there i neat way (without pop and list
> comprehension) to convert a set into a list?

list(a)

Raymond

Asun Friere

unread,

Sep 18, 2007, 9:11:21 PM9/18/07

to

On Sep 19, 10:39 am, sapsi <saptarshi.g...@gmail.com> wrote:

> My question is,
> 1) Why can't lists be hashed?

They are mutable.

Dustan

unread,

Sep 18, 2007, 9:49:09 PM9/18/07

to

On Sep 18, 7:39 pm, sapsi <saptarshi.g...@gmail.com> wrote:
> Hello,
> I recently tried using the set function in Python and was surprised to
> find that
>
> a=[ 1, 2,3, [1,2] ]
>
> doesn't work with 'set', throwing TyperError (unhashable exception). I
> found out that this is because lists can't be hashed.
>
> So,this implies 'a' cannot be a set in python which i think is quite
> unfortunate, after all 'a' does look like a mathematical set.

It is not the variable *a* itself that's a problem when constructing a
set (ie. set(a)); it is the content. set() goes through each of the
items and adds that item to the set. 1, 2, and 3 are valid because
they can be hashed. The next item in the list, however, is [1,2], and
cannot be hashed because it is a mutable list.

The solution is as Raymond Hettinger said:

a = set([1, 2, 3, frozenset([1, 2])])

> My question is,

> 1) Why can't lists be hashed?

They're mutable.

> and
> 2) This is not related, but is there i neat way (without pop and list
> comprehension) to convert a set into a list? I say neat because i'm
> guessing using list comprehension might turn out be slow and there
> might be other methods which are faster.

list(a_set)

> Thank you for your time

You're welcome.

Paddy

unread,

Sep 19, 2007, 3:00:00 AM9/19/07

to

frozenset over turning the embedded list into a tuple?
The tuple would preserve order in the item (1,2)
a = set([1,2,3, (1,2)])

- Paddy.

Francesco Guerrieri

unread,

Sep 19, 2007, 3:35:20 AM9/19/07

to Paddy, pytho...@python.org

On 9/19/07, Paddy <padd...@googlemail.com> wrote:

> frozenset over turning the embedded list into a tuple?
> The tuple would preserve order in the item (1,2)
> a = set([1,2,3, (1,2)])

The OP was probably thinking in mathematical terms as in "the set of
all the possible subsets of the set composed by 1, 2 and 3" and thus
order would not be important.

francesco

Sion Arrowsmith

unread,

Sep 19, 2007, 9:16:48 AM9/19/07

to

sapsi <saptars...@gmail.com> wrote:
> Why can't lists be hashed?

Several people have answered "because they're mutable" without
explaining why mutability precludes hashing. So:

Consider a dict (dicts have been in Python a *lot* longer than
sets, and have the same restriction) which allowed lists as
keys:

d = {}
k = [1, 2]
d[k] = None

Now, if I were to do:

k.append(3)

what would you expect:

d.keys()

to return? Did d magically rehash k when it was modified? Did d[k]
take a copy of k, and if so, how deep was the copy (consider
d[[1, k]] = None followed by a modification to k)? Leaving the hash
unchanged and relying on collision detection to resolve won't work,
since you may go directly for d[[1, 2, 3]] and not spot that
there's already an entry for it since it's been hashed under [1, 2].

"Practicality beats purity" and the design decision was to simply
sidestep these issues by disallowing mutable dict keys. And as the
set implementation is based on the dict implementation, it applies
to sets to.

--
\S -- si...@chiark.greenend.org.uk -- http://www.chaos.org.uk/~sion/
"Frankly I have no feelings towards penguins one way or the other"
-- Arthur C. Clarke
her nu becomeþ se bera eadward ofdun hlæddre heafdes bæce bump bump bump

Karthik Gurusamy

unread,

Sep 19, 2007, 4:58:03 PM9/19/07

to

On Sep 19, 6:16 am, Sion Arrowsmith <si...@chiark.greenend.org.uk>
wrote:

> sapsi <saptarshi.g...@gmail.com> wrote:
> > Why can't lists be hashed?
>
> Several people have answered "because they're mutable" without
> explaining why mutability precludes hashing. So:
>
> Consider a dict (dicts have been in Python a *lot* longer than
> sets, and have the same restriction) which allowed lists as
> keys:
>
> d = {}
> k = [1, 2]
> d[k] = None
>
> Now, if I were to do:
>
> k.append(3)
>
> what would you expect:
>
> d.keys()
>
> to return? Did d magically rehash k when it was modified? Did d[k]
> take a copy of k, and if so, how deep was the copy (consider
> d[[1, k]] = None followed by a modification to k)? Leaving the hash
> unchanged and relying on collision detection to resolve won't work,
> since you may go directly for d[[1, 2, 3]] and not spot that
> there's already an entry for it since it's been hashed under [1, 2].
>
> "Practicality beats purity" and the design decision was to simply
> sidestep these issues by disallowing mutable dict keys. And as the
> set implementation is based on the dict implementation, it applies
> to sets to.

While it's easy to explain the behavior, I think the decision to dis-
allow mutable items as keys is a bit arbitrary. There is no need for
dict to recompute hash (first of all, a user doesn't even need to know
if underneath 'hashing' is used -- the service is just a mapping
between one item to another item).

Since we know hashing is used, all that is needed is, a well-defined
way to construct a hash out of a mutable. "Given a sequence, how to
get a hash" is the problem. If later the given sequence is different,
that's not the dict's problem.

>>> d = {}
a = 10
>>> d[a] = 'foo'
>>> d[5+5] = 'bar'
>>> d[10]
'bar'

aren't the '5+5' which is 10, is different from the previous line's
a?.. so
why not allow similar behavior with lists/other sequence/even other
collections. As long as two objects compare equal the hash-result must
be the same. I guess this takes us to defining the equality operation
for lists-- which I think has a very obvious definition (ie same
length and the ith element of each list compare equal).

So if the list changes, it will result in a different hash and we will
get a hash-miss. I doubt this is in anyway less intuitive than dis-
allowing mutable items as keys.

Karthik

>
> --
> \S -- si...@chiark.greenend.org.uk --http://www.chaos.org.uk/~sion/

Paddy

unread,

Sep 19, 2007, 6:06:48 PM9/19/07

to

On Sep 19, 9:58 pm, Karthik Gurusamy <kar1...@gmail.com> wrote:
>
> Since we know hashing is used, all that is needed is, a well-defined
> way to construct a hash out of a mutable. "Given a sequence, how to
> get a hash" is the problem. If later the given sequence is different,
> that's not the dict's problem.
>

Oh it is possible to construct a hash from a mutable. What is
difficult is creating the same hash when the mutable mutates. Or
indeed working out what it means when a hash key mutates and you
access the dictionary.
Ignoring this gives the programmer a big problem hence the limitation.

I don't think you have a better solution.

- Paddy.

Karthik Gurusamy

unread,

Sep 19, 2007, 7:26:10 PM9/19/07

to

On Sep 19, 3:06 pm, Paddy <paddy3...@googlemail.com> wrote:
> On Sep 19, 9:58 pm, Karthik Gurusamy <kar1...@gmail.com> wrote:
>
> > Since we know hashing is used, all that is needed is, a well-defined
> > way to construct a hash out of a mutable. "Given a sequence, how to
> > get a hash" is the problem. If later the given sequence is different,
> > that's not the dict's problem.
>
> Oh it is possible to construct a hash from a mutable. What is
> difficult is creating the same hash when the mutable mutates.

Why? There is no reason that the dict should maintain the same hash,
after all the user is calling with a different sequence as key (after
the mutation).

There seems to be an underlying assumption that the dictionary key-
>value mapping should somehow maintain the mapping even when the key
changes behind its back.

The contract could very well be, hey if you give me a different
sequence later (by mutating the one you added), don't expect me to
find it in the dictionary.

>Or
> indeed working out what it means when a hash key mutates and you
> access the dictionary.
> Ignoring this gives the programmer a big problem hence the limitation.
>
> I don't think you have a better solution.

But why would a programmer expect to find the match, when his/her code
has changed the sequence (or has somehow let the hash key mutate) from
the time of dictionary addition.

If I did, a = [10, 20] and I did d[a]= 'foo', then a.append(30).
If dict complains key error on d[a] now, I won't be surprised. If I do
d[[10, 20, 30]], I will be surprised if it doesn't find the item. Of
course, in today's behavior the above is syntax error.

Karthik

>
> - Paddy.

Mark Dickinson

unread,

Sep 19, 2007, 8:25:22 PM9/19/07

to

On Sep 19, 7:26 pm, Karthik Gurusamy <kar1...@gmail.com> wrote:
> If I did, a = [10, 20] and I did d[a]= 'foo', then a.append(30).
> If dict complains key error on d[a] now, I won't be surprised. If I do
> d[[10, 20, 30]], I will be surprised if it doesn't find the item. Of
> course, in today's behavior the above is syntax error.

It sounds as though you're proposing something like the following:

>>> k = mylist([1, 2])
>>> d = {k : 'test'}
>>> d[k]
'test'
>>> k.append(3)
>>> d[k]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

So far, so good. But how do you explain the following to a confused
newcomer?

>>> d.keys()
[[1, 2, 3]]
>>> k in d.keys()
True
>>> k in d
False
>>> d[k]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: [1, 2, 3]

In other words, to repeat Sion Arrowsmith's question, what would you
expect d.keys() to return after a key of d has been modified?

Mark

Steven D'Aprano

unread,

Sep 19, 2007, 10:17:09 PM9/19/07

to

On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote:

> While it's easy to explain the behavior, I think the decision to dis-
> allow mutable items as keys is a bit arbitrary. There is no need for
> dict to recompute hash

What???

Of course it does. How else can it look up the key? Because it (somehow)
just recognizes that it has seen the key before? How? By magic?

> (first of all, a user doesn't even need to know
> if underneath 'hashing' is used -- the service is just a mapping between
> one item to another item).

The user doesn't need to know the mechanism, but the dict does. Dicts are
implemented as hash tables. I suppose they could be implemented as
something else (what? linear lists? some sort of tree?) but the same
considerations must be made: the dict must be able to find keys it has
seen before. How is the dict supposed to recognise the key if the key has
changed?

> Since we know hashing is used, all that is needed is, a well-defined way
> to construct a hash out of a mutable. "Given a sequence, how to get a
> hash" is the problem.

Nonsense. That's not the problem. The problem is how to get exactly the
same hash when the sequence has changed.

In other words, if you have two mutable objects M1 and M2, then you
expect:

hash(M1) == hash(M2) if and only if M1 and M2 are equal
hash(M1) != hash(M2) if M1 and M2 are unequal

but also:

if M1 mutates to become equal to M2, hash(M1) must remain the same while
still being different from hash(M2).

That means that hash() now is a non-deterministic function. hash([1,2,3])
will vary according to how the list [1,2,3] was constructed!

Obviously something has to give, because not all of these things are
mutually compatible.

> If later the given sequence is different, that's
> not the dict's problem.

Data structures don't have problems. Programmers do. And language
designers with sense build languages that minimize the programmers
problems, not maximize them.

> So if the list changes, it will result in a different hash and we will
> get a hash-miss. I doubt this is in anyway less intuitive than dis-
> allowing mutable items as keys.

The choices for the language designer are:

(1) Invent some sort of magical non-deterministic hash function which
always does the Right Thing.

(2) Allow the hash of mutable objects to change, which means you can use
mutable objects as keys in dicts but if you change them, you can no
longer find them in the dict. They'll still be there, using up memory,
but you can't get to them.

(3) Simply disallow mutable objects as keys.

Alternative 1 is impossible, as we've seen, because the requirements for
the Right Thing are not mutually compatible.

Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you
store objects in a dict only for them to mysteriously disappear later.
Worse, it could lead to bugs like the following hypothetical:

>>> M = [1, 2, 3]
>>> D = {M: 'parrot'} # pretend this works
>>> D
{[1, 2, 3]: 'parrot'}
>>> M.append(4)
>>> D
{[1, 2, 3, 4]: 'parrot'}
>>> D[[1, 2, 3, 4]]

Traceback (most recent call last):
File "<stdin>", line 1, in <module>

KeyError: [1, 2, 3, 4]

Try explaining that one to programmers: they can SEE the key in the dict
when they print it, but they can't get it or delete it because the hash
has changed.

Alternative 3 is easy to deal with: simply don't use mutable objects as
keys. That's what Python does. Sure, the programmer sometimes needs to
work around the lack (convert the list into a tuple, or a string, or
pickle it, whatever...) which on rare occasions is hard to do, but at
least there are no mysterious, hard to track down bugs.

--
Steven.

prik...@gmail.com

unread,

Sep 19, 2007, 11:46:08 PM9/19/07

to

In the new model, it should be the value at the time of addition.
That is [1,2] (not [1,2,3]). This does mean a copy of key in
maintained internally in the dict. I think today that's not needed
(just a reference to the key's object is sufficient).

Again, this may not be the most elegant solution; neither seems to be
the current requirement that keys must be immutable. Fundamentally
dict is a mapping; underneath it could even use other elaborate
algorithms (say for integer keys, an avl tree) -- there is no reason
to expose the the quirks of hashing to the programmer.

Karthik
>
> Mark

Bryan Olson

unread,

Sep 19, 2007, 11:58:05 PM9/19/07

to

Karthik Gurusamy wrote:
> While it's easy to explain the behavior, I think the decision to dis-
> allow mutable items as keys is a bit arbitrary.

Furthermore, it's not really true.

class Blurf (object):
def __init__(self, intval):
self.seti(intval)
def seti(self, intval):
self.i = intval

Blurf sure looks mutable, yet Python will happily use Blurfs as
dict keys. The trick is that the dict key is the Blurf's identity,
not its state. We could switch to using state as the key by
implementing methods __hash__ and either __cmp__ or __eq__.

[...]

> why not allow similar behavior with lists/other sequence/even other
> collections. As long as two objects compare equal the hash-result must
> be the same.

That's a good rule to follow in programming one's own classes.
Keep __hash__, __cmp__ and __eq__ consistent.

> I guess this takes us to defining the equality operation
> for lists-- which I think has a very obvious definition (ie same
> length and the ith element of each list compare equal).

Already done. The "==" operator compares lists by their contents;
the "is" operator, by identity.

> So if the list changes, it will result in a different hash and we will
> get a hash-miss. I doubt this is in anyway less intuitive than dis-
> allowing mutable items as keys.

If it's any consolation, you can build your own:

class HashableList (list):
def __hash__(self):
return tuple(self).__hash__()

You'd probably also want to implement slicing.

Bad news: Python 3000 has no immutable type for byte-strings.
The new bytes type cannot serve for dict keys or set members.
Many things one would want to hash are unhashable -- for
example, the results of the hash functions in hashlib.

--
--Bryan

Karthik Gurusamy

unread,

Sep 20, 2007, 12:02:03 AM9/20/07

to

On Sep 19, 7:17 pm, Steven D'Aprano <st...@REMOVE-THIS-

cybersource.com.au> wrote:
> On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote:
> > While it's easy to explain the behavior, I think the decision to dis-
> > allow mutable items as keys is a bit arbitrary. There is no need for
> > dict to recompute hash
>
> What???
>
> Of course it does. How else can it look up the key? Because it (somehow)
> just recognizes that it has seen the key before? How? By magic?

You answered it yourself later. For a mapping service, hash is just
one way to do things. What you need is for each item in the
collection, a unique key.
How you go from the key to the value is not something a programmer
needs to know.
Your mind is set on thinking on hash alone and hence you don't see
beyond it.

>
> > (first of all, a user doesn't even need to know
> > if underneath 'hashing' is used -- the service is just a mapping between
> > one item to another item).
>
> The user doesn't need to know the mechanism, but the dict does. Dicts are
> implemented as hash tables. I suppose they could be implemented as
> something else (what? linear lists? some sort of tree?) but the same
> considerations must be made:

Oh yes. If the keys are all integers (or any set of items that can be
ordered), why not an avl. It has guaranteed O(log N) while a hash in
worst case is O(N). Why you want to tie yourself to the drawbacks of
one datastructure? Understand your goal is not to provide a hash; but
to provide a mapping service.

the dict must be able to find keys it has
> seen before. How is the dict supposed to recognise the key if the key has
> changed?
>
> > Since we know hashing is used, all that is needed is, a well-defined way
> > to construct a hash out of a mutable. "Given a sequence, how to get a
> > hash" is the problem.
>
> Nonsense. That's not the problem. The problem is how to get exactly the
> same hash when the sequence has changed.

Yes, if you keep thinking hash is the only tool you got.

>
> In other words, if you have two mutable objects M1 and M2, then you
> expect:
>

No. I don't expect. I expect the hash to be different. Why do you keep
thinking it's the mappings responsibility to take care of a changing
key.

> hash(M1) == hash(M2) if and only if M1 and M2 are equal
> hash(M1) != hash(M2) if M1 and M2 are unequal
>
> but also:
>
> if M1 mutates to become equal to M2, hash(M1) must remain the same while
> still being different from hash(M2).
>
> That means that hash() now is a non-deterministic function. hash([1,2,3])
> will vary according to how the list [1,2,3] was constructed!
>
> Obviously something has to give, because not all of these things are
> mutually compatible.
>
> > If later the given sequence is different, that's
> > not the dict's problem.
>
> Data structures don't have problems. Programmers do. And language
> designers with sense build languages that minimize the programmers
> problems, not maximize them.

Yes, here you talk about a different goal altogether. Here comes the
'arbitrary' part I mentioned.

>
> > So if the list changes, it will result in a different hash and we will
> > get a hash-miss. I doubt this is in anyway less intuitive than dis-
> > allowing mutable items as keys.
>
> The choices for the language designer are:
>
> (1) Invent some sort of magical non-deterministic hash function which
> always does the Right Thing.

Nope, just say if the new sequence is different, you don't find the
item in the dict.

>
> (2) Allow the hash of mutable objects to change, which means you can use
> mutable objects as keys in dicts but if you change them, you can no
> longer find them in the dict. They'll still be there, using up memory,
> but you can't get to them.

In the new model, at the time of addition, you need to remember the
key at that time. If it's a list, you make a copy of the items.

>
> (3) Simply disallow mutable objects as keys.
>
> Alternative 1 is impossible, as we've seen, because the requirements for
> the Right Thing are not mutually compatible.
>
> Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you
> store objects in a dict only for them to mysteriously disappear later.
> Worse, it could lead to bugs like the following hypothetical:

Of course they can be reached with.. for k in dict...

>
> >>> M = [1, 2, 3]
> >>> D = {M: 'parrot'} # pretend this works
> >>> D
>
> {[1, 2, 3]: 'parrot'}>>> M.append(4)
> >>> D
>
> {[1, 2, 3, 4]: 'parrot'}>>> D[[1, 2, 3, 4]]

No, in the new way, the key still remains [1, 2, 3]
What was changed is M. Not the key given to dict at the time of
addition.
Again I'm not describing today's behavior; it's in the new way.

>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> KeyError: [1, 2, 3, 4]
>
> Try explaining that one to programmers: they can SEE the key in the dict
> when they print it, but they can't get it or delete it because the hash
> has changed.

No they don't. They see the key at the time of addition ([1,2,3])

>
> Alternative 3 is easy to deal with: simply don't use mutable objects as
> keys. That's what Python does. Sure, the programmer sometimes needs to
> work around the lack (convert the list into a tuple, or a string, or
> pickle it, whatever...) which on rare occasions is hard to do, but at
> least there are no mysterious, hard to track down bugs.

When I first saw key's must'be be mutable, it did appear to me to be
mysterious. There was unnecessary tighter coupling between
implementation details and the service exposed to the programmer. (As
I see it, the root cause of all this is, the dict does not make a copy
of the key at the time of item addition, it just makes a new reference
to the same object)

Karthik

>
> --
> Steven.

Paddy

unread,

Sep 20, 2007, 2:28:35 AM9/20/07

to

On Sep 20, 5:02 am, Karthik Gurusamy <kar1...@gmail.com> wrote:
> In the new model, at the time of addition, you need to remember the
> key at that time. If it's a list, you make a copy of the items.

In other words you ask the dict to freeze any mutable keys given to
it.
Try an implementation and you will find it is impractical. Checking
for mutability then doing deep copies of keys and would consume time
in something that greatly affects the speed of Python as a whole.
Pythons designers give *you* the option of doing this and leave the
underlying dict speedy. I can live with that.

- Paddy.

thebjorn

unread,

Sep 20, 2007, 2:45:45 AM9/20/07

to

On Sep 20, 4:17 am, Steven D'Aprano <st...@REMOVE-THIS-
cybersource.com.au> wrote:
[...]

> Data structures don't have problems. Programmers do.

That's QOTW material :-)

> ... And language

> designers with sense build languages that minimize the programmers
> problems, not maximize them.
>

...

>
> The choices for the language designer are:
>
> (1) Invent some sort of magical non-deterministic hash function which
> always does the Right Thing.
>
> (2) Allow the hash of mutable objects to change, which means you can use
> mutable objects as keys in dicts but if you change them, you can no
> longer find them in the dict. They'll still be there, using up memory,
> but you can't get to them.
>
> (3) Simply disallow mutable objects as keys.

(4) Allow mutable objects as keys, but have undefined (implementation
defined) behavior if keys are mutated.

This would seem a natural extension of the "we're all adults" paradigm
(if I name a variable _foo you should treat it as private).
Unfortunately there is no visual cue in this case, and tracking down
where you relied on undefined behavior is notoriously time-consuming
(just ask your nearest C++ multiplatform programmer).

> Alternative 3 is easy to deal with: simply don't use mutable objects as
> keys. That's what Python does. Sure, the programmer sometimes needs to
> work around the lack (convert the list into a tuple, or a string, or
> pickle it, whatever...) which on rare occasions is hard to do, but at
> least there are no mysterious, hard to track down bugs.

Amen.

-- bjorn

Marc 'BlackJack' Rintsch

unread,

Sep 20, 2007, 3:20:27 AM9/20/07

to

A copy!? That has to be a deep copy. Which would make `dict`\s alot
slower and use more memory. Plus you can't store objects that can't be
copied anymore. That doesn't sound like a good trade off to me.

Ciao,
Marc 'BlackJack' Rintsch

thebjorn

unread,

Sep 20, 2007, 3:50:20 AM9/20/07

to

On Sep 20, 6:02 am, Karthik Gurusamy <kar1...@gmail.com> wrote:
> On Sep 19, 7:17 pm, Steven D'Aprano <st...@REMOVE-THIS-
>

[...]

> > (2) Allow the hash of mutable objects to change, which means you can use
> > mutable objects as keys in dicts but if you change them, you can no
> > longer find them in the dict. They'll still be there, using up memory,
> > but you can't get to them.
>
> In the new model, at the time of addition, you need to remember the
> key at that time. If it's a list, you make a copy of the items.

Eek! Barf! Gag me with a spoon! etc. etc. :-)

And you mean a deep-copy, not just a copy, right?

Or perhaps you were thinking of something like this (mdict ::= mutable
dict):

class mdict(dict):
def __setitem__(self, k, val):
super(mdict,self).__setitem__(`k`, val)
def __getitem__(self, k):
return super(mdict,self).__getitem__(`k`)
def __contains__(self, k):
return super(mdict,self).__contains__(`k`)
def keys(self):
return list(eval(k) for k in super(mdict,self).keys())
def __iter__(self):
for k in super(mdict,self).__iter__():
yield eval(k)
def items(self):
return list((eval(k),v) for k,v in super(mdict,self).items())
def __repr__(self):
items = ', '.join('%s: %s' % (k,repr(v)) for k,v in
self.items())
return '{' + items + '}'

I think it does what you want..?:

>>> m = mdict()
>>> a, b = [], [1,2]
>>> print m
{}
>>> m[a] = a
>>> m[b] = b
>>> m
{[1, 2]: [1, 2], []: []}
>>> m.keys()
[[1, 2], []]
>>> for k in m:
... m[k].append('foo')
...
>>> m
{[1, 2]: [1, 2, 'foo'], []: ['foo']}
>>> m.items()
[([1, 2], [1, 2, 'foo']), ([], ['foo'])]
>>> m.values()
[[1, 2, 'foo'], ['foo']]
>>> a in m
False
>>> a
['foo']
>>> b
[1, 2, 'foo']
>>> [] in m
True
>>> [1,2] in m
True
>>> m[{'key':['val']}] = 'this works too'

It'll work for all keys, k, where eval(`k`) == k, and repr(a) ==
repr(b) when a == b (I'm pretty sure the latter isn't always true for
dicts, although I haven't looked at the implementation.)

-- bjorn

thebjorn

unread,

Sep 20, 2007, 4:01:56 AM9/20/07

to

On Sep 20, 9:50 am, thebjorn <BjornSteinarFjeldPetter...@gmail.com>
wrote:

it's bad form to reply to myself, I know, but

> def __iter__(self):
> for k in super(mdict,self).__iter__():
> yield eval(k)

should probably be

def __iter__(self):
return (eval(k) for k in super(mdict,self).__iter__())

I'm still getting used to the power of generator expressions :-)

-- bjorn

Steve Holden

unread,

Sep 20, 2007, 5:59:06 AM9/20/07

to pytho...@python.org

Karthik Gurusamy wrote:
> On Sep 19, 7:17 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote:
>>> While it's easy to explain the behavior, I think the decision to dis-
>>> allow mutable items as keys is a bit arbitrary. There is no need for
>>> dict to recompute hash
>> What???
>>
>> Of course it does. How else can it look up the key? Because it (somehow)
>> just recognizes that it has seen the key before? How? By magic?
>
> You answered it yourself later. For a mapping service, hash is just
> one way to do things. What you need is for each item in the
> collection, a unique key.
> How you go from the key to the value is not something a programmer
> needs to know.
> Your mind is set on thinking on hash alone and hence you don't see
> beyond it.
>

There's a reason for that: the developers (and particularly Tim Peters)
have, to use a phrase Tim is fond of "optimized the snot" out of dict,
and the mechanism is fundamental to much of Python's internals. So don't
expect to be able to tamper wiht it without adversely affectinf performance.

>>> (first of all, a user doesn't even need to know
>>> if underneath 'hashing' is used -- the service is just a mapping between
>>> one item to another item).
>> The user doesn't need to know the mechanism, but the dict does. Dicts are
>> implemented as hash tables. I suppose they could be implemented as
>> something else (what? linear lists? some sort of tree?) but the same
>> considerations must be made:
>
> Oh yes. If the keys are all integers (or any set of items that can be
> ordered), why not an avl. It has guaranteed O(log N) while a hash in
> worst case is O(N). Why you want to tie yourself to the drawbacks of
> one datastructure? Understand your goal is not to provide a hash; but
> to provide a mapping service.
>

Possibly so, but the goals also include "excellent speed" and "as wide a
set of keys as is (practically) possible".

How would you suggest Python avoids "[tying itself] to the drawbacks of
one data structure"? Implement different strategies according to the key
values used? That'll surely speed things up, not.

Python provides you with plenty of tools to implement optimized data
structures for your own applications. Arguing for making them language
primitives merely reveals your design inexperience.

>
> the dict must be able to find keys it has
>> seen before. How is the dict supposed to recognise the key if the key has
>> changed?
>>
>>> Since we know hashing is used, all that is needed is, a well-defined way
>>> to construct a hash out of a mutable. "Given a sequence, how to get a
>>> hash" is the problem.
>> Nonsense. That's not the problem. The problem is how to get exactly the
>> same hash when the sequence has changed.
>
> Yes, if you keep thinking hash is the only tool you got.
>
>> In other words, if you have two mutable objects M1 and M2, then you
>> expect:
>>
>
> No. I don't expect. I expect the hash to be different. Why do you keep
> thinking it's the mappings responsibility to take care of a changing
> key.
>

Because mappings must do precisely that. Do you actually know what a
mapping *is*?

Right. Simple. And completely impractical.

>> (3) Simply disallow mutable objects as keys.
>>
>> Alternative 1 is impossible, as we've seen, because the requirements for
>> the Right Thing are not mutually compatible.
>>
>> Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you
>> store objects in a dict only for them to mysteriously disappear later.
>> Worse, it could lead to bugs like the following hypothetical:
>
> Of course they can be reached with.. for k in dict...

But that hardly provides the required mapping features. What on earth
are you thinking?.

>>>>> M = [1, 2, 3]
>>>>> D = {M: 'parrot'} # pretend this works
>>>>> D
>> {[1, 2, 3]: 'parrot'}>>> M.append(4)
>>>>> D
>> {[1, 2, 3, 4]: 'parrot'}>>> D[[1, 2, 3, 4]]
>
> No, in the new way, the key still remains [1, 2, 3]
> What was changed is M. Not the key given to dict at the time of
> addition.
> Again I'm not describing today's behavior; it's in the new way.
>

I doubt this new way, whatever it is, is going to have many disciples.

>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> KeyError: [1, 2, 3, 4]
>>
>> Try explaining that one to programmers: they can SEE the key in the dict
>> when they print it, but they can't get it or delete it because the hash
>> has changed.
>
> No they don't. They see the key at the time of addition ([1,2,3])
>> Alternative 3 is easy to deal with: simply don't use mutable objects as
>> keys. That's what Python does. Sure, the programmer sometimes needs to
>> work around the lack (convert the list into a tuple, or a string, or
>> pickle it, whatever...) which on rare occasions is hard to do, but at
>> least there are no mysterious, hard to track down bugs.
>
> When I first saw key's must'be be mutable, it did appear to me to be
> mysterious. There was unnecessary tighter coupling between
> implementation details and the service exposed to the programmer. (As
> I see it, the root cause of all this is, the dict does not make a copy
> of the key at the time of item addition, it just makes a new reference
> to the same object)
>

Light dawns. Dicts are optimized for PERFORMANCE! And for very good reasons.

regards
Steve
--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC/Ltd http://www.holdenweb.com
Skype: holdenweb http://del.icio.us/steve.holden

Sorry, the dog ate my .sigline

Dustan

unread,

Sep 20, 2007, 7:03:12 AM9/20/07

to

On Sep 19, 10:58 pm, Bryan Olson <fakeaddr...@nowhere.org> wrote:
> Bad news: Python 3000 has no immutable type for byte-strings.
> The new bytes type cannot serve for dict keys or set members.
> Many things one would want to hash are unhashable -- for
> example, the results of the hash functions in hashlib.

Are you serious????

Steven D'Aprano

unread,

Sep 20, 2007, 7:46:29 AM9/20/07

to

On Thu, 20 Sep 2007 04:02:03 +0000, Karthik Gurusamy wrote:

> On Sep 19, 7:17 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote:
>> > While it's easy to explain the behavior, I think the decision to dis-
>> > allow mutable items as keys is a bit arbitrary. There is no need for
>> > dict to recompute hash
>>
>> What???
>>
>> Of course it does. How else can it look up the key? Because it
>> (somehow) just recognizes that it has seen the key before? How? By
>> magic?
>
> You answered it yourself later. For a mapping service, hash is just one
> way to do things. What you need is for each item in the collection, a
> unique key.

And does the mapping get access to that unique key (the key's key)? It
can't keep a mapping of object:key, because if it could do that, it
wouldn't need the key, it could just keep object:payload.

Since the mapping can't know the key's key, it has to ask the key, and
that's what the __hash__ method does.

> How you go from the key to the value is not something a programmer needs
> to know.

You are correct. All you need to know is that in Python, you can't use
lists and sets as keys directly.

You only need to know about the details of the way dicts look up keys if
you are writing your own class, and you aren't happy with Python's
default hash for instance objects. It isn't a common task.

> Your mind is set on thinking on hash alone and hence you don't see
> beyond it.

No, these issues exist regardless of the implementation of the mapping.
Whether you use a hash table or a binary tree of some sort, or a linear
linked list, or a database, or folders in a filing cabinet.

The behaviour of mutable objects in a mapping is always problematic,
regardless of the mapping implementation.

>> > (first of all, a user doesn't even need to know if underneath
>> > 'hashing' is used -- the service is just a mapping between one item
>> > to another item).
>>
>> The user doesn't need to know the mechanism, but the dict does. Dicts
>> are implemented as hash tables. I suppose they could be implemented as
>> something else (what? linear lists? some sort of tree?) but the same
>> considerations must be made:
>
> Oh yes. If the keys are all integers (or any set of items that can be
> ordered), why not an avl. It has guaranteed O(log N) while a hash in
> worst case is O(N).

But both the best and average cases are O(1), which beats AVL trees by a
lot. Since dicts are used for Python's internals, they are HIGHLY
optimized and VERY fast. Their O(1) will beat the O(log N) of AVL trees
easily.

Hash tables can also use keys even if they can't be ordered: {1+2j: None}

> Why you want to tie yourself to the drawbacks of one
> datastructure? Understand your goal is not to provide a hash; but to
> provide a mapping service.

No, the goal of a good language designer is to provide the fastest, most
lightweight, simplest, most flexible mapping as a built-in. Any old slow,
heavyweight, complicated, inflexible mapping will not do the job. That
goal is best provided by a hash table.

If people want additional mappings as well, they can be added as
libraries. If those libraries are very useful, they can be added to the
standard library. If they are HUGELY useful, they will become built-ins.

AVL trees are not as flexible or fast as hash tables, and even if they
were, you would *still* need to resolve the difficulties that occur if
the keys mutate.

>> the dict must be able to find keys it has
>> seen before. How is the dict supposed to recognise the key if the key
>> has changed?
>>
>> > Since we know hashing is used, all that is needed is, a well-defined
>> > way to construct a hash out of a mutable. "Given a sequence, how to
>> > get a hash" is the problem.
>>
>> Nonsense. That's not the problem. The problem is how to get exactly the
>> same hash when the sequence has changed.
>
> Yes, if you keep thinking hash is the only tool you got.

Fine, let me re-word it in terms of an AVL.

Suppose you have two lists in an AVL:

L1 = [1, 2, 3]
L2 = [4, 5, 6]
M = avl((L1, True), (L2, False))

The tree M (for mapping) has L1 at the top of the tree, and L2 as the
right node.

But now, EVERY time ANY mutable object changes, Python has to check
whether it is a key in EVERY avl, and if so, re-built the tree. Otherwise
the tree can become corrupt because the AVL invariants are no longer true.

(Consider what happens if we say L1[0] = 999. Now L1 > L2. If you don't
reorder the avl, M[L2] cannot be reached except by an exhaustive search
of every node. That means it is no longer an AVL tree, just an inordered
tree. Might as well save memory and use a linear linked list.)

Needless to say, this will slow down Python just a tad.

Look, you can go back to the archives of 1993 when this was first
discussed. Nothing has changed since then. Mutable keys are still
problematic, regardless of the implementation, and the simplest solution
is to simply prohibit them.

http://www.python.org/search/hypermail/python-1993/0044.html

If you want to use a mutable object as a key, by object identity rather
than value, then the easy way is to wrap it in an instance:

class Wrapper: # don't even need new-style classes
pass # or even an __init__

key = Wrapper()
key.payload = [1, 2, 3]
D = {key: "value"}

But of course you can't look up the dict by value, only by identity. But
that's what you wanted.

Another way is to use this class:

class HashableList(list):
def __hash__(self):
return hash(tuple(self))

Have fun, and remember us when you're debugging your code, because you'll
be doing a lot of it.

--
Steven.

Piet van Oostrum

unread,

Sep 20, 2007, 9:14:38 AM9/20/07

to

>>>>> Steven D'Aprano <st...@REMOVE-THIS-cybersource.com.au> (SD) wrote:

>SD> In other words, if you have two mutable objects M1 and M2, then you
>SD> expect:

>SD> hash(M1) == hash(M2) if and only if M1 and M2 are equal
>SD> hash(M1) != hash(M2) if M1 and M2 are unequal

Huh? Unequal things may hash to the same value. That one of the essential
properties of a hash function.
--
Piet van Oostrum <pi...@cs.uu.nl>
URL: http://www.cs.uu.nl/~piet [PGP 8DAE142BE17999C4]
Private email: pi...@vanoostrum.org

Neil Cerutti

unread,

Sep 20, 2007, 1:03:02 PM9/20/07

to

On 2007-09-20, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote:
>> In the new model, it should be the value at the time of
>> addition. That is [1,2] (not [1,2,3]). This does mean a copy
>> of key in maintained internally in the dict.
>
> A copy!? That has to be a deep copy. Which would make
> `dict`\s alot slower and use more memory. Plus you can't store
> objects that can't be copied anymore. That doesn't sound like
> a good trade off to me.

Python's dict implementation is so deeply unorthoganal (is that a
word?) to Python's basic assignment semantics and clever type
hierarchy that it's hard to even sensibly promote anything other
than the current implementation without completely redesigning
Python.

--
Neil Cerutti

OKB (not okblacke)

unread,

Sep 20, 2007, 1:05:08 PM9/20/07

to

Steven D'Aprano wrote:

> But of course you can't look up the dict by value, only by
> identity. But that's what you wanted.

Actually, if I understand the OP's examples right, he wants to look
up only by value, not by identity.

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown

Chris Mellon

unread,

Sep 20, 2007, 1:12:30 PM9/20/07

to pytho...@python.org

It's a little Chicken Little. The current version of py3k has no
immutable bytes type, but there's work being done even as we speak to
implement it.

Chris Mellon

unread,

Sep 20, 2007, 1:18:07 PM9/20/07

to pytho...@python.org

On 9/20/07, OKB (not okblacke) <brenNOS...@nobrenspambarn.net> wrote:
> Steven D'Aprano wrote:
>
> > But of course you can't look up the dict by value, only by
> > identity. But that's what you wanted.
>
> Actually, if I understand the OP's examples right, he wants to look
> up only by value, not by identity.
>

He's switched at least twice, as I understand his writing. Currently,
he appears to want to look up by value, copying lists on insertion,
and is ignoring what happens if you mutate the key.

Terry Reedy

unread,

Sep 20, 2007, 2:01:20 PM9/20/07

to pytho...@python.org

"Chris Mellon" <ark...@gmail.com> wrote in message
news:4866bea60709201012x30f...@mail.gmail.com...

| On 9/20/07, Dustan <Dustan...@gmail.com> wrote:

| It's a little Chicken Little. The current version of py3k has no
| immutable bytes type, but there's work being done even as we speak to
| implement it.

To reinforce this comment: CPython3.0 will not be released for at least 9
months. The current 3.0.a1 is openly *experimental*. 3.0.a2 should be
released within a few weeks with an invitation for anyone concerned with
the final product to experiment with it.

GvR's development strategy has been to start small and add what is clearly
useful (rather than start large and trim). 3.0 is being trimmed bit.
Where too much is trimmed, something can get added back.

Gabriel Genellina

unread,

Sep 21, 2007, 3:25:53 AM9/21/07

to pytho...@python.org

En Thu, 20 Sep 2007 08:46:29 -0300, Steven D'Aprano
<st...@REMOVE-THIS-cybersource.com.au> escribi�:

> Another way is to use this class:
>
> class HashableList(list):
> def __hash__(self):
> return hash(tuple(self))

...and that will stop working as soon as the list is mutated (which is
exactly what you said before)

--
Gabriel Genellina

Bryan Olson

unread,

Sep 22, 2007, 1:17:50 AM9/22/07

to

Gabriel Genellina wrote:
> En Thu, 20 Sep 2007 08:46:29 -0300, Steven D'Aprano
>

>> Another way is to use this class:
>>
>> class HashableList(list):
>> def __hash__(self):
>> return hash(tuple(self))
>
> ...and that will stop working as soon as the list is mutated (which is
> exactly what you said before)

Yup. I had suggested that technique in this thread, but it
doesn't really work. It hashes by state and compares by
state, but the actual key that a dict would store is the
object's identity. If the state has changed by the time
of a dict lookup, the dict will look in the hash-bucket
of the old state, but the object's equality test will
compare against the current state.
Bummer. Sorry.

--
--Bryan

sapsi

unread,

Sep 22, 2007, 10:23:56 AM9/22/07

to

Thank you everyone for the assistance and for the very informative
discussion
Regards
SM