Thanks,
Mattia
Check out the set() data type.
~Sean
Unless the insertion order is important, you could use a set -- where a
second insertion will have no effect.
>>> s = set()
>>> s.add(1)
>>> s.add(2)
>>> s.add(1)
>>> print s
set([1, 2])
Gary Herron
How about using a set instead?
>>> a = {1, 2, 3}
>>> a
{1, 2, 3}
>>> a |= {4}
>>> a
{1, 2, 3, 4}
>>> a |= {4}
>>> a
{1, 2, 3, 4}
>>> _
Cheers,
- Alf
Ok, so you all suggest to use a set. Now the second question, more
interesting. Why can't I insert a list into a set? I mean, I have a
function that returns a list. I call this function several times and
maybe the list returned is the same as another one already returned. I
usually put all this lists into another list. How can I assure that my
list contains only unique lists? Using set does'n work (i.e. the python
interpreter tells me: TypeError: unhashable type: 'list')...
Sets can contain *only* hashable objects, but lists are not hashable
(since they are mutable).
Perhaps, you could convert your list to a tuple first -- tuples *are*
hashable.
>>> s = set()
>>> l = [1,2,3]
>>> s.add(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> s.add(tuple(l))
>>> s
set([(1, 2, 3)])
> Using set does'n work (i.e. the python interpreter tells me:
> TypeError: unhashable type: 'list')...
Convert the lists to tuples before adding them. Tuples are hashable.
/W
--
INVALID? DE!
You could also define a custom object that manages a custom ordered
set
class unique_set(object):
def __init__(self,list):
self.list = list
def __add___(self,val):
if val not in self.list
self.list.append(val)
return True
return False
>>> unique_list = unique_set(['a','b','c'])
>>> unique_list.list
['a', 'b', 'c']
>>> unique_list + 'd'
True
>>> unique_list + 'e'
True
>>> unique_list + 'd'
False
>>> unique_list.list
['a', 'b', 'c', 'd', 'e']
>>>
I've used this on a few projects, it makes for wonderfully clean code,
because you can look at your program as an overview without all the
arithmetic behind it.
hope it helps
> You could also define a custom object that manages a custom ordered set
[...]
> I've used this on a few projects, it makes for wonderfully clean code,
> because you can look at your program as an overview without all the
> arithmetic behind it.
Which is all very good, but beware that your "ordered set" implementation
is O(N) for insertions, which may be slow once the set grows large enough.
Also, I'm not sure I like your abuse of the + operator to modify the
object in place and return a flag. It is an API not shared by (as far as
I can see) any other data type in Python.
--
Steven
Could probably just abuse an odict as cleanly. The other option that
leaps to mind is to use a bloom filter and a list.
Geremy Condra
I agree it couuld be more consisten with other object apis,
I also think that if every api has to conform to every other api
nothing will ever get done.
Heres a slightly more familiar version, it returns the value added or
none to conform with other APIs.
class unique_set(object):
def __init__(self,list):
self.list = list
def __add___(self,val):
if val not in self.list
self.list.append(val)
return val
return None
>>> unique_list = unique_set(['a','b','c'])
>>> unique_list.list
['a', 'b', 'c']
>>> unique_list + 'd'
'd'
then a test opperand to verify if a value was inserted could be
if unique_list + 'd':
# done some stuff
but it could also be used in cases like this
unique_added = unique_list + 'd' or 'not added'
this makes the set type hashable.
class Set(set):
__hash__ = lambda self: id(self)
s = Set()
s.add(1)
s.add(2)
s.add(1)
print s
set([1, 2])
d = {}
d[s] = 'voila'
print d
{Set([1,2]):'voila'}
print d[s]
'voila'
although it's not what you've asked about. it's intereting to make set
hashable using __hash__.
Or you could just use frozenset and get the correct semantics:
http://docs.python.org/library/stdtypes.html#frozenset
Cheers,
Chris
--
http://blog.rebertia.com
but if every object has its own distinct API, we will never finish
reading the docs (assuming they do exist for each object).
> this makes the set type hashable.
>
> class Set(set):
> __hash__ = lambda self: id(self)
That's a *seriously* broken hash function.
>>> key = "voila"
>>> d = { Set(key): 1 }
>>> d
{Set(['i', 'a', 'l', 'o', 'v']): 1}
>>> d[ Set(key) ]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
KeyError: Set(['i', 'a', 'l', 'o', 'v'])
--
Steven
of course it is broken as long as it uses it's instance id.
i added this to notify that unhashable can become hashable
implementing __hash__ inside the class. which probably set to None by
default.
Ok, nice example, but I believe that using id() as the hash function can
lead to unexpected collisions.
For dict and set to work correctly, the hash function must conform to
the contract that:
- if A == B then hash(A) == hash(B)
If the id() of two objects differ but their content equal (i.e. they are
two equivalent, but distinct object), they should have the same hash. If
id() is used for the hash of an arbitrary object, the contract will be
broken unless you define A == B in terms of id(A) == id(B).
No, you have that backwards. Using id() as the hash function means that
equal keys will hash unequal -- rather than unexpected collisions, it
leads to unexpected failure-to-collide-when-it-should-collide.
And it isn't a "nice example", it is a terrible example.
Firstly, the example fails to behave correctly. It simply doesn't work as
advertised.
Secondly, and much worse, it encourages people to do something dangerous
without thinking about the consequences. If it is so easy to hash mutable
objects, why don't built-in lists and sets don't have a __hash__ method?
Do you think that the Python developers merely forgot?
No, there is good reason why mutable items shouldn't be used as keys in
dicts or in sets, and this example simply papers over the reasons why and
gives the impression that using mutable objects as keys is easy and safe
when it is neither.
Using mutable objects as keys or set elements leads to surprising,
unexpected behaviour and hard-to-find bugs. Consider the following set
with lists as elements:
L = [1, 2]
s = Set() # set that allows mutable elements
s.add(L)
s.add([1, 2, 3])
So far so good. But what should happen now?
L.append(3)
The set now has two equal elements, which breaks the set invariant that
it has no duplicates.
Putting the problem of duplicates aside, there is another problem:
L = [1, 2]
s = Set([L])
L.append(3)
There are two possibilities: either the hash function of L changes when
the object changes, or it doesn't. Suppose it changes. Then the hash of L
after the append will be different from the hash of L before the append,
and so membership testing (L in s) will fail.
Okay, suppose we somehow arrange matters so that the hash of the object
doesn't change as the object mutates. This will solve the problem above:
we can mutate L as often as we like, and L in s will continue to work
correctly. But now the hash of an object doesn't just depend on it's
value, but on its history. That means that two objects which are
identical can hash differently, and we've already seen this is a problem.
There is one final approach which could work: we give the object a
constant hash function, so that all objects of that type hash
identically. This means that the performance of searches and lookups in
sets and dicts will fall to that of lists. There is no point in paying
all the extra overhead of a dict to get behaviour as slow, or slower,
than a list.
In other words, no matter what you do, using mutable objects as keys or
set elements leads to serious problems that need to be dealt with. It
simply isn't true that all you need to do to make mutable objects usable
in dicts or sets is to add a hash function. That part is trivial.
--
Steven
I agree with you, and in fact I'm inserting tuples in my set. All the
workaroun to use somehow mutable object are poor attemps to solve in a
quick-and-dirty way a difficult problem like hashing. But I think that
during the discussion we have slowly forgot the main topic of my
question, that was insert unique objects in a container. Hash are good to
retrieve items in constant time, and when we are dealing with collisions
we have to provide some solutions, like chaining or open addressing. Note
also that in hash collisions happen and the hash function is used to
retrieve items, not to insert unique items.
>>> s1 = Set([1])
>>> s2 = Set([1])
>>> s1 == s2
True
>>> d = {}
>>> d[s1] = 3
>>> d[s2] = 5
>>> d
{Set([1]): 3, Set([1]): 5}
>>>
Equality is kept for comparison, but what is it worth to hash
them by id?
On the original problem:
You could turn you lists into tuples. This would be clean
and correct.
ciao - chris
--
Christian Tismer :^)<mailto:tis...@stackless.com>
tismerysoft GmbH : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/