insert unique data in a list

13 views
Skip to first unread message

mattia

unread,
Dec 13, 2009, 11:37:20 AM12/13/09
to
How can I insert non-duplicate data in a list? I mean, is there a
particular option in the creation of a list that permit me not to use
something like:
def append_unique(l, val):
if val not in l:
l.append(val)

Thanks,
Mattia

Sean DiZazzo

unread,
Dec 13, 2009, 12:05:31 PM12/13/09
to

Check out the set() data type.

~Sean

Gary Herron

unread,
Dec 13, 2009, 12:07:08 PM12/13/09
to mattia, pytho...@python.org

Unless the insertion order is important, you could use a set -- where a
second insertion will have no effect.

>>> s = set()
>>> s.add(1)
>>> s.add(2)
>>> s.add(1)
>>> print s
set([1, 2])


Gary Herron

Alf P. Steinbach

unread,
Dec 13, 2009, 12:08:41 PM12/13/09
to
* mattia:

How about using a set instead?

>>> a = {1, 2, 3}
>>> a
{1, 2, 3}
>>> a |= {4}
>>> a
{1, 2, 3, 4}
>>> a |= {4}
>>> a
{1, 2, 3, 4}
>>> _


Cheers,

- Alf

mattia

unread,
Dec 13, 2009, 12:57:47 PM12/13/09
to

Ok, so you all suggest to use a set. Now the second question, more
interesting. Why can't I insert a list into a set? I mean, I have a
function that returns a list. I call this function several times and
maybe the list returned is the same as another one already returned. I
usually put all this lists into another list. How can I assure that my
list contains only unique lists? Using set does'n work (i.e. the python
interpreter tells me: TypeError: unhashable type: 'list')...

Gary Herron

unread,
Dec 13, 2009, 1:20:29 PM12/13/09
to pytho...@python.org

Sets can contain *only* hashable objects, but lists are not hashable
(since they are mutable).

Perhaps, you could convert your list to a tuple first -- tuples *are*
hashable.

>>> s = set()
>>> l = [1,2,3]
>>> s.add(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'
>>> s.add(tuple(l))
>>> s
set([(1, 2, 3)])

Andreas Waldenburger

unread,
Dec 13, 2009, 1:19:10 PM12/13/09
to
On 13 Dec 2009 17:57:47 GMT mattia <ger...@gmail.com> wrote:

> Using set does'n work (i.e. the python interpreter tells me:
> TypeError: unhashable type: 'list')...

Convert the lists to tuples before adding them. Tuples are hashable.

/W

--
INVALID? DE!

Fire Crow

unread,
Dec 13, 2009, 1:48:11 PM12/13/09
to

You could also define a custom object that manages a custom ordered
set

class unique_set(object):
def __init__(self,list):
self.list = list

def __add___(self,val):
if val not in self.list
self.list.append(val)
return True
return False

>>> unique_list = unique_set(['a','b','c'])
>>> unique_list.list
['a', 'b', 'c']
>>> unique_list + 'd'
True
>>> unique_list + 'e'
True
>>> unique_list + 'd'
False
>>> unique_list.list
['a', 'b', 'c', 'd', 'e']
>>>

I've used this on a few projects, it makes for wonderfully clean code,
because you can look at your program as an overview without all the
arithmetic behind it.

hope it helps

Steven D'Aprano

unread,
Dec 13, 2009, 3:09:36 PM12/13/09
to
On Sun, 13 Dec 2009 10:48:11 -0800, Fire Crow wrote:

> You could also define a custom object that manages a custom ordered set

[...]


> I've used this on a few projects, it makes for wonderfully clean code,
> because you can look at your program as an overview without all the
> arithmetic behind it.

Which is all very good, but beware that your "ordered set" implementation
is O(N) for insertions, which may be slow once the set grows large enough.

Also, I'm not sure I like your abuse of the + operator to modify the
object in place and return a flag. It is an API not shared by (as far as
I can see) any other data type in Python.


--
Steven

geremy condra

unread,
Dec 13, 2009, 6:04:36 PM12/13/09
to Steven D'Aprano, pytho...@python.org

Could probably just abuse an odict as cleanly. The other option that
leaps to mind is to use a bloom filter and a list.

Geremy Condra

Fire Crow

unread,
Dec 13, 2009, 7:49:27 PM12/13/09
to
> Also, I'm not sure I like your abuse of the + operator to modify the
> object in place and return a flag. It is an API not shared by (as far as
> I can see) any other data type in Python.

I agree it couuld be more consisten with other object apis,

I also think that if every api has to conform to every other api
nothing will ever get done.

Heres a slightly more familiar version, it returns the value added or
none to conform with other APIs.

class unique_set(object):
def __init__(self,list):
self.list = list

def __add___(self,val):
if val not in self.list
self.list.append(val)

return val
return None

>>> unique_list = unique_set(['a','b','c'])
>>> unique_list.list
['a', 'b', 'c']
>>> unique_list + 'd'

'd'

then a test opperand to verify if a value was inserted could be

if unique_list + 'd':
# done some stuff

but it could also be used in cases like this

unique_added = unique_list + 'd' or 'not added'

knifenomad

unread,
Dec 13, 2009, 8:19:17 PM12/13/09
to

this makes the set type hashable.

class Set(set):
__hash__ = lambda self: id(self)

s = Set()
s.add(1)
s.add(2)
s.add(1)

print s
set([1, 2])

d = {}
d[s] = 'voila'

print d
{Set([1,2]):'voila'}

print d[s]
'voila'

knifenomad

unread,
Dec 13, 2009, 8:24:18 PM12/13/09
to
> 'voila'- 원본 텍스트 숨기기 -
>
> - 원본 텍스트 보기 -

although it's not what you've asked about. it's intereting to make set
hashable using __hash__.

Chris Rebert

unread,
Dec 13, 2009, 8:47:45 PM12/13/09
to knifenomad, pytho...@python.org
On Sun, Dec 13, 2009 at 5:19 PM, knifenomad <knife...@gmail.com> wrote:
> On 12월14일, 오전2시57분, mattia <ger...@gmail.com> wrote:
>> Il Sun, 13 Dec 2009 16:37:20 +0000, mattia ha scritto:
>> > How can I insert non-duplicate data in a list? I mean, is there a
>> > particular option in the creation of a list that permit me not to use
>> > something like:
>> > def append_unique(l, val):
>> >     if val not in l:
>> >         l.append(val)
>>
>> Ok, so you all suggest to use a set. Now the second question, more
>> interesting. Why can't I insert a list into a set? I mean, I have a
>> function that returns a list. I call this function several times and
>> maybe the list returned is the same as another one already returned. I
>> usually put all this lists into another list. How can I assure that my
>> list contains only unique lists? Using set does'n work (i.e. the python
>> interpreter tells me: TypeError: unhashable type: 'list')...
>
> this makes the set type hashable.
>
> class Set(set):
>    __hash__ = lambda self: id(self)

Or you could just use frozenset and get the correct semantics:
http://docs.python.org/library/stdtypes.html#frozenset

Cheers,
Chris
--
http://blog.rebertia.com

Lie Ryan

unread,
Dec 13, 2009, 9:06:56 PM12/13/09
to
On 12/14/2009 11:49 AM, Fire Crow wrote:
> I also think that if every api has to conform to every other api
> nothing will ever get done.

but if every object has its own distinct API, we will never finish
reading the docs (assuming they do exist for each object).

Steven D'Aprano

unread,
Dec 13, 2009, 10:42:26 PM12/13/09
to
On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:


> this makes the set type hashable.
>
> class Set(set):
> __hash__ = lambda self: id(self)


That's a *seriously* broken hash function.

>>> key = "voila"
>>> d = { Set(key): 1 }
>>> d
{Set(['i', 'a', 'l', 'o', 'v']): 1}
>>> d[ Set(key) ]


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

KeyError: Set(['i', 'a', 'l', 'o', 'v'])


--
Steven

knifenomad

unread,
Dec 14, 2009, 12:17:28 AM12/14/09
to
On 12월14일, 오후12시42분, Steven D'Aprano

of course it is broken as long as it uses it's instance id.
i added this to notify that unhashable can become hashable
implementing __hash__ inside the class. which probably set to None by
default.

mattia

unread,
Dec 14, 2009, 12:13:24 PM12/14/09
to

Ok, nice example, but I believe that using id() as the hash function can
lead to unexpected collisions.

Lie Ryan

unread,
Dec 14, 2009, 2:34:57 PM12/14/09
to
On 12/15/2009 4:13 AM, mattia wrote:
>> >
>> > of course it is broken as long as it uses it's instance id. i added this
>> > to notify that unhashable can become hashable implementing __hash__
>> > inside the class. which probably set to None by default.
> Ok, nice example, but I believe that using id() as the hash function can
> lead to unexpected collisions.

For dict and set to work correctly, the hash function must conform to
the contract that:
- if A == B then hash(A) == hash(B)

If the id() of two objects differ but their content equal (i.e. they are
two equivalent, but distinct object), they should have the same hash. If
id() is used for the hash of an arbitrary object, the contract will be
broken unless you define A == B in terms of id(A) == id(B).

Steven D'Aprano

unread,
Dec 14, 2009, 4:53:38 PM12/14/09
to

No, you have that backwards. Using id() as the hash function means that
equal keys will hash unequal -- rather than unexpected collisions, it
leads to unexpected failure-to-collide-when-it-should-collide.

And it isn't a "nice example", it is a terrible example.

Firstly, the example fails to behave correctly. It simply doesn't work as
advertised.

Secondly, and much worse, it encourages people to do something dangerous
without thinking about the consequences. If it is so easy to hash mutable
objects, why don't built-in lists and sets don't have a __hash__ method?
Do you think that the Python developers merely forgot?

No, there is good reason why mutable items shouldn't be used as keys in
dicts or in sets, and this example simply papers over the reasons why and
gives the impression that using mutable objects as keys is easy and safe
when it is neither.

Using mutable objects as keys or set elements leads to surprising,
unexpected behaviour and hard-to-find bugs. Consider the following set
with lists as elements:

L = [1, 2]
s = Set() # set that allows mutable elements
s.add(L)
s.add([1, 2, 3])

So far so good. But what should happen now?

L.append(3)

The set now has two equal elements, which breaks the set invariant that
it has no duplicates.

Putting the problem of duplicates aside, there is another problem:

L = [1, 2]
s = Set([L])
L.append(3)

There are two possibilities: either the hash function of L changes when
the object changes, or it doesn't. Suppose it changes. Then the hash of L
after the append will be different from the hash of L before the append,
and so membership testing (L in s) will fail.

Okay, suppose we somehow arrange matters so that the hash of the object
doesn't change as the object mutates. This will solve the problem above:
we can mutate L as often as we like, and L in s will continue to work
correctly. But now the hash of an object doesn't just depend on it's
value, but on its history. That means that two objects which are
identical can hash differently, and we've already seen this is a problem.

There is one final approach which could work: we give the object a
constant hash function, so that all objects of that type hash
identically. This means that the performance of searches and lookups in
sets and dicts will fall to that of lists. There is no point in paying
all the extra overhead of a dict to get behaviour as slow, or slower,
than a list.

In other words, no matter what you do, using mutable objects as keys or
set elements leads to serious problems that need to be dealt with. It
simply isn't true that all you need to do to make mutable objects usable
in dicts or sets is to add a hash function. That part is trivial.

--
Steven

mattia

unread,
Dec 14, 2009, 6:24:31 PM12/14/09
to

I agree with you, and in fact I'm inserting tuples in my set. All the
workaroun to use somehow mutable object are poor attemps to solve in a
quick-and-dirty way a difficult problem like hashing. But I think that
during the discussion we have slowly forgot the main topic of my
question, that was insert unique objects in a container. Hash are good to
retrieve items in constant time, and when we are dealing with collisions
we have to provide some solutions, like chaining or open addressing. Note
also that in hash collisions happen and the hash function is used to
retrieve items, not to insert unique items.

Christian Tismer

unread,
Dec 15, 2009, 5:17:10 PM12/15/09
to pytho...@python.org
Thoughtless messing with __hash__ is seldom useful.

>>> s1 = Set([1])
>>> s2 = Set([1])
>>> s1 == s2
True
>>> d = {}
>>> d[s1] = 3
>>> d[s2] = 5
>>> d
{Set([1]): 3, Set([1]): 5}
>>>

Equality is kept for comparison, but what is it worth to hash
them by id?

On the original problem:

You could turn you lists into tuples. This would be clean
and correct.

ciao - chris

--
Christian Tismer :^)<mailto:tis...@stackless.com>
tismerysoft GmbH : Have a break! Take a ride on Python's
Johannes-Niemeyer-Weg 9A : *Starship* http://starship.python.net/
14109 Berlin : PGP key -> http://wwwkeys.pgp.net/
work +49 30 802 86 56 mobile +49 173 24 18 776 fax +49 30 80 90 57 05
PGP 0x57F3BF04 9064 F4E1 D754 C2FF 1619 305B C09C 5A3B 57F3 BF04
whom do you want to sponsor today? http://www.stackless.com/

Reply all
Reply to author
Forward
0 new messages