Unique Elements in a List

126 views
Skip to first unread message

supe...@gmail.com

unread,
May 9, 2005, 6:15:03 PM5/9/05
to
Is there an easy way to grab the Unique elements from a list?
For Example:
data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

what I am looking for is the unique elements 0.4 and 0.9 with their
index from the list.
Probably something like a Hash Table approach!!
I would like to get this done without unnecessary overhead.And the list
could be essentially anything strings,floats,int etc...

Or is it already avaliable as an attr to a list or an array?
I dont remember seeing anything like that.

supe...@gmail.com

unread,
May 9, 2005, 6:36:37 PM5/9/05
to
OK I need to be more clear I guess!Unique Elements I mean, elements
that are non repeating. so in the above list 0.4, 0.9 are unique as
they exist only once in the list.

runes

unread,
May 9, 2005, 7:01:02 PM5/9/05
to
This is not the most beautiful idiom, but it works...

d = {}
for k in data:
try:
d[k] += 1
except:
d[k] = 1

for k,v in d.items():
if v == 1:
print k

Roie Kerstein

unread,
May 9, 2005, 6:40:52 PM5/9/05
to pytho...@python.org
supe...@gmail.com wrote:

> Is there an easy way to grab the Unique elements from a list?
>

Yes.
The set data type.
Look in the reference of python 2.4.
--
Best regards
Roie Kerstein

James Stroud

unread,
May 9, 2005, 7:24:39 PM5/9/05
to pytho...@python.org

On Monday 09 May 2005 03:15 pm, supe...@gmail.com wrote:
> Is there an easy way to grab the Unique elements from a list?

from sets import Set

data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

[x for x in Set(data) if data.count(x) == 1]

--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095

http://www.jamesstroud.com/

James Stroud

unread,
May 9, 2005, 6:57:54 PM5/9/05
to pytho...@python.org
On Monday 09 May 2005 03:15 pm, supe...@gmail.com wrote:

from sets import Set

runes

unread,
May 9, 2005, 7:47:14 PM5/9/05
to
Your set-approach is very nice, but if the list of data is huge, it is
rather slow because you'll have to loop through the data list and count
every member.

bearoph...@lycos.com

unread,
May 9, 2005, 8:26:16 PM5/9/05
to
runes:

> d = {}
> for k in data:
> try:
> d[k] += 1
> except:
> d[k] = 1
>
> for k,v in d.items():
> if v == 1:
> print k

For this problem, if duplicated keys are common enough, this version is
probably the faster.

With this *different* problem, it seems that sometimes the faster
solution is using "in":
http://pyalot.blogspot.com/2005/04/dictionary-speed.html

Bye,
Bearophile

Michael J. Fromberger

unread,
May 9, 2005, 8:07:27 PM5/9/05
to
In article <1115676903.4...@z14g2000cwz.googlegroups.com>,
"supe...@gmail.com" <supe...@gmail.com> wrote:

From your comments downthread, it seems you want to find those elements
of the input sequence which occur exactly once, and return not only
these elements, but also their positions.

One reasonable solution might be as follows:

def unique_elts(seq):
elts = {}
for pos, elt in enumerate(seq):
elts.setdefault(elt, []).append(pos)

return [ (x, p[0]) for (x, p) in elts.iteritems()
if len(p) == 1 ]

This returns a list of tuples of the form (x, pos), where x is an
element of seq that occurs exactly once, and pos is its index in the
original sequence. This implementation traverses the input sequence
exactly once, and requires storage proportional to the length of the
input.

-M

--
Michael J. Fromberger | Lecturer, Dept. of Computer Science
http://www.dartmouth.edu/~sting/ | Dartmouth College, Hanover, NH, USA

Aahz

unread,
May 9, 2005, 9:14:43 PM5/9/05
to
In article <1115679662.5...@f14g2000cwb.googlegroups.com>,

Only if "works" does not include "order preserving".
--
Aahz (aa...@pythoncraft.com) <*> http://www.pythoncraft.com/

"And if that makes me an elitist...I couldn't be happier." --JMS

Paul Rubin

unread,
May 9, 2005, 9:23:16 PM5/9/05
to
"supe...@gmail.com" <supe...@gmail.com> writes:
> Is there an easy way to grab the Unique elements from a list?
> For Example:
> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

Untested: here's an iterator that gives you the index and value,
without reordering. Uses dicts instead of sets for backwards compatibility.

def unique_elements(data):
seen = {}
for n,x in data:
if x not in seen:
seen[x] = 1
yield n,x

Rune Strand

unread,
May 9, 2005, 9:30:49 PM5/9/05
to
You're right. I somehow missed the index part :-o. It's easy to fix
though.

Fredrik Lundh

unread,
May 10, 2005, 3:28:27 AM5/10/05
to pytho...@python.org
Paul Rubin wrote:

> > Is there an easy way to grab the Unique elements from a list?
> > For Example:
> > data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>
> Untested: here's an iterator that gives you the index and value,
> without reordering. Uses dicts instead of sets for backwards compatibility.
>
> def unique_elements(data):
> seen = {}
> for n,x in data:
> if x not in seen:
> seen[x] = 1
> yield n,x

you forgot enumerate()

and if you fix that, you'll notice that the output doesn't quite match
the OP's spec:

> For Example:
> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>

> what I am looking for is the unique elements 0.4 and 0.9 with their
> index from the list.

here's a straight-forward variation that gives the specified output,
in the original order:

def unique_elements(data):
count = {}
data = list(enumerate(data))
for n,x in data:
count[x] = count.setdefault(x, 0) + 1
for n,x in data:
if count[x] == 1:
yield x, n # value with index

depending on the data, it might be more efficient to store the
"last seen index" in a dictionary, and sort the result on the way
out (if necessary). something like

from operator import itemgetter

def unique_elements(data):
seen = {}; index = {}
for n, x in enumerate(data):
if x in seen:
del index[x]
else:
index[x] = seen[x] = n
index = index.items()
index.sort(key=itemgetter(1)) # leave this out if order doesn't matter
return index

could work.

</F>

Edvard Majakari

unread,
May 10, 2005, 3:53:35 AM5/10/05
to
James Stroud <jst...@mbi.ucla.edu> writes:

> from sets import Set
>
> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>
> [x for x in Set(data) if data.count(x) == 1]

Um.

...I must have missed something, but I'll post nevertheless:

wouldn't just

[x for x in data if data.count(x) == 1]

suffice? it is also "stable" preserving order of items. Lemme demo:

>>> [x for x in Set(data) if data.count(x) == 1]

[0.90000000000000002, 0.40000000000000002]

>>> [x for x in data if data.count(x) == 1]
[0.40000000000000002, 0.90000000000000002]

Though I'll admit I also thought of Sets first, because I didn't remember
there was such a nice method list.count().

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!
One day, when he was naughty, Mr Bunnsy looked over the hedge into Farmer
Fred's field and it was full of fresh green lettuces. Mr Bunnsy, however, was
not full of lettuces. This did not seem fair. --Mr Bunnsy has an adventure

Max M

unread,
May 10, 2005, 4:01:09 AM5/10/05
to
Fredrik Lundh wrote:

> depending on the data, it might be more efficient to store the
> "last seen index" in a dictionary, and sort the result on the way
> out (if necessary). something like


form sets import Set

data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))


--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science

Edvard Majakari

unread,
May 10, 2005, 4:26:16 AM5/10/05
to
"Michael J. Fromberger" <Michael.J....@Clothing.Dartmouth.EDU> writes:

> One reasonable solution might be as follows:
>
> def unique_elts(seq):
> elts = {}
> for pos, elt in enumerate(seq):
> elts.setdefault(elt, []).append(pos)
>
> return [ (x, p[0]) for (x, p) in elts.iteritems()
> if len(p) == 1 ]

This is neat. Being lazy (the wrong way) I've never payed much attention why
would I need dict.setdefault() when I have things as dict.get(k, [def]). This
was a nice example of good use for setdefault() because it works like
dict.get() except it also sets the value if it didn't exist.

+1 IOTW (idiom of the week).

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print
join('',map{chr hex}(split/(\w{2})/)),uc substr(crypt(60281449,'es'),2,4),"\n";

Fredrik Lundh

unread,
May 10, 2005, 5:54:59 AM5/10/05
to pytho...@python.org
Max M wrote:

>> depending on the data, it might be more efficient to store the
>> "last seen index" in a dictionary, and sort the result on the way
>> out (if necessary). something like
>
> form sets import Set
>
> data = list(Set([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]))

read the OP's spec again.

</F>

Bengt Richter

unread,
May 10, 2005, 6:22:16 AM5/10/05
to

You want to be careful of your definitions, especially with floating point values,
which may surprise the uninitiated. Dicts and sets hash numerically equal values
to the same hash, and then do equality test, so you get legitimate results like:

>>> set([9*.1-8*.1, .1])
set([0.10000000000000001, 0.099999999999999978])
>>> set([123, 123.0, 123L])
set([123])
>>> set([123.0, 123, 123L])
set([123.0])
>>> set([123L, 123, 123.0])
set([123L])


You may want to consider creating representations other than the original data
to use in your uniqueness testing, depending on your definition.

If you are happy with the way dicts and sets compare elements, you could do
a dict that keeps counts, or FTHOI something different (hot off untested griddle,
so test before relying ;-)

>>> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>>> once=set()
>>> more=set()
>>> for el in data:
... if el in once: more.add(el)
... else: once.add(el)
...
>>> once-more
set([0.90000000000000002, 0.40000000000000002])

Not the most efficient space-wise, not sure about speed.

Regards,
Bengt Richter

Max M

unread,
May 10, 2005, 7:52:30 AM5/10/05
to

Well if you want to haggle over minor details like a spec, it's easy to
be critical!


data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]

freqs = {}
for position in xrange(len(data)):
freqs.setdefault(data[position], []).append(position)
unique = [(value, positions[0]) for (value, positions) in freqs.items()
if len(positions) == 1]

Paul Rubin

unread,
May 10, 2005, 1:42:37 PM5/10/05
to
"Fredrik Lundh" <fre...@pythonware.com> writes:
> you forgot enumerate()

Oh, oops.

> and if you fix that, you'll notice that the output doesn't quite match
> the OP's spec:
>

> > what I am looking for is the unique elements 0.4 and 0.9 with their
> > index from the list.

Oh, I see, he wanted the elements that only occur once. I misunderstood.

Aahz

unread,
May 10, 2005, 2:24:25 PM5/10/05
to
In article <87psvzi...@titan.staselog.com>,

Edvard Majakari <edvar...@majakari.net> wrote:
>James Stroud <jst...@mbi.ucla.edu> writes:
>>
>> from sets import Set
>>
>> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>>
>> [x for x in Set(data) if data.count(x) == 1]
>
>Um.
>
>...I must have missed something, but I'll post nevertheless:
>
>wouldn't just
>
>[x for x in data if data.count(x) == 1]
>
>suffice? it is also "stable" preserving order of items. Lemme demo:

Only for small datasets -- this is an O(N^2) algorithm.

Steven Bethard

unread,
May 10, 2005, 4:04:18 PM5/10/05
to
supe...@gmail.com wrote:
> Is there an easy way to grab the Unique elements from a list?
> For Example:
> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>
> what I am looking for is the unique elements 0.4 and 0.9 with their
> index from the list.

This is basically the same as Fredrik Lundh's solution, but here it is
anyway:

py> def f(lst):
... counts = dicttools.counts(lst)
... return [(i, elem)
... for i, elem in enumerate(lst)
... if counts[elem] == 1]
...
py> f([0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9])
[(3, 0.40000000000000002), (7, 0.90000000000000002)]

Where dicttools.counts is defined as:

def counts(iterable, key=None):
result = {}
for item in iterable:
# apply key function if necessary
if key is None:
k = item
else:
k = key(item)
# increment key's count
try:
result[k] += 1
except KeyError:
result[k] = 1
return result

(I use this so often that I have a module called dicttools to hold it.
You don't actually need the key= argument for your problem, but I just
copied the code directly.)

STeVe

Edvard Majakari

unread,
May 12, 2005, 3:48:21 AM5/12/05
to
aa...@pythoncraft.com (Aahz) writes:

>>[x for x in data if data.count(x) == 1]
>>
>>suffice? it is also "stable" preserving order of items. Lemme demo:
>
> Only for small datasets -- this is an O(N^2) algorithm.

I realized that, but maybe I should've pointed it out too. For the OP if
he/she is unaware - notation O(N^2) (big O n squared) means the computing time
of the algorithm increases exponentially (where exponent is 2) relative to the
size of the input.

Eg. if the input size is i and it takes p seconds to compute it, then given
input size 10*i, the computing time would be 100*p. These notions can apply
for memory usage as well, but the problem in here is the computing time:
list.count() must iterate through the list each time, and as such the loop

[x for x in data if data.count(x) == 1]

iterates through each item in data (x for x in data), and for each item it
will again iterate through each item in data to see how many times it
occurred. If data contains 200 items, this idiom would iterate the structure
40 000 times. With today's computers one wouldn't notice it, unless each item
requires heavy processing (eg. launching a separate process per item
etc). However, if the length of the data can be thousands or even tens of
thousands, this idiom would become unusable. If data contained 75 000 items,
the loop would do 25 625 000 000 iterations, effectively bringing cpu to
halt..

So, generally one should avoid using exponential O(n^k) (where k > 1)
algorithms, unless faster O(n) or O(n*lg(n)) solution is either impossible (or
too complex, and inputs are known to be small etc).

Wikipedia has good resources and pointers to related things, see
http://en.wikipedia.org/wiki/Analysis_of_algorithms

--
#!/usr/bin/perl -w
$h={23,69,28,'6e',2,64,3,76,7,20,13,61,8,'4d',24,73,10,'6a',12,'6b',21,68,14,
72,16,'2c',17,20,9,61,11,61,25,74,4,61,1,45,29,20,5,72,18,61,15,69,20,43,26,
69,19,20,6,64,27,61,22,72};$_=join'',map{chr hex $h->{$_}}sort{$a<=>$b}
keys%$h;m/(\w).*\s(\w+)/x;$_.=uc substr(crypt(join('',60,28,14,49),join'',
map{lc}($1,substr $2,4,1)),2,4)."\n"; print;

Scott David Daniels

unread,
May 12, 2005, 12:38:19 PM5/12/05
to
Edvard Majakari wrote:
> I realized that, but maybe I should've pointed it out too. For the OP if
> he/she is unaware - notation O(N^2) (big O n squared) means the computing time
> of the algorithm increases exponentially (where exponent is 2) relative to the
> size of the input.
Normally this is called a polynomial, rather than exponential increase.
Exponential increases are typically of the form (C^N) (they are all
equivalent).
Polynomial times are hallways characterized by their largest exponent,
So you never call something O(N^3 - N^2) Since, as N gets large enough,
The N^2 term shrinks to non-existence.

http://en.wikipedia.org/wiki/Exponential_time

> ... generally one should avoid using exponential O(n^k) (where k > 1)

Again polynomial, not exponential time. Note that there is no
polynomial time algorithm with (k < 1), since it takes O(n) time
to read the problem.


--Scott David Daniels
Scott....@Acm.Org

Edvard Majakari

unread,
May 12, 2005, 3:56:08 PM5/12/05
to
Scott David Daniels <Scott....@Acm.Org> writes:

> Normally this is called a polynomial, rather than exponential increase.
> Exponential increases are typically of the form (C^N) (they are all
> equivalent).
> Polynomial times are hallways characterized by their largest exponent,
> So you never call something O(N^3 - N^2) Since, as N gets large enough,
> The N^2 term shrinks to non-existence.

Yup, you are of course, completely correct. I was thinking of "exponent here
is two" and mistakenly named in exponential.

my_text.replace('exponent','polynom'), there :)

Reminding of ignoring terms with smaller exponent was good, too.

--
# Edvard Majakari Software Engineer
# PGP PUBLIC KEY available Soli Deo Gloria!

$_ = '456476617264204d616a616b6172692c20612043687269737469616e20'; print

Andrew Dalke

unread,
May 12, 2005, 10:40:22 PM5/12/05
to
Scott David Daniels wrote:
> Again polynomial, not exponential time. Note that there is no
> polynomial time algorithm with (k < 1), since it takes O(n) time
> to read the problem.

Being a stickler (I develop software after all :) quantum computers
can do better than that. For example, Grover's algorithm
http://en.wikipedia.org/wiki/Grover%27s_algorithm
for searching an unsorted list solves the problem in O(N**0.5) time.

Being even more picky, I think the largest N that's been tested
so far is on the order of 5.

Andrew
da...@dalkescientific.com

Facundo Batista

unread,
May 15, 2005, 12:09:16 PM5/15/05
to pytho...@python.org
On 5/9/05, James Stroud <jst...@mbi.ucla.edu> wrote:

> > Is there an easy way to grab the Unique elements from a list?

>>> from sets import Set as set
>>> data = [0.1,0.5,0.6,0.4,0.1,0.5,0.6,0.9]
>>> for x in set(data):
... print x
...
0.5
0.9
0.6
0.4
0.1


. Facundo

Blog: http://www.taniquetil.com.ar/plog/
PyAr: http://www.python.org/ar/

Nyet Ya Koshka

unread,
May 17, 2005, 12:32:18 PM5/17/05
to
> One reasonable solution might be as follows:
>
> def unique_elts(seq):
> elts = {}
> for pos, elt in enumerate(seq):
> elts.setdefault(elt, []).append(pos)
>
> return [ (x, p[0]) for (x, p) in elts.iteritems()
> if len(p) == 1 ]
>

Minor tweak to conserve space:

def bachelor_filter(iter_over_hashables):
B={}
for index, elem in enumerate(iter_over_hashables):
if B.setdefault(elem, index) != index:
B[elem]=None
return [pair for pair in B.items() if pair[1] is not None]

Reply all
Reply to author
Forward
0 new messages