http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/502263
Regards,
Jordan
That looks pretty messy, and it's a quadratic time algorithm (or maybe
worse) because of all the list.index and deletion operations.
This version is also quadratic and passes your test suite, but
might differ in some more complicated cases:
def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if (c not in seen, seen.append(c))[0])
Preferable:
def unique(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))
Wow, nice! Very cool. :)
Regards,
Jordan
I posted this (attributed to you of course) in the comments section
for the recipe.
Any reason not to use a set for the 'seen' variable? Avoids searching
through a linear list. The input order is preserved because the
return value is created in the generator expression, not by using the
seen variable directly.
def unique2(seq, keepstr=True):
t = type(seq)
if t==str:
t = (list, ''.join)[bool(keepstr)]
seen = set()
return t(c for c in seq if not (c in seen or seen.add(c)))
-- Paul
Yes, the sequence can contain non-hashable elements. See the test
vectors for examples.
It's more terse, but my version is built to be faster in the more
common cases of all hashable or/and all sortable items (while working
in other cases too).
Try your unique on an unicode string, that's probably a bug (keepstr
is being ignored).
Version by Paul Rubin is very short, but rather unreadable too.
Bye,
bearophile
Unicode fix (untested):
def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))
Case by case optimization (untested):
def unique(seq, keepstr=True):
t = type(seq)
if t in (unicode, str):
t = (list, t('').join)[bool(keepstr)]
try:
remaining = set(seq)
seen = set()
return t(c for c in seq if (c in remaining and
not remaining.remove(c)))
except TypeError: # hashing didn't work, see if seq is sortable
try:
from itertools import groupby
s = sorted(enumerate(seq),key=lambda (i,v):(v,i))
return t(g.next() for k,g in groupby(s, lambda (i,v): v))
except: # not sortable, use brute force
seen = []
return t(c for c in seq if not (c in seen or seen.append(c)))
I don't have Python 2.4 available right now to try either of the above.
Note that all the schemes fail if seq is some arbitrary iterable,
rather one of the built-in sequence types.
I think these iterator approaches get more readable as one becomes
used to them.
This definitely works. I tried to post a message about this a few
hours ago, but I guess it didn't go through. I've already updated the
recipe and comments.
> I think these iterator approaches get more readable as one becomes
> used to them.
I agree. I understood your code in a matter of seconds.
In your case optimized version, in the second try clause using
itertools, it should be like this, shouldn't it?
return t(g.next()[1] for k,g in groupby(s, lambda (i,v): v))
^^^
Regards,
Jordan
I didn't think so but I can't conveniently test it for now. Maybe
tonight.
After playing with it a bit (only have 2.5 on this box), it looks like
you do need to subscript the next() call. For example, the return from
"unique( [[1], [2]] )" is "[(0, [1]), (1, [2])]". I'm not overly
familiar with the functional paradigm or itertools though, so I might
have missed something.
Regards,
Jordan