Sorts strings in a way that seems natural to humans.
If the
strings contain integers, then the integers are
ordered
numerically. For example, sorts ['Team 11', 'Team 3',
'Team 1']
into the order ['Team 1', 'Team 3', 'Team 11'].
Code:
#
---------------------------------------------------------
# natsort.py: Natural string sorting.
#
---------------------------------------------------------
# By Seo Sanghyeon. Some changes by Connelly Barnes.
def try_int(s):
"Convert to integer if possible."
try: return int(s)
except: return s
def natsort_key(s):
"Used internally to get a tuple by which s is
sorted."
import re
return map(try_int, re.findall(r'(\d+|\D+)', s))
def natcmp(a, b):
"Natural string comparison, case sensitive."
return cmp(natsort_key(a), natsort_key(b))
def natcasecmp(a, b):
"Natural string comparison, ignores case."
return natcmp(a.lower(), b.lower())
def natsort(seq, cmp=natcmp):
"In-place natural string sort."
seq.sort(cmp)
def natsorted(seq, cmp=natcmp):
"Returns a copy of seq, sorted by natural string
sort."
import copy
temp = copy.copy(seq)
natsort(temp, cmp)
return temp
Examples:
You can use this code to sort tarball filenames:
>>> natsorted(['ver-1.3.12', 'ver-1.3.3', 'ver-1.2.5',
'ver-1.2.15', 'ver-1.2.3', 'ver-1.2.1'])
['ver-1.2.1', 'ver-1.2.3', 'ver-1.2.5', 'ver-1.2.15',
'ver-1.3.3', 'ver-1.3.12']
Chemical elements:
>>> natsorted(['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N',
'C3H6'])
['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N', 'C3H6']
Teams:
>>> natsorted(['Team 101', 'Team 58', 'Team 30', 'Team
1'])
['Team 1', 'Team 30', 'Team 58', 'Team 101']
Pass natcasecmp as a second argument for
case-insensitive sorting:
>>> natsorted(['a5', 'A7', 'a15', 'a9', 'A8'],
natcasecmp)
['a5', 'A7', 'A8', 'a9', 'a15']
Enjoy!
Message and source code are in the public domain.
__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/
Nice. Can be written (and, hopefully, made a bit more efficient) using
the Decorate-Sort-Undecorate pattern too:
import re
_expr = re.compile(r'(\d+|\D+)')
def try_int(s):
if s.isdigit():
return int(s)
else:
return s
def nat_key(s):
return [try_int(e) for e in _expr.findall(s)]
def nat_nocase_key(s):
return nat_key(s.lower())
def decsorted(seq, key):
decorated = [(key(item), item) for item in seq]
decorated.sort()
return [item for k, item in decorated]
Then you could use decsorted(seq, nat_key) or decsorted(seq,
nat_nocase_key).
If someone is found of one-liners, he can just write nat_key as
def nat_key(s):
return map(lambda s: s.isdigit() and int(s) or s, _expr.findall(s))
Note that 2.4's sorted() and list.sort() will have built-in support for
decorate-sort-undecorate, so you'll just have to throw nat_key or
nat_nocase_key to them, with no need for decsorted.
--
Ciao,
Matteo