Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Code snippet: Natural string sorting

2 views
Skip to first unread message

C. Barnes

unread,
Jun 9, 2004, 1:58:04 AM6/9/04
to pytho...@python.org

Summary:

Sorts strings in a way that seems natural to humans.
If the
strings contain integers, then the integers are
ordered
numerically. For example, sorts ['Team 11', 'Team 3',
'Team 1']
into the order ['Team 1', 'Team 3', 'Team 11'].

Code:

#
---------------------------------------------------------
# natsort.py: Natural string sorting.
#
---------------------------------------------------------

# By Seo Sanghyeon. Some changes by Connelly Barnes.

def try_int(s):
"Convert to integer if possible."
try: return int(s)
except: return s

def natsort_key(s):
"Used internally to get a tuple by which s is
sorted."
import re
return map(try_int, re.findall(r'(\d+|\D+)', s))

def natcmp(a, b):
"Natural string comparison, case sensitive."
return cmp(natsort_key(a), natsort_key(b))

def natcasecmp(a, b):
"Natural string comparison, ignores case."
return natcmp(a.lower(), b.lower())

def natsort(seq, cmp=natcmp):
"In-place natural string sort."
seq.sort(cmp)

def natsorted(seq, cmp=natcmp):
"Returns a copy of seq, sorted by natural string
sort."
import copy
temp = copy.copy(seq)
natsort(temp, cmp)
return temp

Examples:

You can use this code to sort tarball filenames:

>>> natsorted(['ver-1.3.12', 'ver-1.3.3', 'ver-1.2.5',
'ver-1.2.15', 'ver-1.2.3', 'ver-1.2.1'])
['ver-1.2.1', 'ver-1.2.3', 'ver-1.2.5', 'ver-1.2.15',
'ver-1.3.3', 'ver-1.3.12']

Chemical elements:

>>> natsorted(['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N',
'C3H6'])
['C1H2', 'C1H4', 'C2H2', 'C2H6', 'C2N', 'C3H6']

Teams:

>>> natsorted(['Team 101', 'Team 58', 'Team 30', 'Team
1'])
['Team 1', 'Team 30', 'Team 58', 'Team 101']

Pass natcasecmp as a second argument for
case-insensitive sorting:

>>> natsorted(['a5', 'A7', 'a15', 'a9', 'A8'],
natcasecmp)
['a5', 'A7', 'A8', 'a9', 'a15']

Enjoy!


Message and source code are in the public domain.




__________________________________
Do you Yahoo!?
Friends. Fun. Try the all-new Yahoo! Messenger.
http://messenger.yahoo.com/

Matteo Dell'Amico

unread,
Jun 9, 2004, 1:27:34 PM6/9/04
to
C. Barnes wrote:
> Summary:
>
> Sorts strings in a way that seems natural to humans.
> If the
> strings contain integers, then the integers are
> ordered
> numerically. For example, sorts ['Team 11', 'Team 3',
> 'Team 1']
> into the order ['Team 1', 'Team 3', 'Team 11'].
[snip]

Nice. Can be written (and, hopefully, made a bit more efficient) using
the Decorate-Sort-Undecorate pattern too:

import re

_expr = re.compile(r'(\d+|\D+)')

def try_int(s):
if s.isdigit():
return int(s)
else:
return s

def nat_key(s):
return [try_int(e) for e in _expr.findall(s)]

def nat_nocase_key(s):
return nat_key(s.lower())

def decsorted(seq, key):
decorated = [(key(item), item) for item in seq]
decorated.sort()
return [item for k, item in decorated]

Then you could use decsorted(seq, nat_key) or decsorted(seq,
nat_nocase_key).

If someone is found of one-liners, he can just write nat_key as

def nat_key(s):
return map(lambda s: s.isdigit() and int(s) or s, _expr.findall(s))

Note that 2.4's sorted() and list.sort() will have built-in support for
decorate-sort-undecorate, so you'll just have to throw nat_key or
nat_nocase_key to them, with no need for decsorted.

--
Ciao,
Matteo

0 new messages