jsaul asked a question about a custom sort, which reminded me of a
problem I often have with sorting.
What is everyones favorite way of sorting descending on a[1], and
ascending on a[0]? (or however you wish, even changing this on
the fly)
I am not a big fan of passing sort() a comparison function, but I
don't see how I could use the Decorate - Sort - Undecorate (DSU)
technique.
Below is how I usually do it, and also a more general method that
generates the function for sort() that lets you choose
the order of the indexes and whether you sort ascending or
descending for each index.
b b b b b b b b h h h h h descending comparison aaaaaaaaaa
"""
import re
t = re.sub(r'[^a-z]+',' ',t.lower()).strip()
d = {}
for w in t.split(' '):
d[w] = d.get(w,0) + 1
def sort_f(a,b):
if a[1] != b[1]:
return -cmp(a[1],b[1])
else:
return cmp(a[0],b[0])
list0 = d.items()
list0.sort(sort_f)
print '* sort by a[1](desc) then a[0]'
for a in list0[:10]: print a
# very clean with lexically nested scopes (Python 2.2)
def make_sort_f(list0):
def f(a,b):
for (i,m) in list0:
if a[i] == b[i]: continue
return m * cmp(a[i],b[i])
return 0
return f
list1 = [ (w,c,len(w)) for (w,c) in d.items() ]
list1.sort(make_sort_f( [(2,-1),(1,-1),(0,1)] ))
print '* sort by a[2](desc) then a[1](desc) then a[0]'
for a in list1[:10]: print a
----------(output)-----------
* sort by a[1](desc) then a[0]
('a', 8)
('b', 8)
('h', 5)
('i', 5)
('sort', 5)
('the', 5)
('of', 4)
('and', 3)
('descending', 3)
('on', 3)
* sort by a[2](desc) then a[1](desc) then a[0]
('descending', 3, 10)
('comparison', 2, 10)
('aaaaaaaaaa', 1, 10)
('undecorate', 1, 10)
('ascending', 2, 9)
('everyones', 1, 9)
('generates', 1, 9)
('technique', 1, 9)
('function', 2, 8)
('changing', 1, 8)
Probably the easiest way would be to convert the strings to their ordinal
values, and for the ascending sort, then negate each ordinal value.
But unless you are having performance problems, the absolute easiest way in
this case is to use a comparison function. That's what they're there for.
DSU is a wonderful thing, but it is not always the best option. Just most of
the time.
Tim Delaney
> What is everyones favorite way of sorting descending on a[1], and
> ascending on a[0]? (or however you wish, even changing this on
> the fly)
>
> I am not a big fan of passing sort() a comparison function, but I
> don't see how I could use the Decorate - Sort - Undecorate (DSU)
> technique.
You can even use a lambda (or just a one-line function) that isn't
_totally_ horrible (I believe a similar pattern is mentioned in the
Cookbook):
>>> L = [(1, 'a'), (2, 'a'), (6, 'b'), (8, 'q'), (2, 'q'), (0, 'z')]
>>> L.sort(lambda x, y: cmp(y[1], x[1]) or cmp(x[0], y[0]))
>>> L
[(0, 'z'), (2, 'q'), (8, 'q'), (6, 'b'), (1, 'a'), (2, 'a')]
The use of or works since the second cmp will only be evaluated if the
first cmp is false, which means it must be zero, or that the values
match. One can obviously also write cmp(y[1], x[1]) as -cmp(x[1],
y[1]), if one considers that more readable.
--
Erik Max Francis / m...@alcyone.com / http://www.alcyone.com/max/
__ San Jose, CA, USA / 37 20 N 121 53 W / &tSftDotIotE
/ \ They make it a desert and call it peace.
\__/ Tacitus
WebVal / http://www.alcyone.com/pyos/webval/
URL scanner, maintainer, and validator in Python.
> t = """
>
> jsaul asked a question about a custom sort, which reminded me of a
> problem I often have with sorting.
>
> What is everyones favorite way of sorting descending on a[1], and
> ascending on a[0]? (or however you wish, even changing this on
> the fly)
>
> I am not a big fan of passing sort() a comparison function, but I
> don't see how I could use the Decorate - Sort - Undecorate (DSU)
> technique.
You could, but you wouldn't necessarily want to, for such specific
needs of mixing ascending and descending sorts. For this task,
there are various approaches with different complication and
performance characteristics -- and even if you do desperately
care about performance, the choice depends on what Python version
you want to run your program with.
N = 100 * 1000
biggie = [ (i//4, str(i)) for i in xrange(N) ]
def asc0desc1(a, b):
return cmp(a[0],b[0]) or cmp(b[1],a[1])
def sortAdByMethod():
result = biggie[:]
result.sort(asc0desc1)
return result
def sortAdByDSUPasses():
temp = [ (it[1],it[0]) for it in biggie ]
temp.sort()
temp.reverse()
temp = [ (temp[i][1], i, temp[i][0]) for i in range(len(temp)) ]
temp.sort()
return [ (a, c) for a, b, c in temp ]
reverser = ''.join([ chr(255-x) for x in range(256) ])
def sortAdByRichDSU():
temp = [ (it[0], it[1].translate(reverser)) for it in biggie ]
temp.sort()
return [ (it[0], it[1].translate(reverser)) for it in temp ]
byMet = sortAdByMethod()
byPas = sortAdByDSUPasses()
byRic = sortAdByRichDSU()
print byMet==byPas, byMet==byRic, byPas==byRic
import time
def timeit(func):
start = time.clock()
func()
stend = time.clock()
return '%.2f %s' % (stend-start, func.__name__)
for func in sortAdByMethod, sortAdByDSUPasses, sortAdByRichDSU:
print timeit(func)
[alex@lancelot alex]$ python2.2 -O twoso.py
1 1 1
3.74 sortAdByMethod
4.17 sortAdByDSUPasses
2.74 sortAdByRichDSU
[alex@lancelot alex]$ python2.3 -O twoso.py
True True True
0.97 sortAdByMethod
2.75 sortAdByDSUPasses
1.88 sortAdByRichDSU
[alex@lancelot alex]$
As you see, pure DSU (with the needed insertion of index to make
the second pass stable -- 2.3's sort IS stable, but you still
need the index to stop items being compared again by the field
already used in the first pass!) is slowest here -- by just a
little in 2.2, by quite a bit in 2.3. The complicated approach
of synthesizing 'decorated strings' whose comparison is the
reverse than the original's, besides lack of generality (quite
serious, btw -- it works here, but it will NOT work when two
strings being compared can have the same prefix but with one
longer than the other!), is only a little faster than the simple
minded "pass a comparison function" in 2.2, and SLOWER than it
in 2.3, by a substantial margin too.
Thus, in practice, I would not suggest DSU for such a mixed
sort unless you *know* that the fields on which you need to sort
in reverse are numbers -- in THAT case, changing sign is easy,
and the DSU approach is simple and fast once again, e.g.:
N = 100 * 1000
biggie = [ (i%4, i//4) for i in xrange(N) ]
def asc0desc1(a, b):
return cmp(a[0],b[0]) or cmp(b[1],a[1])
def sortAdByMethod():
result = biggie[:]
result.sort(asc0desc1)
return result
def sortAdDSU():
temp = [ (it[0],-it[1]) for it in biggie ]
temp.sort()
return [ (it[0],-it[1]) for it in temp ]
byMet = sortAdByMethod()
byDsu = sortAdDSU()
print byMet==byDsu
import time
def timeit(func):
start = time.clock()
func()
stend = time.clock()
return '%.2f %s' % (stend-start, func.__name__)
for func in sortAdByMethod, sortAdDSU:
print timeit(func)
[alex@lancelot alex]$ python2.2 -O twoso.py
1
5.07 sortAdByMethod
2.90 sortAdDSU
[alex@lancelot alex]$ python2.3 -O twoso.py
True
1.39 sortAdByMethod
1.31 sortAdDSU
[alex@lancelot alex]$
Again 2.3's new sort changes things, making the two
approaches just about equivalent even here. But at least
in this case you can adopt DSU with a clear conscience
(AFTER some benchmarking on YOUR specific machine and
problem, of course:-).
BTW, I keep being amazed at how much faster 2.3 is than
2.2 -- sort may be an extreme case, but the timing
ratios seem to be always in 2.3's favor in every case
I've measured, and sometimes substantially so. Let's
hope the release of this speed jewel does not get
delayed by the wish to ALSO tweak a zillion other
things -- the Python community deserves to bask in its
beauteous new speed, and such speed-ups would no doubt
help Python win more converts, too!-).
Alex
>def asc0desc1(a, b):
> return cmp(a[0],b[0]) or cmp(b[1],a[1])
I forgot about this trick, but I think I might stick with
list1.sort(make_sort_f( [(2,-1),(1,-1),(0,1)] ))
in many situations because changing to different sort orders is
trivial, and with the proper sort I usually can continue working with
a much smaller beginning slice of the list, and of course with a
smaller list the speed up is dramatic for all operations.
>[alex@lancelot alex]$ python2.2 -O twoso.py
>1 1 1
>3.74 sortAdByMethod
>4.17 sortAdByDSUPasses
>2.74 sortAdByRichDSU
>[alex@lancelot alex]$ python2.3 -O twoso.py
>True True True
>0.97 sortAdByMethod
>2.75 sortAdByDSUPasses
>1.88 sortAdByRichDSU
>[alex@lancelot alex]$ python2.2 -O twoso.py
>1
>5.07 sortAdByMethod
>2.90 sortAdDSU
>[alex@lancelot alex]$ python2.3 -O twoso.py
>True
>1.39 sortAdByMethod
>1.31 sortAdDSU
Hooray for Python 2.3 and Tim Peters's sort()! I will have to get
used to seeing "True" instead of "1"!
>The complicated approach
>of synthesizing 'decorated strings' whose comparison is the
>reverse than the original's, besides lack of generality (quite
>serious, btw -- it works here, but it will NOT work when two
>strings being compared can have the same prefix but with one
>longer than the other!), is only a little faster than the simple
>minded "pass a comparison function" in 2.2
Yeah, and the strings I often end up working with are file paths,
which would be the worst case scenario for this technique