Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Q: sort's key and cmp parameters

16 views
Skip to first unread message

kj

unread,
Oct 1, 2009, 1:08:33 PM10/1/09
to

Challenge: to come up with a sorting task that cannot be achieved
by passing to the sort method (or sorted function) suitable values
for its key and reverse parameters, but instead *require* giving
a value to its cmp parameter.

For example,

from random import random
scrambled = some_list.sort(cmp=lambda x, y: cmp(random(), 0.5))

can be achieved (to a very good approximation at least) with

scrambled = some_list.sort(key=lambda x: random())

Is there a real-life sorting task that requires (or is far more
efficient with) cmp and can't be easily achieved with key and
reverse?

Many thanks in advance,

G.

P.S. I guess that, if sort is going to produce a non-trivial,
*consistent* order, a function CMP passed as the value of its cmp
parameter must satisfy the following properties:

0. CMP(x, y) must be non-zero for some elements x, y of the list;
1. anti-symmetry: sign(CMP(x, y)) must be equal to -sign(CMP(y, x));
2. transitivity: if sign(CMP(x, y)) equals sign(CMP(y, z)), then
sign(CMP(x, z)) must be equal to sign(CMP(x, y)).

In (1) and (2), sign() stands for the function

-1 if x < 0
sign(x) = 0 if x == 0
1 if x > 0

I suppose that all bets are off if these properties are not satisfied,
though the documentation does not make this entirely clear, as far
as I can tell. (If I'm wrong about this alleged omission, please
point me to the part of the docs that I missed.)

Laszlo Nagy

unread,
Oct 1, 2009, 1:51:14 PM10/1/09
to pytho...@python.org
Is this a homework?

> Challenge: to come up with a sorting task that cannot be achieved
> by passing to the sort method (or sorted function) suitable values
> for its key and reverse parameters, but instead *require* giving
> a value to its cmp parameter.
>

Let me put up this question: how do you define the difference between an
object to be sorted in the list, and the passed "suitable value" that is
used for finding the position for the object in the list. If the only
difference is that they need to be different Python objects, then you
can always use [x] instead of x. This value is perfectly suitable. ;-)

If you could give us somewhat stronger requirements regarding keys and
objects, we may give you a different answer.

Best,

Laszlo


Peter Otten

unread,
Oct 1, 2009, 1:51:02 PM10/1/09
to
kj wrote:

> Challenge: to come up with a sorting task that cannot be achieved
> by passing to the sort method (or sorted function) suitable values
> for its key and reverse parameters, but instead *require* giving
> a value to its cmp parameter.
>
> For example,
>
> from random import random
> scrambled = some_list.sort(cmp=lambda x, y: cmp(random(), 0.5))

The above violates the transitivity requirement as stated below. The result
will have a bias.

> can be achieved (to a very good approximation at least) with
>
> scrambled = some_list.sort(key=lambda x: random())
>
> Is there a real-life sorting task that requires (or is far more
> efficient with) cmp and can't be easily achieved with key and
> reverse?

The core developers don't think there is one. list.sort() no longer supports
the cmp parameter in 3.x.

Laszlo Nagy

unread,
Oct 1, 2009, 2:05:21 PM10/1/09
to pytho...@python.org

>> can be achieved (to a very good approximation at least) with
>>
>> scrambled = some_list.sort(key=lambda x: random())
>>
>> Is there a real-life sorting task that requires (or is far more
>> efficient with) cmp and can't be easily achieved with key and
>> reverse?
>>
>
> The core developers don't think there is one. list.sort() no longer supports
> the cmp parameter in 3.x.
>
LOL :-D

BTW if you really want to sort a list then you can. Using keys or
values, it doesn't matter. The op's question has no practical usage.
REALLY looks like a homework.

L

Carsten Haese

unread,
Oct 1, 2009, 2:26:41 PM10/1/09
to pytho...@python.org
kj wrote:
>
> Challenge: to come up with a sorting task that cannot be achieved
> by passing to the sort method (or sorted function) suitable values
> for its key and reverse parameters, but instead *require* giving
> a value to its cmp parameter.

Such a task can't exist, because any arbitrary comparison function can
be transformed into a key function. The idea behind the transformation
is to construct wrapper objects that can compare themselves to other
wrapper objects by invoking the given comparison function on their
wrapped originals.

Google for "Hettinger cmp2key" if you want to see the code that does
this transformation.

HTH,

--
Carsten Haese
http://informixdb.sourceforge.net

Ethan Furman

unread,
Oct 1, 2009, 2:21:38 PM10/1/09
to pytho...@python.org
Laszlo Nagy wrote:
>
>>> can be achieved (to a very good approximation at least) with
>>>
>>> scrambled = some_list.sort(key=lambda x: random())
>>>
>>> Is there a real-life sorting task that requires (or is far more
>>> efficient with) cmp and can't be easily achieved with key and
>>> reverse?
>>>
>>
>>
>> The core developers don't think there is one. list.sort() no longer
>> supports the cmp parameter in 3.x.
>>
>
> LOL :-D
>
> BTW if you really want to sort a list then you can. Using keys or
> values, it doesn't matter. The op's question has no practical usage.
> REALLY looks like a homework.
>
> L
>

If it is, it's one he's contemplating giving his students. :-D

~Ethan~

Paul Rubin

unread,
Oct 1, 2009, 2:41:22 PM10/1/09
to
kj <no.e...@please.post> writes:
> Is there a real-life sorting task that requires (or is far more
> efficient with) cmp and can't be easily achieved with key and
> reverse?

Yes, think of sorting tree structures where you have to recursively
compare them til you find an unequal pair of nodes. To sort with
key=... you have to wrap a class instance around each tree, with
special comparison methods. Blecch.

Bearophile

unread,
Oct 1, 2009, 3:39:23 PM10/1/09
to
Paul Rubin:

> Yes, think of sorting tree structures where you have to recursively
> compare them til you find an unequal pair of nodes.

That's cute. In what situations do you have to perform such kind of
sort?

Bye,
bearophile

Duncan Booth

unread,
Oct 1, 2009, 3:55:03 PM10/1/09
to
kj <no.e...@please.post> wrote:

> Is there a real-life sorting task that requires (or is far more
> efficient with) cmp and can't be easily achieved with key and
> reverse?
>

There is no sorting task that *requires* cmp. If all else fails you can
define a key class to wrap the original wrapper such that comparing the
keys simply calls the comparison function that you *would* have used.

This has been discussed before and if you google sufficiently you should
find the code that has been posted to do exactly that.

Raymond Hettinger

unread,
Oct 1, 2009, 4:54:35 PM10/1/09
to
On Oct 1, 10:08 am, kj <no.em...@please.post> wrote:
> Challenge: to come up with a sorting task that cannot be achieved
> by passing to the sort method (or sorted function) suitable values
> for its key and reverse parameters, but instead *require* giving
> a value to its cmp parameter.

If you're assuming a consistent sort-order (transitivity, not
evolving over time, etc), then the cmp method and key method
are mathematically equivalent (you could always do a compare
sort first, record the order produced, and assign the position
number as a key function):

# constructive proof of the existence of a key function
# corresponding to a given cmp function
tmplist = sorted(s, cmp=mycmp)
pos = dict(zip(map(id, tmplist), range(len(tmplist))))
result = sorted(s, key=lambda x: pos[id(x)])
assert tmplist == result

Given equivalence, what is really at stake is convenience.

If you're starting point is a cmp function (for instance,
asking a dating service member whether they prefer
mate x to mate y), then having to convert to a key function
can be inconvenient.

If you need to sort by an ascending primary key and a descending
secondary key, you may find it easiest to sort in two passes
(taking advantage of guaranteed sort stability):

sorted(s, key=secondary, reversed=True)
sorted(s, key=primary)

With a cmp function, the above could be achieved in a single pass:

def mycmp(x, y):
p = cmp(primary(a), primary(b))
return p if p else cmp(secondary(b), secondary(a))

sorted(s, cmp=mycmp)

Also, as Paul pointed-out, there are some structures such as tree
comparisons that may be challenging to express directly as a key
function. As Carsten pointed-out, the cmp2key function allows
cmp functions to trivially (though indirectly) be recast as key
functions.

All that being said, for many everyday uses, a key function is
simpler to write and faster to run than a cmp function approach.


Raymond

kj

unread,
Oct 1, 2009, 6:05:24 PM10/1/09
to
In <ec18cfe1-e122-499e...@t11g2000prh.googlegroups.com> Raymond Hettinger <pyt...@rcn.com> writes:

<snip>

Thanks for an extremely helpful reply!

>If you need to sort by an ascending primary key and a descending
>secondary key, you may find it easiest to sort in two passes
>(taking advantage of guaranteed sort stability):

> sorted(s, key=secondary, reversed=3DTrue)
> sorted(s, key=primary)

In the special case where the value returned by secondary is
numeric, I suppose one could do this in one go with

sorted(s, key=lambda x: (primary(x), -secondary(x)))

...but I can't think of a way to generalize this...

kj

kj

unread,
Oct 1, 2009, 6:16:13 PM10/1/09
to


Good point. This example convinces me that it was a bad idea to
get rid of cmp in Python 3, even if situations like this one are
rare. With the cmp parameter as an option, coding this type of
sort was accessible even to those with a rudementary knowledge of
Python. Now one needs to be pretty clever and pretty skilled with
Python to figure this one out... Granted, anyone who needs to
perform such sorts is probably smart enough to handle the required
solution, but not necessarily very proficient with Python. Besides,
why make something that was relatively straightforward before become
an exercise in cleverness? I wonder what was gained by eliminating
the cmp option...

kj

Paul Rubin

unread,
Oct 1, 2009, 7:16:52 PM10/1/09
to
Duncan Booth <duncan...@invalid.invalid> writes:
> > Is there a real-life sorting task that requires (or is far more
> > efficient with) cmp and can't be easily achieved with key and reverse?
> >
> There is no sorting task that *requires* cmp. If all else fails you can
> define a key class to wrap the original wrapper such that comparing the
> keys simply calls the comparison function that you *would* have used.

I would count that as key being far less efficient, though still
giving the same result.

Paul Rubin

unread,
Oct 1, 2009, 7:19:05 PM10/1/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> If you're assuming a consistent sort-order (transitivity, not
> evolving over time, etc), then the cmp method and key method
> are mathematically equivalent (you could always do a compare
> sort first, record the order produced, and assign the position
> number as a key function)

But that is an efficiency hit.

> If you're starting point is a cmp function (for instance,
> asking a dating service member whether they prefer
> mate x to mate y), then having to convert to a key function
> can be inconvenient.

Well, the issue there is that the comparison function may not be
transitive. Maybe you really want a topological sort for that.

> All that being said, for many everyday uses, a key function is
> simpler to write and faster to run than a cmp function approach.

I still have never understood why cmp was removed. Sure, key is more
convenient a lot (or maybe most) of the time, but it's not always.

alex23

unread,
Oct 1, 2009, 8:00:17 PM10/1/09
to
kj <no.em...@please.post> wrote:
> This example convinces me that it was a bad idea to
> get rid of cmp in Python 3, even if situations like this one are
> rare.

It sounds like the entire point of this exercise was to get other
people to confirm your bias for you.

kj

unread,
Oct 1, 2009, 10:13:29 PM10/1/09
to

The only problem with this hypothesis is that my original bias was
exactly the opposite to the one you quote: when I sent my original
post I thought that cmp was in fact useless (after all I could not
think of a situation that required it or even made it preferable),
and was not even aware that it had been dropped in Python 3. Paul
Rubin's post convinced me otherwise.

BTW, what's with this business of ascribing underhanded motives to
me? Earlier some other clown alleged that that my original post
was homework??? WTF?

kj

Paul Rubin

unread,
Oct 1, 2009, 10:21:12 PM10/1/09
to
Bearophile <bearoph...@lycos.com> writes:
> > Yes, think of sorting tree structures where you have to recursively
> > compare them til you find an unequal pair of nodes.
>
> That's cute. In what situations do you have to perform such kind of
> sort?

It came up in a search engine application I've been involved with.

Duncan Booth

unread,
Oct 2, 2009, 7:35:02 AM10/2/09
to

You'll notice I carefully didn't comment on the efficiency. However,
without testing it I'm not convinced that it would be 'far less' efficient.

Using cmp2key you have an overhead of one class per sort plus one instance
for each element being sorted, and an additional one function call for each
comparison. The only example given so far that would justify needing this
(sorting trees which require recursive comparison) by its very nature would
make both of these additional overheads insignificant as each element
already contains multiple instances and each comparison contains multiple
function calls.

Scott David Daniels

unread,
Oct 2, 2009, 10:38:07 AM10/2/09
to
Paul Rubin wrote:
> I still have never understood why cmp was removed. Sure, key is more
> convenient a lot (or maybe most) of the time, but it's not always.

Not just more convenient. cmp will always be N log N, in that _every_
comparison runs your function, while key is linear, in that it is run
once per element. Most cases are moreeasily done with key, and it is
a good idea to make the most accessible way to a sort be the most
efficient one. In the rare case that you really want each comparison,
the cmp-injection function will do nicely (and can be written as a
recipe.

In short, make the easy path the fast path, and more will use it;
provide two ways, and the first that springs to mind is the one
used.

--Scott David Daniels
Scott....@Acm.Org

Paul Rubin

unread,
Oct 2, 2009, 4:49:34 PM10/2/09
to
Scott David Daniels <Scott....@Acm.Org> writes:
> Most cases are moreeasily done with key, and it is
> a good idea to make the most accessible way to a sort be the most
> efficient one. In the rare case that you really want each comparison,
> the cmp-injection function will do nicely (and can be written as a
> recipe.

I don't think wrapping the sorted objects in an otherwise useless
special purpose class is "nicely", either from a performance or from a
code verbosity point of view. I avoid Java and its useless extra
classes for a reason ;-).

> In short, make the easy path the fast path, and more will use it;
> provide two ways, and the first that springs to mind is the one
> used.

I think we are saying the same thing. Python 2.x provides two ways
and you can use whichever one fits the application better. I have
never understood why Python 3.x finds it necessary to break one of
them. Maybe I can migrate to Haskell by the time Python 2.x becomes
deprecated.

Raymond Hettinger

unread,
Oct 2, 2009, 8:38:14 PM10/2/09
to
[Paul Rubin]

> Yes, think of sorting tree structures where you have to recursively
> compare them til you find an unequal pair of nodes.  

I'm not sure what you mean by this. What are the semantics of
sorting a tree? Can you outline an example of tree that
could be sorted easily with a cmp function but not a key function?

t = [[9,6],[7,1]],[[2,5],[4,3]] # inputs
t.sort(mycmp) # what would the cmp function be?
print t # what would the output be


Raymond

Paul Rubin

unread,
Oct 3, 2009, 12:11:22 AM10/3/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> I'm not sure what you mean by this. What are the semantics of
> sorting a tree? Can you outline an example of tree that
> could be sorted easily with a cmp function but not a key function?

The idea was that you have a list of trees that you want to sort, and
an ordering relation between trees:

def gt(tree1, tree2): ...

where you recursively descend both trees until you find an unequal
pair of nodes. You're not trying to sort the nodes within a single
tree.

Raymond Hettinger

unread,
Oct 3, 2009, 2:16:25 AM10/3/09
to
[Paul Rubin]

> The idea was that you have a list of trees that you want to sort, and
> an ordering relation between trees:
>
>    def gt(tree1, tree2): ...
>
> where you recursively descend both trees until you find an unequal
> pair of nodes.  You're not trying to sort the nodes within a single
> tree.

Can you give an example of a list of trees and a cmp function
that recursively compares them?


Raymond

Raymond Hettinger

unread,
Oct 3, 2009, 2:25:59 AM10/3/09
to
[Paul Rubin]

> The idea was that you have a list of trees that you want to sort, and
> an ordering relation between trees:
>
>    def gt(tree1, tree2): ...

Are the trees user defined classes? Can the gt() function be added
incorporated as __lt__ method so that you can just run a plain sort:

sort(list_of_trees)

Is the recursive search order something you can turn into a straight
sequence:

sort(list_of_trees, key=flattener)

IOW, if there is an ordering relation between the trees, why can't
it be either part of the tree API or collapsable into a list of
successive nodes to be compared.

From the sound of it, the trees are static during the sort and
would get a nice O(n log n) --> O(n) speed-up if a key function
were allowed to flatten them in a single pass.


Raymond

Paul Rubin

unread,
Oct 3, 2009, 3:22:26 AM10/3/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> Can you give an example of a list of trees and a cmp function
> that recursively compares them?

Example of list of trees (nested dicts). In practice you could get
such a list from the simplejson module:

list_of_trees = [{'value':1, 'left':{'value':3,'left':None,'right':None},
'right':{'value':7,'left':{'value':5, ...}}},
{'value':19, 'left':{'value':23', ...}},
...
]

Example comparison function:

def compare(tree1, tree2):
c = cmp(tree1['value'], tree2['value'])
if c != 0: return c
c = cmp(tree1['left'], tree2['left'])
if c != 0: return c
return cmp(tree1['right'], tree2['right])

> Are the trees user defined classes?

Not in general. They might be nested tuples, lists, or dictionaries.
Or they could come from a non-extensible library-defined class, like
from cElementTree.

> From the sound of it, the trees are static during the sort and
> would get a nice O(n log n) --> O(n) speed-up if a key function
> were allowed to flatten them in a single pass.

But the key function has to do all those comparisons on the internal
nodes.

Paul Rubin

unread,
Oct 3, 2009, 3:28:36 AM10/3/09
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:
> Example comparison function:
>
> def compare(tree1, tree2):
> c = cmp(tree1['value'], tree2['value'])
> if c != 0: return c
> c = cmp(tree1['left'], tree2['left'])
> if c != 0: return c
> return cmp(tree1['right'], tree2['right])

Sorry, meant recursive comparison.

def compare(tree1, tree2):
c = cmp(tree1['value'], tree2['value'])
if c != 0: return c

c = compare(tree1['left'], tree2['left'])


if c != 0: return c

return compare(tree1['right'], tree2['right])

Paul Rubin

unread,
Oct 3, 2009, 5:39:36 AM10/3/09
to
Paul Rubin <http://phr...@NOSPAM.invalid> writes:
> c = compare(tree1['left'], tree2['left'])

Of course this recursive call crashes if either branch is None.
Oh well, I'll stop trying to correct it since I'm sure you get the idea.

kj

unread,
Oct 3, 2009, 11:54:44 AM10/3/09
to
In <7xtyyhi...@ruckus.brouhaha.com> Paul Rubin <http://phr...@NOSPAM.invalid> writes:

>Python 2.x provides two ways
>and you can use whichever one fits the application better. I have
>never understood why Python 3.x finds it necessary to break one of
>them. Maybe I can migrate to Haskell by the time Python 2.x becomes
>deprecated.

!!!

Maybe Haskell is much handier than I give it credit for, but it's
hard for me to imagine that it is as convenient as Python 3, even
without the cmp sort option... (And I agree with you that getting
rid of sort's cmp in Python 3 was a bad idea.)

What's going on here? Our lab recently hired a new postdoc who,
to our disbelief, works almost exclusively in OCaml. And I hear
all this talk about switching to Haskell or Scheme. I don't get
it. Despite the elegance of these languages, the libraries are
not there. It seems to me it would take forever to get the simplest
things done in these languages...

Confused.

kj

Paul Rubin

unread,
Oct 3, 2009, 4:53:32 PM10/3/09
to
kj <no.e...@please.post> writes:
> !!!
>
> Maybe Haskell is much handier than I give it credit for, but it's
> hard for me to imagine that it is as convenient as Python 3, even
> without the cmp sort option...

Heh, yeah, I was being a bit snarky/grouchy. Haskell has a very steep
learning curve and will never be as convenient as Python for banging
out some small script. It's worth considering for larger or more
serious programs.

> What's going on here? Our lab recently hired a new postdoc who,
> to our disbelief, works almost exclusively in OCaml. And I hear
> all this talk about switching to Haskell or Scheme. I don't get
> it. Despite the elegance of these languages, the libraries are
> not there. It seems to me it would take forever to get the simplest
> things done in these languages...

Haskell's library is growing very rapidly, more so than Python's I'd
say. Take a look at

http://donsbot.wordpress.com/2009/03/16/visualising-the-haskell-universe/

if you're willing to count Hackage (sort of the equivalent of the
Python cheese shop). The Haskell Platform (counterpart to Python
stdlib) is also very actively expanding.

Ocaml and Scheme both seem to me to be sort of stagnant. Scheme is an
elegant fossil. Some people find Ocaml to be at a sweet spot,
combining Python's convenience and the more important aspects of
Haskell's expressiveness. I haven't used it myself. It seems to me
that Haskell is attracting all the most advanced development
attention.

Raymond Hettinger

unread,
Oct 4, 2009, 10:01:42 PM10/4/09
to
[Paul Rubin]

> Example of list of trees (nested dicts). In practice you could get
> such a list from the simplejson module:
>
> list_of_trees = [{'value':1, 'left':{'value':3,'left':None,'right':None},
> 'right':{'value':7,'left':{'value':5, ...}}},
> {'value':19, 'left':{'value':23', ...}},
> ...
> ]

So, it looks like the relevant comparison values can be stored in
nested lists:

list_of_lists = \
[[1, [3, [], []],
[7, [5, [], []], []]],
[19, [23, [], []],
[]],
]

Here I've used a transformation where a tree is stored as:

[tree['value'], tree['left'], tree['right']]

and the None value indicating an empty tree transformed to:

[]

> > Example comparison function:
>
>    def compare(tree1, tree2):
>       c = cmp(tree1['value'], tree2['value'])
>       if c != 0: return c

>       c = compare(tree1['left'], tree2['left'])
>       if c != 0: return c
>       return compare(tree1['right'], tree2['right])

Hmm, that recursive comparison order looks like how Python
already recursively compares lists. So, perhaps the
lists can be compared directly:

if list_of_lists[0] < list_of_lists[1]:
...


>> Are the trees user defined classes?
>
> Not in general. They might be nested tuples, lists, or dictionaries.
> Or they could come from a non-extensible library-defined class, like
> from cElementTree

Are there any of these trees that cannot be transformed
(by a key function) into an equivalent nested list of lists
and values so that the trees can be directly compared to one another
using Python's normal built-in recursive comparison order for lists?


Raymond

Paul Rubin

unread,
Oct 5, 2009, 6:10:01 AM10/5/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> So, it looks like the relevant comparison values can be stored in
> nested lists:
>
> list_of_lists = \
> [[1, [3, [], []],
> [7, [5, [], []], []]],
> [19, [23, [], []],
> []],
> ]

Now you've copied all the data out of the original tree, which is even
worse than putting a class wrapper around it. The comparison remains
(asymptotically) as expensive as before, too.

> Are there any of these trees that cannot be transformed
> (by a key function) into an equivalent nested list of lists
> and values so that the trees can be directly compared to one another
> using Python's normal built-in recursive comparison order for lists?

I think so. Say you want to change the numeric comparisons so that
even numbers always sort before odd numbers, ie.
-4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...
I don't think there is a way to re-encode the numbers so they
can be compared with direct integer comparison. Even if there is,
I think there are reasonable orderings on the trees themselves that
can't possibly be embedded in the standard python comparison order.
I might be able to construct an example using ordinal diagrams from
proof theory.

Raymond Hettinger

unread,
Oct 5, 2009, 3:48:12 PM10/5/09
to

> > So, it looks like the relevant comparison values can be stored in
> > nested lists:
>
> > list_of_lists = \
> > [[1, [3, [], []],
> > [7, [5, [], []], []]],
> > [19, [23, [], []],
> > []],
> > ]
>
> Now you've copied all the data out of the original tree, which is even
> worse than putting a class wrapper around it. The comparison remains
> (asymptotically) as expensive as before, too.

FWIW, you could also just flatten it to: [(1,3,7,5), (19,23)].

The point is that the tree comparisons you presented have a
predetermined
order of values to compare, so they either be recast a flattened list
of comparison values or the tree itself can be cast in a list of lists
form.
Either way, the O(n log n) step of doing the actual comparisons runs
much
faster than if you coded a recursive cmp function written in pure
python.


> Say you want to change the numeric comparisons so that
> even numbers always sort before odd numbers, ie.
> -4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...

This is too easy:

>>> s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
>>> s.sort()
>>> s.sort(key=lambda x: x%2)
>>> s
[-4, -2, 0, 2, 4, -3, -1, 1, 3]

The use cases you've presented so far aren't convincing,
but I'm sure that if you keep making-up random, weird use cases,
there will be one where a cmp function is a better approach.


> I think there are reasonable orderings on the trees themselves that
> can't possibly be embedded in the standard python comparison order.
> I might be able to construct an example using ordinal diagrams from
> proof theory.

As long as the values of a tree get compared in a predetermined order,
there will always be a flattened list equivalent that works faster
using a key function. If you're going to concoct something isn't
easily transformable to a key function, I think your best bet is to
create a comparison where the trees have an interaction other than
comparing values at identical node positions; otherwise, the tree
shape hasn't made the problem any more difficult than sorting a list
of successive values. The concocted comparison function needs to
exploit some aspect of the tree shape than can't be precomputed
in a key function (perhaps involving some complex interaction between
the two compared items like a tree merge or somesuch).

Instead of trying to create a hard comparison from first principles,
there may already be some research on the subject in the SQL world.
It seems to me that SQL's ORDER BY clauses are essentially just
key functions, so maybe there are known cases where a query cannot
be sorted with just the tools provided by SQL.


Raymond

P.S. I accept than you hate sorting in Py3.x. There's no need to
convince me of that. I was just curious about your one real-world
use case and whether I could find a straight-forward key function
that would get the job done.

Paul Rubin

unread,
Oct 5, 2009, 8:27:19 PM10/5/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> FWIW, you could also just flatten it to: [(1,3,7,5), (19,23)].
>
> The point is that the tree comparisons you presented have a
> predetermined order of values to compare, so they either be recast a
> flattened list of comparison values or the tree itself can be cast
> in a list of lists form. Either way, the O(n log n) step of doing
> the actual comparisons runs much faster than if you coded a
> recursive cmp function written in pure python.

Possibly so, but asymptotically it's the same, and anyway

> > Say you want to change the numeric comparisons so that
> > even numbers always sort before odd numbers, ie.
> > -4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...
>
> This is too easy:
>
> >>> s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
> >>> s.sort()
> >>> s.sort(key=lambda x: x%2)
> >>> s
> [-4, -2, 0, 2, 4, -3, -1, 1, 3]

I don't think so:

>>> s = [0, 2, -2, 3, 1, -1, 4, -4, -3]
>>> s.sort(key=lambda x:x%2)
>>> s
[0, 2, -2, 4, -4, 3, 1, -1, -3]

s.sort(key=lambda x:(x%2,x)) might work but why are you so opposed
to being able to write one's natural thought pattern if that pattern
happens to be a comparison function? Who wants to concoct these
ad-hoc encodings for every ordering?

> As long as the values of a tree get compared in a predetermined order,
> there will always be a flattened list equivalent that works faster
> using a key function. If you're going to concoct something isn't
> easily transformable to a key function, I think your best bet is to
> create a comparison where the trees have an interaction other than
> comparing values at identical node positions;

How about this: gen_e, gen_pi, gen_sqrt2, etc. all generate
representations of various real numbers as possibly infinite sequences
of digits:

def gen_pi():
yield 3
yield 1
yield 4
# ... compute infinite stream one digit at a time

Comparison is obvious if (hmmm....) you can guarantee that you're
sorting unique elements and that the sort function won't compare an
element with itself (otherwise I guess you could cut off comparison at
a million digits or whatever):

def cmp(gen1, gen2):
for d,e in izip(gen1(), gen2()):
c = cmp(d,e)
if c: return c

I don't see any clean way to handle that with key=, though of course
maybe I'm missing something.

> P.S. I accept than you hate sorting in Py3.x. There's no need to
> convince me of that. I was just curious about your one real-world
> use case and whether I could find a straight-forward key function
> that would get the job done.

I don't hate sorting in py3.x. I understand the concept that py3.x
introduces incompatibilities to make things better. What I don't like
is the approach of breaking stuff gratutiously with no benefit. If
the positional arg for cmp is a kludge, why not switch it to a keyword
arg instead of eliminating it?

I don't subscribe to the cult of the real-world use case. I'm not
smart enough to anticipate every way anyone might ever want to use
code that I write. So I try to identify the obvious "basis vectors"
of sensible functionality, implement those, and make sure that the
entire space of combinations works the way it should, without worrying
about whether anyone will care about any specific combination. There
is no real-world use case for the constant assignment

a = 0x36b20b32c927eaf68bb40d87d251049db98da0c03a0e8c791f04ee486ec2f7ee

which I'm sure nobody has ever seen before (I just made it from
os.urandom). But if Python somehow failed to parse that number (and
only that number), that would be considered an unacceptable bug,
because a straightforward combination of primitives had failed to work
as it should.

Any book on sorting spends most of its time on comparison sorting,
which is the primitive operation. key= is very useful but it is
a derived operation, not the other way around.

Raymond Hettinger

unread,
Oct 5, 2009, 9:48:59 PM10/5/09
to
> > >  Say you want to change the numeric comparisons so that
> > > even numbers always sort before odd numbers, ie.
> > >     -4 < -2 < 0 < 2 < 4 < ... < -999 < ... -1 < 1 < 3 ...
>
> > This is too easy:
>
> >     >>> s = [-2, 4, 2, -1, -3, 1, -4, 0, 3]
> >     >>> s.sort()
> >     >>> s.sort(key=lambda x: x%2)
> >     >>> s
> >     [-4, -2, 0, 2, 4, -3, -1, 1, 3]
>
> I don't think so:
>
>     >>> s = [0, 2, -2, 3, 1, -1, 4, -4, -3]
>     >>> s.sort(key=lambda x:x%2)
>     >>> s
>     [0, 2, -2, 4, -4, 3, 1, -1, -3]

There were two sorts in my post and only one in yours.
That's why you didn't get the same answer.


> s.sort(key=lambda x:(x%2,x)) might work but why are you so opposed
> to being able to write one's natural thought pattern if that pattern
> happens to be a comparison function?  Who wants to concoct these
> ad-hoc encodings for every ordering?

Your (x%2, x) idea seems like a good way to do it in a single pass.
Also, it seems to be a natural translation of the problem statement,
"even numbers always sort before odd numbers", into a primary key
and secondary sort key.


> How about this: gen_e, gen_pi, gen_sqrt2, etc. all generate
> representations of various real numbers as possibly infinite sequences
> of digits:
>
>    def gen_pi():
>       yield 3
>       yield 1
>       yield 4
>       # ... compute infinite stream one digit at a time
>
> Comparison is obvious if (hmmm....) you can guarantee that you're
> sorting unique elements and that the sort function won't compare an
> element with itself (otherwise I guess you could cut off comparison at
> a million digits or whatever):
>
>    def cmp(gen1, gen2):
>      for d,e in izip(gen1(), gen2()):
>         c = cmp(d,e)
>         if c: return c
>
> I don't see any clean way to handle that with key=, though of course
> maybe I'm missing something.  

Right. I would be forced to use a Cmp2Key wrapper for this one.

If sorting lazily evaluated infinite sequences is a common use
case for you, then you're going to love Haskell ;-)


> I don't hate sorting in py3.x.

That's good to hear.


>  I understand the concept that py3.x
> introduces incompatibilities to make things better.  What I don't like
> is the approach of breaking stuff gratutiously with no benefit.  If
> the positional arg for cmp is a kludge, why not switch it to a keyword
> arg instead of eliminating it?

When Guido made the final call, I'm sure he was balancing
a number of goals including language simplification and
one-way-to-do-it. I'm also sure that sorting lazily
evaluated infinite sequences was not on his radar screen :-)


>  key= is very useful but it is
> a derived operation, not the other way around.

As you showed with infinite sequence example, it may well
be the case that cmp is more general and can more readily
handle an exotic case.

Psychologically, the thing that I find to be interesting is
that beginners and intermediate users seem to take to key
functions more readily than old timers. The key function seems
to be an easy thing to teach (perhaps because that's the
way Excel sorts and the way SQL sorts, or perhaps it is because
problem statements seem to arise in the form of "sort by this,
then by that" instead of "here's how two objects should be
compared").

In contrast, some people who have have had deep experience with
cmp functions may tend to first think of cmp solutions and then
have a harder time seeing solutions with key functions. If you
grew-up on C's qsort() like I did, then a key function may not
be the first thing that pops into your head.

One other interesting data point: Python uses key functions
in min(), max(), heapq.nsmallest(), heapq.nlargest, and
itertools.groupby(). Those functions never supported a
cmp function argument, nor has one ever been requested.


Raymond

Paul Rubin

unread,
Oct 5, 2009, 10:34:27 PM10/5/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> There were two sorts in my post and only one in yours.
> That's why you didn't get the same answer.

Whoops, missed that.

> When Guido made the final call, I'm sure he was balancing
> a number of goals including language simplification and
> one-way-to-do-it. I'm also sure that sorting lazily
> evaluated infinite sequences was not on his radar screen :-)

I can't help wondering if there will be feature requests for a cmp
argument for the rest of eternity until it eventually makes its way
back in.

> Psychologically, the thing that I find to be interesting is
> that beginners and intermediate users seem to take to key
> functions more readily than old timers.

I think key is preferable in at least 90% of the cases. I also think
a general purpose language should be able to handle 100% of the cases,
so if key is all you have, there can be a gap to fill.

> In contrast, some people who have have had deep experience with
> cmp functions may tend to first think of cmp solutions and then
> have a harder time seeing solutions with key functions. If you
> grew-up on C's qsort() like I did, then a key function may not
> be the first thing that pops into your head.

That could be.

> One other interesting data point: Python uses key functions
> in min(), max(), heapq.nsmallest(), heapq.nlargest, and
> itertools.groupby(). Those functions never supported a
> cmp function argument, nor has one ever been requested.

Interesting. Maybe someday...

Steven D'Aprano

unread,
Oct 5, 2009, 11:51:52 PM10/5/09
to
On Mon, 05 Oct 2009 19:34:27 -0700, Paul Rubin wrote:

> Raymond Hettinger <pyt...@rcn.com> writes:

>> When Guido made the final call, I'm sure he was balancing a number of
>> goals including language simplification and one-way-to-do-it. I'm also
>> sure that sorting lazily evaluated infinite sequences was not on his
>> radar screen :-)
>
> I can't help wondering if there will be feature requests for a cmp
> argument for the rest of eternity until it eventually makes its way back
> in.

When I learned that cmp was being dropped, I was horrified. That lasted
for about fifteen minutes of time actually using key :)

Occasionally I come across sorting problems where it isn't obvious how to
write the key function, but I've never come across one where it wasn't
possible at all. I no longer miss cmp.


>> Psychologically, the thing that I find to be interesting is that
>> beginners and intermediate users seem to take to key functions more
>> readily than old timers.
>
> I think key is preferable in at least 90% of the cases. I also think a
> general purpose language should be able to handle 100% of the cases, so
> if key is all you have, there can be a gap to fill.

I don't think it's possible to enumerate 100% of the cases, let alone
support them.

Here's one thing I've sometimes wanted to do: sort multiple lists at
once, according to the contents of one of them, with a single call. You
can get the same result with a key function and sorting each list
separately, but if you've got a lot of large lists, that becomes
expensive.

Python's sort is a memory-sort -- you can't sort a list too big to fit
into memory. Sorting 400GB of data using a single function call is not
something I see as fundamental to a general purpose language (although of
course you should be able to write a disk-sort utility using Python).

Nor does sort() support the case where the value of the comparisons
differs according to where the elements are in the list, e.g. given:

[a, b, c, d, a, b]

the result of comparing a and b at the start of the list is different
from comparing them at the end of the list. I don't think there's an
actual use-case for this feature, and cmp would have just as much trouble
solving this (pretend) problem as key does, but I don't think languages
need to supply this as a fundamental operation. If you need it, write
your own sort function :)

Which of course would be a solution to your problem. The source code for
timsort is available, and stable. You could fork it, re-enable the
support for comparison functions, and put it on the Cheeseshop. If there
is sufficient interest, maybe you'll get cmp support re-added to the
built-in in a version or two.

--
Steven

Bearophile

unread,
Oct 6, 2009, 8:36:47 AM10/6/09
to
Raymond Hettinger:

> Psychologically, the thing that I find to be interesting is
> that beginners and intermediate users seem to take to key
> functions more readily than old timers.  The key function seems
> to be an easy thing to teach (perhaps because that's the
> way Excel sorts and the way SQL sorts, or perhaps it is because
> problem statements seem to arise in the form of "sort by this,
> then by that" instead of "here's how two objects should be
> compared").
>
> In contrast, some people who have have had deep experience with
> cmp functions may tend to first think of cmp solutions and then
> have a harder time seeing solutions with key functions.  If you
> grew-up on C's qsort() like I did, then a key function may not
> be the first thing that pops into your head.

I love the 'key', it makes my code simpler and it's simpler to
understand. I am not opposed to the idea of keeping cmp, that in some
rare cases may be useful or more natural.

The problem is that if you allow to use the cmp, lot of programmers
will use it straight away, not even bothering to know what that
strange 'key' argument may be useful for. And they miss the how much
handy 'key' is.

I am having a very hard type (despite my implementation, explanations,
etc) explaining to D programmers why for a flexible general-purpose
function a key function argument is often better. They in the end have
added something like that in the std lib, but with a weird name like
SchwartzSort that's surely not the first standard sort they look
for... this is bad.

So a funny solution to this situation may be the following: to keep
only 'key' in the sort/sorted for few years, so the community of
Python programmers gets used to 'key' only, and then put 'cmp' back
in, for the special cases ;-)

Bye,
bearophile

Paul Rubin

unread,
Oct 6, 2009, 2:40:46 PM10/6/09
to
Bearophile <bearoph...@lycos.com> writes:
> The problem is that if you allow to use the cmp, lot of programmers
> will use it straight away, not even bothering to know what that
> strange 'key' argument may be useful for. And they miss the how much
> handy 'key' is.

Given how often we hear "consenting adults" as justification for any
number of gaps in Python error checking, the argument above is
singularly unpersuasive.

> I am having a very hard type (despite my implementation, explanations,
> etc) explaining to D programmers why for a flexible general-purpose
> function a key function argument is often better. They in the end have
> added something like that in the std lib, but with a weird name like
> SchwartzSort that's surely not the first standard sort they look
> for... this is bad.

"key" and the DSU (Schwartz) sorting pattern makes lots of sense in
scripting languages like Python that are primarily used on relatively
small data sets. The best reason for writing something in a lower
level language like D is because you want to push the limits of your
availiable machine resources. Therefore, the DSU strategy's extra
memory consumption will be what you want less of the time than it is
in Python.

Raymond Hettinger

unread,
Oct 6, 2009, 3:28:00 PM10/6/09
to
[bearophile]

> I love the 'key', it makes my code simpler and it's simpler to
> understand. I am not opposed to the idea of keeping cmp, that in some
> rare cases may be useful or more natural.
>
> The problem is that if you allow to use the cmp, lot of programmers
> will use it straight away, not even bothering to know what that
> strange 'key' argument may be useful for. And they miss the how much
> handy 'key' is.
...

> So a funny solution to this situation may be the following: to keep
> only 'key' in the sort/sorted for few years, so the community of
> Python programmers gets used to 'key' only, and then put 'cmp' back
> in, for the special cases ;-)

FWIW, sorting is only a small part of the picture. The notion of
three-way cmp functions was removed throughout the language. The
built-in cmp() function and the __cmp__ magic method and the
corresponding C slot were all excised. None of them played nicely
with rich comparisons. It was a case where having two-ways-to-do-it
was complicating everyone's lives. AFAICT, very few people understood
exactly how the two interacted -- the under-the-hood C code for it
was complex, unattractive, and hard to understand.


> I am having a very hard type (despite my implementation,
> explanations, etc) explaining to D programmers why for a
> flexible general-purpose function a key function argument
> is often better.

Last week, I learned a new term, "Code Prion", that referred to
a code technique such as cmp-functions which cause the proteins
of the brain to become contagiously misfolded so that the victim
1) always prefers that technique, 2) easily infects others, and
3) is resistant to sterilization (using other techniques) ;-)

Once Kernighan and Ritchie put C's qsort() into the food supply,
we were doomed.

FWIW, I think the standard library's unittest module is also a
code prion -- there's no other explanation for its thriving
despite competition from the vastly superior syntax of py.test
or nose.


Raymond

Paul Rubin

unread,
Oct 6, 2009, 3:33:42 PM10/6/09
to
Raymond Hettinger <pyt...@rcn.com> writes:
> Once Kernighan and Ritchie put C's qsort() into the food supply,
> we were doomed.

It was already too late. Knuth vol 3 came out in 1973(?) and its
sorting half is mostly about comparison sorting.

> FWIW, I think the standard library's unittest module is also a
> code prion -- there's no other explanation for its thriving
> despite competition from the vastly superior syntax of py.test
> or nose.

unittest is sort of a clone of JUnit if I understand correctly.
py.test and nose are obscure because they're not in the stdlib. I
have used unittest a little bit but I mostly use doctest these days.
I have the impression that py.test and/or nose are better in some ways
but I haven't bothered checking further.

Bearophile

unread,
Oct 6, 2009, 4:09:56 PM10/6/09
to
Paul Rubin:

> bearophile:


>> I am having a very hard type (despite my implementation, explanations,
>> etc) explaining to D programmers why for a flexible general-purpose
>> function a key function argument is often better. They in the end have
>> added something like that in the std lib, but with a weird name like
>> SchwartzSort that's surely not the first standard sort they look
>> for... this is bad.

> "key" and the DSU (Schwartz) sorting pattern makes lots of sense in
> scripting languages like Python that are primarily used on relatively
> small data sets. The best reason for writing something in a lower
> level language like D is because you want to push the limits of your
> availiable machine resources. Therefore, the DSU strategy's extra
> memory consumption will be what you want less of the time than it is
> in Python.

I think you are quite wrong.

In D dynamic arrays are built-in, they are used almost as Python lists
(but they are single-typed, unless you create an array of variants),
among other things they have a built-in sort. Such sort is used for
small situations too. Even in a language like D *very* often what you
need is to sort a small amount of data in a very flexible way. In such
situations what you need isn't to squeeze the last byte or CPU cycle
out of the PC, but to have a handy and short syntax, a very flexible
sorting, and something that's surely not bug-prone. In such situation
having a 'key' argument is *better*. Such sort can be stable. That's
why the main sort/sorted routines in my dlibs accept an optional key
callable, and they are very handy. I can show you examples.

On the other hand once in a while in a D program you may need to sort
a very large amount of data, or you may need something faster (and
unstable, like a refined introsort based on a 2-pivot QS), or even
more special than the things allowed by the built in sort. In such
situation you can import a special sorting function from the std lib
and use it, such sort can have something equivalent to a cmp (but
lower level, it can accept a function template alias, to inline the
comparison operation).

So the idea is: in a multi-level language you very often have to sort
a limited amount of data in complex ways. Because in most programs a
quite large percentage of the code isn't performance-critical. In such
large parts of the code what you want is something very handy, safe
and flexible. So having a key function is positive. In the few spots
where you need high performance, you can import (or even implement
with your hands) and use a specialized sorting routine. D is a general
purpose language, it's not used like C in Python projects to write
performance-critical spots only.

This is also visible in other parts of the language, for example there
are built-in associative arrays. Their syntax is handy, and even if
they aren't high performance (they are sometimes slower than Python
dicts despite being O(1), but they never have a O(n^2) performance
profile, as may happen to Python dicts, so they are safer) you can use
them in a very quick way, even if you have to create a 10-items
associative array. The end result is that in D programs you can find
more hashes than in C++ code, where I think programmers use them only
where simpler solutions (like iterating on an array) aren't good
enough. (So D AAs have to be fast even in situations where you put 8
items inside them, like in Python dicts). This free usage of AAs makes
the code stronger too (because once in a while you may put 1000 items
in such AA, and it will keeps working efficiently, while a linear scan
in a list of 1000 items starts to become not efficient).

Bye,
bearophile

Steven D'Aprano

unread,
Oct 6, 2009, 7:41:14 PM10/6/09
to
On Tue, 06 Oct 2009 12:28:00 -0700, Raymond Hettinger wrote:

> [bearophile]
>> I love the 'key', it makes my code simpler and it's simpler to
>> understand. I am not opposed to the idea of keeping cmp, that in some
>> rare cases may be useful or more natural.
>>
>> The problem is that if you allow to use the cmp, lot of programmers
>> will use it straight away, not even bothering to know what that strange
>> 'key' argument may be useful for. And they miss the how much handy
>> 'key' is.
> ...
>> So a funny solution to this situation may be the following: to keep
>> only 'key' in the sort/sorted for few years, so the community of Python
>> programmers gets used to 'key' only, and then put 'cmp' back in, for
>> the special cases ;-)
>
> FWIW, sorting is only a small part of the picture. The notion of
> three-way cmp functions was removed throughout the language. The
> built-in cmp() function and the __cmp__ magic method and the
> corresponding C slot were all excised. None of them played nicely with
> rich comparisons. It was a case where having two-ways-to-do-it was
> complicating everyone's lives. AFAICT, very few people understood
> exactly how the two interacted -- the under-the-hood C code for it was
> complex, unattractive, and hard to understand.

There's no reason why a hypothetical comparison function for sort() needs
to be a three-way comparison function. All you need to implement sorting
is a function to implement less-than -- I believe that sort() and min()
are already implemented in such a way as to only require the objects
implements __lt__.

Hypothetically, sort(), min() and heapq.nsmallest() could take an
optional less-than comparison function, and max() and heapq.nlargest()
could use a greater-than comparison function.

Of course you can do this now with a cost of O(N) and a helper class:

class SortHelper:
def __init__(self, x):
self.x = x
def __lt__(self, other):
return self.x < other.x + 42 # or whatever...

# Decorate, Sort, Undecorate.
mylist = map(SortHelper, mylist)
mylist.sort()
mylist = [obj.x for obj in mylist]


If your list is huge, you can do the decoration in place:

for i, x in enumerate(mylist):
mylist[i] = SortHelper(x)
mylist.sort()
for i, obj in enumerate(mylist):
mylist[i] = obj.x

but since this is going to especially useful for really big lists, paying
that extra O(N) cost twice will hurt.

--
Steven

Paul Rubin

unread,
Oct 7, 2009, 2:20:13 AM10/7/09
to
Bearophile <bearoph...@lycos.com> writes:
> sorting, and something that's surely not bug-prone. In such situation
> having a 'key' argument is *better*. Such sort can be stable.

Nothing stops comparison sorting from being stable. Since the rest of
your post seems premised on the opposite, I hope that clears things up
for you.

Steven D'Aprano

unread,
Oct 7, 2009, 3:38:58 AM10/7/09
to

I'm sure Paul already knows this, but key-based sorts are comparison
sorts.

There are two basic types of sorts: comparison based, where the routine
has to compare items (usually with the < operator), and non-comparison
sorts, like bucket sort, pigeon-hole sort and radix sort. These sorts
require special knowledge of the items being sorted, and don't need to
compare two items. General purpose sorts like Python's sort() do,
regardless of whether you pass a key function, a three-way comparison
function, or something else.


--
Steven

Bearophile

unread,
Oct 7, 2009, 7:31:14 AM10/7/09
to
Paul Rubin:
> Bearophile:

> > sorting, and something that's surely not bug-prone. In such situation
> > having a 'key' argument is *better*. Such sort can be stable.
>
> Nothing stops comparison sorting from being stable.  Since the rest of
> your post seems premised on the opposite, I hope that clears things up
> for you.

When I have written that post I was partially unfocused, I am sorry.

What I meant is that a general sorting routine, even in D, is better
to be first of all flexible. So I think it's better for the D built-in
sort to be stable, because such extra invariant allows you to use the
sort in more situations.

Bye,
bearophile

Hrvoje Niksic

unread,
Oct 7, 2009, 8:35:34 AM10/7/09
to
Bearophile <bearoph...@lycos.com> writes:

> What I meant is that a general sorting routine, even in D, is better
> to be first of all flexible. So I think it's better for the D built-in
> sort to be stable, because such extra invariant allows you to use the
> sort in more situations.

Note that stable sort has additional memory requirements. In situations
where you don't need stability, but do need memory-efficient in-place
sorting, an unstable sort might well be preferred. This is why
libraries such as C++'s STL offer both.

Bearophile

unread,
Oct 7, 2009, 10:41:58 AM10/7/09
to
Hrvoje Niksic:

> Note that stable sort has additional memory requirements.  In situations
> where you don't need stability, but do need memory-efficient in-place
> sorting, an unstable sort might well be preferred.  This is why
> libraries such as C++'s STL offer both.

There are stable sorts that need no additional memory.

In a largish program written in a general-purpose multi-level
language, like D (this is probably true for C++ too), often 80% of the
code takes a very little time to run. Such code may need to process
few small arrays, or small data sets, or to manage the GUI, etc.

So optimizing those parts of the code for performance (both in memory
used and CPU cycles used) is stupid. What you want in such large parts
of the code is:
- to write code quickly
- to have code that's short, readable, easy to fix, and most important
of all that is the less bug-prone as possible.

So what you need in such large parts of the code is something that's
very flexible and safe, even if it's not top performance (both in
memory and CPU).

This is why for example in D there are built-in associative arrays,
that have a handy syntax, built-in methods, and they are never O(n^2),
even if they are not the most efficient ones where you need max
performance, or where you need minimal memory used, or where you need
to perform unusual operations. Such built-ins are useful to reduce
both code length and bug count, because they are quite safe and easy
to use.

Stable sorts are a little safer, because they don't scramble your data
in certain ways, so they can be used in more situations. This is why
having the Timsort in Python is good.

When you run profile the D code and you see your program uses too much
RAM or wastes too much time in the built-in sort, then you can switch
to using special sorts from the std lib. You can even write your own
hash or sort for special situations (and you don't have to drop down
to use another language for that, you keep using the same, even if you
may want to change your programming stile, and use a more C-like
style, with no dynamic allocations, some bit twidding, pointers, more
structs, 16 bit integers, unsigned integers, unions, compiler
intrinsics, even inlined assembly that uses SSE3, etc). In normal code
you may want to adopt a more Java-like programming style, or
templates, or even using a large library I have written, you may
program it almost as a curious Python-like, with lazyness too.

This is why I think the built-in sort has to be flexible, because you
are supposed to use it most of the times, and most of the times you
don't need top performance. Different parts of a program have to be
optimized for different things, computer performance, or programmer
performance.

Bye,
bearophile

Luis Zarrabeitia

unread,
Oct 7, 2009, 11:30:09 AM10/7/09
to pytho...@python.org
On Tuesday 06 October 2009 02:40:46 pm Paul Rubin wrote:
> > The problem is that if you allow to use the cmp, lot of programmers
> > will use it straight away, not even bothering to know what that
> > strange 'key' argument may be useful for. And they miss the how much
> > handy 'key' is.
>
> Given how often we hear "consenting adults" as justification for any
> number of gaps in Python error checking, the argument above is
> singularly unpersuasive.

Well, as long as you consider them "gaps" in need of a "justification", of
course that argument (and the one about the "gaps" themselves) will of course
seem singularly unpersuasive.

But if you see them as a "feature" (that may sometimes, albeit rarely,
missfire), then you would have no problem with /either/ argument.

--
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie

Raymond Hettinger

unread,
Oct 7, 2009, 1:15:15 PM10/7/09
to
[Hrvoje Niksic]

> Note that stable sort has additional memory requirements.  In situations
> where you don't need stability, but do need memory-efficient in-place
> sorting, an unstable sort might well be preferred.  This is why
> libraries such as C++'s STL offer both.

FWIW, the "additional memory requirements" are typically a set of
pointers to the objects being sorted, so the memory overhead is
typically very small relative to the size of the objects being sorted.
IOW, this isn't much of a consideration in most Python apps.


Raymond

0 new messages