Guido rethinking removal of cmp from sort method

Steven D'Aprano

unread,

Mar 13, 2011, 12:59:55 PM3/13/11

to

The removal of cmp from the sort method of lists is probably the most
disliked change in Python 3. On the python-dev mailing list at the
moment, Guido is considering whether or not it was a mistake.

If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up.

--
Steven

Dave Abrahams

unread,

Mar 13, 2011, 8:35:16 PM3/13/11

to pytho...@python.org

Steven D'Aprano <steve+comp.lang.python <at> pearwood.info> writes:

> If anyone has any use-cases for sorting with a comparison function that
> either can't be written using a key function, or that perform really
> badly when done so, this would be a good time to speak up.

I think it's probably provable that there are no cases in the first category,
provided you're willing to do something sufficiently contorted. However,
it also seems self-evident to me that many programmers will rightly chafe
at the idea of creating and tearing down a bunch of objects just to
compare things for sorting. Think of the heap churn! Even if it turns out
that Python 3 contains some magic implementation detail that makes it
efficient most of the time, it goes against a natural understanding of the
computation model

2p for y'all.
-Dave

Stefan Behnel

unread,

Mar 14, 2011, 7:38:46 AM3/14/11

to pytho...@python.org

Steven D'Aprano, 13.03.2011 13:59:

As Raymond Hettinger and Daniel Stutzbach pointed out in that thread, the
memory overhead is much lower in Python 3.2 and also depends on the usage.
If memory is a problem, it can still be traded for time by sorting in
multiple (stable) passes with smaller keys. It rarely is a problem in
practice, though, and Guido's initial post seems to suggest that as well.

Stefan

Jean-Michel Pichavant

unread,

Mar 14, 2011, 11:10:27 AM3/14/11

to Steven D'Aprano, pytho...@python.org

You seem concerned by this removal, do you have any use-case ?

JM

Steven D'Aprano

unread,

Mar 14, 2011, 1:15:02 PM3/14/11

to

You seem concerned by my concern. Why do you think I am concerned?

(1) I'm not concerned, but many people are. If you search the archives of
this newsgroup (mailing list), you'll see that I have defended the
removal of cmp from sort, e.g. this post:

http://www.mail-archive.com/python-list%40python.org/msg261728.html

(2) If I had a good use-case for keeping cmp, I wouldn't need to ask
others if they had a good use-case.

As it is, Guido himself has mentioned one such good use for a comparison
function when sorting. Use of a key function trades off memory for time,
while sorting with a comparison function makes the opposite trade off,
using more time for the sake of saving memory.

--
Steven

Jean-Michel Pichavant

unread,

Mar 14, 2011, 1:37:56 PM3/14/11

to Steven D'Aprano, pytho...@python.org

Just to know if currently the use-case count is zero. Nothing evil
hidden behind my question :p

JM

Paul Rubin

unread,

Mar 14, 2011, 7:39:35 PM3/14/11

to

Steven D'Aprano <steve+comp....@pearwood.info> writes:
> If anyone has any use-cases for sorting with a comparison function that
> either can't be written using a key function, or that perform really
> badly when done so, this would be a good time to speak up.

We've had this discussion a couple times before. I remember an example
involving comparing tree structures, that I thought couldn't be done
with key= in a reasonable way, and that this was a reasonable real-world
example. Raymond H then gave a way of encoding it with key= which
looked ok at the time, but I decided sometime later that I think we both
missed an error in that encoding, so I've been wanting to dig it out
again and look more closely.

key= can of course burn a lot more memory because of the DSU pattern
(say you are sorting file handles according to the contents of the file,
so with key= you might have to read N multi-megabyte files where with
cmp you just pairwise-compare them until they differ, which is probably
in the first few bytes).

Finally I concocted an "infinite" example which we agreed is artificial:
you are given a list of generators denoting real numbers, for example
pi generates the infinite sequence 3,1,4,1,5,9... while e generates
2,7,1,8,... You can sort these with cmp but not with key.

Nobody

unread,

Mar 14, 2011, 11:37:06 PM3/14/11

to

On Mon, 14 Mar 2011 12:39:35 -0700, Paul Rubin wrote:

> Finally I concocted an "infinite" example which we agreed is artificial:
> you are given a list of generators denoting real numbers, for example
> pi generates the infinite sequence 3,1,4,1,5,9... while e generates
> 2,7,1,8,... You can sort these with cmp but not with key.

Is there some restriction on the return type of the key= function? If not,
you can define a class with appropriate comparison methods and have key=
return instances of that class.

Steven D'Aprano

unread,

Mar 15, 2011, 2:14:28 AM3/15/11

to

You mean like this?

http://docs.python.org/dev/library/functools.html#functools.cmp_to_key

--
Steven

Paul Rubin

unread,

Mar 15, 2011, 4:59:54 AM3/15/11

to

Nobody <nob...@nowhere.com> writes:
>> 2,7,1,8,... You can sort these with cmp but not with key.
>
> Is there some restriction on the return type of the key= function? If not,
> you can define a class with appropriate comparison methods and have key=
> return instances of that class.

Sorry, yeah, of course you can do it that way, but it's bloody ugly and
you're no longer gaining the supposed benefit of the DSU pattern.

Ian Kelly

unread,

Mar 15, 2011, 6:52:44 AM3/15/11

to pytho...@python.org

On Mon, Mar 14, 2011 at 1:39 PM, Paul Rubin <no.e...@nospam.invalid> wrote:
> Finally I concocted an "infinite" example which we agreed is artificial:
> you are given a list of generators denoting real numbers, for example
> pi generates the infinite sequence 3,1,4,1,5,9... while e generates
> 2,7,1,8,... You can sort these with cmp but not with key.

I would think that you can sort them with key as long as none of the
sequences are equal (which would result in an infinite loop using
either method). Why not this?

_MISSING = object()

@functools.total_ordering
class IteratorKey(object):

def __init__(self, iterable):
self._iterator = iter(iterable)
self._cache = []

def __iter__(self):
# This is not reentrant, but it's
# good enough for this purpose.
for x in self._cache:
yield x
for x in self._iterator:
self._cache.append(x)
yield x

def __lt__(self, other):
for x, y in itertools.izip_longest(self, other, fillvalue=_MISSING):
if x is _MISSING or y is _MISSING:
return x is _MISSING
elif x < y or y < x:
return x < y
return False

def __eq__(self, other):
for x, y in itertools.izip_longest(self, other, fillvalue=_MISSING):
if x is _MISSING or y is _MISSING or x != y:
return False
return True

Paul Rubin

unread,

Mar 15, 2011, 8:00:44 AM3/15/11

to

Ian Kelly <ian.g...@gmail.com> writes:
> I would think that you can sort them with key as long as none of the
> sequences are equal (which would result in an infinite loop using
> either method). Why not this?

Yes you can do something like that, but look how ugly it is compared
with using cmp.

Stefan Behnel

unread,

Mar 15, 2011, 9:01:42 AM3/15/11

to pytho...@python.org

Paul Rubin, 15.03.2011 09:00:

I would argue that it's a rare use case, though. In most cases, memory
doesn't really matter, and the overhead of a decorated sort is acceptable.
In some cases, where space matters, an external sort is a better choice or
even the only thing that will work, so all that's left are the cases where
speed really needs to be traded for space *and* an external sort is not
applicable, and those where things can be expressed more easily using cmp
than using a key. For the latter, there's support in the standard library,
as Steven pointed out. Now, the question that remains is: are the few
remaining cases worth being supported directly, thus complicating the
internal source code of the Python runtime?

Stefan

Paul Rubin

unread,

Mar 15, 2011, 9:08:04 AM3/15/11

to

Stefan Behnel <stef...@behnel.de> writes:
> the question that remains is: are the few remaining cases worth being
> supported directly, thus complicating the internal source code of the
> Python runtime?

The other cases were supported perfectly well already, and the support
was removed. That's just ludicrous in my opinion.

Ian Kelly

unread,

Mar 15, 2011, 5:00:56 PM3/15/11

to pytho...@python.org

Actually, how would you do it with cmp? You would need to tee off the
iterators in some way so that each comparison gets the same data as
the last one, which sounds like a major hassle.

Or were you talking about passing in generator functions rather than
generator objects and reinitializing them for each comparison? If
that's the case then the key version can be a fair amount simpler
since you wouldn't need to bother with caching.

Antoon Pardon

unread,

Mar 23, 2011, 1:53:42 PM3/23/11

to pytho...@python.org

How about a list of tuples where you want them sorted first item in ascending
order en second item in descending order.

--
Antoon Pardon

Stefan Behnel

unread,

Mar 23, 2011, 1:59:09 PM3/23/11

to pytho...@python.org

Antoon Pardon, 23.03.2011 14:53:

> On Sun, Mar 13, 2011 at 12:59:55PM +0000, Steven D'Aprano wrote:

> How about a list of tuples where you want them sorted first item in ascending
> order en second item in descending order.

You can use a stable sort in two steps for that.

Stefan

Antoon Pardon

unread,

Mar 23, 2011, 3:14:26 PM3/23/11

to pytho...@python.org

On Wed, Mar 23, 2011 at 02:59:09PM +0100, Stefan Behnel wrote:
> Antoon Pardon, 23.03.2011 14:53:
> >On Sun, Mar 13, 2011 at 12:59:55PM +0000, Steven D'Aprano wrote:

> >How about a list of tuples where you want them sorted first item in ascending
> >order en second item in descending order.
>
> You can use a stable sort in two steps for that.

Which isn't helpfull if where you decide how they have to be sorted is
not the place where they are actually sorted.

I have a class that is a priority queue. Elements are added at random but
are removed highest priority first. The priority queue can have a key or
a cmp function for deciding which item is the highest priority. It
can also take a list as an initializor, which will then be sorted.

So this list is sorted within the class but how it is sorted is decided
outside the class. So I can't do the sort in multiple steps.

--
Antoon Pardon

Ian Kelly

unread,

Mar 23, 2011, 4:40:11 PM3/23/11

to Antoon Pardon, pytho...@python.org

On Wed, Mar 23, 2011 at 9:14 AM, Antoon Pardon
<Antoon...@rece.vub.ac.be> wrote:
> Which isn't helpfull if where you decide how they have to be sorted is
> not the place where they are actually sorted.
>
> I have a class that is a priority queue. Elements are added at random but
> are removed highest priority first. The priority queue can have a key or
> a cmp function for deciding which item is the highest priority. It
> can also take a list as an initializor, which will then be sorted.
>
> So this list is sorted within the class but how it is sorted is decided
> outside the class. So I can't do the sort in multiple steps.

You can't do this?

for (key, reversed) in self.get_multiple_sort_keys():
self.pq.sort(key=key, reversed=reversed)

Stefan Behnel

unread,

Mar 23, 2011, 4:51:07 PM3/23/11

to pytho...@python.org

Antoon Pardon, 23.03.2011 16:14:

> On Wed, Mar 23, 2011 at 02:59:09PM +0100, Stefan Behnel wrote:
>> Antoon Pardon, 23.03.2011 14:53:
>>> On Sun, Mar 13, 2011 at 12:59:55PM +0000, Steven D'Aprano wrote:

>>> How about a list of tuples where you want them sorted first item in ascending
>>> order en second item in descending order.
>>
>> You can use a stable sort in two steps for that.
>

> Which isn't helpfull if where you decide how they have to be sorted is
> not the place where they are actually sorted.
>
> I have a class that is a priority queue. Elements are added at random but
> are removed highest priority first. The priority queue can have a key or
> a cmp function for deciding which item is the highest priority. It
> can also take a list as an initializor, which will then be sorted.
>
> So this list is sorted within the class but how it is sorted is decided
> outside the class. So I can't do the sort in multiple steps.

That sounds more like a use case for heap sort than for Python's builtin
list sort. See the heapq module.

Stefan

Carl Banks

unread,

Mar 23, 2011, 5:23:55 PM3/23/11

to

How about this one: you have are given an obscure string collating
function implented in a C library you don't have the source to.

Or how about this: I'm sitting at an interactive session and I have a
convenient cmp function but no convenient key, and I care more about
the four minutes it'd take to whip up a clever key function or an
adapter class than the 0.2 seconds I'd save to on sorting time.

Removing cmp from sort was a mistake; it's the most straightforward
and natural way to sort in many cases. Reason enough for me to keep
it.

Carl Banks

Stefan Behnel

unread,

Mar 23, 2011, 5:51:06 PM3/23/11

to pytho...@python.org

Carl Banks, 23.03.2011 18:23:

> On Mar 23, 6:59 am, Stefan Behnel wrote:
>> Antoon Pardon, 23.03.2011 14:53:
>>
>>> On Sun, Mar 13, 2011 at 12:59:55PM +0000, Steven D'Aprano wrote:
>>>> The removal of cmp from the sort method of lists is probably the most
>>>> disliked change in Python 3. On the python-dev mailing list at the
>>>> moment, Guido is considering whether or not it was a mistake.
>>
>>>> If anyone has any use-cases for sorting with a comparison function that
>>>> either can't be written using a key function, or that perform really
>>>> badly when done so, this would be a good time to speak up.
>>
>>> How about a list of tuples where you want them sorted first item in ascending
>>> order en second item in descending order.
>>
>> You can use a stable sort in two steps for that.
>
> How about this one: you have are given an obscure string collating
> function implented in a C library you don't have the source to.
>
> Or how about this: I'm sitting at an interactive session and I have a
> convenient cmp function but no convenient key, and I care more about
> the four minutes it'd take to whip up a clever key function or an
> adapter class than the 0.2 seconds I'd save to on sorting time.

As usual with Python, it's just an import away:

http://docs.python.org/library/functools.html#functools.cmp_to_key

I think this is a rare enough use case to merit an import rather than being
a language feature.

Stefan

Carl Banks

unread,

Mar 23, 2011, 6:47:26 PM3/23/11

to

The original question posted here was, "Is there a use case for cmp?"
There is, and your excuse-making doesn't change the fact. It's the
most natural way to sort sometimes; that's a use case. We already
knew it could be worked around.

It's kind of ridiculous to claim that cmp adds much complexity (it's
maybe ten lines of extra C code), so the only reason not to include it
is that it's much slower than using key. Not including it for that
reason would be akin to the special-casing of sum to prevent strings
from being concatenated, although omitting cmp would not be as drastic
since it's not a special case.

Do we omit something that's useful but potentially slow? I say no.

Carl Banks

Paul Rubin

unread,

Mar 23, 2011, 8:38:53 PM3/23/11

to

Carl Banks <pavlove...@gmail.com> writes:
> It's kind of ridiculous to claim that cmp adds much complexity (it's
> maybe ten lines of extra C code), so the only reason not to include it
> is that it's much slower than using key.

Well, I thought it was also to get rid of 3-way cmp in general, in favor
of rich comparison.

Carl Banks

unread,

Mar 24, 2011, 1:40:44 AM3/24/11

to

On Mar 23, 1:38 pm, Paul Rubin <no.em...@nospam.invalid> wrote:

Supporting both __cmp__ and rich comparison methods of a class does
add a lot of complexity. The cmp argument of sort doesn't.

The cmp argument doesn't depend in any way on an object's __cmp__
method, so getting rid of __cmp__ wasn't any good readon to also get
rid of the cmp argument; their only relationship is that they're
spelled the same. Nor is there any reason why cmp being a useful
argument of sort should indicate that __cmp__ should be retained in
classes.

Carl Banks

Message has been deleted

Paul Rubin

unread,

Mar 24, 2011, 7:40:03 AM3/24/11

to

Dennis Lee Bieber <wlf...@ix.netcom.com> writes:
> The first half of the problem description -- "Elements are added at
> random" seems more suited to an in-place insertion sort method.

This is precisely what a priority queue is for. Insertions take
O(log n) time and there's very little space overhead in heapq's
list-based implementation.

Antoon Pardon

unread,

Mar 24, 2011, 9:23:23 AM3/24/11

to pytho...@python.org

On Wed, Mar 23, 2011 at 10:40:11AM -0600, Ian Kelly wrote:
> On Wed, Mar 23, 2011 at 9:14 AM, Antoon Pardon
> <Antoon...@rece.vub.ac.be> wrote:

> > Which isn't helpfull if where you decide how they have to be sorted is
> > not the place where they are actually sorted.
> >
> > I have a class that is a priority queue. Elements are added at random but
> > are removed highest priority first. The priority queue can have a key or
> > a cmp function for deciding which item is the highest priority. It
> > can also take a list as an initializor, which will then be sorted.
> >
> > So this list is sorted within the class but how it is sorted is decided
> > outside the class. So I can't do the sort in multiple steps.
>

> You can't do this?
>
> for (key, reversed) in self.get_multiple_sort_keys():
> self.pq.sort(key=key, reversed=reversed)

Sure I can do that. I can do lots of things like writing a CMP class
that I will use as a key and where I can implement the logic for
comparing the original objects, which I otherwise would have put in a
cmp function. I thought this wasn't about how one can get by without
the cmp argument. This was about cases where the cmp argument was the
more beautiful or natural way to handle this case.

I think I have provided such a case. If you don't agree then
don't just give examples of what I can do, argue how your solution
would be a less ugly or more natural way to handle this.

Antoon Pardon

unread,

Mar 24, 2011, 9:33:18 AM3/24/11

to pytho...@python.org

On Wed, Mar 23, 2011 at 05:51:07PM +0100, Stefan Behnel wrote:
> >>
> >>You can use a stable sort in two steps for that.
> >

> >Which isn't helpfull if where you decide how they have to be sorted is
> >not the place where they are actually sorted.
> >
> >I have a class that is a priority queue. Elements are added at random but
> >are removed highest priority first. The priority queue can have a key or
> >a cmp function for deciding which item is the highest priority. It
> >can also take a list as an initializor, which will then be sorted.
> >
> >So this list is sorted within the class but how it is sorted is decided
> >outside the class. So I can't do the sort in multiple steps.
>

> That sounds more like a use case for heap sort than for Python's
> builtin list sort. See the heapq module.

No the heapq module is not usefull. The heapq functions don't have a
cmp, or a key argument. So you can't use it with priorities that differ
from the normal order of the items.

For the rest is solving this particular problem beside the point. It
was just an illustration of how, sorting a list can be done in a place
differently from where one decides the order-relation of the items in
the list. That fact makes it less obvious to use multiple steps in the
place where you actually sort.

Mel

unread,

Mar 24, 2011, 2:41:54 PM3/24/11

to

Carl Banks wrote:

I would have thought that the upper limit of cost of supporting cmp= and
key= would be two different internal front-ends to the internal
internal sort.

Mel.

Ian Kelly

unread,

Mar 24, 2011, 3:06:44 PM3/24/11

to Antoon Pardon, pytho...@python.org

On Thu, Mar 24, 2011 at 3:23 AM, Antoon Pardon
<Antoon...@rece.vub.ac.be> wrote:
> Sure I can do that. I can do lots of things like writing a CMP class
> that I will use as a key and where I can implement the logic for
> comparing the original objects, which I otherwise would have put in a
> cmp function. I thought this wasn't about how one can get by without
> the cmp argument. This was about cases where the cmp argument was the
> more beautiful or natural way to handle this case.

That's not what you wrote before. You wrote "I can't do the sort in
multiple steps." I was just responding to what you wrote.

> I think I have provided such a case. If you don't agree then
> don't just give examples of what I can do, argue how your solution
> would be a less ugly or more natural way to handle this.

Well, the alternative is to generate the cmp function from the
externally selected keys dynamically at runtime, is it not? Something
like this:

def get_cmp(keys):
def cmp(a, b):
for (key, reversed) in keys:
if reversed:
result = cmp(key(b), key(a))
else:
result = cmp(key(a), key(b))
if result != 0:
return result
return result
return cmp

I fail to see how that is any more natural than my solution, unless
you have another way of doing it. And it's certainly not going to be
any faster.

Antoon Pardon

unread,

Mar 24, 2011, 4:47:05 PM3/24/11

to pytho...@python.org

On Thu, Mar 24, 2011 at 09:06:44AM -0600, Ian Kelly wrote:
> On Thu, Mar 24, 2011 at 3:23 AM, Antoon Pardon
> <Antoon...@rece.vub.ac.be> wrote:
> > Sure I can do that. I can do lots of things like writing a CMP class
> > that I will use as a key and where I can implement the logic for
> > comparing the original objects, which I otherwise would have put in a
> > cmp function. I thought this wasn't about how one can get by without
> > the cmp argument. This was about cases where the cmp argument was the
> > more beautiful or natural way to handle this case.
>
> That's not what you wrote before. You wrote "I can't do the sort in
> multiple steps." I was just responding to what you wrote.

That is because I tend to assume some intelligence with those I
communicate with, so that I don't need to be pedantly clear and
spell everything out.

However since that seems to be a problem for you I will be more
detailed. The original poster didn't ask for cases in which cmp
was necessary, he asked for cases in which not using cmp was
cumbersome. I gave a case where not using cmp was cumbersome.
a tuple where you want it sorted with first item in descending
order and second item ascending.

You then responded, that you could solve that by sorting in multiple
steps. The problem with this response is that how items are to be
compared can be decided in a different module from where they are
actually sorted. So if we would accept this MO, this would mean
that any module that needed to sort something according to a provided
key, would need to provide for the possibility that the sorting would
have to be done in multiple steps. So, in order to avoid complexity
in the internal python sort, we would increase the complexity of
all library code that would need to sort things in a by the user
provided way and didn't want to barf in such an event.

So when I wrote I couldn't do that, I implicitly meant if you
want a less cumbersome solution than using cmp, because your
solution would make all library code more cumbersome.

> > I think I have provided such a case. If you don't agree then
> > don't just give examples of what I can do, argue how your solution
> > would be a less ugly or more natural way to handle this.
>
> Well, the alternative is to generate the cmp function from the
> externally selected keys dynamically at runtime, is it not? Something
> like this:
>
> def get_cmp(keys):
> def cmp(a, b):
> for (key, reversed) in keys:
> if reversed:
> result = cmp(key(b), key(a))
> else:
> result = cmp(key(a), key(b))
> if result != 0:
> return result
> return result
> return cmp
>
> I fail to see how that is any more natural than my solution, unless
> you have another way of doing it. And it's certainly not going to be
> any faster.

First of all, there is no need for a dynamical generated cmp function in
my provided case. My cmp fumction would simply look like this:

def cmp(tp1, tp2):
return cmp(tp2[0], tp1[0]) or cmp(tp1[1], tp2[1])

Second, there is not only the question of what is more natural, but
there is also the question of how you are spreading the complexity.
Your first solution would mean spreading the complexity to all library
code where the user could provide the order-relation of the items that
would then be sorted within the library. This alternative of your, even
if too complicated than needed would mean that libary code could just
use the sort method and the complicated code is limited to those
specific items that need it.

--
Antoon Pardon

Ian Kelly

unread,

Mar 24, 2011, 5:31:19 PM3/24/11

to Antoon Pardon, pytho...@python.org

On Thu, Mar 24, 2011 at 10:47 AM, Antoon Pardon
<Antoon...@rece.vub.ac.be> wrote:
>> That's not what you wrote before. You wrote "I can't do the sort in
>> multiple steps." I was just responding to what you wrote.
>
> That is because I tend to assume some intelligence with those I
> communicate with, so that I don't need to be pedantly clear and
> spell everything out.
>
> However since that seems to be a problem for you I will be more
> detailed. The original poster didn't ask for cases in which cmp
> was necessary, he asked for cases in which not using cmp was
> cumbersome.

False. He asked for either sort of case:

> If anyone has any use-cases for sorting with a comparison function that
> either can't be written using a key function, or that perform really
> badly when done so, this would be a good time to speak up.

> You then responded, that you could solve that by sorting in multiple
> steps.

No, I did not.

> The problem with this response is that how items are to be
> compared can be decided in a different module from where they are
> actually sorted. So if we would accept this MO, this would mean
> that any module that needed to sort something according to a provided
> key, would need to provide for the possibility that the sorting would
> have to be done in multiple steps. So, in order to avoid complexity
> in the internal python sort, we would increase the complexity of
> all library code that would need to sort things in a by the user
> provided way and didn't want to barf in such an event.

That is a perfectly reasonable argument that you apparently were too
lazy to even mention before. It is completely different from and not
at all implied by the patently absurd statement "So this list is

sorted within the class but how it is sorted is decided outside the

class. So I can't do the sort in multiple steps."

I would not say "The sky is green" when what I mean is "The color of
the sky is the result of the atmosphere scattering the white light
from the sun in the smaller wavelengths, including some green light."
Would you?

I misunderstood your use case, then. When you wrote that "how it is
sorted is decided outside of the class" in the context of sorting
based on multiple keys, I took that to imply that the external code
was deciding the sorting at runtime, e.g. that perhaps your priority
queue was being displayed in a GUI table that can be sorted on
multiple columns simultaneously. But evidently you expect that this
key-based sort order will always be hard-coded?

> Second, there is not only the question of what is more natural, but
> there is also the question of how you are spreading the complexity.
> Your first solution would mean spreading the complexity to all library
> code where the user could provide the order-relation of the items that
> would then be sorted within the library. This alternative of your, even
> if too complicated than needed would mean that libary code could just
> use the sort method and the complicated code is limited to those
> specific items that need it.

On the other hand, if at some point your library really does need to
accept a dynamically generated sort order, then you are pushing the
complexity onto the client code by forcing it to generate a cmp
function like the one I posted above. But perhaps that will never be
the case for your purpose, in which case I take your point.

Steven D'Aprano

unread,

Mar 24, 2011, 11:49:53 PM3/24/11

to

On Thu, 24 Mar 2011 17:47:05 +0100, Antoon Pardon wrote:

> However since that seems to be a problem for you I will be more
> detailed. The original poster didn't ask for cases in which cmp was
> necessary, he asked for cases in which not using cmp was cumbersome.

I'm the original poster, and that's not what I said. I said:

"If anyone has any use-cases for sorting with a comparison function that
either can't be written using a key function, or that perform really
badly when done so, this would be a good time to speak up."

You'll notice that I said nothing about whether writing the code was easy
or cumbersome, and nothing about readability.

> I
> gave a case where not using cmp was cumbersome. a tuple where you want
> it sorted with first item in descending order and second item ascending.

No, I'm sorry, calling sort twice is not cumbersome. In fact, if somebody
gave me code that sorted tuples in that way using a comparison function,
I would immediately rip it out and replace it with two calls to sort: not
only is it usually faster and more efficient, but it's easier to read,
easier to reason about, and easier to write.

from operator import itemgetter
data.sort(key=itemgetter(1))
data.sort(key=itemgetter(0), reverse=True)

A cmp function for this task may have been justified back in the Dark
Ages of Python 2.2, before Python's sort was guaranteed to be stable, but
there's no need for it now.

> You then responded, that you could solve that by sorting in multiple
> steps. The problem with this response is that how items are to be
> compared can be decided in a different module from where they are
> actually sorted.

So what? Instead of this:

# Module A decides how to sort, module B actually does the sort.
# In module A:
B.do_sorting(cmp=some_comparison_function)

it will become this:

# In module A:
func = functools.cmp_to_key(some_comparison_function)
B.do_sorting(key=func)

That's hardly cumbersome.

I'm looking for cases where using cmp_to_key doesn't work at all, or
performs badly, if such cases exist at all.

> First of all, there is no need for a dynamical generated cmp function in
> my provided case. My cmp fumction would simply look like this:
>
> def cmp(tp1, tp2):
> return cmp(tp2[0], tp1[0]) or cmp(tp1[1], tp2[1])

"Simply"?

Perhaps that works, I don't know -- but I do know that I can't tell
whether it works by just looking at it. That makes it complex enough that
it should have a test suite, to ensure it works the way you expect.

But regardless, nobody argues that a comparison function must always be
complicated. Or even that a comparison function is never less complicated
than a key function. That's not the point. The point is to find an
example where a comparison function will work where no key function is
either possible or practical.

--
Steven

Martin v. Loewis

unread,

Mar 25, 2011, 12:37:15 AM3/25/11

to Carl Banks

> The cmp argument doesn't depend in any way on an object's __cmp__
> method, so getting rid of __cmp__ wasn't any good readon to also get
> rid of the cmp argument

So what do you think about the cmp() builtin? Should have stayed,
or was it ok to remove it?

If it should have stayed: how should it's implementation have looked like?

If it was ok to remove it: how are people supposed to fill out the cmp=
argument in cases where they use the cmp() builtin in 2.x?

Regards,
Martin

MRAB

unread,

Mar 25, 2011, 12:39:53 AM3/25/11

to pytho...@python.org

On 24/03/2011 23:49, Steven D'Aprano wrote:
> On Thu, 24 Mar 2011 17:47:05 +0100, Antoon Pardon wrote:
>
>> However since that seems to be a problem for you I will be more
>> detailed. The original poster didn't ask for cases in which cmp was
>> necessary, he asked for cases in which not using cmp was cumbersome.
>
> I'm the original poster, and that's not what I said. I said:
>
> "If anyone has any use-cases for sorting with a comparison function that
> either can't be written using a key function, or that perform really
> badly when done so, this would be a good time to speak up."
>
> You'll notice that I said nothing about whether writing the code was easy
> or cumbersome, and nothing about readability.
>
>
>> I
>> gave a case where not using cmp was cumbersome. a tuple where you want
>> it sorted with first item in descending order and second item ascending.
>
> No, I'm sorry, calling sort twice is not cumbersome. In fact, if somebody
> gave me code that sorted tuples in that way using a comparison function,
> I would immediately rip it out and replace it with two calls to sort: not
> only is it usually faster and more efficient, but it's easier to read,
> easier to reason about, and easier to write.
>
>
> from operator import itemgetter
> data.sort(key=itemgetter(1))
> data.sort(key=itemgetter(0), reverse=True)
>

If there are two, it isn't too bad, but if there are several (some
ascending, some descending), then it gets a little cumbersome.

>
> A cmp function for this task may have been justified back in the Dark
> Ages of Python 2.2, before Python's sort was guaranteed to be stable, but
> there's no need for it now.
>

[snip]
It's one of those things I wouldn't have voted to remove, but now it's
gone, I'm unsure how much I'd like it back! I wouldn't object to it
returning...

Carl Banks

unread,

Mar 25, 2011, 1:32:11 AM3/25/11

to

On Mar 24, 5:37 pm, "Martin v. Loewis" <mar...@v.loewis.de> wrote:
> > The cmp argument doesn't depend in any way on an object's __cmp__
> > method, so getting rid of __cmp__ wasn't any good readon to also get
> > rid of the cmp argument
>
> So what do you think about the cmp() builtin? Should have stayed,
> or was it ok to remove it?

Since it's trivial to implement by hand, there's no point for it to be
a builtin. There wasn't any point before rich comparisons, either.
I'd vote not merely ok to remove, but probably a slight improvement.
It's probably the least justified builtin other than pow.

> If it should have stayed: how should it's implementation have looked like?

Here is how cmp is documented: "The return value is negative if x < y,
zero if x == y and strictly positive if x > y."

So if it were returned as a built-in, the above documentation suggests
the following implementation:

def cmp(x,y):
if x < y: return -1
if x == y: return 0
if x > y: return 1
raise ValueError('arguments to cmp are not well-ordered')

(Another, maybe better, option would be to implement it so as to have
the same expectations as list.sort, which I believe only requires
__eq__ and __gt__.)

> If it was ok to remove it: how are people supposed to fill out the cmp=
> argument in cases where they use the cmp() builtin in 2.x?

Since it's trivial to implement, they can just write their own cmp
function, and as an added bonus they can work around any peculiarities
with an incomplete comparison set.

Carl Banks

Steven D'Aprano

unread,

Mar 25, 2011, 5:46:52 AM3/25/11

to

On Thu, 24 Mar 2011 18:32:11 -0700, Carl Banks wrote:

> It's probably the least justified builtin other than pow.

I don't know about that. Correctly, efficiently and *quickly*
implementing the three-argument version of pow is exactly the sort of
thing that should be in the built-ins, or at least the standard library.

[steve@sylar ~]$ python3 -m timeit -s "n=2" "(n**30000)%123456789"
1000 loops, best of 3: 633 usec per loop
[steve@sylar obfuscate]$ python3 -m timeit -s "n=2" "pow(n, 30000,
123456789)"
100000 loops, best of 3: 6.03 usec per loop

That's two orders of magnitude faster.

(You need the n=2 to defeat Python's compile-time constant folding,
otherwise timing measurements will be misleading.)

--
Steven

Paul Rubin

unread,

Mar 25, 2011, 6:01:05 AM3/25/11

to

Steven D'Aprano <steve+comp....@pearwood.info> writes:
> I don't know about that. Correctly, efficiently and *quickly*
> implementing the three-argument version of pow is exactly the sort of
> thing that should be in the built-ins, or at least the standard library.

There is also math.pow which works slightly differently from built-in
pow, sigh.

Stefan Behnel

unread,

Mar 25, 2011, 6:11:16 AM3/25/11

to pytho...@python.org

Steven D'Aprano, 25.03.2011 06:46:

> On Thu, 24 Mar 2011 18:32:11 -0700, Carl Banks wrote:
>
>> It's probably the least justified builtin other than pow.
>
> I don't know about that. Correctly, efficiently and *quickly*
> implementing the three-argument version of pow is exactly the sort of
> thing that should be in the built-ins, or at least the standard library.

I think that touches it already. We have a "math" module, so why is there a
*builtin* for doing special math? How much code is there really that only
uses pow() and does not import "math"?

Stefan

Antoon Pardon

unread,

Mar 25, 2011, 9:21:35 AM3/25/11

to pytho...@python.org

On Thu, Mar 24, 2011 at 11:49:53PM +0000, Steven D'Aprano wrote:
> On Thu, 24 Mar 2011 17:47:05 +0100, Antoon Pardon wrote:
>
> > However since that seems to be a problem for you I will be more
> > detailed. The original poster didn't ask for cases in which cmp was
> > necessary, he asked for cases in which not using cmp was cumbersome.
>
> I'm the original poster, and that's not what I said. I said:
>
> "If anyone has any use-cases for sorting with a comparison function that
> either can't be written using a key function, or that perform really
> badly when done so, this would be a good time to speak up."
>
> You'll notice that I said nothing about whether writing the code was easy
> or cumbersome, and nothing about readability.

Well fine. I should have realised the question was just a pretense and that
there really never was any intention to consider the reactions, because
the answer is already fixed. Of course a key function can always be written,
it may just need a specific class to implement the specific order. Likewise
there is no reason to expect the order-functions to preform worse when
implemented in a class, rather than in a function.

--
Antoon Pardon

Stefan Behnel

unread,

Mar 25, 2011, 10:05:17 AM3/25/11

to pytho...@python.org

Antoon Pardon, 25.03.2011 10:21:

> On Thu, Mar 24, 2011 at 11:49:53PM +0000, Steven D'Aprano wrote:

>> On Thu, 24 Mar 2011 17:47:05 +0100, Antoon Pardon wrote:
>>
>>> However since that seems to be a problem for you I will be more
>>> detailed. The original poster didn't ask for cases in which cmp was
>>> necessary, he asked for cases in which not using cmp was cumbersome.
>>
>> I'm the original poster, and that's not what I said. I said:
>>
>> "If anyone has any use-cases for sorting with a comparison function that
>> either can't be written using a key function, or that perform really
>> badly when done so, this would be a good time to speak up."
>>
>> You'll notice that I said nothing about whether writing the code was easy
>> or cumbersome, and nothing about readability.
>

> Well fine. I should have realised the question was just a pretense and that
> there really never was any intention to consider the reactions, because
> the answer is already fixed.

I don't think it is. It's just not as simple as "see, I have an artificial
use case here, so you *must* put the feature back in the language".

Stefan

Antoon Pardon

unread,

Mar 25, 2011, 12:56:23 PM3/25/11

to pytho...@python.org

On Fri, Mar 25, 2011 at 11:05:17AM +0100, Stefan Behnel wrote:
> Antoon Pardon, 25.03.2011 10:21:
> >On Thu, Mar 24, 2011 at 11:49:53PM +0000, Steven D'Aprano wrote:

> >>On Thu, 24 Mar 2011 17:47:05 +0100, Antoon Pardon wrote:
> >>
> >>>However since that seems to be a problem for you I will be more
> >>>detailed. The original poster didn't ask for cases in which cmp was
> >>>necessary, he asked for cases in which not using cmp was cumbersome.
> >>
> >>I'm the original poster, and that's not what I said. I said:
> >>
> >>"If anyone has any use-cases for sorting with a comparison function that
> >>either can't be written using a key function, or that perform really
> >>badly when done so, this would be a good time to speak up."
> >>
> >>You'll notice that I said nothing about whether writing the code was easy
> >>or cumbersome, and nothing about readability.
> >

> >Well fine. I should have realised the question was just a pretense and that
> >there really never was any intention to consider the reactions, because
> >the answer is already fixed.
>
> I don't think it is. It's just not as simple as "see, I have an
> artificial use case here, so you *must* put the feature back in the
> language".

Look we are provided with the cmp_to_key function. If something doesn't work
with that function or performs realy bad with that function, it means either
the function has a bug in it or something the function relies on, has a bug
in it. In either case you just fix the bug.

My guess is that cmp_to_key is implemented similarly as my example below.
So we simply have a constant overhead before we call the user provided
cmp function on the items to be sorted. So if this would fail to sort the
items or started to perform really badly (more than can be expected
by the overhead illustrated below) it would indicate an internal python
problem, either in the implementation of cmp_to_key or elsewhere.

Well if python has an internal problem, my expectation is that they will
fix the problem, instead of providing a cmp argument to sort as a work around.
So the outcome was fixed from the beginning.

def cmp_to_key(func):

class cmpkey:
def __init__(self, arg):
self.data = arg
self.cmp = func

def __lt__(self, other):
return self.cmp(self.data, other.data) < 0

def __le__(self, other):
return self.cmp(self.data, other.data) <= 0

...

return cmpkey

Westley Martínez

unread,

Mar 25, 2011, 1:39:05 PM3/25/11

to pytho...@python.org

On Fri, 2011-03-25 at 07:11 +0100, Stefan Behnel wrote:
> Steven D'Aprano, 25.03.2011 06:46:

> > On Thu, 24 Mar 2011 18:32:11 -0700, Carl Banks wrote:
> >
> >> It's probably the least justified builtin other than pow.
> >
> > I don't know about that. Correctly, efficiently and *quickly*
> > implementing the three-argument version of pow is exactly the sort of
> > thing that should be in the built-ins, or at least the standard library.
>

> I think that touches it already. We have a "math" module, so why is there a
> *builtin* for doing special math? How much code is there really that only
> uses pow() and does not import "math"?
>
> Stefan
>

pow() and math.pow() are different.

Stefan Behnel

unread,

Mar 25, 2011, 2:00:42 PM3/25/11

to pytho...@python.org

Westley Martínez, 25.03.2011 14:39:

> On Fri, 2011-03-25 at 07:11 +0100, Stefan Behnel wrote:
>> Steven D'Aprano, 25.03.2011 06:46:

>>> On Thu, 24 Mar 2011 18:32:11 -0700, Carl Banks wrote:
>>>
>>>> It's probably the least justified builtin other than pow.
>>>
>>> I don't know about that. Correctly, efficiently and *quickly*
>>> implementing the three-argument version of pow is exactly the sort of
>>> thing that should be in the built-ins, or at least the standard library.
>>

>> I think that touches it already. We have a "math" module, so why is there a
>> *builtin* for doing special math? How much code is there really that only
>> uses pow() and does not import "math"?
>

> pow() and math.pow() are different.

I don't find that a good excuse for pow() being a builtin. Why not just
rename it to "math.powmod" instead?

Stefan

Steven D'Aprano

unread,

Mar 25, 2011, 10:06:59 PM3/25/11

to

The reason Guido is considering re-introducing cmp is that somebody at
Google approached him with a use-case where a key-based sort did not
work. The use-case was that the user had masses of data, too much data
for the added overhead of Decorate-Sort-Undecorate (which is what key
does), but didn't care if it took a day or two to sort.

So there is at least one use-case for preferring slowly sorting with a
comparison function over key-based sorting. I asked if there any others.
It seems not.

--
Steven

Steven D'Aprano

unread,

Mar 25, 2011, 10:40:03 PM3/25/11

to

On Fri, 25 Mar 2011 13:56:23 +0100, Antoon Pardon wrote:

> Look we are provided with the cmp_to_key function. If something doesn't
> work with that function or performs realy bad with that function, it
> means either the function has a bug in it or something the function
> relies on, has a bug in it. In either case you just fix the bug.

No, it means that the trade-offs made by cmp_to_key, or by sorting with a
key function, do not interact well with what you are trying to do.

An analogy: Python lists are contiguous arrays of pointers. When the
array is full, Python automatically doubles the size of the list. This is
a good trade-off of space (memory) for time, because it means that list
access is O(1), searching is O(N), and appending is O(1) on average. At
worst, you waste 50% of the memory used, which is pretty good.

But if you're programming for a device with 8 K of memory (if such a
thing even exists these days!), this would be a *terrible* strategy.
(Python wouldn't even fit in 8K, but never mind that...) In this case,
memory is more precious than programmer convenience, or even execution
speed. It's not enough to hand-wave and say "Python lists have a bug that
needs fixing", because that's simply not the case. It's that the design
of Python lists doesn't suit the use-case, and if you try it, it will
perform badly or not at all: by design, Python lists assume memory will
be plentiful and time will be short.

That is exactly the same trade-off key-based sorting makes: it uses more
memory to speed up sorting. This is a good strategy for Python, because
Python already requires lots of memory (so no major loss there), and is a
rather slow language (so speed-ups help).

So the question is, being completely general, as there any *real-world*
(not artificial, made-up) uses where sorting with a comparison function
works but a key function doesn't? I'm not limiting you to cases where you
have a shortage of memory but lots of time available. There may be other
design choices where key-based sorting sucks. We already know that some
people just prefer the look and feel of writing and using cmp functions.
Is there anything else?

--
Steven

Carl Banks

unread,

Mar 25, 2011, 11:04:02 PM3/25/11

to

On Mar 25, 3:06 pm, Steven D'Aprano <steve

+comp.lang.pyt...@pearwood.info> wrote:
> The reason Guido is considering re-introducing cmp is that somebody at
> Google approached him with a use-case where a key-based sort did not
> work. The use-case was that the user had masses of data, too much data
> for the added overhead of Decorate-Sort-Undecorate (which is what key
> does), but didn't care if it took a day or two to sort.
>
> So there is at least one use-case for preferring slowly sorting with a
> comparison function over key-based sorting. I asked if there any others.
> It seems not.

1. You asked for a specific kind of use case. Antoon gave you a use
case, you told him that wasn't the kind of use case you were asking
for, then you turn around and say "I guess there are no use
cases" (without the mentioning qualification).

2. I posted two use cases in this thread that fit your criteria, and
you followed up to that subthread so you most likely read them. Here
they are again so you won't overlook them this time:

"You have are given an obscure string collating function implented in
a C library you don't have the source to." (Fits your criterion
"can't be done with key=".)

"I'm sitting at an interactive session and I have a
convenient cmp function but no convenient key, and I care more about
the four minutes it'd take to whip up a clever key function or an
adapter class than the 0.2 seconds I'd save to on sorting

time." (Fits your criterion "performs really badly when done so".)

3. You evidently also overlooked the use-case example posted on Python-
dev that you followed up to.

Call me crazy, but you seem to be overlooking a lot of things in your
zeal to prove your point.

Carl Banks

OKB (not okblacke)

unread,

Mar 26, 2011, 8:14:28 AM3/26/11

to

Steven D'Aprano wrote:

That is one use case for preferring cmp. Another use case is "I
can easily think of how to write the cmp function and not the key
function and I don't care about either speed or memory usage". There's
more to practicality than code execution speed.

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown

Steven D'Aprano

unread,

Mar 26, 2011, 8:49:42 AM3/26/11

to

On Fri, 25 Mar 2011 16:04:02 -0700, Carl Banks wrote:
> On Mar 25, 3:06 pm, Steven D'Aprano <steve
> +comp.lang.pyt...@pearwood.info> wrote:
> > The reason Guido is considering re-introducing cmp is that somebody at
> > Google approached him with a use-case where a key-based sort did not
> > work. The use-case was that the user had masses of data, too much data
> > for the added overhead of Decorate-Sort-Undecorate (which is what key
> > does), but didn't care if it took a day or two to sort.
> >
> > So there is at least one use-case for preferring slowly sorting with a
> > comparison function over key-based sorting. I asked if there any
> > others. It seems not.

> 1. You asked for a specific kind of use case. Antoon gave you a use
> case, you told him that wasn't the kind of use case you were asking for,
> then you turn around and say "I guess there are no use cases" (without
> the mentioning qualification).

That is astonishing. Did you read my post before going on the attack? I
know that there is a grand tradition of adversarial argument on Usenet,
but I am not your enemy here.

First you wrongly accuse me of stating that there are no use cases,
without qualification, when the very text you quote has me giving a use-
case. Nor did I say there are no use cases -- I said that I didn't think
there were any *others*. You even quote me saying this!

Then you go on to wrongly claim:

> 3. You evidently also overlooked the use-case example posted on Python-
> dev that you followed up to.

No, I did not overlook it. I explicitly described it twice. Once in the
very text you quoted above, and once early in the thread when I wrote:

"Guido himself has mentioned one such good use for a comparison
function when sorting. Use of a key function trades off memory for time,
while sorting with a comparison function makes the opposite trade off,
using more time for the sake of saving memory."

> 2. I posted two use cases in this thread that fit your criteria, and you
> followed up to that subthread so you most likely read them. Here they
> are again so you won't overlook them this time:
>
> "You have are given an obscure string collating function implented in a
> C library you don't have the source to." (Fits your criterion "can't be
> done with key=".)

I must admit I don't understand this use-case. What is a string collating
function, and what does this have to do with sorting? I could try to
guess from the plain English meaning of the words, but if I get it wrong,
that wouldn't be fair to you. Perhaps an example would help.

> "I'm sitting at an interactive session and I have a convenient cmp
> function but no convenient key, and I care more about the four minutes
> it'd take to whip up a clever key function or an adapter class than the
> 0.2 seconds I'd save to on sorting time." (Fits your criterion
> "performs really badly when done so".)

"I can't be bothered to write a key function" is not equivalent to "I
have a key function, but it is too slow to use".

In any case, from Python 3.2 onwards, a key function is never more than
one import and one function call away. (In 3.1, you're stuck writing your
own adapter, but in 3.1 sort doesn't take a cmp function anyway, so
either way you're stuck. The earliest it could be reintroduced, if at
all, is in 3.3.)

> Call me crazy, but you seem to be overlooking a lot of things in your
> zeal to prove your point.

And what point would that be?

Carl, anybody could have asked the question I have asked. Anyone is free
to summarize the best of the arguments -- or all of them -- and post them
to python-dev. If you think I have some ulterior motive in doing this,
please feel free to champion the cmp argument to python-dev instead.

--
Steven

Mark Dickinson

unread,

Mar 26, 2011, 11:39:50 AM3/26/11

to

+1 (modulo bikeshedding about the name). I've long thought that three-
argument pow belongs in the standard library rather than the core.
And then the existence of the ** operator obviates the need for two-
argument pow.

Worse than pow itself is the fact that the __pow__ special method
takes an optional 3rd argument, which apart from unnecessarily
complicating Python's internal, also leads to horrors like Decimal's
(IMO inappropriate) support for 3-argument pow.

--
Mark

Terry Reedy

unread,

Mar 26, 2011, 5:53:17 PM3/26/11

to pytho...@python.org

On 3/26/2011 4:49 AM, Steven D'Aprano wrote:
> On Fri, 25 Mar 2011 16:04:02 -0700, Carl Banks wrote:

>> Call me crazy,

I would call you a bit ungrateful. Steven could have stayed silent
instead of making himself a target by informing interested people on
this list about the pydev post by Guido.

>> but you seem to be overlooking a lot of things in your
>> zeal to prove your point.

The only point I have seen him 'push' it that it will take a good
argument to move Guido (and other developers) from 'possibly' to "I (we)
made a mistake. Let's revert."

Don't shoot the messenger.

> And what point would that be?

That you like Python and want to see it improved even further, as do I ;-).

--
Terry Jan Reedy

Raymond Hettinger

unread,

Mar 27, 2011, 12:55:07 AM3/27/11

to

On Mar 26, 4:39 am, Mark Dickinson <dicki...@gmail.com> wrote:
> On Mar 25, 2:00 pm, Stefan Behnel <stefan...@behnel.de> wrote:
>
>
>
>
>
>
>
>
>
> > Westley Martínez, 25.03.2011 14:39:
>
> > > On Fri, 2011-03-25 at 07:11 +0100, Stefan Behnel wrote:
> > >> Steven D'Aprano, 25.03.2011 06:46:
> > >>> On Thu, 24 Mar 2011 18:32:11 -0700, Carl Banks wrote:
>
> > >>>> It's probably the least justified builtin other than pow.
>
> > >>> I don't know about that. Correctly, efficiently and *quickly*
> > >>> implementing the three-argument version of pow is exactly the sort of
> > >>> thing that should be in the built-ins, or at least the standard library.
>
> > >> I think that touches it already. We have a "math" module, so why is there a
> > >> *builtin* for doing special math? How much code is there really that only
> > >> uses pow() and does not import "math"?
>
> > > pow() and math.pow() are different.
>
> > I don't find that a good excuse for pow() being a builtin. Why not just
> > rename it to "math.powmod" instead?
>
> +1 (modulo bikeshedding about the name). I've long thought that three-
> argument pow belongs in the standard library rather than the core.
> And then the existence of the ** operator obviates the need for two-
> argument pow.

IMO, there is far too much willingness to churn long-standing APIs for
nearly zero benefit. In my decade as a core Python developer, I've
come view deprecations as things that cause unnecessary pain for users
and is partially to blame for so many organizations staying with
older, unmaintained versions of Python.

When it comes to moving pow() to another module or changing the
signature for the __pow__ magic method, there's not much of a win to
be had, but it will affect tons of existing code (including examples
in printed books where pow() is commonly used in examples).

I think we (as a group) need to develop a strong aversion to breaking
existing APIs and instead focus on making existing APIs more robust or
giving them useful extensions.

The Python language is not like a personal library where changing APIs
has limited (and predictable costs).

Raymond

Antoon Pardon

unread,

Mar 28, 2011, 9:06:20 AM3/28/11

to pytho...@python.org

On Fri, Mar 25, 2011 at 10:40:03PM +0000, Steven D'Aprano wrote:
> On Fri, 25 Mar 2011 13:56:23 +0100, Antoon Pardon wrote:
>
> > Look we are provided with the cmp_to_key function. If something doesn't
> > work with that function or performs realy bad with that function, it
> > means either the function has a bug in it or something the function
> > relies on, has a bug in it. In either case you just fix the bug.
>
> No, it means that the trade-offs made by cmp_to_key, or by sorting with a
> key function, do not interact well with what you are trying to do.

So and if you find something like that, what stops the dev team from
saying that python is just not suited to deal with such a problem?
Specifically as your analogy points in that direction as your dillema has
a strategy that is good for python on the one hand and a problem not
suitable for python on the other hand.

> [analgy of python lists, that double in size when full vs
> memory management on a 8K device ]

> So the question is, being completely general, as there any *real-world*
> (not artificial, made-up) uses where sorting with a comparison function
> works but a key function doesn't? I'm not limiting you to cases where you
> have a shortage of memory but lots of time available. There may be other
> design choices where key-based sorting sucks. We already know that some
> people just prefer the look and feel of writing and using cmp functions.
> Is there anything else?

Asking for *real-world* uses is just how the python community smothers
requests. Should someone come with a use case, I expect the following
to happenr: One will go over the case with magnifying glasses in
search for any possible way in which it could be rewritten so as not
to need the cmp anyway. This will probably not be a big problem because
the person with the use case will probably have given a simplified example
to explain the overal picture. However most people responding will
ingnore the overal picture and focus on details that may very well work
on the simplified example but may not in the specific real-world case.
But since going over all details to argue the case will be too time
consuming, in the end the person with the use case will simply fail to
convince.

Forcing people to use a key-function, will produce cases where python
will ask for more memory and take longer to sort, than allowing to provide
a cmp function. This for the simple reason that for some order-functions,
the only way to write a key-function will be to write a specific class,
that will implement the order with the same kind of code that otherwise
would have been put in the cmp function, making the trade off: memory use
vs speed no longer a trade off.

If the existance alone of such cases isn't enough to reverse the decision
about the cmp-argument of the sort-method. My guess is that nothing will.

--
Antoon Pardon

Steven D'Aprano

unread,

Mar 28, 2011, 11:35:09 AM3/28/11

to

On Mon, 28 Mar 2011 11:06:20 +0200, Antoon Pardon wrote:

> On Fri, Mar 25, 2011 at 10:40:03PM +0000, Steven D'Aprano wrote:
>> On Fri, 25 Mar 2011 13:56:23 +0100, Antoon Pardon wrote:
>>
>> > Look we are provided with the cmp_to_key function. If something
>> > doesn't work with that function or performs realy bad with that
>> > function, it means either the function has a bug in it or something
>> > the function relies on, has a bug in it. In either case you just fix
>> > the bug.
>>
>> No, it means that the trade-offs made by cmp_to_key, or by sorting with
>> a key function, do not interact well with what you are trying to do.
>
> So and if you find something like that, what stops the dev team from
> saying that python is just not suited to deal with such a problem?

There are lots of problems that Python is not well-suited for.

It is not well-suited for writing the driver to a graphics card.

It is not well-suited to calculating pi to a trillion decimal places.

It is not well-suited for application development on a device with 512K
of RAM.

It is not well-suited for copying C idioms exactly, or Java, or Perl.

It is not well-suited for doing a lot of bit-twiddling very quickly.

That's why there is more than one programming language: because no one
language can be well-suited for everything.

[...]

> Asking for *real-world* uses is just how the python community smothers
> requests.

99% of requests should be smothered. That is a *good thing*. Insisting on
real-world uses prevents the developers from wasting their valuable time
bloating Python with thousands of badly desired anti-features that are
slow, unreliable, buggy, useless or unpopular.

[...]

> Forcing people to use a key-function, will produce cases where python
> will ask for more memory and take longer to sort, than allowing to
> provide a cmp function.

More memory yes; take longer to sort, almost certainly not (except,
perhaps, for a very narrow window where the addition of a key causes the
application to page out to virtual memory, but a comparison function
wouldn't -- and even then, the key function might still be faster).

But we've already covered the "more memory" angle to death. We already
understand that. This is why I have asked if there are any *other* use-
cases for a comparison function.

Dan Stromberg has suggested that another use-case would be lazy-
comparisons, where you might not be able to generate a single key
cheaply, but you can perform a comparison lazily.

> This for the simple reason that for some
> order-functions, the only way to write a key-function will be to write a
> specific class, that will implement the order with the same kind of code
> that otherwise would have been put in the cmp function,

There is a recipe for that on ActiveState's website, google for
cmp_to_key. That recipe has been added to Python's standard library for
version 3.2. Please stop flogging that dead horse.

> making the trade off: memory use vs speed no longer a trade off.

You want to use a comparison function: you can still do this. Instead of
calling

data.sort(cmp=myfunc)

you call

data.sort(key=functools.cmp_to_key(myfunc))

which is a little longer to type, and uses a little more memory, but
otherwise is no worse than using cmp.

> If the existance alone of such cases isn't enough to reverse the
> decision about the cmp-argument of the sort-method. My guess is that
> nothing will.

Existence of which cases? Made-up imaginary cases that don't actually
happen? Silly cases that can easily be converted to key comparisons, and
run much faster if you do so? Trivial cases that can trivially be
converted to key functions? "I can't be bothered to write a key
function", to paraphrase one of the posters in this thread -- these are
hardly convincing.

So far I have one actual use-case: Guido's example of sorting a huge list
where the amount of time it takes doesn't matter; plus one hypothetical
but plausible use-case: sorting data where the comparison is expensive
but lazy. (And to give Carl Banks the benefit of the doubt, one that I
don't understand, about interfacing with "an obscure C library".)

--
Steven

Antoon Pardon

unread,

Mar 28, 2011, 12:39:04 PM3/28/11

to pytho...@python.org

The question is: Why do you need a use case for this? Should the decision
to remove the cmp-argument, rest only on the fact whether or not current
users are processing such masses of data without considering the possibility
of future users processing such masses of data?

Isn't it obvious that if you force the users into using more memory, that
sooner or later, there will be a user for which this demand of extra memory
will be a problem?

The question is also, whether this really is a use case. Why don't you
advise this person to buy a computer with more memory? One could simply
see this as a problem of too little resources instead of a problem for
python.

IMO, the dev-team can look at this in two different ways. (1) If the
machine has enough resource to contain enormous lists of data, then
it should be possible to sort these data with only little extra memory.
or (2) Sorting will take extra memory proportional to the length of the
list you will try to sort. If your computer doesn't have this extra
memory, your computer just lacks the resource to sort this large
data with python.

It is this design decision that should guide whether or not to have
a cmp-argument. Not the fact whether or not there are now users with
a data set that still fits into a python program on their computer
but for which this data set is too large for python to sort on that
computer.

> So there is at least one use-case for preferring slowly sorting with a
> comparison function over key-based sorting. I asked if there any others.
> It seems not.

How about the possibility of cases where it isn't a choice between slowly
sorting with a comparison function over a faster key-based sorting but
where the choice is over sorting with a comparison function over a slower
and more memory consuming key-based sorting?

I tried to sort lists of 10000 elemens. Each element a tuple two items that
was to be sorted first according to the first item in ascending order, then
according to the second item in descending order. This was done on python 2.6
so I had to write my own cmp_to_key function.

I then sorted these lists of 10000 elements a 1000 times with the following
methods.

1) a cmp-function
2) key = cmp_to_key
3) key = wrapper class around cmp-function
4) key = class specifically written to implement the ordering.

This was done, once with number tuples and and once with string tuples.
The result in seconds

numbers strings

1 63.8 74.3
2 155.3 169.6
3 156.1 170.2
4 96.2 104.4

Now you can of course ignore those numbers because they don't come from a real
world example and thus consider such numbers unimportant until someone in the
future has a case similar enough with the above and wonders why sorting his
data is so slow.

Duncan Booth

unread,

Mar 28, 2011, 3:07:39 PM3/28/11

to

Antoon Pardon <Antoon...@rece.vub.ac.be> wrote:

> I tried to sort lists of 10000 elemens. Each element a tuple two items
> that was to be sorted first according to the first item in ascending
> order, then according to the second item in descending order. This was
> done on python 2.6 so I had to write my own cmp_to_key function.
>
> I then sorted these lists of 10000 elements a 1000 times with the
> following methods.
>
> 1) a cmp-function
> 2) key = cmp_to_key
> 3) key = wrapper class around cmp-function
> 4) key = class specifically written to implement the ordering.
>
> This was done, once with number tuples and and once with string
> tuples. The result in seconds
>
> numbers strings
>
> 1 63.8 74.3
> 2 155.3 169.6
> 3 156.1 170.2
> 4 96.2 104.4
>
> Now you can of course ignore those numbers because they don't come
> from a real world example and thus consider such numbers unimportant
> until someone in the future has a case similar enough with the above
> and wonders why sorting his data is so slow.
>

Out of interest, what happens to your timings if you do the sort just
using simple key functions?

Since you've got two keys you will of course need to make two calls to
'sort'. e.g.

data.sort(key=itemgetter(1), reverse=True)
data.sort(key=itemgetter(0))

--
Duncan Booth http://kupuguy.blogspot.com

Steven D'Aprano

unread,

Mar 28, 2011, 3:43:03 PM3/28/11

to

On Mon, 28 Mar 2011 14:39:04 +0200, Antoon Pardon wrote:

> I tried to sort lists of 10000 elemens. Each element a tuple two items
> that was to be sorted first according to the first item in ascending
> order, then according to the second item in descending order. This was
> done on python 2.6 so I had to write my own cmp_to_key function.
>
> I then sorted these lists of 10000 elements a 1000 times with the
> following methods.

Thank you for spending the time to get some hard data, but I can't
replicate your results since you haven't shown your code. Rather than
attempt to guess what you did and duplicate it, I instead came up with my
own timing measurements. Results are shown here, my code follows below:

[steve@sylar ~]$ python2.7 sort_test.py
Comparison function: 12.1256039143
Key function: 3.51603388786
Double sort: 2.33165812492
cmp_to_key: 28.1129128933

By far the best result comes from two calls to sort. Not far off is the
key function solution. (Admittedly, coming up with a key function for non-
numeric data would be challenging.) The comparison function, and the
cmp_to_key solution, are clearly *much* slower.

I may do another test with larger lists, and leave it running overnight
and see what happens.

[code]

from random import random, shuffle
from operator import itemgetter
from timeit import Timer
from functools import cmp_to_key

def make_data(n):
x = range(n)*2
y = range(n)*2
shuffle(x)
shuffle(y)
data = zip(x, y)
assert len(data) == 2*n
return data

# Sort according to the first item in ascending order, then by second item
# in descending order:

def cmp_func(t1, t2):
return cmp(t1[0], t2[0]) or cmp(t2[1], t1[1])

def key_func(t):
return (t[0], -t[1])

def sorter(d):
d = sorted(d, key=itemgetter(1), reverse=True)
return sorted(d, key=itemgetter(0))

# Check that all sort methods give the same result.
data = make_data(20)
assert sorted(data, cmp=cmp_func) == sorted(data, key=key_func) == \
sorter(data) == sorted(data, key=cmp_to_key(cmp_func))

data = make_data(5000)
assert len(data) == 10000
setup = "from __main__ import cmp_to_key, cmp_func, key_func, sorter,
data"

t1 = Timer("sorted(data, cmp=cmp_func)", setup)
t2 = Timer("sorted(data, key=key_func)", setup)
t3 = Timer("sorter(data)", setup)
t4 = Timer("sorted(data, key=cmp_to_key(cmp_func))", setup)

print "Comparison function:",
print min(t1.repeat(number=100, repeat=7))
print "Key function:",
print min(t2.repeat(number=100, repeat=7))
print "Double sort:",
print min(t3.repeat(number=100, repeat=7))
print "cmp_to_key:",
print min(t4.repeat(number=100, repeat=7))

[end code]

--
Steven

Fons Adriaensen

unread,

Mar 28, 2011, 10:36:36 PM3/28/11

to pytho...@python.org

On Mon, Mar 28, 2011 at 11:06:20AM +0200, Antoon Pardon wrote:

> Asking for *real-world* uses is just how the python community smothers
> requests.

It's quite a common strategy, I've seen it used in many
contexts. Which doesn't make it any more acceptable of
course.

> Should someone come with a use case, I expect the following
> to happenr: One will go over the case with magnifying glasses in
> search for any possible way in which it could be rewritten so as not
> to need the cmp anyway. This will probably not be a big problem because
> the person with the use case will probably have given a simplified example
> to explain the overal picture. However most people responding will
> ingnore the overal picture and focus on details that may very well work
> on the simplified example but may not in the specific real-world case.
> But since going over all details to argue the case will be too time
> consuming, in the end the person with the use case will simply fail to
> convince.

An accurate description of the process.

> Forcing people to use a key-function, will produce cases where python
> will ask for more memory and take longer to sort, than allowing to provide

> a cmp function. This for the simple reason that for some order-functions,

> the only way to write a key-function will be to write a specific class,
> that will implement the order with the same kind of code that otherwise

> would have been put in the cmp function, making the trade off: memory use

> vs speed no longer a trade off.

If cmp_to_key() would do some (probably impossible) magic - factoring
the user's cmp() into a non-trivial key mapping and a simpler and more
efficient compare - then it would make sense. But it doesn't do such
a thing. The 'key' it uses is just a copy, and the compare that will
be used finally is just the user's one. Nothing gained, just overhead
added.

> If the existance alone of such cases isn't enough to reverse the decision
> about the cmp-argument of the sort-method. My guess is that nothing will.

Sadly enough, that is probably a good guess.

Ciao,

--
FA

Raymond Hettinger

unread,

Mar 29, 2011, 1:09:18 AM3/29/11

to

On Mar 28, 8:43 am, Steven D'Aprano <steve

+comp.lang.pyt...@pearwood.info> wrote:
> Thank you for spending the time to get some hard data, but I can't
> replicate your results since you haven't shown your code. Rather than
> attempt to guess what you did and duplicate it, I instead came up with my
> own timing measurements. Results are shown here, my code follows below:
>
> [steve@sylar ~]$ python2.7 sort_test.py
> Comparison function: 12.1256039143
> Key function: 3.51603388786
> Double sort: 2.33165812492
> cmp_to_key: 28.1129128933
>
> By far the best result comes from two calls to sort. Not far off is the
> key function solution. (Admittedly, coming up with a key function for non-
> numeric data would be challenging.) The comparison function, and the
> cmp_to_key solution, are clearly *much* slower.

On Mar 28, 8:43 am, Steven D'Aprano <steve

+comp.lang.pyt...@pearwood.info> wrote:
> Thank you for spending the time to get some hard data, but I can't
> replicate your results since you haven't shown your code. Rather than
> attempt to guess what you did and duplicate it, I instead came up with my
> own timing measurements. Results are shown here, my code follows below:
>
> [steve@sylar ~]$ python2.7 sort_test.py
> Comparison function: 12.1256039143
> Key function: 3.51603388786
> Double sort: 2.33165812492
> cmp_to_key: 28.1129128933
>
> By far the best result comes from two calls to sort. Not far off is the
> key function solution. (Admittedly, coming up with a key function for non-
> numeric data would be challenging.) The comparison function, and the
> cmp_to_key solution, are clearly *much* slower.

There is less here than meets the eye. The cmp-function dispatch in
cmp_to_key() is written in pure Python. Essentially, the timings are
showing the already well-known fact that call forwarding is faster in
C than in pure python.

Each of the O(n log n) comparisons is run through pure Python code
that like this:

def __lt__(self, other):
# mycmp is nested scope variable pointing to
# the user-defined cmp-function
return mycmp(self.obj, other.obj) < 0

I intend to add a C version of cmp_to_key() so that no trips around
the eval-loop are needed. It should be about as efficient as
bound_method dispatch (very fast), leaving the user-supplied cmp-
function as the dominant cost in the handful of cases where the
superfast O(n) key-function approach can't be used for some reason.

The memory overhead should either stay the same or drop slightly
(currently at 14 bytes per element on a 32-bit build and 28 bytes per
element on a 64-bit build).

Also note that running timings on Py2.7 instead of Py3.2 disadvantages
key-functions. In 2.7, key-functions use sortwrapper objects which
were removed in 3.2 (saving memory and reducing dispatch time
considerably). Also, in Py2.7 cmp logic is called even when a key-
function is defined (because key-function logic was wrapped around the
already existing sort with cmp-logic) so you pay the price of both
(i.e. creating a 2-tuple for cmp arguments on every call). In Py3.x
the cmp logic is gone, making the remaining key-function logic even
faster. IOW, key-function logic is way faster and more space
efficient in 3.2.

One of the implications of the last paragraph is that if 2.7 style cmp-
logic were reinstated in 3.3, not only would it clutter the API with
two-ways-to-do-it, it would also disadvantage make the common case
(using key-functions) run much more slowly. In other words, the
performance mavens will have shot themselves (and us) in the foot.

Raymond

Paul Rubin

unread,

Mar 29, 2011, 1:58:25 AM3/29/11

to

Steven D'Aprano <steve+comp....@pearwood.info> writes:

> There are lots of problems that Python is not well-suited for.
> It is not well-suited for writing the driver to a graphics card.
> It is not well-suited to calculating pi to a trillion decimal places.

>...

But sorting moderate amounts of data on complicated criteria is an area
Python IS supposed to be well-suited for.

> 99% of requests should be smothered. That is a *good thing*.

We're not talking about a "request" for a feature to be added. There
was an existing feature that worked perfectly well, that was removed for
some ideological reason that makes no sense to me, with some slight
allusions to technical reasons that I've also never completely
understood.

> More memory yes; take longer to sort, almost certainly not (except,
> perhaps, for a very narrow window where the addition of a key causes

I don't see how using something like cmp_to_key could ever fail to slow
the program down. It's calling the exact same comparison function the
same number of times, but also creating and releasing a pile of class
instances, and doing a lot of runtime lookups to access the comparison
function in each instance on each comparison.

> data.sort(key=functools.cmp_to_key(myfunc))
> which is a little longer to type, and uses a little more memory, but
> otherwise is no worse than using cmp.

I don't believe that for one second. Have you actually timed it? Like,
sort an array of 100000 random (int,int) pairs ascending on the first
int and descending on the second, using a simple comparison function and
using cmp_to_key? That's Anton's example, and by coincidence that
exact situation came up in something I'm hacking on RIGHT NOW. I
have (roughly) a list of transactions in the form
(person's name, timestamp, transaction info)
and I want to print a report sorted alphabetically by person, listing
each person's transactions most-recent-first. Of course I was able
to concoct a solution with a few moments of head-scratching, but
using a cmp argument is much more natural and transparent in my
opinion. DSU is a clever and useful design pattern, but comparison
sorting is what all the sorting textbooks are written about.

Antoon Pardon

unread,

Mar 29, 2011, 7:52:40 AM3/29/11

to pytho...@python.org

On Mon, Mar 28, 2011 at 11:35:09AM +0000, Steven D'Aprano wrote:

> [...]
> > Forcing people to use a key-function, will produce cases where python
> > will ask for more memory and take longer to sort, than allowing to
> > provide a cmp function.
>
> More memory yes; take longer to sort, almost certainly not (except,
> perhaps, for a very narrow window where the addition of a key causes the
> application to page out to virtual memory, but a comparison function
> wouldn't -- and even then, the key function might still be faster).
>
> But we've already covered the "more memory" angle to death. We already
> understand that. This is why I have asked if there are any *other* use-
> cases for a comparison function.
>
> Dan Stromberg has suggested that another use-case would be lazy-
> comparisons, where you might not be able to generate a single key
> cheaply, but you can perform a comparison lazily.
>
>
> > This for the simple reason that for some
> > order-functions, the only way to write a key-function will be to write a
> > specific class, that will implement the order with the same kind of code
> > that otherwise would have been put in the cmp function,
>
> There is a recipe for that on ActiveState's website, google for
> cmp_to_key. That recipe has been added to Python's standard library for
> version 3.2. Please stop flogging that dead horse.

This doesn't make any sense. As far as I can see there is nothing in Dan's
code that would make it more troublesome to use with cmp_to_key than Carl
Bank's examples or that would make it loose more speed than the overhead
I was describing.

Whatever is meant by lazily, the cmp_to_key function will generate a key
function that during the sorting will finally defer to this cmp-function
to do the comparisons. If the cmp is lazy when it does its work as a
cmp-argument, it will be lazy when it does its work when a key is wrapped
around it.

This argument works for whatever property, the cmp-function should posses.
If your cmp argument has property foo when comparing during a sort then
using cmp_to_key will preserve than property when comparing during a sort.
This for the simple reason that with a cmp_to_key function when the
sort comapares two keys, it will after some overhead just call the cmp
function with the undecorated data.

That means that asking for use cases that fall outside the overhead in
memory or speed of the cmp_to_key function and are not about the cumbersome
work in writing your own key-function in order to reduce that overhead,
that asking for such use cases is like asking for a real number that squares
to a negative. We can conclude a priori that there won't be any.

That is why I don't consider a request for such use cases a serious request.

Antoon Pardon

unread,

Mar 29, 2011, 8:46:57 AM3/29/11

to pytho...@python.org

On Mon, Mar 28, 2011 at 03:43:03PM +0000, Steven D'Aprano wrote:
> On Mon, 28 Mar 2011 14:39:04 +0200, Antoon Pardon wrote:
>
> > I tried to sort lists of 10000 elemens. Each element a tuple two items
> > that was to be sorted first according to the first item in ascending
> > order, then according to the second item in descending order. This was
> > done on python 2.6 so I had to write my own cmp_to_key function.
> >
> > I then sorted these lists of 10000 elements a 1000 times with the
> > following methods.
>
> Thank you for spending the time to get some hard data, but I can't
> replicate your results since you haven't shown your code. Rather than
> attempt to guess what you did and duplicate it, I instead came up with my
> own timing measurements. Results are shown here, my code follows below:

But why expect me to come up with such data? Shouldn't the dev team have
come up with such data before they decided to remove the cmp-argument,
instead of afterwards expecting the python users to come up with data
to support the reversing of that decision?

> [steve@sylar ~]$ python2.7 sort_test.py
> Comparison function: 12.1256039143
> Key function: 3.51603388786
> Double sort: 2.33165812492
> cmp_to_key: 28.1129128933
>
>
> By far the best result comes from two calls to sort. Not far off is the
> key function solution. (Admittedly, coming up with a key function for non-
> numeric data would be challenging.) The comparison function, and the
> cmp_to_key solution, are clearly *much* slower.

So, that means that the fastest solutions don't generalize.

The double sort is useless if the actual sorting is done in a different
module/function/method than the module/function/method where the order
is implemented. It is even possible you didn't write the module
where the sorting actually occurs.

Your key-function only works with numeric data, strings already pose
a problem here.

The cmp function can be externally provided, without you having the
source.

So sooner or later there will be a user who just doesn't have a
choice but to use cmp_to_key or something similar. and who will
have to suffer this kind of speed loss.

--
Antoon Pardon

Jean-Michel Pichavant

unread,

Mar 29, 2011, 4:10:38 PM3/29/11

to Steven D'Aprano, pytho...@python.org

That was known from the begining, was it ? Since key sort trades memory
for CPU, you could expect that "masses of data" use cases may fail
because of memory shortage.

JM

MRAB

unread,

Mar 29, 2011, 6:57:21 PM3/29/11

to pytho...@python.org

On 29/03/2011 18:01, Dan Stromberg wrote:

>
> On Tue, Mar 29, 2011 at 1:46 AM, Antoon Pardon
> <Antoon...@rece.vub.ac.be <mailto:Antoon...@rece.vub.ac.be>> wrote:
>
> The double sort is useless if the actual sorting is done in a different
> module/function/method than the module/function/method where the order
> is implemented. It is even possible you didn't write the module
> where the sorting actually occurs.
>
>

> There's a good point in the above.
>
> Usually the order in which a class needs to be sorted, is a detail that
> should be hidden by the class (or auxiliary comparison function).
>
> Doing more than one sort in order to get one key forward and another in
> reverse, seems to require a client of the class to know class detail (or
> auxiliary comparison function detail).
>
> In fact, if something inside the class (or auxiliary comparison
> function) needs to change s.t. you must change how you sort its
> instances, then you might have to change a bunch of single-sorts to
> multiple-sorts in client code. EG if you were sorting a number in
> reverse followed by a string forward, to a number in reverse followed by
> a string forward and another string in reverse.
>
> In principle, you could come up with a "negate string" operation
> though. EG for Latin-1, each ch would become chr(255-ord(ch)). That's
> probably going to be rather expensive, albeit linear. It should at
> least avoid the need for multi-sorting and exposing implementation
> detail. I want to say you could do something similar for Unicode
> strings, but I don't have enough experience with Unicode to be sure yet.
>
You would have to do more than that.

For example, "" < "A", but if you "negate" both strings you get "" <
"\xBE", not "" > "\xBE".

Chris Angelico

unread,

Mar 29, 2011, 7:08:22 PM3/29/11

to pytho...@python.org

On Wed, Mar 30, 2011 at 5:57 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> You would have to do more than that.
>
> For example, "" < "A", but if you "negate" both strings you get "" <
> "\xBE", not "" > "\xBE".

Strings effectively have an implicit character at the end that's less
than any other character. Easy fix: Append a character that's greater
than any other. So "" < "A" becomes "\xFF" > "\xBE\xFF".

Still not going to be particularly efficient.

Chris Angelico

Terry Reedy

unread,

Mar 29, 2011, 7:35:40 PM3/29/11

to pytho...@python.org

For anyone interested, the tracker discussion on removing cmp is at
http://bugs.python.org/issue1771
There may have been more on the old py3k list and pydev list.

One point made there is that removing cmp= made list.sort consistent
with all the other comparision functions,
min/max/nsmallest/nlargest/groupby that only have a key arg. How many
would really want cmp= added everywhere?

A minor problem problem with cmp is that the mapping between return
values and input comparisons is somewhat arbitrary. Does -1 mean a<b or
b<a? (That can be learned and memorized, of course, though I tend to
forget without constant use).

A bigger problem is that it conflicts with key=. What is the result of
l=[1,3,2]
l.sort(cmp=lambda x,y:y-x, key=lambda x: x)
print l
? (for answer, see http://bugs.python.org/issue11712 )

While that can also be learned, I consider conflicting parameters
undesireable and better avoided when reasonably possible. So I see this
thread as a discussion of the meaning of 'reasonably' in this particular
case.

--
Terry Jan Reedy

Ian Kelly

unread,

Mar 29, 2011, 7:48:39 PM3/29/11

to Chris Angelico, pytho...@python.org

Not to mention that it still has bugs:

"" < "\0"
"\xff" < "\xff\xff"

Chris Angelico

unread,

Mar 29, 2011, 8:29:44 PM3/29/11

to pytho...@python.org

On Wed, Mar 30, 2011 at 6:48 AM, Ian Kelly <ian.g...@gmail.com> wrote:
> Not to mention that it still has bugs:
>
> "" < "\0"
> "\xff" < "\xff\xff"

That's because \0 isn't less than any character, nor is \xff greater
than any. However, the latter is true if (for instance) your string is
in UTF-8; it may be possible to use some construct that suits your own
data.

ChrisA

MRAB

unread,

Mar 29, 2011, 11:06:51 PM3/29/11

to pytho...@python.org

On 29/03/2011 21:32, Dan Stromberg wrote:

>
> On Tue, Mar 29, 2011 at 12:48 PM, Ian Kelly <ian.g...@gmail.com
> <mailto:ian.g...@gmail.com>> wrote:
>
> On Tue, Mar 29, 2011 at 1:08 PM, Chris Angelico <ros...@gmail.com
> <mailto:ros...@gmail.com>> wrote:
> > On Wed, Mar 30, 2011 at 5:57 AM, MRAB <pyt...@mrabarnett.plus.com
> <mailto:pyt...@mrabarnett.plus.com>> wrote:
> >> You would have to do more than that.
> >>
> >> For example, "" < "A", but if you "negate" both strings you get "" <
> >> "\xBE", not "" > "\xBE".
> >
> > Strings effectively have an implicit character at the end that's less
> > than any other character. Easy fix: Append a character that's greater
> > than any other. So "" < "A" becomes "\xFF" > "\xBE\xFF".
> >
> > Still not going to be particularly efficient.
>

> Not to mention that it still has bugs:
>
> "" < "\0"
> "\xff" < "\xff\xff"

> --
> http://mail.python.org/mailman/listinfo/python-list
>
>
> It probably could be a list of tuples:
>
> 'abc' -> [ (1, chr(255 - ord('a')), (1, chr(255 - ord('b')), (1, chr(255
> - ord('c')), (0, '') ]
>
> ...where the (0, '') is an "end of string" marker, but it's getting
> still slower, and quite a lot bigger, and getting so reductionistic
> isn't a good thing when one class wraps another wraps another - when
> their comparison methods are interrelated.
>
I think I've found a solution:

class NegStr:
def __init__(self, value):
self._value = value
def __lt__(self, other):
return self._value > other._value

so when sorting a list of tuples:

def make_key(t):
return NegStr(t[0]), t[1]

items = [("one", 1), ("two", 2), ("three", 3), ("four", 4),
("five", 5)]
items.sort(key=make_key)

Ian Kelly

unread,

Mar 29, 2011, 11:36:18 PM3/29/11

to pytho...@python.org

On Tue, Mar 29, 2011 at 5:06 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> I think I've found a solution:
>
> class NegStr:
> def __init__(self, value):
> self._value = value
> def __lt__(self, other):
> return self._value > other._value

IOW:

cmp_to_key(lambda x, y: -cmp(x, y))

This is still just taking a cmp function and dressing it up as a key.

Paul Rubin

unread,

Mar 30, 2011, 12:23:03 AM3/30/11

to

Ian Kelly <ian.g...@gmail.com> writes:
> cmp_to_key(lambda x, y: -cmp(x, y))

cmp_to_key(lambda x, y: cmp(y, x))

Antoon Pardon

unread,

Mar 30, 2011, 9:06:20 AM3/30/11

to pytho...@python.org

On Tue, Mar 29, 2011 at 03:35:40PM -0400, Terry Reedy wrote:
> For anyone interested, the tracker discussion on removing cmp is at
> http://bugs.python.org/issue1771
> There may have been more on the old py3k list and pydev list.
>
> One point made there is that removing cmp= made list.sort consistent
> with all the other comparision functions,
> min/max/nsmallest/nlargest/groupby that only have a key arg. How
> many would really want cmp= added everywhere?

I wouldn't have a problem with it.

I would also like to react to the following.

Guido van Rossum in msg95975 on http://bugs.python.org/issue1771 wrote:
| Also, for all of you asking for cmp back, I hope you realize that
| sorting N values using a custom cmp function makes about N log N calls
| calls to cmp, whereas using a custom key calls the key function only N
| times. This means that even if your cmp function is faster than the
| best key function you can write, the advantage is lost as N increases
| (which is just where you'd like it to matter most :-).

This is a play on semantics. If you need python code to compare
two items, then this code will be called N log N times, independently
of the fact how this code is presented, as a cmp function or as rich
comparison methods. So forcing people to write a key function in cases
where this will only result in the cmp code being translated to __lt__
code, accomplishes nothing.

As far as I can see, key will only produce significant speedups, if
comparing items can then be completly done internally in the python
engine without referencing user python code.

> A minor problem problem with cmp is that the mapping between return
> values and input comparisons is somewhat arbitrary. Does -1 mean a<b
> or b<a? (That can be learned and memorized, of course, though I tend
> to forget without constant use).

My rule of thumb is that a < b is equivallent with cmp(a, b) < 0

> A bigger problem is that it conflicts with key=. What is the result of
> l=[1,3,2]
> l.sort(cmp=lambda x,y:y-x, key=lambda x: x)
> print l
> ? (for answer, see http://bugs.python.org/issue11712 )
>
> While that can also be learned, I consider conflicting parameters
> undesireable and better avoided when reasonably possible. So I see
> this thread as a discussion of the meaning of 'reasonably' in this
> particular case.

But what does this have to do with use cases? Does what is reasonable
depend on the current use cases without regard of possible future use
cases? Is the conflict between key and cmp a lesser problem in the
case of someone having a huge data set to sort on a computer that lacks
the resources to decorate as opposed to currently noone having such
a data set? Are we going to decide which functions/methods get a cmp
argument depening on which use cases we currently have that would need it?

This thread started with a request for use cases. But if you take this
kind of things into consideration, I don't see how use cases can then
make a big difference in the final decision.

Steven D'Aprano

unread,

Mar 31, 2011, 2:13:53 AM3/31/11

to

On Wed, 30 Mar 2011 11:06:20 +0200, Antoon Pardon wrote:

> As far as I can see, key will only produce significant speedups, if
> comparing items can then be completly done internally in the python
> engine without referencing user python code.

Incorrect. You don't even need megabytes of data to see significant
differences. How about a mere 1000 short strings?

[steve@wow-wow ~]$ python2.6
Python 2.6.6 (r266:84292, Dec 21 2010, 18:12:50)
[GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from random import shuffle
>>> data = ['a'*n for n in range(1000)]
>>> shuffle(data)
>>> from timeit import Timer
>>>
>>> t_key = Timer('sorted(data, key=lambda a: len(a))',
... 'from __main__ import data')
>>> t_cmp = Timer('sorted(data, cmp=lambda a,b: cmp(len(a), len(b)))',
... 'from __main__ import data')
>>>
>>> min(t_key.repeat(number=1000, repeat=5))
0.89357517051696777
>>> min(t_cmp.repeat(number=1000, repeat=5))
7.6032949066162109

That's almost ten times slower.

Of course, the right way to do that specific sort is:

>>> t_len = Timer('sorted(data, key=len)', 'from __main__ import data')
>>> min(t_len.repeat(number=1000, repeat=5))
0.64559602737426758

which is even easier and faster. But even comparing a pure Python key
function to the cmp function, it's obvious that cmp is nearly always
slower.

Frankly, trying to argue that cmp is faster, or nearly as fast, is a
losing proposition. In my opinion, the only strategy that has even a
faint glimmer of hope is to find a convincing use-case where speed does
not matter.

Or, an alternative approach would be for one of the cmp-supporters to
take the code for Python's sort routine, and implement your own sort-with-
cmp (in C, of course, a pure Python solution will likely be unusable) and
offer it as a download. For anyone who knows how to do C extensions, this
shouldn't be hard: just grab the code in Python 2.7 and make it a stand-
alone function that can be imported.

If you get lots of community interest in this, that is a good sign that
the solution is useful and practical, and then you can push to have it
included in the standard library or even as a built-in.

And if not, well, at least you will be able to continue using cmp in your
own code.

--
Steven

harrismh777

unread,

Mar 31, 2011, 6:34:17 AM3/31/11

to

Antoon Pardon wrote:
> On Sun, Mar 13, 2011 at 12:59:55PM +0000, Steven D'Aprano wrote:
>> The removal of cmp from the sort method of lists is probably the most
>> disliked change in Python 3. On the python-dev mailing list at the
>> moment, Guido is considering whether or not it was a mistake.

>>
>> If anyone has any use-cases for sorting with a comparison function that
>> either can't be written using a key function, or that perform really
>> badly when done so, this would be a good time to speak up.
>

> How about a list of tuples where you want them sorted first item in ascending
> order en second item in descending order.
>
Greetings,

Not sure here, but thought you folks might want to have another
viewpoint from someone who is maybe not so close to the trees as to be
able to comment effectively on the forest.

Many of you (Guido included) have lost significant sight of a
critical object oriented philosophical pillar (but not all of you, thank
goodness). To cut right to the heart of it--- NEVER change an advertised
interface. Change the implementation to your little hearts desire, but
never alter an advertised interface (code is based on it, books are
based on it, tutorials are based on it, college comp sci courses are
based on it... etc). If cmp parm is utilized and is helpful or perceived
as useful and existing code requires it then for crying out loud leave
it alone. I think your overall worship of Guido as Python King has left
many of you with nobbled thinking.

On the other hand, this debate and constant bickering is a serious
rehash of an ancient discussion without any new points--- irritating.
Take a look at issue 1771 http://bugs.python.org/issue1771
particularly msg #s 95975 and 95982.... but no, read the whole
thing.... it is very clear that Guido wants to restructure the python
language for consistency and elegance (and who can blame him?). He makes
some very good arguments for the justification of removing the cmp
keyword from list.sort() and builtin.sorted()/ but, that is not the
point... he is breaking a fundamental law of object oriented
programming... don't break and advertised interface (particularly if it
is useful and people are actually making use of it!).
This is insane folks.

Python is one of the most elegant OO languages to gain popular
appeal and wide-spread use. This sad broken move from 2.x to 3.x has the
potential of ruining the language for everyone. Many of us dropped JAVA
(compile once debug everywhere) because it is complicated, a pita to
use, slow, and actually doesn't port too well on any platform... and
here we go again with python. How do the developers of python ever
expect folks to be able to make use of the language into the future when
every time we turn around the interface is broken or being debated in
flux? Many of us want to use the new 3.2+ version, but no one is going
to ship it pre-installed (probably for many years) because of this
issue. There is no way to easily migrate, nor backport, and everyone is
going to be forced to develop code (and have to maintain code) on two
distinct branches (for many years to come). I suspect that people are
just going to stick with back versions for a long time.
We need to get a grip on this, people.

Guido does not need a use case; he just needs to restore the
interface that everyone expects. Put up a message to start to use key=
instead, and plan deprecation for something like five years out... this
just seems like such a no-brainer to me. But then, maybe its just that
I'm too inexperienced to know any better. <sigh>

In the mean time... I'm playing around with 3.2 and really liking
it... hoping that the world will actually be using it someday.

Kind regards,
m harris

Steven D'Aprano

unread,

Mar 31, 2011, 7:51:09 AM3/31/11

to

On Thu, 31 Mar 2011 01:34:17 -0500, harrismh777 wrote:

> Many of you (Guido included) have lost significant sight of a
> critical object oriented philosophical pillar (but not all of you, thank
> goodness). To cut right to the heart of it--- NEVER change an advertised
> interface.

Thanks for your comments, M Harris, but I'm afraid you stumble right at
the first step. The difference between implementation and interface is
not specific to object-oriented code -- it is no more, or less, an
"object oriented philosophical pillar" than it is a pillar of functional
programming, procedural programming, or any other programming paradigm
(except perhaps spaghetti coding).

Nor should you say "NEVER". To put your advice in other words:

"Once you have added a feature, no matter how badly thought out, broken,
rarely used, or even actively hated by the entire programming community,
you must continue supporting it forever, adding more and more cruft
around it, just in case some obscure piece of software that hasn't been
maintained for ten years relies on it."

Nobody has forgotten the principle of separating interface from
implementation. This was not an accidental backwards incompatible
change, it was a deliberate decision made in the full knowledge that it
would be a painful move for some, but also in the expectation that *not*
removing cruft becomes even more painful over time.

Cruft has real costs: maintenance costs, development costs, runtime
costs, learning costs, and usability costs. Any language which does not
prune cruft from time to time will ossify and is doing its users a real
disservice. Just not too often.

Nobody is denying that the loss of list.sort(cmp=...) will hurt some
people, or some use-cases. The disagreement is over the question of
whether the pain from its loss outweighs the utility of its removal.

Python is about halfway through the expected migration period from 2.x to
3.x, and so far it has not been worse or more difficult than expected.
Support for Python 3 is about what was expected (probably better than
expected), uptake of Python 3 by users is about what was expected, the
number of nay-sayers, Negative Nellies and trolls claiming that Python 3
will destroy the language is about what was expected, and the number of
credible forks of the language promising to support Python 2 forever is
exactly what was expected: zero.

[...]

> Many of us dropped JAVA
> (compile once debug everywhere) because it is complicated, a pita to
> use, slow, and actually doesn't port too well on any platform...

Perhaps that's because Java needs to drop some cruft.

> Many of us want to use the new 3.2+ version, but no one is going
> to ship it pre-installed (probably for many years) because of this
> issue.

Wrong. There is at least one Linux distro, Arch, which ships with Python3
as the system Python. Arch is pretty bleeding edge, but where they go,
others will follow.

Besides, "pre-installed" loses a lot of importance when installation is
as easy as "sudo apt-get python3".

> There is no way to easily migrate, nor backport, and everyone is
> going to be forced to develop code (and have to maintain code) on two
> distinct branches (for many years to come).

Your FUD is outdated. There are now many major products which support
Python 2 and 3, and some of them out of the same code base. Here is a
small selection:

http://allanmcrae.com/2010/12/python-3-module-support/
http://mail.mems-exchange.org/durusmail/qp/441/
http://nedbatchelder.com/blog/200910/running_the_same_code_on_python_2x_and_3x.html

Note that last link is from 2009.

--
Steven

Antoon Pardon

unread,

Mar 31, 2011, 9:41:36 AM3/31/11

to pytho...@python.org

On Thu, Mar 31, 2011 at 02:13:53AM +0000, Steven D'Aprano wrote:
> On Wed, 30 Mar 2011 11:06:20 +0200, Antoon Pardon wrote:
>
> > As far as I can see, key will only produce significant speedups, if
> > comparing items can then be completly done internally in the python
> > engine without referencing user python code.
>
> Incorrect. You don't even need megabytes of data to see significant
> differences. How about a mere 1000 short strings?
>
>
> [steve@wow-wow ~]$ python2.6
> Python 2.6.6 (r266:84292, Dec 21 2010, 18:12:50)
> [GCC 4.1.2 20070925 (Red Hat 4.1.2-27)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> >>> from random import shuffle
> >>> data = ['a'*n for n in range(1000)]
> >>> shuffle(data)
> >>> from timeit import Timer
> >>>
> >>> t_key = Timer('sorted(data, key=lambda a: len(a))',
> ... 'from __main__ import data')
> >>> t_cmp = Timer('sorted(data, cmp=lambda a,b: cmp(len(a), len(b)))',
> ... 'from __main__ import data')
> >>>
> >>> min(t_key.repeat(number=1000, repeat=5))
> 0.89357517051696777
> >>> min(t_cmp.repeat(number=1000, repeat=5))
> 7.6032949066162109
>
> That's almost ten times slower.

But how does it contradict what I wrote above? Maybe I didn't make
myself clear but in your example, the key function produces ints.
With ints the comparison is completely done within the python engine
without the need to defer to user python code (through the rich
comparison functions). So this is the kind of key I described that
would produce a significant speedup.

But once your key produces user objects that need to be compared
through user python code with __lt__ methods, getting a speedup
through the use of key instead of cmp will not be obvious and
in a number of case you will loose speed.

> Of course, the right way to do that specific sort is:
> >>> t_len = Timer('sorted(data, key=len)', 'from __main__ import data')
> >>> min(t_len.repeat(number=1000, repeat=5))
> 0.64559602737426758
>
> which is even easier and faster. But even comparing a pure Python key
> function to the cmp function, it's obvious that cmp is nearly always
> slower.

I don't find that obvious at all.

> Frankly, trying to argue that cmp is faster, or nearly as fast, is a
> losing proposition. In my opinion, the only strategy that has even a
> faint glimmer of hope is to find a convincing use-case where speed does
> not matter.

I don't argue such a general statement. I just argue there are cases
where it is and I supporeted that statement with numbers. The only
way these numbers could be disputed was by focussing on specific details
that didn't generalize.

Now maybe providing a cmp_to_key function written in C, can reduce
the overhead for such cases as Raymond Hettinger suggested. If it
turns out that that works out, then I would consider this thread
usefull even if that will be the only result of this thread.

Something else the dev team can consider, is a Negation class.
This class would wrap itself around a class or object but reverse the ordering.
So that we would get Negation('a') > Negation('b'). That would make it
easy to create sort keys in which partial keys had to be sorted differently
from each other. This would have to be done by the dev team since I
guess that writing such a thing in Python would loose all the speed
of using a key-function.

> Or, an alternative approach would be for one of the cmp-supporters to
> take the code for Python's sort routine, and implement your own sort-with-
> cmp (in C, of course, a pure Python solution will likely be unusable) and
> offer it as a download. For anyone who knows how to do C extensions, this
> shouldn't be hard: just grab the code in Python 2.7 and make it a stand-
> alone function that can be imported.
>
> If you get lots of community interest in this, that is a good sign that
> the solution is useful and practical, and then you can push to have it
> included in the standard library or even as a built-in.
>
> And if not, well, at least you will be able to continue using cmp in your
> own code.

I'll first let Raymond Hettinger rewrite cmp_to_key in C, as I understand he
suggested, and reevaluate after that.

--
Antoon Pardon

Paul Rubin

unread,

Mar 31, 2011, 11:59:29 AM3/31/11

to

Antoon Pardon <Antoon...@rece.vub.ac.be> writes:
> Something else the dev team can consider, is a Negation class....

> This would have to be done by the dev team since I
> guess that writing such a thing in Python would loose all the speed
> of using a key-function.

That is a good idea. SQL has something like it, I think, and Haskell
also has it. I thought about writing it in Python but it would slow
things down a lot. I hadn't thought of writing it in C for some reason.

Terry Reedy

unread,

Mar 31, 2011, 3:26:11 PM3/31/11

to pytho...@python.org

On 3/31/2011 2:34 AM, harrismh777 wrote:

> breaking a fundamental law of object oriented programming... don't break
> and advertised interface (particularly if it is useful and people are
> actually making use of it!).
> This is insane folks.

Each x.y version (starting with 2.3) is feature stable: just bug fixes,
no additions (except as occasionally required to fix a bug), no
removals. 2.x had few feature changes or removals, as it was planned and
announced that all the big ones were being delayed to 3.x. 2.7 will be
polished for years, depending on how long anyone is willing to work on
it. I have been a bit surprised and disappointed that no 2.x fan has yet
come forward (that I know of) to specifically help fix 2.7 bugs.

> Guido does not need a use case; he just needs to restore the interface
> that everyone expects. Put up a message to start to use key= instead,
> and plan deprecation for something like five years out... this just

Python 3 was announced and as a mildly code breaking version at least 5
years before it came out. Old-style classes were clearly doomed when the
new object. The int division change was also decided on about that time.

I agree that it would have been better if the developer group had been
large enough to make the transition more orderly, but ...
as it is, it was not until 2.7 was out that we took a really good look
at library interfaces and found that 'public interface' was not as well
defined as it should be for some modules. That is much clearer for many
in 3.2.

> In the mean time... I'm playing around with 3.2 and really liking it...

Good.

> hoping that the world will actually be using it someday.

I already am and fully expect it to be the dominant version in not too
many years. Some linux distributions have already made it the default.
Others have and will hold back.

--
Terry Jan Reedy

harrismh777

unread,

Apr 1, 2011, 6:13:04 AM4/1/11

to

Steven D'Aprano wrote:
> The difference between implementation and interface is
> not specific to object-oriented code --

When I speak of implementation vs interface I am speaking from a
strictly object oriented philosophy, as pedigree, from Grady Booch, whom
I consider to be the father of Object Oriented Analysis and Design
(Booch, OO A&D with apps, 1994). He makes this remarkable statement:
"The interface of a class is the one place where we assert all of
the assumptions that a client may make about any instances [objects] of
the class; the implementation encapsulates details about which no client
may make assumptions" (Booch, p50, 1994).
The Class interface holds a "special" firmness which fosters the
client relationship of trust and assumption, without which OOA&D is
pointless. The interface of a Class must not change once advertised, and
once in production. This is specific to OOA&D.
Encapsulation of the abstractions must reside within the
implementation about which no one may make assumptions. This is also
where all of the "cruft" (if any) resides. Cruft never resides at the
interface, only in the encapsulation of the Class abstractions, and only
if the Class was poorly designed.

To put my advice in MY other words, here are my two rules about Class
interface based on Booch, but in terms that are mine, and which are way
simpler to put into practice:

Rule 1:
Never change an advertised Class interface.

Corollary 1)a The Class interface never encapsulates cruft.

Corollary 1)b The Class implementation usually encapsulates cruft.

Corollary 1)c Only the Class implementation may be changed.

Rule 2:
If an advertised Class interface must be changed to remove cruft,
please re-read Rule #1, and review corollaries 1)a, 1)b, and 1)c.

----

As an aside, but in response to your straw man argument regarding
obscure "cruft," none of the bickering regarding this cmp issue is about
obscure code. We are discussing one of the primary methods of one of the
primary Class objects are the lovely heart of Python--- the list.sort().
This is NOT obscure. Guido arbitrarily altered the interface to one of
the MOST important Class objects in all of the Python language. This is
not good, and it does violate the spirit of OOA&D.

If the interface changes had been left to type promotion on integer
divide, and the print() method, then folks would probably not be in such
an uproar over 3x. But Guido over reached the bounds of credulity by
altering a Class method that *many* people find useful, desirable,
efficacious and easy... they use it and the love it (right or wrong).
The client(s) are making assumptions about the List Class interface
and they have an OOA&D right to do so... and they are left with their
rear-ends handing out in the breeze because their assumptions are false.
Not good.

Kind regards,

m harris

Chris Angelico

unread,

Apr 1, 2011, 6:30:08 AM4/1/11

to pytho...@python.org

On Fri, Apr 1, 2011 at 5:13 PM, harrismh777 <harri...@charter.net> wrote:
> Corollary 1)a The Class interface never encapsulates cruft.

Provably false. Even a well-designed interface, if it is asked to deal
with a changing implementation and changing requirements, will
eventually either acquire cruft, or be found too rigid to be of use.
The only interface immune to this is the transparent interface
(effectively: "just package up the parameters in a single string and
pass it through"), which isn't really an interface at all, it just
exposes something else. In everything else, cruft is a reality that
must be dealt with.

Chris Angelico

harrismh777

unread,

Apr 1, 2011, 6:44:11 AM4/1/11

to

Terry Reedy wrote:
> Python 3 was announced and as a mildly code breaking version at least 5
> years before it came out.

I appreciate the spirit of your arguments overall, and I do not
necessarily disagree with much of what you are saying. I would like to
challenge you to see this from a little different perspective, if I may.

There are two distinct ways for looking at this "mild code
breakage," and it might be good to think about how we approach changes
in the future based on an understanding of both perspectives, and
consideration for the clients.

In the possible perspective of the Python language developers 3x
changes are mild (in fact, overall, probably even insignificant
percentage-wise). Ok, we removed the cmp comparison keyword from
list.sort(), made the print() method consistent with the rest of the
language, and correctly promoted integer divide 1/2 to float so that the
answer is something greater than zero! Fine. Looks like just a couple
little changes, no big deal, stop your whining.

The perspective of the Class client is something quite different.
They do not look at the overall percentage of Python language definition
that has changed (tiny percentage, right) they look at the overall
percentage of their own projects that have just "broken" and need to be
rewritten to accommodate the disruption in the advertised Class
interface. And to the client-- these tiny changes are magna-bodacious!
(nobbled thinking) You guys have changed integer divide---! the PRINT
print() functionality is diff e r e n t ---! and for crying out loud....
you changed S O R T ( ) !!!

I wonder if folks like google need to do in-place sorts over lists
very often ...?

I wonder how mnay Python scripts call the list.sort() method with
the cmp keyword specified... ? (could it be thousands, or millions?)
All the hoorah you guys are getting, as well all of this incessant
bickering over cmp, is some of the answer to these questions.

When you get ready to change an advertised Class interface in the
future, please consider my interface rules (I gave them to Steven) and
please take into account your client base---the ones who are making
valid assumptions about your Class interfaces. Its easy, and its most
considerate.

king regards,
m harris

Paul Rubin

unread,

Apr 1, 2011, 7:45:24 AM4/1/11

to

Chris Angelico <ros...@gmail.com> writes:
> Provably false. Even a well-designed interface, if it is asked to deal
> with a changing implementation and changing requirements, will
> eventually either acquire cruft, or be found too rigid to be of use.

What happens then is you define a new interface. In Microsoft-speak if
the IWhatever interface needs an incompatible extension like new
parameters, they introduce IWhatever2 which supports the new parameters.
They change the implementation of IWhatever so it becomes a wrapper for
IWhatever2 setting the new parameters to default values, to keep
implementing the old behavior.

Removing cmp certainly isn't the most disruptive change of Python 3,
but it seems like the one with the least benefit.

Terry Reedy

unread,

Apr 1, 2011, 4:59:05 PM4/1/11

to pytho...@python.org

On 4/1/2011 3:45 AM, Paul Rubin wrote:

> Removing cmp certainly isn't the most disruptive change of Python 3,

That was almost certainly the ascii to unicode switch for strings. It is
still not quite complete in 3.2 but should be pretty well ironed out in 3.3.

> but it seems like the one with the least benefit.

Since the set of changes was finite, there *must* be (at least) one with
the lowest benefit/cost ratio. The removal of list.sort(cmp) may well be
that one. Certainly, reasonable people can disagree as to whether the
ratio is above or below 1.0.

If cmp had not been removed, some other change would have been the worst
in this respect, and possibly the subject of a thread like this.

--
Terry Jan Reedy

Terry Reedy

unread,

Apr 1, 2011, 5:06:03 PM4/1/11

to pytho...@python.org

On 4/1/2011 2:13 AM, harrismh777 wrote:

> When I speak of implementation vs interface I am speaking from a
> strictly object oriented philosophy, as pedigree, from Grady Booch, whom
> I consider to be the father of Object Oriented Analysis and Design
> (Booch, OO A&D with apps, 1994).

Python is object based but not object oriented in the Booch sense.

> The Class interface holds a "special" firmness which fosters the client
> relationship of trust and assumption, without which OOA&D is pointless.
> The interface of a Class must not change once advertised, and once in
> production. This is specific to OOA&D.

Right, and Python is not OOA&D based.

> Never change an advertised Class interface.

In Python, class interfaces are no more sacred than module or function
interfaces. If one takes 'no interface change' literally, then Python
would have to be frozen. Even bug fixes change a defacto interface. If
you want frozon, stick with one particular release. There are lots
available.

--
Terry Jan Reedy

Terry Reedy

unread,

Apr 1, 2011, 6:57:48 PM4/1/11

to pytho...@python.org

On 4/1/2011 3:45 AM, Paul Rubin wrote:

> What happens then is you define a new interface.

Like key= versus cmp=

> In Microsoft-speak if
> the IWhatever interface needs an incompatible extension like new
> parameters, they introduce IWhatever2 which supports the new parameters.
> They change the implementation of IWhatever so it becomes a wrapper for
> IWhatever2 setting the new parameters to default values, to keep
> implementing the old behavior.

If cmp had been left (or were reinstated) but its implementation changed
internally to use cmp_to_key to make it a wrapper for key, you would be
happy? 2to3 could probably gain a fixer to change

.sort(cmp=f) # to

import functools import cmp_to_key
.sort(key=functools.cmp_to_key(f))

I know some would not like this because interface change is not their
real concern.

--
Terry Jan Reedy

Terry Reedy

unread,

Apr 1, 2011, 7:13:54 PM4/1/11

to pytho...@python.org

On 4/1/2011 2:44 AM, harrismh777 wrote:
> Terry Reedy wrote:
>> Python 3 was announced and as a mildly code breaking version at least 5
>> years before it came out.
>
> I appreciate the spirit of your arguments overall, and I do not
> necessarily disagree with much of what you are saying. I would like to
> challenge you to see this from a little different perspective, if I may.

I have, but I consider your perspective, as I understand it, unrealistic

> There are two distinct ways for looking at this "mild code breakage,"
> and it might be good to think about how we approach changes in the
> future based on an understanding of both perspectives, and consideration
> for the clients.

Guido especially and the other developers, generally, try to balance
benefit and cost. What some may not know is that we consider benefits
over several years, and benefits to future users as well as current
users. We hope and expect the former to outnumber the latter in not too
many years.

The decision, about the time of 2.2, to put off most non-bugfix
code-breaking changes until 3.0, rather than spread them over the
remainder of 2.x, was in large part based on the expressed wishes of at
least some users. (I myself would have preferred sooner and more spread
out.)

> In the possible perspective of the Python language developers 3x changes
> are mild

Compared to the changes to both the language and implementation Guido
once considered, they are! But the Perl 6 fiasco and an article by Joel
Spolsky advocating evolutionary, not revolutionary, software change
prompted him toward mininal change that would meet objectives. Many
proposed changes were rejected.

> The perspective of the Class client is something quite different.

There is no 'Class' in Python. Module and function clients have the same
perspective. Python classes are just instances of class 'type'. Modules,
instances of class 'module', are sometimes regarded as singleton
classes. Functions are instances of various classes. Methods are
functions until bound as instances of bound-method classes. Functions
are sometimes an alternative to classes, one favored by those of a more
functional rather than strictly OOP bent.

In any case, you seem to include the interface of class attributes (such
as the list.sort function) as part of the class interface. Since
anything can be a class attribute, freezing 'class interfaces' freezes
everything.

> When you get ready to change an advertised Class interface in the
> future, please consider my interface rules

Taken strictly, they pretty much prohibit anything but implementation
changes. If we allow the addition of numbered versions of modules,
classes, and functions, with nothing ever deleted, then we would have
massive bloat, with attendant maintenance and documentation problems.
Some modules might be up to v20 by now. Certainly, the switch from ascii
to unicode as the default text encoding (and I am not sure your rules
would allow that) changed the interface of nearly every module and a
large fraction of classes.

Let me reiterate that PSF and Python are not Microsoft and Office.
Resources are primarily voluntary and limited, but code and even
binaries are available indefinitely, so no one is forced by *us* to
switch any particular Python program to any other version of the language.

If you think your rules are workable, try developing a fork that obeys
them.... but wait, maybe there already is one: 2.7, which will only get
bugfixes and invisible implementation changes for several years. Of
course, implementation changes must be screened carefully because in the
real world, as opposed to the imagined Booch world, it is all to0 easy
to accidentally introduce an interface change for some corner case.

Also, an implementation change that exchanges space for time will be
seen an an interface change by somebody. Do you include space-time
behavior in *your* definition of interface (that should not be changed)?

Indeed, some object to the removal of cmp precisely on this basis, not
on the fairly trivial code rewrite it entails. This was the case with
the Google example that prompted this thread. The Googler may well have
written fresh code, as code breakage was *not* the issue. If the
list.sort change had been purely an implementation change, if cmp
remained but its had been changed to use cmp_to_key internally, there
would have been many of the same objections expressed on this thread anyway.

So why is there a problem with cmp? Because there are people who want
most of the changes that break your rules, but not this particular one.

--
Terry Jan Reedy

Paul Rubin

unread,

Apr 1, 2011, 7:22:59 PM4/1/11

to

Terry Reedy <tjr...@udel.edu> writes:
>> What happens then is you define a new interface.
> Like key= versus cmp=

Well, in an untyped language like Python, adding a feature to an
interface doesn't require defining a new interface unless you change
something incompatibly. key= is very useful but it can be added without
breaking cmp= .

> 2to3 could probably gain a fixer to change
> .sort(cmp=f) # to
>
> import functools import cmp_to_key
> .sort(key=functools.cmp_to_key(f))
>
> I know some would not like this because interface change is not their
> real concern.

Looks like a good idea. There is an efficiency hit from the above in
some situations, but at least it prevents code from breaking, so unless
there's some drawback I'm not currently spotting, it's better than
nothing and I'd endorse adding such a wrapper. 2to3 should show some
kind of diagnostic and maybe put a comment into the output code,
when it does that particular transformation, since most of the
time there's probably a better way to write the key function.

Paul Rubin

unread,

Apr 1, 2011, 7:28:16 PM4/1/11

to

Terry Reedy <tjr...@udel.edu> writes:
>> Never change an advertised Class interface.
>
> In Python, class interfaces are no more sacred than module or function
> interfaces. If one takes 'no interface change' literally, then Python
> would have to be frozen. Even bug fixes change a defacto interface.

Oh come on, a interface is advertised if it's documented in the manual,
which cmp is. There are some undocumented (what you call defacto)
interfaces that you sometimes have to be pragmatically a bit careful
about messing with because people rely on them, but it's almost always
more legitimate to break an undocumented interface than a documented
one.

Terry Reedy

unread,

Apr 1, 2011, 7:37:34 PM4/1/11

to pytho...@python.org

On 4/1/2011 3:45 AM, Paul Rubin wrote:

> What happens then is you define a new interface. In Microsoft-speak if
> the IWhatever interface needs an incompatible extension like new
> parameters, they introduce IWhatever2 which supports the new parameters.
> They change the implementation of IWhatever so it becomes a wrapper for
> IWhatever2 setting the new parameters to default values, to keep
> implementing the old behavior.

Now you have two versions, and eventually many more, to maintain and
document. That takes resources we currently do not have.

Some problems in addition to the benefits of this approach:
1. The users of IWhatever will not gain the benefits of IWhatever2.
2. Upgrading to IWhatever2 requires a change of name as well as off
parameters.
3. If only some users are upgraded, the IWhatever and IWhatever2 users
may become incompatible even though they were before, thus breaking code
without changing the old interface.

Example: Python2 added str2 (= unicode) on top of str. This had all the
problems listed above. Since CPython uses str internally, in particular
for identifiers, CPython users were stuck with the limitations of str.
Rebinding str to mean str2 has many benefits, especially in the future,
in addition to current costs.

--
Terry Jan Reedy

John Bokma

unread,

Apr 1, 2011, 7:42:11 PM4/1/11

to

Terry Reedy <tjr...@udel.edu> writes:

> But the Perl 6 fiasco

Perl 6 a complete failure? Wow, must be coming from a clueless Python
zealot. If Perl 6 is a fiasco, so is Python 3. Both are not considered
production ready, and both can be downloaded and used today:

http://rakudo.org/

Next release is planned for later this month.

Did Perl 6 take a long time? Sure. But confusing it with Python 2 ->
Python 3 is just plainly stupid. It's a complete rewrite of Perl, and
it's much easier to think of it as a complete new language instead of
similar to Perl 4 -> 5 and comparable to Python 2 -> 3.

But if you had any idea what you were talking about, you already knew
that.

--
John Bokma j3b

Blog: http://johnbokma.com/ Facebook: http://www.facebook.com/j.j.j.bokma
Freelance Perl & Python Development: http://castleamber.com/

geremy condra

unread,

Apr 1, 2011, 9:31:09 PM4/1/11

to Steven D'Aprano, pytho...@python.org

On Wed, Mar 30, 2011 at 7:13 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:

<snip>

> Or, an alternative approach would be for one of the cmp-supporters to
> take the code for Python's sort routine, and implement your own sort-with-
> cmp (in C, of course, a pure Python solution will likely be unusable) and
> offer it as a download. For anyone who knows how to do C extensions, this
> shouldn't be hard: just grab the code in Python 2.7 and make it a stand-
> alone function that can be imported.
>
> If you get lots of community interest in this, that is a good sign that
> the solution is useful and practical, and then you can push to have it
> included in the standard library or even as a built-in.
>
> And if not, well, at least you will be able to continue using cmp in your
> own code.

I don't have a horse in this race, but I do wonder how much of Python
could actually survive this test. My first (uneducated) guess is "not
very much"- we would almost certainly lose large pieces of the string
API and other builtins, and I have no doubt at all that a really
significant chunk of the standard library would vanish as well. In
fact, looking at the data I took from PyPI a while back, it's pretty
clear that Python's feature set would look very different overall if
we applied this test to everything.

Geremy Condra

Paul Rubin

unread,

Apr 1, 2011, 9:37:20 PM4/1/11

to

Terry Reedy <tjr...@udel.edu> writes:
>> IWhatever2 setting the new parameters to default values

> Now you have two versions, and eventually many more, to maintain and
> document.

In the case of cmp= there's not two interfaces needed. Python2 does a
perfectly good job supporting cmp and key with one interface. We do
have urllib and urllib2 as separate interfaces (with separate
implementations), and we keep some other legacy interfaces around too,
like the sha and md5 modules (both of which are now subsumed by
hashlib).

> That takes resources we currently do not have.

Well ok, but now you're making excuses for instability, which seems like
a problem given Python's self-marketing as a stable, production-class
system that competes with better-funded efforts like Java.

> Example: Python2 added str2 (= unicode) on top of str. This had all
> the problems listed above. Since CPython uses str internally, in
> particular for identifiers, CPython users were stuck with the
> limitations of str. Rebinding str to mean str2 has many benefits,
> especially in the future, in addition to current costs.

That really is a massive change, but a welcome one, because Python2
programs broke all the time due to missed conversions between str and
unicode. It is the type of thing that a language transition (Python2 to
Python3) is supposed to bring, that goes beyond Interface2 to
Interface3. There's been a lot of experience gained in the decades
since Python's creation and it's fine to do an overhaul after all these
years. The issues are 1) don't break stuff unless there's a substantial
benefit; and 2) don't do these major incompatible releases too often.
There should not be any incompatible Python4 before 2020 or so.

I actually think Python3 actually didn't go far enough in fixing
Python2. I'd have frankly preferred delaying it by a few years, to
allow PyPy to come to maturity and serve as the new main Python
implementation, and have that drive the language change decisions.
Instead we're going to have to give up a lot of possible improvements we
could have gotten from the new implementation.

Steven D'Aprano

unread,

Apr 2, 2011, 12:41:13 AM4/2/11

to

I don't understand what you mean by "this test".

I'm certainly not suggesting that we strip every built-in of all methods
and make everything a third-party C extension. That would be insane.

Nor do I mean that every feature in the standard library should be forced
to prove itself or be removed. The features removed from Python 3 were
deliberately few and conservative, and it was a one-off change (at least
until Python 4000 in the indefinite future). If something is in Python 3
*now*, you can assume that it won't be removed any time soon.

What I'm saying is this: cmp is already removed from sorting, and we
can't change the past. Regardless of whether this was a mistake or not,
the fact is that it is gone, and therefore re-adding it is a new feature
request. Those who want cmp functionality in Python 3 have three broad
choices:

(1) suck it up and give up the fight; the battle is lost, move on;

(2) keep arguing until they either wear down the Python developers or get
kill-filed; never give up, never surrender;

(3) port the feature that they want into a third-party module, so that
they can actually use it in code, and then when they have evidence that
the community needs and/or wants this feature, then try to have it re-
added to the language.

I'm suggesting that #3 is a more practical, useful approach than writing
another hundred thousand words complaining about what a terrible mistake
it was. Having to do:

from sorting import csort

as a prerequisite for using a comparison function is not an onerous
requirement for developers. If fans of functional programming can live
with "from functools import reduce", fans of cmp can live with that.

--
Steven

Steven D'Aprano

unread,

Apr 2, 2011, 12:42:46 AM4/2/11

to

On Fri, 01 Apr 2011 14:37:20 -0700, Paul Rubin wrote:

> I actually think Python3 actually didn't go far enough in fixing
> Python2. I'd have frankly preferred delaying it by a few years, to
> allow PyPy to come to maturity and serve as the new main Python
> implementation, and have that drive the language change decisions.
> Instead we're going to have to give up a lot of possible improvements we
> could have gotten from the new implementation.

There's always Python 4000 :)

--
Steven

Steven D'Aprano

unread,

Apr 2, 2011, 12:58:58 AM4/2/11

to

On Fri, 01 Apr 2011 13:42:11 -0600, John Bokma wrote:

> Terry Reedy <tjr...@udel.edu> writes:
>
>> But the Perl 6 fiasco
>
> Perl 6 a complete failure?

"Fiasco" does not mean "complete failure". It is a debacle, an
embarrassing, serious failure, (and also an Italian wine bottle with a
rounded bottom), but not necessarily complete. It does not imply that a
fiasco cannot, with great effort and/or time, be eventually recovered
from. Netscape Navigator 6 was a fiasco, which directly lead to the
dominance of Internet Explorer in the browser market, but today the heir
of Netscape, Mozilla's Firefox browser, has regained a majority of the
browser market in Europe and is catching up on IE world-wide. Those who
are old enough will remember that Microsoft Word 3.0 was a fiasco, but
there's no doubt that Word has well recovered to virtually own the entire
word processing market.

> Wow, must be coming from a clueless Python zealot.

Thanks for sharing.

> If Perl 6 is a fiasco, so is Python 3. Both are not considered
> production ready, and both can be downloaded and used today:

This is FUD about Python 3. Python 3 is absolutely production ready. The
only reason to avoid Python 3 is if your software relies on a specific
third-party library that does not yet support Python 3.

On the other hand, the PerlFAQ still describes Perl 6 as not ready:

http://faq.perl.org/perlfaq1.html#What_are_Perl_4_Perl
http://faq.perl.org/perlfaq1.html#What_is_Perl_6_

"Perl 6 is the next major version of Perl, although it's not intended to
replace Perl 5. It's still in development in both its syntax and design.
The work started in 2002 and is still ongoing. ..."

"Perl 6 is not scheduled for release yet ..."

Nine years after Perl 6 was started, neither the syntax nor design are
settled.

The initial PEP for Python 3000 development was in 2006; the first final
release of Python 3 was in 2008, but I don't count that because Python
3.0 was fatally flawed and is no longer supported. The first production
ready release of Python 3.1 was 2009: three years from talking to a
production-ready version.

> Did Perl 6 take a long time? Sure. But confusing it with Python 2 ->
> Python 3 is just plainly stupid. It's a complete rewrite of Perl, and
> it's much easier to think of it as a complete new language instead of
> similar to Perl 4 -> 5 and comparable to Python 2 -> 3.

What you have described is not a reason for rejecting the claim that Perl
6 was a fiasco, but the reason for *why* it was a fiasco.

--
Steven

geremy condra

unread,

Apr 2, 2011, 1:22:01 AM4/2/11

to Steven D'Aprano, pytho...@python.org

On Fri, Apr 1, 2011 at 5:41 PM, Steven D'Aprano

I mean testing whether a feature should be in Python based on whether
it can meet some undefined standard of popularity if implemented as a
third-party module or extension.

> I'm certainly not suggesting that we strip every built-in of all methods
> and make everything a third-party C extension. That would be insane.

Granted, but I think the implication is clear: that only those
features which could be successful if implemented and distributed by a
third party should be in Python. My argument is that there are many
features currently in Python that I doubt would pass that test, but
which should probably be in anyway. The conclusion I draw from that is
that this isn't a particularly good way to determine whether something
should be in standard Python.

> Nor do I mean that every feature in the standard library should be forced
> to prove itself or be removed. The features removed from Python 3 were
> deliberately few and conservative, and it was a one-off change (at least
> until Python 4000 in the indefinite future). If something is in Python 3
> *now*, you can assume that it won't be removed any time soon.

I may have been unclear, so let me reiterate: I'm not under the
impression that you're advocating this as a course of action. I'm just
pointing out that the standard for inclusion you're advocating is
probably not a particularly good one, especially in this case, and
engaging in a bit of a thought experiment about what would happen if
other parts of Python were similarly scrutinized.

> What I'm saying is this: cmp is already removed from sorting, and we
> can't change the past. Regardless of whether this was a mistake or not,
> the fact is that it is gone, and therefore re-adding it is a new feature
> request. Those who want cmp functionality in Python 3 have three broad
> choices:

I might quibble over whether re-adding is the same as a new feature
request, but as I said- I don't care about cmp.

> (1) suck it up and give up the fight; the battle is lost, move on;
>
> (2) keep arguing until they either wear down the Python developers or get
> kill-filed; never give up, never surrender;
>
> (3) port the feature that they want into a third-party module, so that
> they can actually use it in code, and then when they have evidence that
> the community needs and/or wants this feature, then try to have it re-
> added to the language.
>
> I'm suggesting that #3 is a more practical, useful approach than writing
> another hundred thousand words complaining about what a terrible mistake
> it was. Having to do:
>
> from sorting import csort
>
> as a prerequisite for using a comparison function is not an onerous
> requirement for developers. If fans of functional programming can live
> with "from functools import reduce", fans of cmp can live with that.

And that's fine, as I said I don't have a horse in this race. My point
is just that I don't think the standard you're using is a good one-
ISTM that if it *had* been applied evenly we would have wound up with
a much less complete (and much less awesome) Python than we have
today. That indicates that there are a reasonable number of real-world
cases where it hasn't and shouldn't apply.

Geremy Condra

Terry Reedy

unread,

Apr 2, 2011, 2:21:33 AM4/2/11

to pytho...@python.org

On 4/1/2011 3:22 PM, Paul Rubin wrote:

>> 2to3 could probably gain a fixer to change
>> .sort(cmp=f) # to
>>
>> import functools import cmp_to_key
>> .sort(key=functools.cmp_to_key(f))
>>
>> I know some would not like this because interface change is not their
>> real concern.
>
> Looks like a good idea. There is an efficiency hit from the above in
> some situations, but at least it prevents code from breaking, so unless
> there's some drawback I'm not currently spotting, it's better than
> nothing and I'd endorse adding such a wrapper. 2to3 should show some
> kind of diagnostic and maybe put a comment into the output code,
> when it does that particular transformation, since most of the
> time there's probably a better way to write the key function.

rewriting cmp_to_key in C is underway

http://bugs.python.org/issue11707

--
Terry Jan Reedy

Paul Rubin

unread,

Apr 2, 2011, 2:29:59 AM4/2/11

to

Steven D'Aprano <steve+comp....@pearwood.info> writes:
> What I'm saying is this: cmp is already removed from sorting, and we
> can't change the past. Regardless of whether this was a mistake or
> not,

No it's not already removed, I just tried it (in Python 2.6, which is
called "Python" for short) and it still works. It's not "removed" from
Python until basically all Python users have migrated and "Python"
essentially always means "Python 3". Until that happens, for Python 2
users, Python 3 is just a fork of Python with some stuff added and some
stuff broken, that might get its act together someday. I see in the
subject of this thread, "Guido rethinking removal of cmp from sort
method" which gives hope that one particular bit of breakage might get
fixed.

> the fact is that it is gone, and therefore re-adding it is a new feature
> request. Those who want cmp functionality in Python 3 have three broad

> choices: ...
> (3) port the feature that they want into a third-party module, ...
> I'm suggesting that #3 is a more practical, useful approach ...
> Having to do:
> from sorting import csort ...

> If fans of functional programming can live
> with "from functools import reduce", fans of cmp can live with that.

If "sorting" is in the stdlib like functools is, then the similarity
makes sense and the suggestion isn't so bad. But you're proposing a 3rd
party module, which is not the same thing at all. "Batteries included"
actually means something, namely that you don't have to write your
critical applications using a library base written with a Wikipedia-like
development model where anybody can ship anything, where you're expected
to examine every module yourself before you can trust it. Stuff in the
stdlib occasionally has bugs or gaps, but it has a generally consistent
quality level, is documented, and has been reviewed and reasonably
sanity checked by a central development group that knows what it's
doing. Stuff in 3rd party libraries has none of the above. There are
too many places for it to go wrong and I've generally found it best to
stick with stdlib modules instead of occasionally superior modules that
have the disadvantage of coming from a third party.

Paul Rubin

unread,

Apr 2, 2011, 2:31:26 AM4/2/11

to

Steven D'Aprano <steve+comp....@pearwood.info> writes:

> There's always Python 4000 :)

Is that on the boards yet?