Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"Natural" use of cmp= in sort

74 views
Skip to first unread message

Paddy

unread,
Nov 10, 2014, 1:45:15 PM11/10/14
to
Hi, I do agree with Raymond H. about the relative merits of cmp= and key= in sort/sorted, but I decided to also not let natural uses of cmp= pass silently.

In answering this question, http://stackoverflow.com/a/26850434/10562 about ordering subject to inequalities it seemed natural to use the cmp= argument of sort rather than key=.

The question is about merging given inequalities to make 1 inequality such that the inequalities also stays true.


Here is a copy of my code:

Python 2.7.5 (default, May 15 2013, 22:43:36) [MSC v.1500 32 bit (Intel)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> ineq = """f4 > f2 > f3
f4 > f1 > f3
f4 > f2 > f1
f2 > f1 > f3"""
>>> print(ineq)
f4 > f2 > f3
f4 > f1 > f3
f4 > f2 > f1
f2 > f1 > f3
>>> greater_thans, all_f = set(), set()
>>> for line in ineq.split('\n'):
....tokens = line.strip().split()[::2]
....for n, t1 in enumerate(tokens[:-1]):
........for t2 in tokens[n+1:]:
............greater_thans.add((t1, t2))
............all_f.add(t1)
........all_f.add(t2)


>>> sorted(all_f, cmp=lambda t1, t2: 0 if t1==t2 else
...........(1 if (t1, t2) not in greater_thans else -1))
['f4', 'f2', 'f1', 'f3']
>>>

Peter Otten

unread,
Nov 10, 2014, 2:20:29 PM11/10/14
to pytho...@python.org
I'm not sure this works. I tried:

$ cat paddy.py
ineq = """f4 > f2 > f3
f4 > f1 > f3
f4 > f2 > f1
f2 > f1 > f3
f3 > f5
"""

greater_thans = set()
all_f = set()

for line in ineq.split('\n'):
tokens = line.strip().split()[::2]
for n, t1 in enumerate(tokens[:-1]):
for t2 in tokens[n+1:]:
greater_thans.add((t1, t2))
all_f.add(t1)
all_f.add(t2)

print all_f
print greater_thans

print sorted(all_f, cmp=lambda t1, t2: 0 if t1==t2 else
(1 if (t1, t2) not in greater_thans else -1))
$ PYTHONHASHSEED=0 python paddy.py
set(['f1', 'f2', 'f3', 'f4', 'f5'])
set([('f1', 'f3'), ('f2', 'f1'), ('f2', 'f3'), ('f4', 'f3'), ('f4', 'f2'),
('f4', 'f1'), ('f3', 'f5')])
['f4', 'f2', 'f1', 'f3', 'f5']
$ PYTHONHASHSEED=1 python paddy.py
set(['f5', 'f4', 'f3', 'f2', 'f1'])
set([('f1', 'f3'), ('f2', 'f3'), ('f2', 'f1'), ('f4', 'f1'), ('f3', 'f5'),
('f4', 'f3'), ('f4', 'f2')])
['f5', 'f4', 'f2', 'f1', 'f3']


Ian Kelly

unread,
Nov 10, 2014, 2:44:39 PM11/10/14
to Python
On Mon, Nov 10, 2014 at 12:19 PM, Peter Otten <__pet...@web.de> wrote:
> I'm not sure this works. I tried:

Here's a simpler failure case.

>>> ineq = """f2 > f3
... f3 > f1"""

[Previously posted code elided]

>>> greater_thans
set([('f3', 'f1'), ('f2', 'f3')])
>>> sorted(all_f, cmp=lambda t1, t2: 0 if t1==t2 else
... (1 if (t1, t2) not in greater_thans else -1))
['f1', 'f2', 'f3']

Note that the greater_thans set is missing the implication by
transitivity that f2 > f1, so the given cmp function would
inconsistently return -1 for both comparisons cmp('f1', 'f2') and
cmp('f2', 'f1').

Paddy

unread,
Nov 10, 2014, 9:04:44 PM11/10/14
to
On Monday, 10 November 2014 19:44:39 UTC, Ian wrote:
> On Mon, Nov 10, 2014 at 12:19 PM, Peter Otten <xxx@yyy> wrote:
> > I'm not sure this works. I tried:
>
> Here's a simpler failure case.
>
> >>> ineq = """f2 > f3
> ... f3 > f1"""
>
> [Previously posted code elided]
>
> >>> greater_thans
> set([('f3', 'f1'), ('f2', 'f3')])
> >>> sorted(all_f, cmp=lambda t1, t2: 0 if t1==t2 else
> ... (1 if (t1, t2) not in greater_thans else -1))
> ['f1', 'f2', 'f3']
>
> Note that the greater_thans set is missing the implication by
> transitivity that f2 > f1, so the given cmp function would
> inconsistently return -1 for both comparisons cmp('f1', 'f2') and
> cmp('f2', 'f1').

Thanks. I will look into this...

Paddy

unread,
Nov 10, 2014, 10:10:03 PM11/10/14
to
On Monday, 10 November 2014 18:45:15 UTC, Paddy wrote:
> Hi, I do agree with Raymond H. about the relative merits of cmp= and key= in sort/sorted, but I decided to also not let natural uses of cmp= pass silently.
>
> In answering this question, http://stackoverflow.com/a/26850434/10562 about ordering subject to inequalities it seemed natural to use the cmp= argument of sort rather than key=.
>
> The question is about merging given inequalities to make 1 inequality such that the inequalities also stays true.
>
>

Thanks Peter, Ian. I have modified my code to expand transitive relations and ask you to view it on stackoverflow via the original link (as posting code on newsgroups is an ugly hack).

My main reason for the post to c.l.p remains though; it seems like a *natural* use of the cmp= comparator function to sorted rather than using key= .

Ian Kelly

unread,
Nov 11, 2014, 1:37:18 AM11/11/14
to Python
On Mon, Nov 10, 2014 at 8:09 PM, Paddy <padd...@gmail.com> wrote:
> On Monday, 10 November 2014 18:45:15 UTC, Paddy wrote:
>> Hi, I do agree with Raymond H. about the relative merits of cmp= and key= in sort/sorted, but I decided to also not let natural uses of cmp= pass silently.
>>
>> In answering this question, http://stackoverflow.com/a/26850434/10562 about ordering subject to inequalities it seemed natural to use the cmp= argument of sort rather than key=.
>>
>> The question is about merging given inequalities to make 1 inequality such that the inequalities also stays true.
>>
>>
>
> Thanks Peter, Ian. I have modified my code to expand transitive relations and ask you to view it on stackoverflow via the original link (as posting code on newsgroups is an ugly hack).

You still run into trouble though if the given inequalities don't
provide enough information for a total ordering. E.g.:

>>> ' > '.join(extract_relations("""f4 > f1
... f2 > f3"""))
'f1 > f2 > f3 > f4'

By adding some debugging prints, we can see what cmp calls were made
by the sort routine and what the results were:

cmp('f2', 'f1') -> 1
cmp('f3', 'f2') -> 1
cmp('f4', 'f3') -> 1

There is no information about the relative order of f2 and f1, so the
cmp function just returns 1 there.
f2 is known to be greater than f3, so that call correctly returns 1.
There is again no information about the relative order of f4 and f3,
so it again just returns 1. However, this is inconsistent with the
first comparison that placed f1 > f2, because it implies that f1 > f4.

As you can see, giving an inconsistent cmp function to sort produces
bogus results. If you only have a partial ordering of the inputs, you
need to make sure that the cmp function you provide is consistent with
*some* total ordering.

Another issue is that your expand_transitive_relations function is I
think O(n**3 log n), which looks unattractive compared to the O(n**2)
topological sort given in the other answers. Another advantage of the
topological sort is that it will detect if the graph is cyclic (i.e.
the input data itself is inconsistent), rather than just return a
bogus output.

> My main reason for the post to c.l.p remains though; it seems like a *natural* use of the cmp= comparator function to sorted rather than using key= .

There are cases where a cmp function is more natural than a key
function, but for these we have the functools.cmp_to_key adapter.

Paddy

unread,
Nov 11, 2014, 2:45:08 AM11/11/14
to
On Tuesday, 11 November 2014 06:37:18 UTC, Ian wrote:
Thanks Ian. The original author states "...and it is sure that the given inputs will give an output, i.e., the inputs will always be valid.", which could be taken as meaning that all inputs are sufficient, well formed, and contain all relations as their first example does.

In that case, expand_transitive_relations is not even needed. Lets say it isn't for the sake of argument, then we are left with the direct use of cmp= versus a conversion to a key= function.

It seems to me that *in this case* the cmp= function naturally flows from the solution algorithm and that cmp_to_key is less so.

Yes, I knew that there are cases where a cmp function is more natural than key; the idea is to squirrel out a few. We have already made the, (well reasoned in my opinion), decision to go down the key= route in Python 3. I also like to track where my algorithms might originally map to cmp=. (It is not often).

My only other case of this type is here: http://stackoverflow.com/questions/15797120/can-this-cmp-function-be-better-written-as-a-key-for-sorted.

Ian Kelly

unread,
Nov 11, 2014, 4:07:14 AM11/11/14
to Python
On Tue, Nov 11, 2014 at 12:44 AM, Paddy <padd...@gmail.com> wrote:
> Thanks Ian. The original author states "...and it is sure that the given inputs will give an output, i.e., the inputs will always be valid.", which could be taken as meaning that all inputs are sufficient, well formed, and contain all relations as their first example does.

Well, I brought it up because the start of that sentence is "There can
be multiple inequalities as answer but I need any one which is
correct...". The only way there would be more than one correct answer
would be if the inputs were only partially ordered. I take the second
part of the sentence as meaning only that the input can be safely
assumed to be consistent.

> Yes, I knew that there are cases where a cmp function is more natural than key; the idea is to squirrel out a few. We have already made the, (well reasoned in my opinion), decision to go down the key= route in Python 3. I also like to track where my algorithms might originally map to cmp=. (It is not often).

Basically any time you have a comparison that isn't easily expressed
by mapping the values to some bunch of ordered objects. As an example
of what I mean, we can easily sort alphabetically using a key
function:

sorted(items, key=str)

And we can easily sort reverse alphabetically:

sorted(items, key=str, reverse=True)

And we can easily sort alphabetically on multiple criteria:

sorted(items, key=lambda x: (str(x.last_name), str(x.first_name)))

But what if we want to combine those last two requirements? Suppose we
have a table that we want to present sorted primarily in ascending
order on one column, and secondarily in descending order on another.
We can't just add the reverse argument argument to the previous
expression, because that would reverse the sort order for both
columns. But there is no simple key function for strings that will
reverse the sort without using the reverse argument.

There are basically two ways to approach this. One is to implement a
cmp function and pass it to cmp_to_key. The other is to create a class
with comparison methods that result in the desired order, and use the
class as the key; this is really just a variation ot the cmp_to_key
approach.

Paddy

unread,
Nov 11, 2014, 4:21:32 AM11/11/14
to
On Tuesday, 11 November 2014 09:07:14 UTC, Ian wrote:
> On Tue, Nov 11, 2014 at 12:44 AM, Paddy <paddyxxx-at-xmail.com> wrote:
> > Thanks Ian. The original author states "...and it is sure that the given inputs will give an output, i.e., the inputs will always be valid.", which could be taken as meaning that all inputs are sufficient, well formed, and contain all relations as their first example does.
>
> Well, I brought it up because the start of that sentence is "There can
> be multiple inequalities as answer but I need any one which is
> correct...". The only way there would be more than one correct answer
> would be if the inputs were only partially ordered. I take the second
> part of the sentence as meaning only that the input can be safely
> assumed to be consistent.
>
> > Yes, I knew that there are cases where a cmp function is more natural than key; the idea is to squirrel out a few. We have already made the, (well reasoned in my opinion), decision to go down the key= route in Python 3. I also like to track where my algorithms might originally map to cmp=. (It is not often).
>
> Basically any time you have a comparison that isn't easily expressed
> by mapping the values to some bunch of ordered objects.

Yep. I want to track when this comes up for me and others during their normal programming rather than in examples made to highlight the issue.

Ian Kelly

unread,
Nov 11, 2014, 1:07:27 PM11/11/14
to Python
The example that I posted is one that I recall being brought up on
this list in the past, but I don't have a link for you.

Paddy

unread,
Nov 12, 2014, 3:09:22 AM11/12/14
to
On Tuesday, 11 November 2014 18:07:27 UTC, Ian wrote:
> The example that I posted is one that I recall being brought up on
> this list in the past, but I don't have a link for you.

THanks Ian for your help in this.
0 new messages