Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Proposal: adding rich comparisons to Python

6 views
Skip to first unread message

David Ascher

unread,
Apr 6, 1998, 3:00:00 AM4/6/98
to

Note: the long-term address of this protocol is:

http://starship.skyport.net/~da/rich/proposal.html

That page also has links to the patches, for those who care.

I'm CC'ing the Matrix-SIG so that those folks can be made aware of the
issue, but I feel the discussion should really take place on the main
list, since it's a Python-wide change for the sake of NumPy (and others).

I await your constructive criticism,

--david

-----------------------------------------------------------------

Proposal for adding of a new "rich comparison" protocol to Python

David Ascher (d...@skivs.ski.org)
version 0.1
April 6, 1998

Abstract

A new mechanism allowing comparisons of Python objects to return
values other than -1, 0, or 1 (or raise exceptions) is proposed.
This mechanism is entirely backwards compatible, and can be
controlled at the level of the C PyObject type or of the Python
class definition. There are three cooperating parts to the
proposed mechanism:

* the use of the last slot in the type object structure to store
a pointer to a rich comparison

* the addition of special methods for classes

* the addition of an optional argument to the builtin 'cmp()'
function.

Motivation

The current comparison protocol for Python objects assumes that
any two Python objects can be compared (as of Python 1.5, object
comparisons can raise exceptions), and that the return value for
any comparison should be -1, 0 or 1. -1 indicates that the first
argument to the comparison function is less than the right one, +1
indicating the contrapositive, and 0 indicating that the two
objects are equal. While this mechanism allows the establishment
of a order relationship (e.g. for use by the sort() method of list
objects), it has proven to be limited in the context of Numeric
Python (NumPy).

Specifically, NumPy allows the creation of multidimensional
arrays, which support most of the numeric operators. Thus::

x = array((1,2,3,4))

y = array((2,2,4,4))

are two NumPy arrays. While they can be added elementwise,::

z = x + y # z == array((3,4,7,8))

they cannot be compared in the current framework - the released
version of NumPy compares the *pointers*, which was the only
solution before the recent addition of the ability (in
1.5) to raise exceptions in comparison functions.

Even with the ability to raise exceptions, the current protocol
makes array comparisons useless. To deal with this fact, NumPy
includes several functions which perform the comparisons:
'less()', 'less_equal()', 'greater()', 'greater_equal()',
'equal()', 'not_equal()'. These functions return arrays with the
same shape as their arguments (modulo broadcasting), filled with
0's and 1's depending on whether the comparison is true or not for
each element pair. Thus, for example, using the arrays 'x' and
'y' defined above: 'less(x,y)' would be an array containing the
numbers '(1,0,0,0)'.

The current proposal is to modify the Python object interface to
allow the NumPy package to make it so that x < y returns the same
thing as 'less(x,y)'. The exact return value is up to the NumPy
package -- what this proposal *really asks for* is changing the
Python core so that extension objects have *the ability* to return
something other than -1, 0, 1, should their authors choose to do
so.

Current State of Affairs

The current protocol is, at the C level, that each object type
defines a tp_compare slot, which is a pointer to a function which
takes two 'PyObject*' references and returns -1, 0, or 1. This
function is called by the 'PyObject_Compare()' function defined in
the C API. 'PyObject_Compare()' is also called by the builtin
function 'cmp()' which takes two arguments.

Proposed Mechanism

1 Changes to the C structure for type objects

The last availabel slot in the PyTypeObject, reserved up to now
for future expansion, is used to optionally store a pointer to a
new comparison function, of type 'richcmpfunc' defined by::

typedef PyObject *(*richcmpfunc) Py_PROTO((PyObject *, PyObject *, int));

This function takes three arguments. The first two are the
objects to be compared, and the third is an integer corresponding
to an opcode (one of LT, LE, EQ, NE, GT, GE). If this slot is
left NULL, then rich comparison for that object type is not
supported (except for class instances whose class provide the
special methods described below).

The above opcodes need to be added to the published Python/C API
(probably under the names 'Py_LT', 'Py_LE', etc.)

2 Additions of special methods for classes

Classes wishing to support the rich comparison mechanisms must add
one or more of the following new special methods::

def __less__(self, other):
...
def __less_equal__(self, other):
...
def __greater__(self, other):
...
def __greater__equal(self, other):
...
def __equal__(self, other):
...
def __not_equal__(self, other):
...

Each of these is called when the class instance is the on the
left-hand-side of the corresponding operators ('<', '<=', '>',
'>=', '==', and '!=' or '<>'). The argument 'other' is set to
the object on the right side of the operator. The return value of
these methods is up to the class implementor (after all, that's
the entire point of the proposal).

If the object on the left side of the operator does not define an
appropriate rich comparison operator (either at the C level or
with one of the special methods, then the comparison is reversed,
and the right hand operator is called with the *opposite*
operator, and the two objects are swapped. This assumes that 'a <
b' and 'b >= a' are equivalent, as are 'a > b' and 'b <= a', and that
'==' and '!=' are commutative.

For example, if 'obj1' is an object which supports the rich
comparison protocol and x and y are objects which do not support the
rich comparison protocol, then 'obj1 < x' will call the '__less__'
method of 'obj1' with x as the second argument. 'x < obj1' will
call obj1's '__greater_equal__' method with x as a second
argument, and 'x < y' will just use the existing (non-rich)
comparison mechanism.

The above mechanism is such that classes can get away with not
implementing either '__less__' and '__less_equal' or '__greater__'
and '__greater_equal__'. Further smarts could have been added to
the comparison mechanism, but this limited set of allowed "swaps"
was chosen because it doesn't require the infrastructure to do any
processing (negation) of return values. The choice of six special
methods was made over a single (e.g. '__richcmp__') method to
allow the dispatching on the opcode to be performed at the level
of the C implementation rather than the user-defined method.

3 Addition of an optional argument to the builtin 'cmp()'

The builtin cmp() is still used for simple comparisons. For rich
comparisons, it is called with a third argument, one of "<", "<=",
">", ">=", "==", "!=", "<>" (the last two have the same meaning).
When called with one of these strings as the third argument, cmp()
can return any Python object. Otherwise, it can only return -1, 0
or 1 as before.

Consequence Regarding Chained Comparisons

Problem

Objects for which the comparison returns something other than -1,
0, or 1 may be used in chained comparisons, such as::

x < y > z

This is interpreted by Python as::

if x < y:
if y > z:
return y > z # 1 for plain comparisons
else:
return y > z # 0 for plain comparisons
else:
return x < y # 0 for plain comparisons

Note that this requires testing the truth value of the result of
comparisons, with potential "shortcutting" of the right-side
comparison testings. In other words, the truth-value of the
result of a comparison determines the result of a chained
operation. This is problematic in the case of arrays, since if
'x', 'y' and 'z' are three arrays, then the user expects::

x < y < z

to be an array of 0's and 1's where 1's are in the locations
corresponding to the elements of 'y' which are between the
corresponding elements in 'x' and 'z'. In other words, the
right-hand side must be evaluated regardless of the result of 'x < y',
which is incompatible with the mechanism currently in use by the
parser.

Solution

If comparisons return objects which raise an exception when their
truth condition is being queried, then no user code will yield
surprising results. The use of helper functions (e.g.
'chain_cmp(x, "<", y, ">", z)') is needed to fulfill the required
functionality.

Patches

I have patches against Python 1.5 which implement this proposal
(modulo bugs).
The following files were modified:

* Include/object.h
* Objects/object.c
* Python/bltinmodule.c
* Python/ceval.c

I also have patches which allow the use of this feature in NumPy,
but I'll delay discussion of those until this issue is settled (I
have bigger plans for NumPy =).

Note that these patches are not what I propose for the final version
-- for one there's at least one place where I'm not sure what the
right behavior is, and for another I don't publish the opcodes in the
Python API. That's trivial and left up to Guido to decide on the
names.

Michael Hudson

unread,
Apr 19, 1998, 3:00:00 AM4/19/98
to David Ascher

Now, I am well prepared to admit that I could be missing the point
entirely, but what then would be the meaning of


x = array((1,2,3,4))
y = array((2,2,4,4))

if x<y:
fooey()

Also this seems to make chained comparisions overly complex, IMHO.#
Regards
Michael Hudson
Jesus College
Cambridge
mw...@cam.ac.uk

0 new messages