For the first time I have come across a Python feature that seems
completely wrong. After the introduction of rich comparisons, equality
comparison does not have to return a truth value, and may indeed return
nothing at all and throw an error instead. As a result, code like
if foo == bar:
or
foo in alist
cannot be relied on to work.
This is clearly no accident. According to the documentation all comparison
operators are allowed to return non-booleans, or to throw errors. There is
explicitly no guarantee that x == x is True.
Personally I would like to get these !@#$%&* misfeatures removed, and
constrain the __eq__ function to always return a truth value. That is
clearly not likely to happen. Unless I have misunderstood something, could
somebody explain to me
1) Why was this introduced? I can understand relaxing the restrictions on
'<', '<=' etc. - after all you cannot define an ordering for all types of
object. But surely you can define an equal/unequal classification for all
types of object, if you want to? Is it just the numpy people wanting to
type 'a == b' instead of 'equals(a,b)', or is there a better reason?
2) If I want to write generic code, can I somehow work around the fact
that
if foo == bar:
or
foo in alist
does not work for arbitrary objects?
Yours,
Rasmus
Some details:
CCPN has a table display class that maintains a list of arbitrary objects,
one per line in the table. The table class is completely generic, and
subclassed for individual cases. It contains the code:
if foo in tbllist:
...
else:
...
tbllist.append(foo)
...
One day the 'if' statement gave this rather obscure error:
"ValueError:
The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()"
A subclass had used objects passed in from some third party code, and as
it turned out foo happened to be a tuple containing a tuple containing a
numpy array.
Some more precise tests gave the following:
# Python 2.5.2 (r252:60911, Jul 31 2008, 17:31:22)
# [GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
# set up
import numpy
a = float('NaN')
b = float('NaN')
ll = [a,b]
c = numpy.zeros((2,3))
d = numpy.zeros((2,3))
mm = [c,d]
# try NaN
print (a == a) # gives False
print (a is a) # gives True
print (a == b) # gives False
print (a is b) # gives False
print (a in ll) # gives True
print (b in ll) # gives True
print (ll.index(a)) # gives 0
print (ll.index(b)) # gives 1
# try numpy array
print (c is c) # gives True
print (c is d) # gives False
print (c in mm) # gives True
print (mm.index(c)) # 0
print (c == c) # gives [[ True True True][ True True True]]
print (c == d) # gives [[ True True True][ True True True]]
print (bool(1 == c)) # raises error - see below
print (d in mm) # raises error - see below
print (mm.index(d)) # raises error - see below
print (c in ll) # raises error - see below
print (ll.index(c)) # raises error - see below
The error was the same in each case:
"ValueError:
The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()"
---------------------------------------------------------------------------
Dr. Rasmus H. Fogh Email: r.h....@bioc.cam.ac.uk
Dept. of Biochemistry, University of Cambridge,
80 Tennis Court Road, Cambridge CB2 1GA, UK. FAX (01223)766002
You have touched on a real and known issue that accompanies dynamic
typing and the design of Python. *Every* Python function can return any
Python object and may raise any exception either actively, by design, or
passively, by not catching exceptions raised in the functions *it* calls.
> Personally I would like to get these !@#$%&* misfeatures removed,
What you are calling a misfeature is an absence, not a presence that can
be removed.
> and constrain the __eq__ function to always return a truth value.
It is impossible to do that with certainty by any mechanical
creation-time checking. So the implementation of operator.eq would have
to check the return value of the ob.__eq__ function it calls *every
time*. That would slow down the speed of the 99.xx% of cases where the
check is not needed and would still not prevent exceptions. And if the
return value was bad, all operator.eq could do is raise and exception
anyway.
> That is clearly not likely to happen. Unless I have misunderstood something, could
> somebody explain to me.
a. See above.
b. Python programmers are allowed to define 'weird' but possibly
useful-in-context behaviors, such as try out 3-value logic, or to
operate on collections element by element (as with numpy).
> 1) Why was this introduced?
The 6 comparisons were previously done with one __cmp__ function that
was supposed to return -1, 0, or 1 and which worked with negative, 0, or
positive response, but which could return anything or raise an
exception. The compare functions could mask but not prevent weird returns.
I can understand relaxing the restrictions on
> '<', '<=' etc. - after all you cannot define an ordering for all types of
> object. But surely you can define an equal/unequal classification for all
> types of object, if you want to? Is it just the numpy people wanting to
> type 'a == b' instead of 'equals(a,b)', or is there a better reason?
>
> 2) If I want to write generic code, can I somehow work around the fact
> that
> if foo == bar:
> or
> foo in alist
> does not work for arbitrary objects?
Every Python function is 'generic' unless restrained by type tests.
However, even 'generic' functions can only work as expected with objects
that meet the assumptions embodied in the function. In my Python-based
algorithm book-in-progess, I am stating this explicitly. In particular,
I say taht the book only applies to objects for which '==' gives a
boolean result that is reflexive, symmetric, and transitive. This
exludes float('nan'), for instance (as I see you discovered), which
follows the IEEE mandate to act otherwise.
> CCPN has a table display class that maintains a list of arbitrary objects,
> one per line in the table. The table class is completely generic,
but only for the objects that meet the implied assumption. This is true
for *all* Python code. If you want to apply the function to other
objects, you must either adapt the function or adapt or wrap the objects
to give them an interface that does meet the assumptions.
> and subclassed for individual cases. It contains the code:
>
> if foo in tbllist:
> ...
> else:
> ...
> tbllist.append(foo)
> ...
>
> One day the 'if' statement gave this rather obscure error:
> "ValueError:
> The truth value of an array with more than one element is ambiguous.
> Use a.any() or a.all()"
> A subclass had used objects passed in from some third party code, and as
> it turned out foo happened to be a tuple containing a tuple containing a
> numpy array.
Right. 'in' calls '==' and assumes a boolean return. Assumption
violated, exception raised. Completely normal. The error message even
suggests a solution: wrap the offending objects in an adaptor class that
gives them a normal interface with .all (or perhaps the all() builtin).
Terry Jan Reedy
That's not quite true. Rich comparisons explicitly allow non-boolean return
values. Breaking up __cmp__ into multiple __special__ methods was not the sole
purpose of rich comparisons. One of the prime examples at the time was numpy
(well, Numeric at the time). We wanted to use == to be able to return an array
with boolean values where the two operand arrays were equal. E.g.
In [1]: from numpy import *
In [2]: array([1, 2, 3]) == array([4, 2, 3])
Out[2]: array([False, True, True], dtype=bool)
SQLAlchemy uses these operators to build up objects that will be turned into SQL
expressions.
>>> print users.c.id==addresses.c.user_id
users.id = addresses.user_id
Basically, the idea was to turn these operators into full-fledged operators like
+-/*. Returning a non-boolean violates neither the letter, nor the spirit of the
feature.
Unfortunately, if you do overload __eq__ to build up expressions or whatnot, the
other places where users of __eq__ are implicitly expecting a boolean break.
While I was (and am) a supporter of rich comparisons, I feel Rasmus's pain from
time to time. It would be nice to have an alternate method to express the
boolean "yes, this thing is equal in value to that other thing". Unfortunately,
I haven't figured out a good way to fit it in now without sacrificing rich
comparisons entirely.
>> and constrain the __eq__ function to always return a truth value.
>
> It is impossible to do that with certainty by any mechanical
> creation-time checking. So the implementation of operator.eq would have
> to check the return value of the ob.__eq__ function it calls *every
> time*. That would slow down the speed of the 99.xx% of cases where the
> check is not needed and would still not prevent exceptions. And if the
> return value was bad, all operator.eq could do is raise and exception
> anyway.
Sure, but then it would be a bug to return a non-boolean from __eq__ and
friends. It is not a bug today. I think that's what Rasmus is proposing.
--
Robert Kern
"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
I'm not a computer scientist, so my language and perspective on the
topic may be a bit naive, but I'll try to demonstrate my caveman
understanding example.
First, here is why the ability to throw an error is a feature:
class Apple(object):
def __init__(self, appleness):
self.appleness = appleness
def __cmp__(self, other):
assert isinstance(other, Apple), 'must compare apples to apples'
return cmp(self.appleness, other.appleness)
class Orange(object): pass
Apple(42) == Orange()
Second, consider that any value in python also evaluates to a truth
value in boolean context.
Third, every function returns something. A function's returning nothing
is not a possibility in the python language. None is something but
evaluates to False in boolean context.
> But surely you can define an equal/unequal classification for all
> types of object, if you want to?
This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
Even in the realm of pure mathematics, the generality of objects (i.e.
numbers) can not be assumed.
James
--
James Stroud
UCLA-DOE Institute for Genomics and Proteomics
Box 951570
Los Angeles, CA 90095
The best way, IMHO, would have been to use an alternative notation in
numpy and SQLalchemy, and have '==' always return only a truth value - it
could be a non-boolean as long as the bool() function gave the correct
result. Surely the extra convenience of overloading '==' in special cases
was not worth breaking such basic operations as 'bool(x == y)' or
'x in alist'. Again, the problem is only with '==', not with '>', '<='
etc. Of course it is done now, and unlikely to be reversed.
>>> and constrain the __eq__ function to always return a truth value.
>>
>> It is impossible to do that with certainty by any mechanical
>> creation-time checking. So the implementation of operator.eq would
>> have to check the return value of the ob.__eq__ function it calls *every
>> time*. That would slow down the speed of the 99.xx% of cases where the
>> check is not needed and would still not prevent exceptions. And if the
>> return value was bad, all operator.eq could do is raise and exception
>> anyway.
>
>Sure, but then it would be a bug to return a non-boolean from __eq__ and
>friends. It is not a bug today. I think that's what Rasmus is proposing.
Yes, that is the point. If __eq__ functions are *supposed* to return
booleans I can write generic code that will work for well-behaved objects,
and any errors will be somebody elses fault. If __eq__ is free to return
anything, or throw an error, it becomes my responsibility to write generic
code that will work anyway, including with floating point numbers, numpy,
or SQLalchemy. And I cannot see any way to do that (suggestions welcome).
If purportedly general code does not work with numpy, your average numpy
user will not be receptive to the idea that it is all numpys fault.
Current behaviour is both inconsistent and counterintuitive, as these
examples show.
>>> x = float('NaN')
>>> x == x
False
>>> ll = [x]
>>> x in ll
True
>>> x == ll[0]
False
>>> import numpy
>>> y = numpy.zeros((3,))
>>> y
array([ 0., 0., 0.])
>>> bool(y==y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
>>> ll1 = [y,1]
>>> y in ll1
True
>>> ll2 = [1,y]
>>> y in ll2
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is
ambiguous. Use a.any() or a.all()
>>>
Can anybody see a way this could be fixed (please)? I may well have to
live with it, but I would really prefer not to.
> class Orange(object): pass
> Apple(42) == Orange()
True, but that does not hold for __eq__, only for __cmp__, and
for__gt__, __le__, etc.
Consider:
Class Apple(object):
def __init__(self, appleness):
self.appleness = appleness
def __gt__(self, other):
assert isinstance(other, Apple), 'must compare apples to apples'
return (self.appleness > other.appleness)
def __eq__(self, other):
if isinstance(other, Apple):
return (self.appleness == other.appleness)
else:
return False
> Second, consider that any value in python also evaluates to a truth
> value in boolean context.
>
> Third, every function returns something. A function's returning nothing
> is not a possibility in the python language. None is something but
> evaluates to False in boolean context.
Indeed. The requirement would be not that return_value was a boolean, but
that bool(return_value) was defined and gave the correct result. I
understand that in some old Numeric/numpy version the numpy array __eq__
function returned a non-empty array, so that
bool(numarray1 == numarray2)
was true for any pair of arguments, which is one way of breaking '=='.
In current numpy, even
bool(numarray1 == 1)
throws an error, which is another way of breaking '=='.
>> But surely you can define an equal/unequal classification for all
>> types of object, if you want to?
> This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
> Even in the realm of pure mathematics, the generality of objects (i.e.
> numbers) can not be assumed.
It sounds like that problem is simpler in computing. sqrt(32) evaluates to
5.6568542494923806 on my computer. A complex number c with non-zero
imaginary part would be unequal to sqrt(32) even if it so happened that
c*c==32.
Yours,
Rasmus
Quoting James Stroud <jst...@mbi.ucla.edu>:
> First, here is why the ability to throw an error is a feature:
>
> class Apple(object):
> def __init__(self, appleness):
> self.appleness = appleness
> def __cmp__(self, other):
> assert isinstance(other, Apple), 'must compare apples to apples'
> return cmp(self.appleness, other.appleness)
>
> class Orange(object): pass
>
> Apple(42) == Orange()
I beg to disagree.
The right answer for the question "Am I equal to this chair right here?" is not
"I don't know", nor "I can't compare". The answer is "No, I'm not a chair, thus
I'm not equal to this chair right here". If someone comes to my house, looking
for me, he will not run away because he sees a chair before he sees me. Your
assert doesn't belong inside the methot, it should be up to the caller to decide
if the human-chair comparisons make sense or not. I certainly don't want to be
type-checking when looking for an object within a mixed-type collection.
> This reminds me of complex numbers: would 4 + 4i be equal to sqrt(32)?
I assume you meant sqrt(32i).
Well, sqrt is a function, and if its result value is defined as 4+4i, then the
answer is 'yes', otherwise, the answer should be no.
sqrt(4) is *not* -2, and should not be equal to -2. The standard definition of
the square root _function_ for real numbers is to take the non-negative real
root. I haven't heard of a standard square root _function_ for complex numbers
(there is of course, a definition of square root, but it is not a function).
So, if by your definition of sqrt, sqrt(32i) returns a number, there is no
ambiguity. -2 is not sqrt(4). If you need the answer to be 'True', you may be
asking the wrong question.
> Jamed Stroud Wrote:
...
>> Second, consider that any value in python also evaluates to a truth
>> value in boolean context.
But bool(x) can fail too. So not every object in Python can be
interpreted as a truth value.
>> Third, every function returns something.
Unless it doesn't return at all.
>> A function's returning nothing
>> is not a possibility in the python language. None is something but
>> evaluates to False in boolean context.
>
> Indeed. The requirement would be not that return_value was a boolean,
> but that bool(return_value) was defined and gave the correct result.
If __bool__ or __nonzero__ raises an exception, you would like Python to
ignore the exception and return True or False. Which should it be? How do
you know what the correct result should be?
From the Zen of Python:
"In the face of ambiguity, refuse the temptation to guess."
All binary operators are ambiguous when dealing with vector or array
operands. Should the operator operate on the array as a whole, or on each
element? The numpy people have decided that element-wise equality testing
is more useful for them, and this is their prerogative to do so. In fact,
the move to rich comparisons was driven by the needs of numpy.
http://www.python.org/dev/peps/pep-0207/
It is a *VERY* important third-party library, and this was not the first
and probably won't be the last time that their needs will move into
Python the language.
Python encourages such domain-specific behaviour. In fact, that's what
operator-overloading is all about: classes can define what any operator
means for *them*. There's no requirement that the infinity of potential
classes must all define operators in a mutually compatible fashion, not
even for comparison operators.
For example, consider a class implementing one particular version of
three-value logic. It isn't enough for == to only return True or False,
because you also need Maybe:
True == False => returns False
True == True => returns True
True == Maybe => returns Maybe
etc.
Or consider fuzzy logic, where instead of two truth values, you have a
continuum of truth values between 0.0 and 1.0. What should comparing two
such fuzzy values for equality return? A boolean True/False? Another
fuzzy value?
Another one from the Zen:
"Special cases aren't special enough to break the rules."
The rules are that classes can customize their behaviour, that methods
can fail, and that Python should not try to guess what the correct value
should have been in the event of such a failure. Equality is a special
case, but it isn't so special that it needs to be an exception from those
rules.
If you really need a guaranteed-can't-fail[1] equality test, try
something like this untested wrapper class:
class EqualityWrapper(object):
def __init__(self, obj):
self.wrapped = obj
def __eq__(self, other):
try:
return bool(self.wrapped == other)
except Exception:
return False # or maybe True?
Now wrap all your data:
data = [a list of arbitrary objects]
data = map(EqualityWrapper, data)
process(data)
[1] Not a guarantee.
--
Steven
> http://www.python.org/dev/peps/pep-0207/
> [1] Not a guarantee.
Well, lots to think about.
Just to keep you from shooting at straw men:
I would have liked it to be part of the design contract (a convention, if
you like) that
1) bool(x == y) should return a boolean and never throw an error
2) x == x return True
I do *not* say that bool(x) should never throw an error.
I do *not* say that Python should guess a return value if an __eq__
function throws an error, only that it should have been considered a bug,
or at least bad form, for __eq__ functions to do so.
What might be a sensible behaviour (unlike your proposed wrapper) would be
the following:
def eq(x, y):
if x is y:
return True
else:
try:
return (x == y)
except Exception:
return False
If is is possible to change the language, how about having two
diferent functions, one for overloading the '==' operator, and another
for testing list and set membership, dictionary key identity, etc.?
For instance like this
- Add a new function __equals__; x.__equals__(y) could default to
bool(x.__eq__(y))
- Estalish by convention that x.__equals__(y) must return a boolean and
may not intentionally throw an error.
- Establish by convention that 'x is y' implies 'x.__equals__(y)'
in the sense that (not (x is y and not x.__equals__(y)) must always hold
- Have the Python data structures call __equals__ when they want to
compare objects internally (e.g. for 'x in alist', 'x in adict',
'set(alist)', etc.
- Provide an equals(x,y) built-in that calls the __equals__ function
- numpy and others who (mis)use '==' for their own purposes could use
def __equals__(self, other): return (self is other)
For the float NaN case it looks like things are already behaving like
this. For numpy objects you would not lose anything, since
'numpyArray in alist' is broken anyway.
I still think it is a bad choice that numpy got to write
array1 == array2
for their purposes, while everybody else has to use
if equals(x, y):
but at least both sides could get the behaviour they want.
> If is is possible to change the language, how about having two
> diferent functions, one for overloading the '==' operator, and another
> for testing list and set membership, dictionary key identity, etc.?
I've often thought that this would have made a lot of sense too,
though
I'd probably choose to spell the well-behaved structural equality "=="
and the flexible numeric equality "eq" (a la Fortran). Hey, we could
have *six* new keywords: eq, ne, le, lt, ge, gt!
See the recent (September?) thread "Comparing float and decimal"
for some of the fun that results from lack of transitivity of
equality.
But I think there's essentially no chance of Python changing to
support this. And even if there were, Python's conflation of
structural equality with numeric equality brings significant
benefits in terms of readability of code, ease of learning,
and general friendliness; it's only really troublesome in
a few corner cases. Is the tradeoff worth it?
So for me, this comes down to a case of 'practicality beats purity'.
Mark
> Current behaviour is both inconsistent and counterintuitive, as these
> examples show.
>
>>>> x = float('NaN')
>>>> x == x
> False
Blame IEEE for that one. Rich comparisons have nothing to do with that one.
Make a concrete proposal for fixing it that does not break backwards compatibility.
No, I definitely didn't mean sqrt(32i). I'm using sqrt() to represent
the mathematical square root, and not an arbitrary function one might
define, by the way.
My point is that 4 + 4i, sqrt(32), and sqrt(-32) all exist in different
spaces. They are not comparable, even when testing for equality in a
pure mathematical sense. If when encounter these values in our programs,
we might like the power to decide the results of these comparisons. In
one context it might make sense to throw an exception, in another, it
might make sense to return False based on the fact that we consider them
different "types", in yet another context, it might make sense to look
at complex plane values as vectors and return their scalar magnitude for
comparison to real numbers. I think this ability to define the results
of comparisons is not a shortcoming of the language but a strength.
Perhaps this should raise an exception? I think the problem is not with
comparisons in general but with the fact that nan is type float:
py> type(float('NaN'))
<type 'float'>
No float can be equal to nan, but nan is a float. How can something be
not a number and a float at the same time? The illogicality of nan's
type creates the possibility for the illogical results of comparisons to
nan including comparing nan to itself.
>>>> ll = [x]
>>>> x in ll
> True
>>>> x == ll[0]
> False
But there is consistency on the basis of identity which is the test for
containment (in):
py> x is x
True
py> x in [x]
True
Identity and equality are two different concepts. Comparing identity to
equality is like comparing apples to oranges ;o)
>
>>>> import numpy
>>>> y = numpy.zeros((3,))
>>>> y
> array([ 0., 0., 0.])
>>>> bool(y==y)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
But the equality test is not what fails here. It's the cast to bool that
fails, which for numpy works like a unary ufunc. The designers of numpy
thought that this would be a more desirable behavior. The test for
equality likewise is a binary ufunc and the behavior was chosen in numpy
for practical reasons. I don't know if you can overload the == operator
in C, but if you can, you would be able to achieve the same behavior.
>>>> ll1 = [y,1]
>>>> y in ll1
> True
>>>> ll2 = [1,y]
>>>> y in ll2
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: The truth value of an array with more than one element is
> ambiguous. Use a.any() or a.all()
I think you could be safe calling this a bug with numpy. But the fact
that someone can create a bug with a language is not a condemnation of
the language. For example, C makes it real easy to crash a program by
overrunning the limits of an array, but no one would suggest to remove
arrays from C.
> Can anybody see a way this could be fixed (please)? I may well have to
> live with it, but I would really prefer not to.
Your only hope is to somehow convince the language designers to remove
the ability to overload == then get them to agree on what you think the
proper behavior should be for comparisons. I think the probability of
that happening is about zero, though, because such a change would run
counter to the dynamic nature of the language.
James
>>> Personally I would like to get these !@#$%&* misfeatures removed,
>>
>> What you are calling a misfeature is an absence, not a presence that
>> can be removed.
>
> That's not quite true.
In what way, pray tell. My statement still looks quite true to me.
> Rich comparisons explicitly allow non-boolean return values.
They do so by not doing anything to the return value of the underlying
method. As I said, the OP is complaining about an absence of a check.
Moreover, the absence is intentional as I explained in the part snipped
and as you further explained.
>> And if the return value was bad, all operator.eq could do is raise and
>> exception anyway.
>
> Sure, but then it would be a bug to return a non-boolean from __eq__ and
> friends. It is not a bug today. I think that's what Rasmus is proposing.
Right, the addition of a check that is absent today.
tjr
Scratch that. Not thinking and typing at same time.
>
> Can anybody see a way this could be fixed (please)? I may well have to
> live with it, but I would really prefer not to.
I made a suggestion in my first response, which perhaps you missed.
tjr
> Rasmus Fogh wrote:
>
>> Current behaviour is both inconsistent and counterintuitive, as these
>> examples show.
>>
>>>>> x = float('NaN')
>>>>> x == x
>> False
>
> Blame IEEE for that one. Rich comparisons have nothing to do with that
> one.
There is nothing to blame them for. This is the correct behaviour. NaNs
should *not* compare equal to themselves, that's mathematically
incoherent.
--
Steven
I initially thought that looked like a bug to me. But, this is
apparently standard behavior required for "NaN". I'm only using
Wikipedia as a reference here, but about 80% of the way down, under
"standard operations":
http://en.wikipedia.org/wiki/IEEE_754-1985
"Comparison operations. NaN is treated specially in that NaN=NaN always
returns false."
Presumably since floating point calculations return "NaN" for some
operations, and one "Nan" is usually not equal to another, this is the
required behavior. So not a Python issue (though understandably a bit
confusing).
The array issue seems to be with one 3rd party library, and one can
choose to use or not use their library, to ask them to change it, or
even to decide to override their == operator, if one doesn't like the
way it is designed.
Sorry, I should explain why.
Given:
x = log(-5) # a NaN
y = log(-2) # the same NaN
x == y # Some people want this to be true for NaNs.
Then:
# Compare x and y directly.
log(-5) == log(-2)
# If x == y then exp(x) == exp(y) for all x, y.
exp(log(-5)) == exp(log(-2))
-5 == -2
and now the entire foundations of mathematics collapses into a steaming
pile of rubble.
--
Steven
> Rasmus Fogh wrote:
>> Current behaviour is both inconsistent and counterintuitive, as these
>> examples show.
>>
>>>>> x = float('NaN')
>>>>> x == x
>> False
>
> Perhaps this should raise an exception?
Why on earth would you want checking equality on NaN to raise an
exception??? What benefit does it give?
> I think the problem is not with
> comparisons in general but with the fact that nan is type float:
>
> py> type(float('NaN'))
> <type 'float'>
>
> No float can be equal to nan, but nan is a float. How can something be
> not a number and a float at the same time?
Because floats are not real numbers. They are *almost* numbers, they
often (but not always) behave like numbers, but they're actually not
numbers.
The difference is subtle enough that it is easy to forget that floats are
not numbers, but it's easy enough to find examples proving it:
Some perfectly good numbers don't exist as floats:
>>> 2**-10000 == 0.0
True
Try as you might, you can't get the number 0.1 *exactly* as a float:
>>> 0.1
0.10000000000000001
For any numbers x and y not equal to zero, x+y != x. But that fails for
floats:
>>> 1001.0 + 1e99 == 1e99
True
The above is because of overflow. But even avoiding overflow doesn't
solve the problem. With a little effort, you can also find examples of
"ordinary sized" floats where (x+y)-y != x.
>>> 0.9+0.1-0.9 == 0.1
False
>>>>> import numpy
>>>>> y = numpy.zeros((3,))
>>>>> y
>> array([ 0., 0., 0.])
>>>>> bool(y==y)
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()
>
> But the equality test is not what fails here. It's the cast to bool that
> fails
And it is right to do so, because it is ambiguous and the library
designers rightly avoided the temptation of guessing what result is
needed.
>>>>> ll1 = [y,1]
>>>>> y in ll1
>> True
>>>>> ll2 = [1,y]
>>>>> y in ll2
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> ValueError: The truth value of an array with more than one element is
>> ambiguous. Use a.any() or a.all()
>
> I think you could be safe calling this a bug with numpy.
Only in the sense that there are special cases where the array elements
are all true, or all false, and numpy *could* safely return a bool. But
special cases are not special enough to break the rules. Better for the
numpy caller to write this:
a.all() # or any()
instead of:
try:
bool(a)
except ValueError:
a.all()
as they would need to do if numpy sometimes returned a bool and sometimes
raised an exception.
--
Steven
I didn't mean to suggest that it was incorrect, just that that particular
surprising behavior is not related to rich comparisons. Even if the OP gets an
__equals__() or some such, NaN will still not compare equal to NaN.
And why doesn't this happen with the current behavior if x = y = log
(-5) ? According to the same proof, -5 != -5.
George
There is an explicit policy that __eq__() methods can return non-bools for
various purposes. I consider that policy to a "presence that can be removed".
There is no check because that policy exists, not the other way around.
Anyways, this is really a semantic digression, and not particularly important.
Peace?
I'm missing how a.all() solves the problem Rasmus describes, namely that
the order of a python *list* affects the results of containment tests by
numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
results in his example. It still seems like a bug in numpy to me, even
if too much other stuff is broken if you fix it (in which case it
apparently becomes an "issue").
James
It's an issue, if anything, not a bug. There is no consistent implementation of
bool(some_array) that works in all cases. numpy's predecessor Numeric used to
implement this as returning True if at least one element was non-zero. This
works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does not
work well for bool(x==y) (which should be (x==y).all()), but many people got
confused and thought that bool(x==y) worked. When we made numpy, we decided to
explicitly not allow bool(some_array) so that people will not write buggy code
like this again.
The deficiency is in the feature of rich comparisons, not numpy's implementation
of it. __eq__() is allowed to return non-booleans; however, there are some parts
of Python's implementation like list.__contains__() that still expect the return
value of __eq__() to be meaningfully cast to a boolean.
You have explained
py> 112 = [1, y]
py> y in 112
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: The truth value of an array with more than one element is...
but not
py> ll1 = [y,1]
py> y in ll1
True
It's this discrepancy that seems like a bug, not that a ValueError is
raised in the former case, which is perfectly reasonable to me.
All I can imagine is that something like the following lives in the
bowels of the python code for list:
def __contains__(self, other):
foundit = False
for i, v in enumerate(self):
if i == 0:
# evaluates to bool numpy array
foundit = one_kind_of_test(v, other)
else:
# raises exception for numpy array
foundit = another_kind_of_test(v, other)
if foundit:
break
return foundit
I'm trying to imagine some other way to get the results mentioned but I
honestly can't. It's beyond me why someone would do such a thing, but
perhaps it's an optimization of some sort.
James
> Just to keep you from shooting at straw men:
>
> I would have liked it to be part of the design contract (a convention,
> if you like) that
> 1) bool(x == y) should return a boolean and never throw an error
Can't be done without making bool a "magic function". If x==y raises an
exception, bool() won't even be called. The only way around that would be
for the Python compiler to recognise bool(x=y) and perform special magic.
What if you did this?
trueorfalse = bool # I don't like George Boole
trueoffalse( [x][0].__class__.__getattr__('__dict__')['__eq__'](y) )
Should that have special magic performed too? Just how much work must the
compiler put in to special-casing bool?
> 2) x == x return True
Which goes against the IEEE 754 floating-point standard.
http://grouper.ieee.org/groups/754/
Python used to optimize x==x and always return True. This was removed
because it caused problems.
> I do *not* say that bool(x) should never throw an error. I do *not* say
> that Python should guess a return value if an __eq__ function throws an
> error,
But to get what you want, the above is implied.
I suppose, just barely, that you could avoid making bool() magic and just
make if magic. When the compiler sees "if expr": it could swallow all
exceptions inside expr and force it to evaluate to True or False. (How?
By guessing? Randomly?) This would cause many problems, but it could be
done, and much easier than ensuring that bool(x) always succeeds.
> only that it should have been considered a bug, or at least bad
> form, for __eq__ functions to do so.
It's certainly *unusual* for comparisons to return non-bools, but it's
not bad form.
> What might be a sensible behaviour (unlike your proposed wrapper)
What do you dislike about my wrapper class? Perhaps it is fixable.
> would be the following:
>
> def eq(x, y):
> if x is y:
> return True
I've already mentioned NaNs. Sentinel values also sometimes need to
compare not equal with themselves. Forcing them to compare equal will
cause breakage.
> else:
> try:
> return (x == y)
> except Exception:
> return False
Why False? Why not True? If an error occurs inside __eq__, how do you
know that the correct result was False?
class Broken(object):
def __eq__(self, other):
return Treu # oops, raises NameError
--
Steven
Nothing to do with numpy. list.__contains__() checks for identity with "is"
before it goes to __eq__().
...but only for the first element of the list:
py> import numpy
py> y = numpy.array([1,2,3])
py> y
array([1, 2, 3])
py> y in [1, y]
------------------------------------------------------------
Traceback (most recent call last):
File "<ipython console>", line 1, in <module>
<type 'exceptions.ValueError'>: The truth value of an array with more
than one element is ambiguous. Use a.any() or a.all()
py> y is [1, y][1]
True
I think it skips straight to __eq__ if the element is not the first in
the list. That no one acknowledges this makes me feel like a conspiracy
is afoot.
No, it doesn't skip straight to __eq__(). "y is 1" returns False, so (y==1) is
checked. When y is a numpy array, this returns an array of bools.
list.__contains__() tries to convert this array to a bool and
ndarray.__nonzero__() raises the exception.
list.__contains__() checks "is" then __eq__() for each element before moving on
to the next element. It does not try "is" for all elements, then try __eq__()
for all elements.
> That no one acknowledges this makes me feel like a conspiracy
> is afoot.
I don't know what you think I'm not acknowledging.
Ok. Thanks for the explanation.
> > That no one acknowledges this makes me feel like a conspiracy
> > is afoot.
>
> I don't know what you think I'm not acknowledging.
Sorry. That was a failed attempt at humor.
James
>>>> Rasmus Fogh wrote:
>>>>>>>> ll1 = [y,1]
>>>>>>>> y in ll1
>>>>> True
>>>>>>>> ll2 = [1,y]
>>>>>>>> y in ll2
>>>>> Traceback (most recent call last):
>>>>> File "<stdin>", line 1, in <module>
>>>>> ValueError: The truth value of an array with more than one element
is
>>>>> ambiguous. Use a.any() or a.all()
>>>> I think you could be safe calling this a bug with numpy.
>>> Only in the sense that there are special cases where the array
>>> elements are all true, or all false, and numpy *could* safely return a
>>> bool. But special cases are not special enough to break the rules.
>>> Better for the numpy caller to write this:
>>> a.all() # or any()
>>> instead of:
>>> try:
>>> bool(a)
>>> except ValueError:
>>> a.all()
>>> as they would need to do if numpy sometimes returned a bool and
>>> sometimes raised an exception.
>> I'm missing how a.all() solves the problem Rasmus describes, namely
that
>> the order of a python *list* affects the results of containment tests
by
>> numpy.array. E.g. "y in ll1" and "y in ll2" evaluate to different
>> results in his example. It still seems like a bug in numpy to me, even
>> if too much other stuff is broken if you fix it (in which case it
>> apparently becomes an "issue").
> It's an issue, if anything, not a bug. There is no consistent
> implementation of
> bool(some_array) that works in all cases. numpy's predecessor Numeric
> used to
> implement this as returning True if at least one element was non-zero.
> This
> works well for bool(x!=y) (which is equivalent to (x!=y).any()) but does
> not
> work well for bool(x==y) (which should be (x==y).all()), but many people
> got
> confused and thought that bool(x==y) worked. When we made numpy, we
> decided to
> explicitly not allow bool(some_array) so that people will not write
> buggy code like this again.
You are so right, Robert:
> The deficiency is in the feature of rich comparisons, not numpy's
> implementation of it. __eq__() is allowed to return non-booleans;
> however, there are some parts of Python's implementation like
> list.__contains__() that still expect the return value of __eq__() to be
> meaningfully cast to a boolean.
One might argue if this is a deficiency in rich comparisons or a rather a
bug in list, set and dict. Certainly numpy is following the rules. In fact
numpy should be applauded for throwing an error rather than returning a
misleading value.
For my personal problem I could indeed wrap all objects in a wrapper with
whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit
much, though, just to get code like this to work as intended:
alist.append(x)
print ('x is present: ', x in alist)
So, I would much prefer a language change. I am not competent to even
propose one properly, but I'll try.
First, to clear the air:
Rich comparisons, the ability to overload '==', and the constraints (or
lack of them) on __eq__ must stay unchanged. There are reasons for their
current behaviour - ieee754 is particularly convincing - and anyway they
are not going to change. No point in trying.
There remains the problem is that __eq__ is used inside python
'collections' (list, set, dict etc.), and that the kind of overloading
used (quite legitimately) in numpy etc. breaks the collection behaviour.
It seems that proper behaviour of the collections requires an equality
test that satisfies:
1) x equal x
2) x equal y => y equal x
3) x equal y and y equal z => x equal z
4) (x equal y) is a boolean
5) (x equal y) is defined (and will not throw an error) for all x,y
6) x unequal y == not(x equal y) (by definition)
Note to TJR: 5) does not mean that Python should magically shield me from
errors. All I am asking is that programmers design their equal() function
to avoid raising errors, and that errors raised from equal() clearly
count as bugs.
I cannot imagine getting the collections to work in a simple and intuitive
manner without an equality test that satisfies 1)-6). Maybe somebody else
can. Instead I would propose adding an __equal__ special method for the
purpose.
It looks like the current collections use the folowing, at least in part
def oldCollectionTest(x,y):
if x is y:
return True
else:
return (x == y)
I would propose adding a new __equal__ method that satisfies 2) - 6)
above.
We could then define
def newCollectionTest(x,y):
if x is y:
# this takes care of satisfying 1)
return True
elif hasattr(x, '__equal__'):
return x.__equal__(y)
elif hasattr(y, '__equal__'):
return y.__equal__(x)
else:
return False
The implementations for list, set and dict would then behave according to
newCollectionTest. We would also want an equal() built-in with the same
behaviour.
In plain words, the default behaviour would be identity semantics. Objects
that wanted value semantics could implement an __equal__ function with the
correct behaviour. Wherever possible __equal__ would be the same as
__eq__. This function may deviate from 'proper' behaviour in some cases.
All I claim for it is that it makes collections work as intended, and that
it is clear and explicit, and reasonably intuitive.
Backwards compatibility should not be a big problem. The only behaviour
change would be that list, set, and dict would now behave the way it was
always assumed they should - and the way the documentation says they
should. On the minus side there would be the difference between
'__equal__' and '__eq__' to confuse people. On the plus side the behaviour
of objects inside collections would now be explicitly defined, and __eq__
and __equal__ would be so similar that most people could ignore the
distinction.
Some examples:
# NaN:
# For floats, __equal__ would be the same as __eq__. For NaN this gives
>>> x = float('NaN')
>>> y = float('NaN')
>>> x == x
False
>>> equal(x,x)
True
>>> equal(x,y)
False
# It may be problematical mathematically, but computationally it makes
# perfect sense that looking in a given storage location will give you the
# same value every time, even if the actual value happens to be undefined.
# The behaviour is simple to describe, and indeed NaN does behave this way
# in collections at the moment. All we are doing is documenting it clearly.
# numpy
Numpy would have no __equal__ function, so we would have pure identity
semantics - 'equals(x,y)' would be the same as 'x is y'
# ordinary numbers.
Any Python object with value semantics would need an __equal__ function
with the correct behaviour.
Mark Dickinson pointed out the thread "Comparing float and decimal", which
shows that comparisons between float and decimal numbers do not currently
satisfy 3). It would not be attractive to have __equal__ and __eq__ behave
differently for ordinary numbers, so if the relevant __eq__ can not be
fixed that is a problem for my proposal.
At this point I shall try to retire gracefully. Regrettably I am not
competent to discuss if this can be done, how it can be done, and how
much work is required.
Mathematically, NaNs shouldn't be comparable at all. They should
raise an exception when compared. In fact, they should raise an
exception when *created*. But that's not what we want. What we want
is a dummy value that silently plods through our calculations. For a
dummy value it seems a lot more sense to pick an arbitrary yet
consistent sort order (I suggest just above -Inf), rather than quietly
screwing up the sort.
Regarding the mythical IEEE 754, although it's extremely rare to find
quotations, I have one on just this subject. And it does NOT say "x
== NaN gives false". It says it gives *unordered*. It is C and
probably most other languages that turn that into false (as they want
a dummy value, not an error.)
> There is an explicit policy that __eq__() methods can return non-bools
> for various purposes. I consider that policy to a "presence that can be
> removed". There is no check because that policy exists, not the other
> way around.
OK, presence in manual versus presence in code.
>
> Anyways, this is really a semantic digression, and not particularly
> important. Peace?
Yes
Well, there are explicitly two kinds of NaNs: signalling NaNs and quiet NaNs, to
accommodate both requirements. Additionally, there is significant flexibility in
trapping the signals.
> Regarding the mythical IEEE 754, although it's extremely rare to find
> quotations, I have one on just this subject. And it does NOT say "x
> == NaN gives false". It says it gives *unordered*. It is C and
> probably most other languages that turn that into false (as they want
> a dummy value, not an error.)
>
> http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thread/ead0392e646b7cc0/a5bc354cd46f2c49?lnk=st&q=why+does+NaN+not+equal+itself%3F&rnum=3&hl=en&pli=1
Table 4 on page 9 of the standard is pretty clear on the subject. When the two
operands are unordered, the operator == returns False. The standard defines how
to do comparisons notionally; two operands can be "greater than", "less than",
"equal" or "unordered". It then goes on to map these notional concepts to
programming language boolean predicates.
Right, but most of that's lower level. By the time it reaches Python
we only care about quiet NaNs.
> > Regarding the mythical IEEE 754, although it's extremely rare to find
> > quotations, I have one on just this subject. And it does NOT say "x
> > == NaN gives false". It says it gives *unordered*. It is C and
> > probably most other languages that turn that into false (as they want
> > a dummy value, not an error.)
>
> >http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
>
> Table 4 on page 9 of the standard is pretty clear on the subject. When the two
> operands are unordered, the operator == returns False. The standard defines how
> to do comparisons notionally; two operands can be "greater than", "less than",
> "equal" or "unordered". It then goes on to map these notional concepts to
> programming language boolean predicates.
Ahh, interesting. Still though, does it give an explanation for such
behaviour, or use cases? There must be some situation where blindly
returning false is enough benefit to trump screwing up sorting.
No, signaling NaNs raise the exception that you are asking for. You're right
that if you get a Python float object that is a NaN, it is probably going to be
quiet, but signaling NaNs can affect Python in the way that you want.
>>> Regarding the mythical IEEE 754, although it's extremely rare to find
>>> quotations, I have one on just this subject. And it does NOT say "x
>>> == NaN gives false". It says it gives *unordered*. It is C and
>>> probably most other languages that turn that into false (as they want
>>> a dummy value, not an error.)
>>> http://groups.google.ca/group/sci.math.num-analysis/browse_thread/thr...
>> Table 4 on page 9 of the standard is pretty clear on the subject. When the two
>> operands are unordered, the operator == returns False. The standard defines how
>> to do comparisons notionally; two operands can be "greater than", "less than",
>> "equal" or "unordered". It then goes on to map these notional concepts to
>> programming language boolean predicates.
>
> Ahh, interesting. Still though, does it give an explanation for such
> behaviour, or use cases? There must be some situation where blindly
> returning false is enough benefit to trump screwing up sorting.
Well, the standard was written in the days of Fortran. You didn't really have
generic sorting routines. You *could* implement whatever ordering you wanted
because you *had* to implement the ordering yourself. You didn't have to use a
limited boolean predicate.
Basically, the boolean predicates have to return either True or False. Neither
one is really satisfactory, but that's the constraint you're under.
"We've always done it that way" is NOT a use case! Certainly, it's a
factor, but it seems quite weak compared to the sort use case.
I suppose what I'm hoping for is an small example program (one or a
few functions) that needs the "always false" behaviour of NaN.
I didn't say it was. I was explaining that sorting was probably *not* a use case
for the boolean predicates at the time of writing of the standard. In fact, it
suggests implementing a Compare() function that returns "greater than", "less
than", "equal" or "unordered" in addition to the boolean predicates. That Python
eventually chose to use a generic boolean predicate as the basis of its sorting
routine many years after the IEEE-754 standard is another matter entirely.
In any case, the standard itself is quite short, and does not spend much time
justifying itself in any detail.
> I suppose what I'm hoping for is an small example program (one or a
> few functions) that needs the "always false" behaviour of NaN.
Steven D'Aprano gave one earlier in the thread. Additionally, (x!=x) is a simple
test for NaNs if an IsNaN(x) function is not available. Really, though, the
result falls out from the way that IEEE-754 constructed the logic of the
system. It is not defined that (NaN==NaN) should return False, per se. Rather,
all of the boolean predicates are defined in terms of that Compare(x,y)
function. If that function returns "unordered", then (x==y) is False. It doesn't
matter if one or both are NaNs; in either case, the result is "unordered".
> For my personal problem I could indeed wrap all objects in a wrapper with
> whatever 'correct' behaviour I want (thanks, TJR). It does seem a bit
I was not suggesting that you wrap *everything*, merely an adaptor for
numpy arrays in whatever subclass and source it is that feeds them to
your code. It is fairly unusual, I think, to find numpy arrays 'in the
wild', outside the constrained context of numerical code where the
programmer uses them intentionally and hopefully understands their
peculiarities.
> much, though, just to get code like this to work as intended:
> alist.append(x)
> print ('x is present: ', x in alist)
Even if rich comparisons as you propose, the above would *still* not
necessarily work. Collection classes can define a __contains__ that
overrides the default and that can do anything, though True/False is
recommended.
As best I can think of at the moment, the only things you can absolutely
depend on is that builtin id(ob) will return an int, that 'ob1 is ob2'
(based in id()) will be True or False, and that builtin type(ob) will be
a class (at least in 3.0, not sure of 2.x). The names can be rebound
but you can control that within the module you write.
This is what I meant when I said that 'generic' nearly always needs to
be qualified to something like 'generic for objects that meet the
interface requirements'. Every function has that precondition as part
of its implied contract. Your code has an interface requirement that 'x
in y' not raise an exception. An x,y pair that does it outside its
contract.
Terry Jan Reedy
>> much, though, just to get code like this to work as intended:
>> alist.append(x)
>> print ('x is present: ', x in alist)
>
> Even if rich comparisons as you propose, the above would *still* not
> necessarily work. Collection classes can define a __contains__ that
> overrides the default and that can do anything, though True/False is
> recommended.
No, it's actually required.
In [4]: class A(object):
def __contains__(self, other):
return 'foo'
...:
...:
In [7]: a = A()
In [8]: 1 in a
Out[8]: True
Okay, so it will coerce to True/False for you, but unlike rich comparisons, the
return value must be interpretable as a boolean.
> As best I can think of at the moment, the only things you can absolutely
> depend on is that builtin id(ob) will return an int, that 'ob1 is ob2'
> (based in id()) will be True or False, and that builtin type(ob) will be
> a class (at least in 3.0, not sure of 2.x). The names can be rebound
> but you can control that within the module you write.
>
I wonder whether there could be some syntactic sugar which would wrap
try...except... around an expression, eg "except(foo(), False)", which
would return False if foo() raised an exception, otherwise return the
result of foo().
However, "Nan is SomeOtherNan" does not return True.
> so if there was a version of
> __contains__ which used "is" then "Nan in results" would return True.
> Perhaps "Nan is in results"? Or would that be too confusing, ie "in" vs
> "is in"?
list.__contains__() already checks with "is" before it tries "==".
In [65]: from numpy import nan, inf
In [66]: other_nan = inf/inf
In [67]: nan in [nan]
Out[67]: True
In [68]: nan in [other_nan]
Out[68]: False
I interpret that to mean IEEE 754's semantics are for different
circumstances and are inapplicable to Python.
> In any case, the standard itself is quite short, and does not spend much time
> justifying itself in any detail.
A pity, as it is often invoked to explain language design.
> > I suppose what I'm hoping for is an small example program (one or a
> > few functions) that needs the "always false" behaviour of NaN.
>
> Steven D'Aprano gave one earlier in the thread.
I see examples of behaviour, but no use cases.
> Additionally, (x!=x) is a simple
> test for NaNs if an IsNaN(x) function is not available.
That's a trick to work around the lack of IsNaN(x). Again, not a use
case.
> Really, though, the
> result falls out from the way that IEEE-754 constructed the logic of the
> system. It is not defined that (NaN==NaN) should return False, per se. Rather,
> all of the boolean predicates are defined in terms of that Compare(x,y)
> function. If that function returns "unordered", then (x==y) is False. It doesn't
> matter if one or both are NaNs; in either case, the result is "unordered".
And if I arbitrarily dictate that NaN is a single value which is
orderable, sorting just above -Infinity, then all the behaviour makes
a lot more sense AND I fix sort.
So you see the predicament I'm in. On the one hand we have a problem
and an obvious solution. On the other hand we've got historical
behaviour which everybody insists *must* remain, reasons unknown. It
reeks of the Parable of the Monkeys.
I think I should head over to one of the math groups and see if they
can find a reason for it.
Interesting. I did not expect that from "Should return true if item is
in self, false otherwise.", but maybe the lowercase true/false is an
(undocumented?) abbreviation for 'object with Boolean value True/False'.
Of course, if the return value is not so interpretable, or if
__contains__ raises an exception, there is no coercion and the OP's code
will not work.
A different summary of my main point in this thread: Dynamic binding and
special method hooks make somewhat generic code possible, but the same
special method hooks make absolutely generic code nearly impossible.
tjr
> For my personal problem I could indeed wrap all objects in a wrapper
> with whatever 'correct' behaviour I want (thanks, TJR). It does seem a
> bit much, though, just to get code like this to work as intended:
> alist.append(x)
> print ('x is present: ', x in alist)
>
> So, I would much prefer a language change. I am not competent to even
> propose one properly, but I'll try.
You think changing the language is easier than applying a wrapper to your
own data??? Oh my, that's too funny for words.
--
Steven
> On Dec 7, 4:20 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
>> On Sun, 07 Dec 2008 15:32:53 -0600, Robert Kern wrote:
>> > Rasmus Fogh wrote:
>>
>> >> Current behaviour is both inconsistent and counterintuitive, as
>> >> these examples show.
>>
>> >>>>> x = float('NaN')
>> >>>>> x == x
>> >> False
>>
>> > Blame IEEE for that one. Rich comparisons have nothing to do with
>> > that one.
>>
>> There is nothing to blame them for. This is the correct behaviour. NaNs
>> should *not* compare equal to themselves, that's mathematically
>> incoherent.
>
> Mathematically, NaNs shouldn't be comparable at all. They should raise
> an exception when compared. In fact, they should raise an exception
> when *created*. But that's not what we want. What we want is a dummy
> value that silently plods through our calculations. For a dummy value
> it seems a lot more sense to pick an arbitrary yet consistent sort order
> (I suggest just above -Inf), rather than quietly screwing up the sort.
>
> Regarding the mythical IEEE 754,
It's hardly mythical.
http://ieeexplore.ieee.org/ISOL/standardstoc.jsp?punumber=4610933
> although it's extremely rare to find
> quotations, I have one on just this subject. And it does NOT say "x ==
> NaN gives false". It says it gives *unordered*.
Unordered means that none of the following is true:
x > NaN
x < NaN
x == NaN
It doesn't mean that comparing a NaN with something else is an error.
--
Steven
> On Dec 7, 6:37 pm, Steven D'Aprano <st...@REMOVE-THIS-
> cybersource.com.au> wrote:
...
>> Given:
>>
>> x = log(-5) # a NaN
>> y = log(-2) # the same NaN
>> x == y # Some people want this to be true for NaNs.
>>
>> Then:
>>
>> # Compare x and y directly.
>> log(-5) == log(-2)
>> # If x == y then exp(x) == exp(y) for all x, y. exp(log(-5)) ==
>> exp(log(-2))
>> -5 == -2
>>
>> and now the entire foundations of mathematics collapses into a steaming
>> pile of rubble.
>
> And why doesn't this happen with the current behavior if x = y = log
> (-5) ? According to the same proof, -5 != -5.
You're right, I was a little sloppy in my "proof". There are additional
subtleties going on.
--
Steven
I consider it to be mythical because most knowledge of it is
indirect. Few who use floating point have the documents available to
them. Requiring purchase/membership is the cause of this.
> > although it's extremely rare to find
> > quotations, I have one on just this subject. And it does NOT say "x ==
> > NaN gives false". It says it gives *unordered*.
>
> Unordered means that none of the following is true:
>
> x > NaN
> x < NaN
> x == NaN
>
> It doesn't mean that comparing a NaN with something else is an error.
Robert Kern already clarified that. My confusion was due to relying
on second-hand knowledge.
Any individual case of the problem can be hacked somehow - I have already
fixed this one.
My point is that python would be a better language if well-written classes
that followed normal python conventions could be relied on to work
correctly with list, and that it is worth trying to bring this about.
Lists are a central structure of the language after all. Of course you can
disagree, or think the work required would be disproportionate, but surely
there is nothing unreasonable about my point?
> So, I would much prefer a language change. I am not competent to even
> propose one properly, but I'll try.
I don't see any technical problems in what you propose: as
far as I can see it's entirely feasible. However:
> should. On the minus side there would be the difference between
> '__equal__' and '__eq__' to confuse people.
I think this is exactly what makes the idea a non-starter. There
are already enough questions on the lists about when to use 'is'
and when to use '==', without adding an 'equals' function into
the mix. It would add significant extra complexity to the core
language, for questionable (IMO) gain.
There are certainly other languages for which this distinction
would make sense; I just don't think it's appropriate
for Python, with its emphasis on practicality and and
simplicity.
Mark
> Dr. Rasmus H. Fogh Email: r.h.f...@bioc.cam.ac.uk
snip
>> What might be a sensible behaviour (unlike your proposed wrapper)
Sorry
1) I was rude,
2) I thanked TJR for your wrapper class proposal in a later mail. It is
yours.
> What do you dislike about my wrapper class? Perhaps it is fixable.
I think it is a basic requirement for functioning lists that you get
>>> alist = [1,x]
>>> x in alist
True
>>> alist.remove(x)
>>> alist
[1] # unless of course x == 1, in which case the list is [x].
Your wrapper would not provide this behaviour. It is necessary to do
if x is y:
return True
be it in the eq() function, or in the list implementation. Note that this
is the current python behaviour for nan in lists, whatever the mathematics
say.
>> would be the following:
>> def eq(x, y):
>> if x is y:
>> return True
> I've already mentioned NaNs. Sentinel values also sometimes need to
> compare not equal with themselves. Forcing them to compare equal will
> cause breakage.
The list.__contains__ method already checks 'x is y' before it checks 'x
== y'. I'd say that a list where my example above does not work is broken
already, but of course I do not want to break further code. Could you give
an example of this use of sentinel values?
>> else:
>> try:
>> return (x == y)
>> except Exception:
>> return False
> Why False? Why not True? If an error occurs inside __eq__, how do you
> know that the correct result was False?
> class Broken(object):
> def __eq__(self, other):
> return Treu # oops, raises NameError
In managing collections the purpose of eq would be to divide objects into
a small set that are all equal to each other, and a larger set that are
all unequal to all members of the first set. That requires default to
False. If you default to True then eq(aNumpyArray, x) would return True
for all x.
If an error occurs inside __eq__ it could be 1) because __eq__ is badly
written, or 2) because the type of y was not considered by the
implementers of x or is in some deep way incompatible with x. 1) I cannot
help, and for 2) I am simply saying that value semantics require an __eq__
that returns a truth value. In the absence of that I want identity
semantics.
Rasmus
>> So, I would much prefer a language change. I am not competent to even
>> propose one properly, but I'll try.
> I don't see any technical problems in what you propose: as
> far as I can see it's entirely feasible. However:
>> should. On the minus side there would be the difference between
>> '__equal__' and '__eq__' to confuse people.
> I think this is exactly what makes the idea a non-starter. There
> are already enough questions on the lists about when to use 'is'
> and when to use '==', without adding an 'equals' function into
> the mix. It would add significant extra complexity to the core
> language, for questionable (IMO) gain.
So:
It is perfectly acceptable behaviour to have __eq__ return a value that
cannot be cast to a boolean, but it still does break the python list. The
fixes proposed so far all get the thumbs down, for various good reasons.
How about:
- Define a new built-in Exception
BoolNotDefinedError(ValueError)
- Have list.__contains__ (etc.) use the following comparison internally:
def newCollectionTest(x,y):
if x is y:
return True
else:
try:
return bool(x == y)
except BoolNotDefinedError:
return False
- Recommend that numpy.array.__nonzero__ and similar cases
raise BoolNotDefinedError instead of ValueError
Objects that choose to raise BoolNotDefinedError will now work in lists,
with identity semantics.
Objects that do not raise BoolNotDefinedError have no change in behaviour.
Remains to be seen how hard it is to implement, and how much it slows down
list.__contains__
Rasmus
---------------------------------------------------------------------------
Dr. Rasmus H. Fogh Email: r.h....@bioc.cam.ac.uk
In the rare case that you want to test for identity in a list, you can
easily write your own function to do it upfront:
def idcontains(seq, obj):
for i in seq:
if i is obj:
return True
return False
> On the minus side there would be the difference between
> '__equal__' and '__eq__' to confuse people.
This is a very big minus. It would be far better to spell __equal__ in
such a way as to make it clear why it wasn't the same as __eq__, otherwise
you end up with the confusion that the Perl "==" and "eq" operators
regularly cause.
--
Rhodri James *-* Wildebeeste Herder to the Masses
Maybe. But there is more to it than just 'in'. If you do:
>>> c = numpy.zeros((2,))
>>> ll = [1, c, 3.]
then the following all throw errors:
3 in ll, 3 not in ll, ll.index(3), ll.count(3), ll.remove(3)
c in ll, c not in ll, ll.index(c), ll.count(c), ll.remove(c)
Note how the presence of c in the list makes it behave wrong for 3 as
well.
> It's far more
> common to use a dict or set for containment tests, due to O(1)
> performance rather than O(n). I doubt the numpy array supports
> hashing, so an error for misuse is all you should expect.
Indeed it doees not. So there is not much to be gained from modifying
equality comparison with sets/dicts.
> In the rare case that you want to test for identity in a list, you can
> easily write your own function to do it upfront:
> def idcontains(seq, obj):
> for i in seq:
> if i is obj:
> return True
> return False
Again, you can code around any particular case (though wrappers look like
a more robust solution). Still, why not get rid of this wart, if we can
find a way?
>> On the minus side there would be the difference between
>> '__equal__' and '__eq__' to confuse people.
> This is a very big minus. It would be far better to spell __equal__ in
> such a way as to make it clear why it wasn't the same as __eq__,
> otherwise
> you end up with the confusion that the Perl "==" and "eq" operators
> regularly cause.
You are probably right, unfortunately. That proposal is unlikely to fly.
Do you think my latest proposal, raising BoolNotDefinedError, has better
chances?
I think I lost the first messages on this thread, but... Wouldn't be easier to
just fix numpy? I see no need to have the == return anything but a boolean, at
least on Numpy's case. The syntax 'a == b' is an equality test, not a detailed
summary of why they may be different, and (a==b).all() makes no little sense to
read unless you know beforehad that a and b are numpy arrays. When I'm comparing
normal objects, I do not expect (nor should I) the == operator to return an
attribute-by-attribute summary of what was equal and what wasn't.
Why is numpy's == overloaded in such a counter intuitive way? I realize that an
elementwise comparison makes a lot of sense, but it could have been done instead
with a.compare_with(b) (or even better, a.compare_with(b, epsilon=e)). No
unexpected breakage, and you have the chance of specifying when you consider two
elements to be equal - very useful.
Even the transition itself could be done without breaking much code... Make the
== op return an object that wraps the array of bools (instead of the array
itself), give it the any() and all() methods, and make __nonzero__/__bool__
equivalent to all().
--
Luis Zarrabeitia
Facultad de Matemática y Computación, UH
http://profesores.matcom.uh.cu/~kyrie
Rich comparisons were added to Python at the request of the
Numeric (now numpy) developers and they have been part of Python
a Numeric for many many years.
I don't think it's likely they'll change things back to the days
of Python 1.5.2 ;-)
> Even the transition itself could be done without breaking much code... Make the
> == op return an object that wraps the array of bools (instead of the array
> itself), give it the any() and all() methods, and make __nonzero__/__bool__
> equivalent to all().
That would cause a lot of confusion on its own, since such an
object wouldn't behave in the same way as say a regular Python
list (bool([0]) == True).
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Dec 10 2008)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2008-12-02: Released mxODBC.Connect 1.0.0 http://python.egenix.com/
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
All of these are O(n). Use a set or dict. What is your use case
anyway?
> > It's far more
> > common to use a dict or set for containment tests, due to O(1)
> > performance rather than O(n). I doubt the numpy array supports
> > hashing, so an error for misuse is all you should expect.
>
> Indeed it doees not. So there is not much to be gained from modifying
> equality comparison with sets/dicts.
>
> > In the rare case that you want to test for identity in a list, you can
> > easily write your own function to do it upfront:
> > def idcontains(seq, obj):
> > for i in seq:
> > if i is obj:
> > return True
> > return False
>
> Again, you can code around any particular case (though wrappers look like
> a more robust solution). Still, why not get rid of this wart, if we can
> find a way?
The wart is a feature. I agree that it's confusing, but the cost of
adding a special case to work around it is far in excess of the
original problem.
Now if you phrased it as a hypothetical discussion for the purpose of
learning about language design, that'd be another matter.
So do not put numpy arrays into lists without wrapping them. They were
designed and semi-optimized, by a separate community, for a specific
purpose -- numerical computation -- and not for 'playing nice' with
other Python objects.
It is a design feature of Python that people can implement specialized
objects with specialized behaviors for specialized purposes.
Please define "rich comparisons" for me. It seems that I do not understand the
term - I was thinking it meant the ability to override the comparison
operators, and specially, the ability to override them independently.
Even in statically typed languages, when you override the equality
operator/function you can choose not to return a valid answer (raise an
exception). And it would break all the cases mentioned above (element in
list, etc). But that isn't the right thing to do. The language doesn't/can't
prohibit you from breaking the equality test, but that shouldn't be
considered a feature. (a==b).all() makes no sense.
> > Even the transition itself could be done without breaking much code...
> > Make the == op return an object that wraps the array of bools (instead of
> > the array itself), give it the any() and all() methods, and make
> > __nonzero__/__bool__ equivalent to all().
>
> That would cause a lot of confusion on its own, since such an
> object wouldn't behave in the same way as say a regular Python
> list (bool([0]) == True).
I'm certain that something could be worked out. A quick paragraph that took me
just a few minutes to type shouldn't be construed as a PEP that will solve
all the problems :D.
--
Luis Zarrabeitia (aka Kyrie)
Fac. de Matemática y Computación, UH.
http://profesores.matcom.uh.cu/~kyrie
That's one of the features, rich comparisons added. Another is
the ability to return arbitrary objects instead of just booleans
or integers:
http://www.python.org/dev/peps/pep-0207/
David was a Numeric developer at the time (among other things).
> Even in statically typed languages, when you override the equality
> operator/function you can choose not to return a valid answer (raise an
> exception). And it would break all the cases mentioned above (element in
> list, etc). But that isn't the right thing to do. The language doesn't/can't
> prohibit you from breaking the equality test, but that shouldn't be
> considered a feature. (a==b).all() makes no sense.
Perhaps not in your application, but it does make sense in other
numeric applications, e.g. ones that work on vectors or matrixes.
I'd suggest you simply wrap the comparison in a function and then
have that apply the necessary conversion to a boolean.
>>> Even the transition itself could be done without breaking much code...
>>> Make the == op return an object that wraps the array of bools (instead of
>>> the array itself), give it the any() and all() methods, and make
>>> __nonzero__/__bool__ equivalent to all().
>> That would cause a lot of confusion on its own, since such an
>> object wouldn't behave in the same way as say a regular Python
>> list (bool([0]) == True).
>
> I'm certain that something could be worked out. A quick paragraph that took me
> just a few minutes to type shouldn't be construed as a PEP that will solve
> all the problems :D.
As always: the Devil is in the details :-)
I do numeric work... I'm finishing my MSc in applied math and I'm programing
mostly with python. And I'd rather have a.compare_with(b), or
a.elementwise_compare(b), or whatever name, rather than (a==b).all(). In
fact, I'd very much like to have an a.compare_with(b, epsilon=e).all() (to
account for rounding errors), and with python2.5, all(a.compare_with(b)).
Yes, I could create an element_compare(a,b) function. But I still can't use
a==b and have a meaningful result. Ok, I can (and do) ignore that, it's just
one library, I'll keep track of the types before asking for equality (already
an ugly thing to do in python), but the a==b behaviour breaks the lists (a in
ll, ll.indexof(a)) even for elements not in numpy. ¿Should I also ignore
lists?
The concept of equality between two arrays is very well defined, as it is also
very well defined the element-by-element comparison. There is a need to test
for both - then the way to test for equality should be the equality test.
> > I'm certain that something could be worked out. A quick paragraph that
> > took me just a few minutes to type shouldn't be construed as a PEP that
> > will solve all the problems :D.
>
> As always: the Devil is in the details :-)
Of course...
list.__contains__, tuple.__contains__, the 'if' keyword...
How do can you suggest to fix the list.__contains__ implementation?
Should I wrap all my "if"s with this?:
if isinstance(a, numpy.array) or isisntance(b,numpy.array):
res = compare_numpy(a,b)
elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
res = compare_someotherclass(a,b)
...
else:
res = (a == b)
if res:
# do whatever
> I do numeric work... I'm finishing my MSc in applied math and I'm
> programing mostly with python. And I'd rather have a.compare_with(b), or
> a.elementwise_compare(b), or whatever name, rather than (a==b).all().
Unluckily for you, the Numeric/Numpy people wanted something else. They
asked first, there's a lot more of them, and their project is very
important to Python's continued success.
> In
> fact, I'd very much like to have an a.compare_with(b, epsilon=e).all()
> (to account for rounding errors), and with python2.5,
> all(a.compare_with(b)).
>
> Yes, I could create an element_compare(a,b) function.
Absolutely.
> But I still can't use a==b and have a meaningful result.
That's right. *ANY* operation in Python can fail, given arbitrary data,
with the possible exception of the id() function and the "is" and "is
not" operators. You have to deal with it.
> Ok, I can (and do) ignore that,
> it's just one library, I'll keep track of the types before asking for
> equality (already an ugly thing to do in python), but the a==b behaviour
> breaks the lists (a in ll, ll.indexof(a)) even for elements not in
> numpy. ¿Should I also ignore lists?
That depends on what sort of contract your code is giving. Does it
promise to work with any imaginable data whatsoever, no matter how badly
broken or poorly designed or incompatible with what you're trying to do?
If so, then I suggest your contract is broken, not the behaviour of list.
You can't make trustworthy promises to deal with arbitrary data types
that you don't control, that can fail in arbitrary ways. Here's something
for you to consider:
class Boobytrap:
def __eq__(self, other):
if other == 1:
return True
elif other == 2:
while True:
pass
return False
>>> alist = [0, Boobytrap(), 2, 3]
>>> 1 in alist
True
>>> 3 in alist
True
>>> 5 in alist
False
>>> 2 in alist
What do you expect should happen?
> The concept of equality between two arrays is very well defined, as it
> is also very well defined the element-by-element comparison. There is a
> need to test for both - then the way to test for equality should be the
> equality test.
The Numpy people disagree with you. It was from their request that Python
was changed to allow __eq__ to return arbitrary objects.
--
Steven
> On Sunday 07 December 2008 09:21:18 pm Robert Kern wrote:
>> The deficiency is in the feature of rich comparisons, not numpy's
>> implementation of it. __eq__() is allowed to return non-booleans;
>> however, there are some parts of Python's implementation like
>> list.__contains__() that still expect the return value of __eq__() to
>> be meaningfully cast to a boolean.
>
> list.__contains__, tuple.__contains__, the 'if' keyword...
>
> How do can you suggest to fix the list.__contains__ implementation?
I suggest you don't, because I don't think it's broken. I think it's
working as designed. It doesn't succeed with arbitrary data types which
may be broken, buggy or incompatible with __contain__'s design, but
that's okay, it's not supposed to.
> Should I wrap all my "if"s with this?:
>
> if isinstance(a, numpy.array) or isisntance(b,numpy.array):
> res = compare_numpy(a,b)
> elif isinstance(a,some_otherclass) or isinstance(b,someotherclass):
> res = compare_someotherclass(a,b)
> ...
> else:
> res = (a == b)
> if res:
> # do whatever
No, inlining that code everywhere you have an if would be stupid. What
you should do is write a single function equals(x, y) that does precisely
what you want it to do, in whatever way you want, and then call it:
if equals(a, b):
Or, put your data inside a wrapper. If you read back over my earlier
posts in this thread, I suggested a lightweight wrapper class you could
use. You could make it even more useful by using delegation to make the
wrapped class behave *exactly* like the original, except for __eq__.
You don't even need to wrap every single item:
def wrap_or_not(obj):
if obj in list_of_bad_types_i_know_about:
return EqualityWrapper(obj)
return obj
data = [1, 2, 3, BadData, 4]
data = map(wrap_or_not, data)
It isn't really that hard to deal with these things, once you give up the
illusion that your code should automatically work with arbitrarily wacky
data types that you don't control.
--
Steven
You should perhaps reconsider your use of lists. Lists with elements
of different types can be tricky at times, so perhaps you either need
a different data type which doesn't scan all elements or a separate
search function that knows about your type setup.
The fact that comparisons can raise exceptions is not new to Python,
so this problem can pop up in other areas as well, esp. when using
3rd party extensions.
Regarding the other issues like new methods you should really talk
to the numpy developers, since they are the ones who could fix this.
> The concept of equality between two arrays is very well defined, as it is also
> very well defined the element-by-element comparison. There is a need to test
> for both - then the way to test for equality should be the equality test.
>
>>> I'm certain that something could be worked out. A quick paragraph that
>>> took me just a few minutes to type shouldn't be construed as a PEP that
>>> will solve all the problems :D.
>> As always: the Devil is in the details :-)
>
> Of course...
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Dec 11 2008)
> There is nothing to blame them for. This is the correct behaviour. NaNs
> should *not* compare equal to themselves, that's mathematically
> incoherent.
Indeed. The problem is a paucity of equality predicates. This is
hardly surprising: Common Lisp has four general-purpose equality
predicates (EQ, EQL, EQUAL and EQUALP), and many more type-specific ones
(=, STRING=, STRING-EQUAL (yes, I know...), CHAR=, ...), and still
doesn't really have enough. For example, EQUAL compares strings
case-sensitively, but other arrays are compared by address; EQUALP will
recurse into arbitrary arrays, but compares strings
case-insensitively...
For the purposes of this discussion, however, it has enough to be able
to distinguish between
* numerical comparisons, which (as you explain later) should /not/
claim that two NaNs are equal, and
* object comparisons, which clearly must declare an object equal to
itself.
For example, I had the following edifying conversation with SBCL.
CL-USER> ;; Return NaNs rather than signalling errors.
(sb-int:set-floating-point-modes :traps nil)
; No value
CL-USER> (defconstant nan (/ 0.0 0.0))
NAN
CL-USER> (loop for func in '(eql equal equalp =)
collect (list func (funcall func nan nan)))
((EQL T) (EQUAL T) (EQUALP T) (= NIL))
CL-USER>
That is, a NaN is EQL, EQUAL and EQUALP to itself, but not = to itself.
(Due to the vagaries of EQ, a NaN might or might not be EQ to itself or
other NaNs.)
Python has a much more limited selection of equality predicates -- in
fact, just == and is. The is operator is Python's equivalent of Lisp's
EQ predicate: it compares objects by address. I can have a similar chat
with Python.
In [12]: nan = float('nan')
In [13]: nan is nan
Out[13]: True
In [14]: nan == nan
Out[14]: False
In [16]: nan is float('nan')
Out[16]: False
Python numbers are the same as themselves reliably, unlike in Lisp. But
there's no sensible way of asking whether something is `basically the
same as' nan, like Lisp's EQL or EQUAL. I agree that the primary
equality predicate for numbers must be the numerical comparison, and
NaNs can't (sensibly) be numerically equal to themselves.
Address comparisons are great when you're dealing with singletons, or
when you carefully intern your objects. In other cases, you're left
with ==. This puts a great deal of responsibility on the programmer of
an == method to weigh carefully the potentially conflicting demands of
compatibility (many other libraries just expect == to be an equality
operator returning a straightforward truth value, and given that there
isn't a separate dedicated equality operator, this isn't unreasonable),
and doing something more domain-specifically useful.
It's worth pointing out that numpy isn't unique in having == not return
a straightforward truth value. The SAGE computer algebra system (and
sympy, I believe) implement the == operator on algebraic formulae so as
to construct equations. For example, the following is syntactically and
semantically Python, with fancy libraries.
sage: var('x') # x is now a variable
x
sage: solve(x**2 + 2*x - 4 == 1)
[x == -sqrt(6) - 1, x == sqrt(6) - 1]
(SAGE has some syntactic tweaks, such as ^ meaning the same as **, but I
didn't use them.)
I think this is an excellent use of the == operator -- but it does have
some potential to interfere with other libraries which make assumptions
about how == behaves. The SAGE developers have been clever here,
though:
sage: 2*x + 1 == (2 + 4*x)/2
2*x + 1 == (4*x + 2)/2
sage: bool(2*x + 1 == (2 + 4*x)/2)
True
sage: bool(2*x + 1 == (2 + 4*x)/3)
False
I think Python manages surprisingly well with its limited equality
predicates. But the keyword there is `surprisingly' -- and it may not
continue this trick forever.
-- [mdw]
> I've already mentioned NaNs. Sentinel values also sometimes need to
> compare not equal with themselves. Forcing them to compare equal will
> cause breakage.
There's a conflict between such domain-specific considerations (NaNs,
strange sentinels, SAGE's equations), and relatively natural assumptions
about an == operator, such as it being an equivalence relation.
I don't know how to resolve this conflict without introducing a new
function which is (or at least strongly encourages developers to arrange
for it to be) an equivalence relation.
-- [mdw]
> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>
>> I've already mentioned NaNs. Sentinel values also sometimes need to
>> compare not equal with themselves. Forcing them to compare equal will
>> cause breakage.
>
> There's a conflict between such domain-specific considerations (NaNs,
> strange sentinels, SAGE's equations), and relatively natural assumptions
> about an == operator, such as it being an equivalence relation.
Such assumptions only hold under particular domains though. You can't
assume equality is an equivalence relation once you start thinking about
arbitrary domains.
> I don't know how to resolve this conflict without introducing a new
> function which is (or at least strongly encourages developers to arrange
> for it to be) an equivalence relation.
But there cannot be any such function which is a domain-independent
equivalence relation, not if we're talking about arbitrarily wacky
domains. Even something as straight-forward as "is" can't be an
equivalence relation under a domain where identity isn't well-defined.
--
Steven
> Such assumptions only hold under particular domains though. You can't
> assume equality is an equivalence relation once you start thinking
> about arbitrary domains.
From a formal mathematical point of view, equality /is/ an equivalence
relation. If you have a relation on some domain, and it's not an
equivalence relation, then it ain't the equality relation, and that's
flat.
> But there cannot be any such function which is a domain-independent
> equivalence relation, not if we're talking about arbitrarily wacky
> domains.
That looks like a claim which requires a proof to me. But it could also
do with a definition of `domain', so I'll settle for one of those first.
If we're dealing with sets (i.e., `domain's form a subclass of `sets')
then the claim is clearly false, and equality (determined by comparison
of elements) is indeed a domain-independent equivalence relation.
> Even something as straight-forward as "is" can't be an equivalence
> relation under a domain where identity isn't well-defined.
You've completely lost me here. The Python `is' operator is (the
characteristic function of) an equivalence relation on Python values:
that's its definition. You could describe an extension of the `is'
relation to a larger set of items, such that it fails to be an
equivalence relation on that set, but you'd be (rightly) criticized for
failing to preserve one of its two defining properties. (The other is
that `is' makes distinctions between values which are at least as fine
as any other method, and this property should also be extended .)
Let me have another go.
All Python objects are instances of `object' or of some more specific
class. The `==' operator on `object' is (the characteristic function
of) an equivalence relation. In, fact, it's the same as `is' -- but
`==' can be overridden by subclasses, and subclasses are permitted --
according to the interface definition -- to coarsen the relation. In
fact, they're permitted to make it not be an equivalence class at all.
I claim that this is a problem. I /agree/ that domain-specific
predicates are useful, and can be sufficiently useful that they deserve
the `==' name -- as well as floats and numpy, I've provided SAGE and
sympy as examples myself. But I also believe that there are good
reasons to want an `equivalence' operator (I'll write it as `=~', though
I don't propose this as Python syntax -- see below) with the following
properties:
* `=~' is the characteristic function[1] of an equivalence relation,
i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z
* Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
x is y then x =~ y.
A valuable property might be that x =~ y if x and y are
indistinguishable without using `is'. That would mean immediately that
'xyz' =~ 'xy' + 'z' (regardless of interning, because strings are
immutable). But for tuples this would imply elementwise comparison,
which may be expensive -- and, in the case of tuples manufactured by C
extensions, nontrivial because manufactured tuples need not be acyclic.
On the other hand, `==' is already recursive on tuples.
We can envisage a collection of different relations, according to which
distinguishing methods we're willing to disallow. For example, for
numerical types, there are actually a number of interesting relations,
according to whether you think the answers to the following questions
are true or false.
* Is 1 =~ 1/1? (Here, 1 is an integer, and 1/1 is a rational number;
both are the multiplicative identities of their respective rings.
I'd suggest that it doesn't seem very useful to say `no' here, but
there might be reasons why one would want type(x) is type(y) if
x =~ y.)
* Is 1 =~ 1.0? (This is trickier. Numerically the values are equal;
but the former is exact and the latter inexact, and this is a good
reason to want a separation.)
Essentially, these are asking whether `type' is a legitimate
distinguisher, and I think that the answer, unhelpful as it may be, is
`sometimes'.
A third useful distinguishing technique is mutation. Given two
singleton lists whose respective elements compare equivalent, I can
mutate one of them to decide whether the other is in fact the same. Is
this something which `=~' should distinguish? Again, the answer is
probably `sometimes'.
To summarize: we're left with at least three different characteristics
which an equivalence predicate might have:
* efficient (e.g., bounded recursion depth, works on circular values);
* neglects irrelevant (to whom?) differences of type; and
* neglects differences due to mutability.
A predicate used to compare set elements or hash-table keys should
probably /respect/ mutability. (Associating hashing with this
predicate, rather than `==', would coherently allow mutable objects such
as lists to be used as dictionary keys, though they'd be compared by
address. I don't actually know how useful this would be, but suspect
that it wouldn't.)
Oh, before I go, let me make this very clear: I am /not/ proposing a
language change. I think the right way to addres these problems is
using existing mechanisms such as generic functions with multimethods.
Syntax can come later if it seems sufficiently important.
[1] I'll settle for it being a partial function, i.e., attempting to
evaluate x =~ y might raise exceptions, e.g., if x is in some
invalid state, or perhaps if one or both of x or y is circular,
though it would be good to minimize such cases.
-- [mdw]
> Steven D'Aprano <ste...@REMOVE.THIS.cybersource.com.au> wrote:
>
>> Such assumptions only hold under particular domains though. You can't
>> assume equality is an equivalence relation once you start thinking
>> about arbitrary domains.
>
> From a formal mathematical point of view, equality /is/ an equivalence
> relation. If you have a relation on some domain, and it's not an
> equivalence relation, then it ain't the equality relation, and that's
> flat.
Okay, fair enough. In the formal mathematical sense, equality is always
an equivalence relation. So there are certain domains which don't have
equality, e.g. floating point, since nan != nan. Also Python objects,
since x.__eq__(y) is not necessarily the same as y.__eq__(x).
>> But there cannot be any such function which is a domain-independent
>> equivalence relation, not if we're talking about arbitrarily wacky
>> domains.
>
> That looks like a claim which requires a proof to me. But it could also
> do with a definition of `domain', so I'll settle for one of those first.
I'm talking about domain in the sense of "a particular problem domain".
That is, the model, data and operations used to solve a problem. I don't
know that I can be more formal than that.
To prove my claim, all you need is two domains with a mutually
incompatible definition of equality. That's not so difficult, surely? How
about equality of integers, versus equality of integers modulo some N?
> If we're dealing with sets (i.e., `domain's form a subclass of `sets')
> then the claim is clearly false, and equality (determined by comparison
> of elements) is indeed a domain-independent equivalence relation.
It isn't domain-independent in my sense, because you have specified one
specific domain, namely set equality.
>> Even something as straight-forward as "is" can't be an equivalence
>> relation under a domain where identity isn't well-defined.
>
> You've completely lost me here. The Python `is' operator is (the
> characteristic function of) an equivalence relation on Python values:
> that's its definition.
Yes, that's because identity is well-defined in Python. I'm saying that
if identity isn't well-defined, then neither is the 'is' operator, and
therefore it isn't an equivalence relation. That shouldn't be
controversial.
> All Python objects are instances of `object' or of some more specific
> class. The `==' operator on `object' is (the characteristic function
> of) an equivalence relation. In, fact, it's the same as `is' -- but
> `==' can be overridden by subclasses, and subclasses are permitted --
> according to the interface definition -- to coarsen the relation. In
> fact, they're permitted to make it not be an equivalence class at all.
>
> I claim that this is a problem.
It *can* be a problem, if you insist on using == on arbitrary types while
still expecting it to be an equivalence relation.
If you drop the requirement that it remain an e-r, then you can apply ==
to arbitrary types. And if you limit yourself to non-arbitrary types,
then you can safely use (say) any strings you like, and == will remain an
e-r.
I /agree/ that domain-specific
> predicates are useful, and can be sufficiently useful that they deserve
> the `==' name -- as well as floats and numpy, I've provided SAGE and
> sympy as examples myself. But I also believe that there are good
> reasons to want an `equivalence' operator (I'll write it as `=~', though
> I don't propose this as Python syntax -- see below) with the following
> properties:
>
> * `=~' is the characteristic function[1] of an equivalence relation,
> i.e., for all values x, y, z: x =~ y in (True, False); (x =~ x) ==
> True; if x =~ y then y =~ x; and if x =~ y and y =~ z then x =~ z
>
> * Moreover, `=~' is a coarsening of `is', i.e. for all values x, y: if
> x is y then x =~ y.
Ah, but you can't have such a generic e-r that applies across all problem
domains. Consider:
Let's denote regular, case-sensitive strings using "abc", and special,
case-insensitive strings using i"abc". So for regular strings, equality
is an e-r; for case-insensitive strings, equality is also an e-r (I
trust that the truth of this is obvious). But if you try to use equality
on *both* regular and case-insensitive strings, it fails to be an e-r:
i"abc" =~ "ABC" returns True if you use the case-insensitive definition
of equality, but returns False if you use the case-sensitive definition.
There is no single definition of equality that is *simultaneously* case-
sensitive and case-insensitive.
> A valuable property might be that x =~ y if x and y are
> indistinguishable without using `is'.
That's a little strong, because it implies that equality must look at
*everything* about a particular object, not just whatever bits of data
are relevant for the problem domain.
For example, consider storing data in a dict.
>>> D1 = {-1: 0, -2: 0}
>>> D2 = {-2: 0}
>>> D2[-1] = 0
>>> D1 == D2
True
We certainly want D1 and D2 to be equal. But their history is different,
and that makes their internal details different, which has detectable
consequences:
>>> D1
{-2: 0, -1: 0}
>>> D2
{-1: 0, -2: 0}
The same happens with trees. Given a tree structure defined as:
(payload, left-subtree, right-subtree)
do you want the following two trees to be equal?
('b', ('a', None, None), ('c', None, None))
('a', None, ('b', None, ('c', None, None)))
Unless I've made a silly mistake, not only are the payloads of the two
trees equal, but so are the in-order representation of both. Only the
specific order the nodes are stored in differ, and that may not be
important for the specific problem you are trying to solve.
There may be problem domains where the order of elements in a list (or
tree structure) *is* important, and other problem domains where order is
irrelevant. One single relation can't cover all such conflicting
requirements.
--
Steven
> To prove my claim, all you need is two domains with a mutually
> incompatible definition of equality. That's not so difficult, surely? How
> about equality of integers, versus equality of integers modulo some N?
No, that's not an example. The integers modulo N form a ring Z/NZ of
residue classes. Such residue classes are distinct from the integers --
e.g., an integer 3 (say) is not the same as the set 3 + NZ { ..., 3 - 2N,
3 - N, 3, 3 + N, 3 + 2N, ... } -- but there is a homomorphism from Z
to Z/NZ under which 3 + NZ is the image of 3.
If we decide to define the == operator such that 3 == 3 + NZ and 3 + N
== 3 + NZ then == is not an equivalence relation (in particular,
transitivity fails). But that's just an artifact of the definition. If
we distinguish 3 from 3 + NZ then everything is fine. 3 + NZ == (3 + N)
+ NZ correctly, but 3 != 3 + N, and all is well.
Here, at least, the problem is not that == as an equivalence relation
fails in some particular domain -- because in both Z and Z/NZ it can be
a perfectly fine equivalence relation -- but that it can potentially
fail on the boundaries between domains. Easy answer: don't mess it up
at the boundaries.
Proposition. Let U, U' be disjoint sets, and let E, E' be equivalence
relations on U, U' respectively. Define E^ on U union U' as E^ = E
union E', i.e.,
E^(x, y) iff x in U and y in U and E(x, y) or
x in U' and y in U' and E'(x, y)
Then E^ is an equivalence relation.
Proof. Reflexivity and symmetry are trivial; transitivity follows from
disjointness of U and U'.
> It *can* be a problem, if you insist on using == on arbitrary types
> while still expecting it to be an equivalence relation.
Unfortunately, from the surrounding discussion, it seems that container
types particularly want to be able to contain arbitrary objects, and the
failure of == to be a equivalence relation makes this fail. The problem
is that objects with wacky == operators are still more or less quacking
like the more usual kinds of ducks; but they turn out to taste very
different.
> Let's denote regular, case-sensitive strings using "abc", and special,
> case-insensitive strings using i"abc". So for regular strings, equality
> is an e-r; for case-insensitive strings, equality is also an e-r (I
> trust that the truth of this is obvious). But if you try to use equality
> on *both* regular and case-insensitive strings, it fails to be an e-r:
>
> i"abc" =~ "ABC" returns True if you use the case-insensitive definition
> of equality, but returns False if you use the case-sensitive definition.
> There is no single definition of equality that is *simultaneously* case-
> sensitive and case-insensitive.
A case-sensitive string is /not the same/ as a case-insensitive string.
One's a duck, the other's a goose. I'd claim here that i"abc" =~ "ABC"
must be False, because i"abc" =~ "abc" must be false also! To define it
otherwise leads to the incoherence you describe. But the above
proposition provides an easy answer.
> > A valuable property might be that x =~ y if x and y are
> > indistinguishable without using `is'.
>
> That's a little strong, because it implies that equality must look at
> *everything* about a particular object, not just whatever bits of data
> are relevant for the problem domain.
Yes. That's one of the reasons that =~ isn't the same as ==.
I've been thinking on my feet in this thread, so I haven't thought
everything through. And as I mention below, there are /many/ useful
equality predicates on values. As I didn't mention (but hope is
obvious) having a massively-parametrized equality predicate is daft, and
providing enough to suit every possible application equally so. But we
might be able to do well enough with just one or two -- or maybe by just
leaving things as they are.
> For example, consider storing data in a dict.
>
> >>> D1 = {-1: 0, -2: 0}
> >>> D2 = {-2: 0}
> >>> D2[-1] = 0
> >>> D1 == D2
> True
>
>
> We certainly want D1 and D2 to be equal.
Do we? If we're using my `indistinguishable without using ``is'''
criterion from above, then D1 and D2 are certainly different! To detect
the difference, mutate one and see if the other changes:
def distinct_dictionaries_p(D1, D2):
"""
Decide whether D1 and D2 are the same dictionary or not.
Not threadsafe.
"""
magic = []
more_magic = [magic]
old = D1.get('mumble', more_magic)
D1['mumble'] = magic
result = D2.get('mumble', more_magic) is magic
if old is more_magic:
del D1['mumble']
else:
D1['mumble'] = old
return result
But that criterion was a suggestion -- a way of defining a coherent
equivalence relation on the whole of the Python value space which is
coarser than `is' and maybe more useful. My primary purpose in
proposing it was to stimulate discussion: what /do/ we want from
equality predicates? We already have `is', which is too fine-grained to
be widely useful: it distinguishes between different instances of the
number 500000, for example, and I can't for the life of me see why
that's a useful behaviour. (The `is' operator is a fine thing, and I
wouldn't want it any other way: it trades away some useful semantics for
the sake of speed, and that was the /right/ decision.)
My criterion succeeds in distinguishing 1 from 1.0 (they have different
types), which may be considered good. It doesn't distinguish a quiet
NaN from another quiet NaN: that's definitely good. (It'd be bogus for
a numeric equality operator, but we've already got one of those, so we
don't need to define another.) But you're probably right: it's still
too fine-grained for some purposes.
> But their history is different, and that makes their internal details
> different, which has detectable consequences:
>
> >>> D1
> {-2: 0, -1: 0}
> >>> D2
> {-1: 0, -2: 0}
So in this case, `str' also works as a distinguisher. Fine.
> There may be problem domains where the order of elements in a list (or
> tree structure) *is* important, and other problem domains where order is
> irrelevant. One single relation can't cover all such conflicting
> requirements.
Absolutely. This is why Common Lisp provides four(!) out of the box and
it still isn't enough. Python provides one (`is') and a half (`==' when
it's behaving) is actually coping remarkably well considering. But this
/is/ causing problems, and so thinking about solutions seems reasonable.
I'm not trying to change the language. I don't have a pet feature I
want added. I do think the discussion is interesting and worthwhile,
though.
-- [mdw]
> A case-sensitive string is /not the same/ as a case-insensitive string.
> One's a duck, the other's a goose. I'd claim here that i"abc" =~ "ABC"
> must be False, because i"abc" =~ "abc" must be false also! To define it
> otherwise leads to the incoherence you describe.
It's only incoherent if you need equality to be an equivalence relation.
If you don't, it is perfectly reasonable to declare that i"abc" equals
"abc".
--
Steven
> It's only incoherent if you need equality to be an equivalence relation.
> If you don't, it is perfectly reasonable to declare that i"abc" equals
> "abc".
Right! And if you didn't want an equivalence relation, then `==' will
suit you fine. The problem is that some applications seem to /want/ an
equivalence relation, and one that's more useful (i.e., less
discriminating) than `is'.
-- [mdw]