[Python-ideas] checking for identity before comparing built-in objects

53 views
Skip to first unread message

Max Moroz

unread,
Oct 4, 2012, 7:48:03 AM10/4/12
to python...@python.org
It seems that built-in classes do not short-circuit `__eq__` method
when the objects are identical, at least in CPython:

f = frozenset(range(200000000))
f1 = f
f1 == f # this operation will take about 1 sec on my machine

Is there any disadvantage to checking whether the equality was called
with the same object, and if it was, return `True` right away? I
noticed this when trying to memoize a function that has large
frozenset arguments. While hashing of a large argument is very fast
after it's done once (hash value is presumably cached), the equality
comparison is always slow even against itself. So when the same large
argument is provided over and over, memoization is slow.

Of course, there's a workaround: subclass frozenset, and redefine
__eq__ to check id() first. And arguably, for this particular use
case, I should redefine both __hash__ and __eq__, to make them only
look exclusively at id(), since it's not worth wasting memoizer time
trying to compare two non-identical large arguments that are highly
unlikely to compare equal anyway. So if there's any reason for the
current implementation, I don't have a strong argument against it.
_______________________________________________
Python-ideas mailing list
Python...@python.org
http://mail.python.org/mailman/listinfo/python-ideas

Steven D'Aprano

unread,
Oct 4, 2012, 9:53:50 AM10/4/12
to python...@python.org
On 04/10/12 21:48, Max Moroz wrote:
> It seems that built-in classes do not short-circuit `__eq__` method
> when the objects are identical, at least in CPython:
>
> f = frozenset(range(200000000))
> f1 = f
> f1 == f # this operation will take about 1 sec on my machine

You shouldn't over-generalize. Some built-ins do short-circuit __eq__
when the objects are identical. I believe that strings and ints both
do. Other types might not.


> Is there any disadvantage to checking whether the equality was called
> with the same object, and if it was, return `True` right away?

That would break floats and Decimals, both of which support NANs.

The decision whether or not to optimize __eq__ should be left up to the
type. Some types, for example, might decide to optimize x == x even if
x contains a NAN or other objects that break reflexivity of equality.
Other types might prefer not to.

(Please do not start an argument about NANs and reflexivity. That's
been argued to death, and there are very good reasons for the IEEE 754
standard to define NANs the way they do.)

Since frozensets containing NANs are rare (I presume), I think it is
reasonable to optimize frozenset equality. But I do not think it is
reasonable for Python to mandate identity checking before __eq__.



> I noticed this when trying to memoize a function that has large
> frozenset arguments. While hashing of a large argument is very fast
> after it's done once (hash value is presumably cached), the equality
> comparison is always slow even against itself. So when the same large
> argument is provided over and over, memoization is slow.

I'm not sure what you are doing here, because dicts (at least in Python
3.2) already short-circuit equality:

py> NAN = float('nan')
py> NAN == NAN
False
py> d = {NAN: 42}
py> d[NAN]
42

Actually, that behaviour goes back to at least 2.4, so I'm not sure how
you are doing memoization and not seeing the same optimization.



--
Steven

Mathias Panzenböck

unread,
Oct 4, 2012, 10:02:29 AM10/4/12
to python...@python.org
On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
> py> NAN == NAN
> False

Why isn't this True anyway? Is there a PEP that explains this (IMHO odd) behavior?

Mike Graham

unread,
Oct 4, 2012, 10:07:36 AM10/4/12
to Mathias Panzenböck, Python-Ideas
On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenböck
<grosser.me...@gmx.net> wrote:
> On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
>>
>> py> NAN == NAN
>> False
>
>
> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd)
> behavior?

IEEE 754 specifies this.

Mike

MRAB

unread,
Oct 4, 2012, 10:19:44 AM10/4/12
to python-ideas
On 2012-10-04 15:07, Mike Graham wrote:
> On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenböck
> <grosser.me...@gmx.net> wrote:
>> On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
>>>
>>> py> NAN == NAN
>>> False
>>
>>
>> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd)
>> behavior?
>
> IEEE 754 specifies this.
>
Think of it this way:

Calculation A returns NaN for some reason

Calculation B also returns NaN for some reason

Have they really returned the same result? Just because they're both
NaN doesn't mean that they're the _same_ NaN...

Chris Angelico

unread,
Oct 4, 2012, 10:30:50 AM10/4/12
to python-ideas
On Fri, Oct 5, 2012 at 12:19 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 2012-10-04 15:07, Mike Graham wrote:
>>
>> On Thu, Oct 4, 2012 at 10:02 AM, Mathias Panzenböck
>> <grosser.me...@gmx.net> wrote:
>>>
>>> On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
>>>>
>>>>
>>>> py> NAN == NAN
>>>> False
>>>
>>>
>>>
>>> Why isn't this True anyway? Is there a PEP that explains this (IMHO odd)
>>> behavior?
>>
>>
>> IEEE 754 specifies this.
>>
> Think of it this way:
>
> Calculation A returns NaN for some reason
>
> Calculation B also returns NaN for some reason
>
> Have they really returned the same result? Just because they're both
> NaN doesn't mean that they're the _same_ NaN...

The only other viable option would be to declare that (NaN==NaN) is
NaN - kinda like SQL's NULL and its weird semantics. And that would be
*highly* confusing to many situations.

ChrisA

Victor Stinner

unread,
Oct 4, 2012, 11:08:40 AM10/4/12
to Steven D'Aprano, python...@python.org
2012/10/4 Steven D'Aprano <st...@pearwood.info>:
> On 04/10/12 21:48, Max Moroz wrote:
>>
>> It seems that built-in classes do not short-circuit `__eq__` method
>> when the objects are identical, at least in CPython:
>>
>> f = frozenset(range(200000000))
>> f1 = f
>> f1 == f # this operation will take about 1 sec on my machine
>
>
> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
> when the objects are identical. I believe that strings and ints both
> do. Other types might not.

This optimization is not implemented for Unicode strings.

PyObject_RichCompareBool() implements this optimization which leads to
incorrect results:

nan = float("nan")
mytuple = (nan,)
assert mytuple != mytuple # fails

I think that the optimization should be implemented for Unicode
strings, but disabled in PyObject_RichCompareBool().

@Max Moroz: Can you please open an issue on bugs.python.org?

Victor

Steven D'Aprano

unread,
Oct 4, 2012, 11:53:36 AM10/4/12
to python...@python.org
On 05/10/12 01:08, Victor Stinner wrote:
> 2012/10/4 Steven D'Aprano<st...@pearwood.info>:
>> On 04/10/12 21:48, Max Moroz wrote:
>>>
>>> It seems that built-in classes do not short-circuit `__eq__` method
>>> when the objects are identical, at least in CPython:
>>>
>>> f = frozenset(range(200000000))
>>> f1 = f
>>> f1 == f # this operation will take about 1 sec on my machine
>>
>>
>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
>> when the objects are identical. I believe that strings and ints both
>> do. Other types might not.
>
> This optimization is not implemented for Unicode strings.

That does not match my experience. In Python 3.2, I generate a large
unicode string, and an equal but not identical copy:

s = "aЖcdef"*100000
t = "a" + s[1:]
assert s is not t and s == t


Using timeit, s == s is about 10000 times faster than s == t.

--
Steven

MRAB

unread,
Oct 4, 2012, 12:05:43 PM10/4/12
to python-ideas
On 2012-10-04 16:53, Steven D'Aprano wrote:
> On 05/10/12 01:08, Victor Stinner wrote:
>> 2012/10/4 Steven D'Aprano<st...@pearwood.info>:
>>> On 04/10/12 21:48, Max Moroz wrote:
>>>>
>>>> It seems that built-in classes do not short-circuit `__eq__` method
>>>> when the objects are identical, at least in CPython:
>>>>
>>>> f = frozenset(range(200000000))
>>>> f1 = f
>>>> f1 == f # this operation will take about 1 sec on my machine
>>>
>>>
>>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
>>> when the objects are identical. I believe that strings and ints both
>>> do. Other types might not.
>>
>> This optimization is not implemented for Unicode strings.
>
> That does not match my experience. In Python 3.2, I generate a large
> unicode string, and an equal but not identical copy:
>
> s = "aЖcdef"*100000
> t = "a" + s[1:]
> assert s is not t and s == t
>
>
> Using timeit, s == s is about 10000 times faster than s == t.
>
In Python 3.3 I get a similar result.

Oscar Benjamin

unread,
Oct 4, 2012, 12:48:59 PM10/4/12
to python-ideas
On 4 October 2012 17:05, MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 2012-10-04 16:53, Steven D'Aprano wrote:
>>
>> On 05/10/12 01:08, Victor Stinner wrote:
>>>
>>> 2012/10/4 Steven D'Aprano<st...@pearwood.info>:
>>>>
>>>> On 04/10/12 21:48, Max Moroz wrote:
>>>>>
>>>>>
>>>>> It seems that built-in classes do not short-circuit `__eq__` method
>>>>> when the objects are identical, at least in CPython:
>>>>>
>>>>> f = frozenset(range(200000000))
>>>>> f1 = f
>>>>> f1 == f # this operation will take about 1 sec on my machine
>>>>
>>>>
>>>>
>>>> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
>>>> when the objects are identical. I believe that strings and ints both
>>>> do. Other types might not.
>>>
>>>
>>> This optimization is not implemented for Unicode strings.
>>
>>
>> That does not match my experience. In Python 3.2, I generate a large
>> unicode string, and an equal but not identical copy:
>>
>> s = "aЖcdef"*100000
>> t = "a" + s[1:]
>> assert s is not t and s == t
>>
>>
>> Using timeit, s == s is about 10000 times faster than s == t.
>>
> In Python 3.3 I get a similar result.

This was discussed not long ago in a different thread. Here is the line:
http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/unicodeobject.c#l10508

As I understood it that line is the reason that comparisons for
interned strings are faster.


Oscar

Mathias Panzenböck

unread,
Oct 4, 2012, 12:51:23 PM10/4/12
to python...@python.org
On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
> On 04/10/12 21:48, Max Moroz wrote:
>> It seems that built-in classes do not short-circuit `__eq__` method
>> when the objects are identical, at least in CPython:
>>
>> f = frozenset(range(200000000))
>> f1 = f
>> f1 == f # this operation will take about 1 sec on my machine
>
> You shouldn't over-generalize. Some built-ins do short-circuit __eq__
> when the objects are identical. I believe that strings and ints both
> do. Other types might not.
>
>
>> Is there any disadvantage to checking whether the equality was called
>> with the same object, and if it was, return `True` right away?
>
> That would break floats and Decimals, both of which support NANs.
>
> The decision whether or not to optimize __eq__ should be left up to the
> type. Some types, for example, might decide to optimize x == x even if
> x contains a NAN or other objects that break reflexivity of equality.
> Other types might prefer not to.
>
> (Please do not start an argument about NANs and reflexivity. That's
> been argued to death, and there are very good reasons for the IEEE 754
> standard to define NANs the way they do.)
>
> Since frozensets containing NANs are rare (I presume), I think it is
> reasonable to optimize frozenset equality. But I do not think it is
> reasonable for Python to mandate identity checking before __eq__.
>

But it seems like set and frozenset behave like this anyway (using "is" to compare it's items):

>>> frozenset([float("nan")]) == frozenset([float("nan")])
False

>>> s = frozenset([float("nan")])
>>> s == s
True

>>> NaN = float("nan")
>>> NaN == NaN
False
>>> frozenset([NaN]) == frozenset([NaN])
True

So the "is" optimization should not change it's semantics.

(I tested this in Python 2.7.3 and 3.2.3)

>
>
>> I noticed this when trying to memoize a function that has large
>> frozenset arguments. While hashing of a large argument is very fast
>> after it's done once (hash value is presumably cached), the equality
>> comparison is always slow even against itself. So when the same large
>> argument is provided over and over, memoization is slow.
>
> I'm not sure what you are doing here, because dicts (at least in Python
> 3.2) already short-circuit equality:
>
> py> NAN = float('nan')
> py> NAN == NAN
> False
> py> d = {NAN: 42}
> py> d[NAN]
> 42
>
> Actually, that behaviour goes back to at least 2.4, so I'm not sure how
> you are doing memoization and not seeing the same optimization.
>
>
>

Max Moroz

unread,
Oct 4, 2012, 1:49:45 PM10/4/12
to Steven D'Aprano, python...@python.org
On Thu, Oct 4, 2012 at 6:53 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> I'm not sure what you are doing here, because dicts (at least in Python
> 3.2) already short-circuit equality:
>
> py> NAN = float('nan')
> py> NAN == NAN
> False
> py> d = {NAN: 42}
> py> d[NAN]
> 42
>
> Actually, that behaviour goes back to at least 2.4, so I'm not sure how
> you are doing memoization and not seeing the same optimization.

It was my mistake... I do see this optimization now that I know where
to look for it. Thanks for clarifying this.

Max Moroz

unread,
Oct 4, 2012, 1:50:50 PM10/4/12
to python-ideas
On Thu, Oct 4, 2012 at 7:19 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Think of it this way:
>
> Calculation A returns NaN for some reason
>
> Calculation B also returns NaN for some reason
>
> Have they really returned the same result? Just because they're both
> NaN doesn't mean that they're the _same_ NaN...

Someone who performs two calculations with float numbers should never
compare their results for equality. It's really a bug to rely on that
comparison:

# this is a bug
# since the result of this comparison for regular numbers is unpredictable
# so doesn't it really matter how this behaves when NaNs are compared?
if a/b == c/d:
# ...

On the other hand, comparing a number to another number, when none of
the two numbers are involved in a calculation, is perfectly fine:

# this is not a bug
# too bad that it won't work as expected
# when input1 == input2 == 'nan'
a = float(input1)
b = float(input2)
if a == b:
# ...

So it seems to me your argument is this: "let's break the expectations
of developers who are writing valid code, in order to partially meet
the expectations of developers who are writing buggy code". If so, I disagree.

Antoine Pitrou

unread,
Oct 4, 2012, 7:00:10 PM10/4/12
to python...@python.org
On Thu, 4 Oct 2012 17:08:40 +0200
Victor Stinner <victor....@gmail.com>
wrote:
> PyObject_RichCompareBool() implements this optimization which leads to
> incorrect results:
>
> nan = float("nan")
> mytuple = (nan,)
> assert mytuple != mytuple # fails
>
> I think that the optimization should be implemented for Unicode
> strings, but disabled in PyObject_RichCompareBool().

I think we should wait for someone to complain before disabling it.
It's a useful optimization.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net

Steven D'Aprano

unread,
Oct 5, 2012, 12:52:55 AM10/5/12
to python...@python.org, python...@python.org
On Fri, Oct 05, 2012 at 01:00:10AM +0200, Antoine Pitrou wrote:
> On Thu, 4 Oct 2012 17:08:40 +0200
> Victor Stinner <victor....@gmail.com>
> wrote:
> > PyObject_RichCompareBool() implements this optimization which leads to
> > incorrect results:
> >
> > nan = float("nan")
> > mytuple = (nan,)
> > assert mytuple != mytuple # fails
> >
> > I think that the optimization should be implemented for Unicode
> > strings, but disabled in PyObject_RichCompareBool().
>
> I think we should wait for someone to complain before disabling it.
> It's a useful optimization.

+1

I will go to the wall to defend correct IEEE 754 semantics for NANs, but
I also support containers that optimise away those semantics by default.

I think it's too early to talk about disabling it without even the
report of a bug caused by it.



--
Steven

Sven Marnach

unread,
Oct 7, 2012, 7:43:25 PM10/7/12
to python...@python.org
On Thu, Oct 04, 2012 at 05:08:40PM +0200, Victor Stinner wrote:
> I think that the optimization should be implemented for Unicode
> strings, but disabled in PyObject_RichCompareBool().

Actually, this change to PyObject_RichCompareBool() has been made
before, but was reverted after the discussion in

http://bugs.python.org/issue4296

Cheers,
Sven

Alexander Belopolsky

unread,
Oct 7, 2012, 8:35:14 PM10/7/12
to Steven D'Aprano, python...@python.org
On Thu, Oct 4, 2012 at 9:53 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> (Please do not start an argument about NANs and reflexivity. That's
> been argued to death, and there are very good reasons for the IEEE 754
> standard to define NANs the way they do.)

Why not? This is python-ideas, isn't it? I've been hearing that IEEE
754 committee had some "very good reasons" to violate reflexivity of
equality comparison with NaNs since I first learned about NaNs some 20
years ago. From time to time, I've also heard claims that there are
some important numeric algorithms that depend on this behavior.
However, I've never been able to dig out the actual rationale that
convinced the committee that voted for IEEE 754 or any very good
reasons to preserve this behavior in Python.

I am not suggesting any language changes, but I think it will be
useful to explain why float('nan') != float('nan') somewhere in the
docs. A reference to IEEE 754 does not help much. Java implements
IEEE 754 to some extent, but preserves reflexivity of object equality.

Chris Angelico

unread,
Oct 7, 2012, 8:42:28 PM10/7/12
to python-ideas
On Mon, Oct 8, 2012 at 11:35 AM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> I am not suggesting any language changes, but I think it will be
> useful to explain why float('nan') != float('nan') somewhere in the
> docs. A reference to IEEE 754 does not help much. Java implements
> IEEE 754 to some extent, but preserves reflexivity of object equality.

NaN isn't a single value, but a whole category of values.
Conceptually, it's an uncountably infinite (I think that's the
technical term) of invalid results; in implementation, NaN has the
highest possible exponent and any non-zero mantissa.

So then the question becomes: Should *all* NaNs be equal, or only ones
with the same bit pattern? Aside from signalling vs non-signalling
NaNs, I don't think there's any difference between one and another, so
they should probably all compare equal. And once you go there, a huge
can o'worms is opened involving floating point equality.

It's much MUCH easier and simpler to defer to somebody else's standard
and just say "NaNs behave according to IEEE 754, blame them if you
don't like it". There would possibly be value in guaranteeing
reflexivity, but it would increase confusion somewhere else.

ChrisA

Mike Graham

unread,
Oct 7, 2012, 8:43:35 PM10/7/12
to Alexander Belopolsky, python...@python.org
On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.

I don't actually know Java, but if I run

class HelloNaN {
public static void main(String[] args) {
double nan1 = 0.0 / 0.0;
double nan2 = 0.0 / 0.0;
System.out.println(nan1 == nan2);
}
}

I get the output "false".

Mike

Alexander Belopolsky

unread,
Oct 7, 2012, 8:47:36 PM10/7/12
to mikeg...@gmail.com, python...@python.org
Try this with Double instead of double. Note that I said "*object*
equality". In Java, lowercase double is not an object type.

Alexander Belopolsky

unread,
Oct 7, 2012, 8:50:01 PM10/7/12
to Chris Angelico, python-ideas
On Sun, Oct 7, 2012 at 8:42 PM, Chris Angelico <ros...@gmail.com> wrote:
> It's much MUCH easier and simpler to defer to somebody else's standard
> and just say "NaNs behave according to IEEE 754, blame them if you
> don't like it". There would possibly be value in guaranteeing
> reflexivity, but it would increase confusion somewhere else.

I agree, but a good thing about standards is that there are plenty to
choose from. We can as easily refer to Java as a standard.

Guido van Rossum

unread,
Oct 7, 2012, 8:54:10 PM10/7/12
to Alexander Belopolsky, python-ideas
On Sun, Oct 7, 2012 at 5:50 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> On Sun, Oct 7, 2012 at 8:42 PM, Chris Angelico <ros...@gmail.com> wrote:
>> It's much MUCH easier and simpler to defer to somebody else's standard
>> and just say "NaNs behave according to IEEE 754, blame them if you
>> don't like it". There would possibly be value in guaranteeing
>> reflexivity, but it would increase confusion somewhere else.
>
> I agree, but a good thing about standards is that there are plenty to
> choose from. We can as easily refer to Java as a standard.

Very funny.

Seriously, we can't change our position on this topic now without
making a lot of people seriously unhappy. IEEE 754 it is.

--
--Guido van Rossum (python.org/~guido)

Alexander Belopolsky

unread,
Oct 7, 2012, 9:09:08 PM10/7/12
to Guido van Rossum, python-ideas
On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <gu...@python.org> wrote:
> Seriously, we can't change our position on this topic now without
> making a lot of people seriously unhappy. IEEE 754 it is.

I did not suggest a change. I wrote: "I am not suggesting any
language changes, but I think it will be
useful to explain why float('nan') != float('nan') somewhere in the
docs." If there is a concise explanation for the choice of IEEE 754
vs. Java, I think we should write it down and put an end to this
debate.

Guido van Rossum

unread,
Oct 7, 2012, 9:51:51 PM10/7/12
to Alexander Belopolsky, python-ideas
On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <gu...@python.org> wrote:
>> Seriously, we can't change our position on this topic now without
>> making a lot of people seriously unhappy. IEEE 754 it is.
>
> I did not suggest a change. I wrote: "I am not suggesting any
> language changes, but I think it will be
> useful to explain why float('nan') != float('nan') somewhere in the
> docs." If there is a concise explanation for the choice of IEEE 754
> vs. Java, I think we should write it down and put an end to this
> debate.

Referencing Java here is absurd and I still consider this suggestion
as a troll. Python is not in any way based on Java.

On the other hand referencing IEEE 754 makes all the sense in the
world, since every other aspect of Python float is based on IEEE 754
double whenever the underlying platform implements this standard --
and all modern CPUs do. I don't think there's anything else we need to
say.

--
--Guido van Rossum (python.org/~guido)

Alexander Belopolsky

unread,
Oct 7, 2012, 10:33:37 PM10/7/12
to Guido van Rossum, python-ideas
On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum <gu...@python.org> wrote:
> Referencing Java here is absurd and I still consider this suggestion
> as a troll. Python is not in any way based on Java.

I did not suggest that. Sorry if it came out this way. I am well
aware that Python and Java were invented independently and have
different roots. (IIRC, Java was born from Oak and Python from ABC
and Oak and ABC were both developed in the 1980s.) IEEE 784 precedes
both languages and one team decided that equality reflexivity for
hashable objects was more important than IEEE 784 compliance while the
other decided otherwise.

Many Python features (mostly library) are motivated by C. In the 90s,
"because C does it this way" was a good explanation for a language
feature. Doing things differently from the "C way", on the other hand
would deserve an explanation. These days, C is rarely first language
that a student learns. Hopefully Python will take this place in not
so distant future, but many students graduated in late 90s - early
2000s knowing nothing but Java. As a result, these days it is a
valid question to ask about a language feature: "Why does Python do X
differently from Java?" Hopefully in most cases the answer is
"because Python does it better."

In case of nan != nan, I would really like to know a modern reason why
Python's way is better. Better compliance with a 20-year old standard
does not really qualify.

Ned Batchelder

unread,
Oct 7, 2012, 10:35:17 PM10/7/12
to Guido van Rossum, python-ideas
On 10/7/2012 9:51 PM, Guido van Rossum wrote:
> On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky
> <alexander....@gmail.com> wrote:
>> On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <gu...@python.org> wrote:
>>> Seriously, we can't change our position on this topic now without
>>> making a lot of people seriously unhappy. IEEE 754 it is.
>> I did not suggest a change. I wrote: "I am not suggesting any
>> language changes, but I think it will be
>> useful to explain why float('nan') != float('nan') somewhere in the
>> docs." If there is a concise explanation for the choice of IEEE 754
>> vs. Java, I think we should write it down and put an end to this
>> debate.
> Referencing Java here is absurd and I still consider this suggestion
> as a troll. Python is not in any way based on Java.
>
> On the other hand referencing IEEE 754 makes all the sense in the
> world, since every other aspect of Python float is based on IEEE 754
> double whenever the underlying platform implements this standard --
> and all modern CPUs do. I don't think there's anything else we need to
> say.
>
I don't understand the reluctance to address a common conceptual
speed-bump in the docs. After all, the tutorial has an entire chapter
(http://docs.python.org/tutorial/floatingpoint.html) that explains how
floats work, even though they work exactly as IEEE 754 says they should.

A sentence in section 5.4 (Numeric Types) would help. Something like,
"In accordance with the IEEE 754 standard, NaN's are not equal to any
value, even another NaN. This is because NaN doesn't represent a
particular number, it represents an unknown result, and there is no way
to know if one unknown result is equal to another unknown result."

--Ned.

Alexander Belopolsky

unread,
Oct 7, 2012, 10:48:53 PM10/7/12
to Guido van Rossum, python-ideas
On Sun, Oct 7, 2012 at 10:33 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> In case of nan != nan, I would really like to know a modern reason why
> Python's way is better.

To this end, a link to Kahan's "How Java’s Floating-Point Hurts
Everyone Everywhere" <http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf>
may be appropriate.

Rob Cliffe

unread,
Oct 7, 2012, 11:09:06 PM10/7/12
to python...@python.org
I understand that the undefined result of a computation is not the same
as the undefined result of another computation.
(E.g. one might represent positive infinity, another might represent
underflow or loss of accuracy.)
But I can't help feeling (strongly) that the result of a computation
should be equal to itself.
In other words, after
x = float('nan')
y = float('nan')
I would expect
x != y
but
x == x

After all, how much sense does this make (I got this in a quick test
with Python 2.7.3):
>>> x=float('nan')
>>> x is x
True # Well I guess you'd sorta expect this
>>> x==x
False # You what?
>>> D = {1:x, 2:x}
>>> D[1]==D[2]
False # I see, both NANs - hmph!
>>> [x]==[x]
True # Oh yeh, it doesn't always work that way then?

Making equality non-reflexive feels utterly wrong to me, partly no doubt
because of my mathematical background, partly because of the difficulty
in implementing container objects and algorithms and God knows what else
when you have to remember that some of the objects they may deal with
may not be equal to themselves. In particular the difference between my
last two examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to
justify except by saying that for historical reasons the designers of
lists and the designers of dictionaries made different - but entirely
reasonable - assumptions about the equality relation, and (perhaps)
whether identity implies equality (how do you explain to a Python
learner that it doesn't (pathological code examples aside) ???).
Couldn't each NAN when generated contain something that identified it
uniquely, so that different NANs would always compare as not equal, but
any given NAN would compare equal to itself?
Rob Cliffe

Alexander Belopolsky

unread,
Oct 7, 2012, 11:46:43 PM10/7/12
to Rob Cliffe, python...@python.org
On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe <rob.c...@btinternet.com> wrote:
> Couldn't each NAN when generated contain something that identified it
> uniquely, so that different NANs would always compare as not equal, but any
> given NAN would compare equal to itself?

If we take this route and try to distinguish NaNs with different
payload, I am sure you will want to distinguish between -0.0 and 0.0
as well. The later would violate transitivity in -0.0 == 0 == 0.0.

The only sensible thing to do with NaNs is either to treat them all
equal (the Eiffel way) or to stick to IEEE default.

I don't think NaN behavior in Python is a result of a deliberate
decision to implement IEEE 754. If that was the case, why 0.0/0.0
does not produce NaN? Similarly, Python math library does not produce
infinities where IEEE 754 compliant library should:

>>> math.log(0.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: math domain error

Some other operations behave inconsistently:

>>> 2 * 10.**308
inf

but
>>> 10.**309
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OverflowError: (34, 'Result too large')

I think non-reflexivity of nan in Python is an accidental feature.
Python's float type was not designed with NaN in mind and until
recently, it was relatively difficult to create a nan in pure python.

It is also not true that IEEE 754 requires that nan == nan is false.
IEEE 754 does not define operator '==' (nor does it define boolean
false). Instead, IEEE defines a comparison operation that can have
one of four results: >, <, =, or unordered. The standard does require
than NaN compares unordered with anything including itself, but it
does not follow that a language that defines an == operator with
boolean results must define it so that nan == nan is false.

Antoine Pitrou

unread,
Oct 8, 2012, 2:26:28 AM10/8/12
to python...@python.org
On Sun, 07 Oct 2012 22:35:17 -0400
Ned Batchelder <n...@nedbatchelder.com>
wrote:
> I don't understand the reluctance to address a common conceptual
> speed-bump in the docs. After all, the tutorial has an entire chapter
> (http://docs.python.org/tutorial/floatingpoint.html) that explains how
> floats work, even though they work exactly as IEEE 754 says they should.
>
> A sentence in section 5.4 (Numeric Types) would help. Something like,
> "In accordance with the IEEE 754 standard, NaN's are not equal to any
> value, even another NaN. This is because NaN doesn't represent a
> particular number, it represents an unknown result, and there is no way
> to know if one unknown result is equal to another unknown result."

+1

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


Guido van Rossum

unread,
Oct 8, 2012, 12:19:31 PM10/8/12
to Alexander Belopolsky, python-ideas
On Sun, Oct 7, 2012 at 7:33 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum <gu...@python.org> wrote:
>> Referencing Java here is absurd and I still consider this suggestion
>> as a troll. Python is not in any way based on Java.
>
> I did not suggest that. Sorry if it came out this way. I am well
> aware that Python and Java were invented independently and have
> different roots. (IIRC, Java was born from Oak and Python from ABC
> and Oak and ABC were both developed in the 1980s.) IEEE 784 precedes
> both languages and one team decided that equality reflexivity for
> hashable objects was more important than IEEE 784 compliance while the
> other decided otherwise.
>
> Many Python features (mostly library) are motivated by C. In the 90s,
> "because C does it this way" was a good explanation for a language
> feature. Doing things differently from the "C way", on the other hand
> would deserve an explanation. These days, C is rarely first language
> that a student learns. Hopefully Python will take this place in not
> so distant future, but many students graduated in late 90s - early
> 2000s knowing nothing but Java. As a result, these days it is a
> valid question to ask about a language feature: "Why does Python do X
> differently from Java?" Hopefully in most cases the answer is
> "because Python does it better."

Explaining the differences between Python and Java is a job for
educators, not for the language reference.

I agree that documenting APIs as "this behaves just like C" does not
have the same appeal -- but that turn of phrase was mostly used for
system calls anyway, and for those I think that a slightly modified
redirection (to the OS man pages) is still completely appropriate.

> In case of nan != nan, I would really like to know a modern reason why
> Python's way is better. Better compliance with a 20-year old standard
> does not really qualify.

I am not aware of an update to the standard. Being 20 years old does
not make it outdated.

Again, there are plenty of reasons (you have to ask the numpy folks),
but I don't think it is the job of the Python reference manual to give
its motivations. It just needs to explain how things work, and if that
can be done best by deferring to an existing standard that's fine.

Of course a tutorial should probably mention this behavior, but a
tutorial does not have the task of giving you the reason for every
language feature either -- most readers of the tutorial don't have the
context yet to understand those reasons, many don't care, and whether
they like it or not, it's not going to change.

You keep getting very close to suggesting to make changes, despite
your insistence that you just want to know the reason. But assuming
you really just are asking in an obnoxious way for the reason, I
recommand that you ask the people who wrote the IEEE 754 standard. I'm
sure their explanation (which I recall having read once but can't
reproduce here) makes sense for Python too.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 8, 2012, 12:25:16 PM10/8/12
to Ned Batchelder, python-ideas
On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:
> I don't understand the reluctance to address a common conceptual speed-bump
> in the docs. After all, the tutorial has an entire chapter
> (http://docs.python.org/tutorial/floatingpoint.html) that explains how
> floats work, even though they work exactly as IEEE 754 says they should.

I'm sorry. I didn't intend to refuse to document the behavior. I was
mostly reacting to things I thought I read between the lines -- the
suggestion that there is no reason for the NaN behavior except silly
compatibility with an old standard that nobody cares about. From this
it is only a small step to reading (again between the lines) the
suggesting to change the behavior.

> A sentence in section 5.4 (Numeric Types) would help. Something like, "In
> accordance with the IEEE 754 standard, NaN's are not equal to any value,
> even another NaN. This is because NaN doesn't represent a particular
> number, it represents an unknown result, and there is no way to know if one
> unknown result is equal to another unknown result."

That sounds like a great addition to the docs, except for the nit that
I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs"
myself. Also, the words here can still cause confusion. The exact
behavior is that every one of the 6 comparison operators (==, !=, <,
<=, >, >=) returns False when either argument (or both) is a NaN. I
think your suggested words could lead someone to believe that they
mean that x != NaN or NaN != Nan would return True.

Anyway, once we can agree to words I agree that we should update that section.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 8, 2012, 12:29:42 PM10/8/12
to Rob Cliffe, python...@python.org
On Sun, Oct 7, 2012 at 8:09 PM, Rob Cliffe <rob.c...@btinternet.com> wrote:
> I understand that the undefined result of a computation is not the same as
> the undefined result of another computation.
> (E.g. one might represent positive infinity, another might represent
> underflow or loss of accuracy.)
> But I can't help feeling (strongly) that the result of a computation should
> be equal to itself.
> In other words, after
> x = float('nan')
> y = float('nan')
> I would expect
> x != y
> but
> x == x

That's too bad. It sounds like this mailing list really wouldn't have
enough space in its margins to convince you otherwise. And yet you are
wrong.

> After all, how much sense does this make (I got this in a quick test with
> Python 2.7.3):
>>>> x=float('nan')
>>>> x is x
> True # Well I guess you'd sorta expect this
>>>> x==x
> False # You what?
>>>> D = {1:x, 2:x}
>>>> D[1]==D[2]
> False # I see, both NANs - hmph!
>>>> [x]==[x]
> True # Oh yeh, it doesn't always work that way then?
>
> Making equality non-reflexive feels utterly wrong to me, partly no doubt
> because of my mathematical background,

Do you have any background at all in *numerical* mathematics?

> partly because of the difficulty in
> implementing container objects and algorithms and God knows what else when
> you have to remember that some of the objects they may deal with may not be
> equal to themselves. In particular the difference between my last two
> examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to justify except by
> saying that for historical reasons the designers of lists and the designers
> of dictionaries made different - but entirely reasonable - assumptions about
> the equality relation, and (perhaps) whether identity implies equality (how
> do you explain to a Python learner that it doesn't (pathological code
> examples aside) ???).
> Couldn't each NAN when generated contain something that identified it
> uniquely, so that different NANs would always compare as not equal, but any
> given NAN would compare equal to itself?

It's not about equality. If you ask whether two NaNs are *unequal* the
answer is *also* False.

I admit that a tutorial section describing the behavior would be good.
But I am less than ever convinced that it's possible to explain the
*reason* for the behavior in a tutorial.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 8, 2012, 12:47:48 PM10/8/12
to Alexander Belopolsky, python...@python.org
On Sun, Oct 7, 2012 at 8:46 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe <rob.c...@btinternet.com> wrote:
>> Couldn't each NAN when generated contain something that identified it
>> uniquely, so that different NANs would always compare as not equal, but any
>> given NAN would compare equal to itself?
>
> If we take this route and try to distinguish NaNs with different
> payload, I am sure you will want to distinguish between -0.0 and 0.0
> as well. The later would violate transitivity in -0.0 == 0 == 0.0.
>
> The only sensible thing to do with NaNs is either to treat them all
> equal (the Eiffel way) or to stick to IEEE default.
>
> I don't think NaN behavior in Python is a result of a deliberate
> decision to implement IEEE 754.

Oh, it was. It was very deliberate. Like in many other areas of
Python, I refused to invent new rules when there was existing behavior
elsewhere that I could borrow and with which I had no reason to
quibble. (And in the case of floating point behavior, there really is
no alternate authority to choose from besides IEEE 754. Languages that
disagree with it do not make an authority.)

Even if I *did* have reasons to quibble with the NaN behavior (there
were no NaNs on the mainframe where I learned programming, so they
were as new and weird to me as they are to today's novices), Tim
Peters, who has implemented numerical libraries for Fortran compilers
in a past life and is an absolute authority on floating points,
convinced me to follow IEEE 754 as closely as I could.

> If that was the case, why 0.0/0.0 does not produce NaN?

Easy. It was an earlier behavior, from the days where IEEE 754
hardware did not yet rule the world, and Python didn't have much op an
opinion on float behavior at all -- it just did whatever the platform
did. Infinities and NaNs were not on my radar (I hadn't met Tim yet
:-). However division by zero (which is not just a float but also an
int behavior) was something that I just had to address, so I made the
runtime check for it and raise an exception. When we became more
formal about this, we considered changing this but decided that the
ZeroDivisionError was more user-friendly than silently propagating
NaNs everywhere, given the typical use of Python. (I suppose we could
make it optional, and IIRC that's what Decimal does -- but for floats
we don't have a well-developed numerical context concept yet.)

> Similarly, Python math library does not produce
> infinities where IEEE 754 compliant library should:
>
>>>> math.log(0.0)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: math domain error

Again, this mostly comes from backward compatibility with the math
module's origins (and it is as old as Python itself, again predating
its use of IEEE 754). AFAIK Tim went over the math library very
carefully and cleaned up what he could, so he probably thought about
this as well. Also, IIUC the IEEE library prescribes exceptions as
well as return values; e.g. "man 3 log" on my OSX computer says that
log(0) returns -inf as well as raise a divide-by-zero exception. So I
think this is probably compliant with the standard -- one can decide
to ignore the exceptions in certain contexts and honor them in others.
(Probably even the 1/0 behavior can be defended this way.)

> Some other operations behave inconsistently:
>
>>>> 2 * 10.**308
> inf
>
> but
>>>> 10.**309
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> OverflowError: (34, 'Result too large')

Probably the same. IEEE 754 may be more complex than you think!

> I think non-reflexivity of nan in Python is an accidental feature.

It is not.

> Python's float type was not designed with NaN in mind and until
> recently, it was relatively difficult to create a nan in pure python.

And when we did add NaN and Inf we thought about the issues carefully.

> It is also not true that IEEE 754 requires that nan == nan is false.
> IEEE 754 does not define operator '==' (nor does it define boolean
> false). Instead, IEEE defines a comparison operation that can have
> one of four results: >, <, =, or unordered. The standard does require
> than NaN compares unordered with anything including itself, but it
> does not follow that a language that defines an == operator with
> boolean results must define it so that nan == nan is false.

Are you proposing changes again? Because it sure sounds like you are
unhappy with the status quo and will not take an explanation, however
authoritative it is.

Given a language with the 6 comparisons like Python (and most do),
they have to be mapped to the IEEE comparison *somehow*, and I believe
we chose one of the most logical translations imaginable (given that
nobody likes == and != raising exceptions).

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 8, 2012, 2:39:03 PM10/8/12
to Rob Cliffe, python...@python.org
On Mon, Oct 8, 2012 at 10:36 AM, Guido van Rossum <gu...@python.org> wrote:
>
>>> It's not about equality. If you ask whether two NaNs are *unequal* the
>>> answer is *also* False.
>>
>> Does this mean that the following behaviour of lists is a bug?
>> >>> x=float('NAN')
>> >>> [x]==[x], [x]<=[x], [x]>=[x]
>> (True, True, True)
>
> No. That's a special case in the comparisons for sequences.

[Now that I'm back at a real keyboard I can elaborate...]

This applies to all container comparisons: without the rule that if
two contained items reference the same object they are to be
considered equal without calling their __eq__, containers couldn't
take the shortcut that a container is always equal to itself (i.e. c1
is c2 => c1 == c2). Without this shortcut, container comparisons would
be much more expensive: any time a large container was compared to
itself, it would be forced to recursively compare all the contained
items. You might say that it has to do this anyway when comparing to a
container that is not itself, but if the anser is "unequal" the
comparison can stop as soon as two unequal items are found, whereas if
the answer is "equal" you end up comparing all items. For two
different containers there is no possible shortcut, but comparing a
container to itself is quite common and really does deserve the
shortcut. We discussed this in the past and always came to the same
conclusion: despite the rules for NaN, the shortcut for containers is
required. A similar shortcut exists for 'x in [x]' BTW.

Guido van Rossum

unread,
Oct 8, 2012, 3:46:43 PM10/8/12
to Rob Cliffe, Python-Ideas
On Mon, Oct 8, 2012 at 12:03 PM, Rob Cliffe <rob.c...@btinternet.com> wrote:
>
> On 08/10/2012 19:39, Guido van Rossum wrote:
>>
>> Does this mean that the following behaviour of lists is a bug?
>>>>>>>
>>>>>>> x=float('NAN')
>>>>>>> [x]==[x], [x]<=[x], [x]>=[x]
>>>>
>>>> (True, True, True)
>>>
>>> No. That's a special case in the comparisons for sequences.
>>
>> [Now that I'm back at a real keyboard I can elaborate...]
>>
>> This applies to all container comparisons: without the rule that if
>> two contained items reference the same object they are to be
>> considered equal without calling their __eq__, containers couldn't
>> take the shortcut that a container is always equal to itself (i.e. c1
>> is c2 => c1 == c2). Without this shortcut, container comparisons would
>> be much more expensive: any time a large container was compared to
>> itself, it would be forced to recursively compare all the contained
>> items. You might say that it has to do this anyway when comparing to a
>> container that is not itself, but if the anser is "unequal" the
>> comparison can stop as soon as two unequal items are found, whereas if
>> the answer is "equal" you end up comparing all items. For two
>> different containers there is no possible shortcut, but comparing a
>> container to itself is quite common and really does deserve the
>> shortcut. We discussed this in the past and always came to the same
>> conclusion: despite the rules for NaN, the shortcut for containers is
>> required. A similar shortcut exists for 'x in [x]' BTW.
>>
> Thank you for elaborating, I was going to ask what the justification for the
> special case was.
> You have explained why
>
>>>> x=float('NAN'); A=[x]; A==A
> True
>
> but not as far as I can see why
>
>>>> x=float('NAN'); A=[x]; B=[x]; A==B, [x]=[x]
> (True, True)
>
> where neither of the results is comparing a container to itself.

It's so that when the container is iterating over pairs of elements it
can check for item identity (a simple pointer comparison) first, which
makes a pretty big difference in speed.

Ned Batchelder

unread,
Oct 8, 2012, 4:39:52 PM10/8/12
to Guido van Rossum, python-ideas
How about:

"In accordance with the IEEE 754 standard, when NaNs are compared to any value, even another NaN, the result is always False, regardless of the comparison. This is because NaN represents an unknown result. There is no way to know the relationship between an unknown result and any other result, especially another unknown one. Even comparing a NaN to itself always produces False."

--Ned.

Guido van Rossum

unread,
Oct 8, 2012, 4:47:53 PM10/8/12
to Ned Batchelder, python-ideas
Sounds good. (But now maybe we also need to come clean with the
exceptions for NaNs compared as part of container comparisons?)

--
--Guido van Rossum (python.org/~guido)

Terry Reedy

unread,
Oct 8, 2012, 4:51:14 PM10/8/12
to python...@python.org
On 10/8/2012 12:19 PM, Guido van Rossum wrote:

> I am not aware of an update to the standard. Being 20 years old does
> not make it outdated.

Similarly, being hundreds or thousands of years old does not make the
equality standard, which includes reflexivity of equality, outdated. The
IEEE standard violated that older standard.
http://bugs.python.org/issue4296
illustrates some of the problems than come with that violation. But
given the compromise made to maintain sane behavior of Python's
collection classes, I see little reason to change nan in isolation.

I wonder if it would be helpful to make a NaN subclass of floats with
its own arithmetic and comparison methods. This would clearly mark a nan
as Not a Normal float. Since subclasses rule (at least some) binary
operations*, this might also simplify normal float code. But perhaps
this was considered and rejected before adding math.isnan in 2.6. (And
ditto for infinities.)

* in that class_ob op subclass_ob is delegated to subclass.__op__, but I
am not sure if this applies only to arithmetic, comparisons, or both.

--
Terry Jan Reedy

Terry Reedy

unread,
Oct 8, 2012, 5:17:56 PM10/8/12
to python...@python.org
On 10/8/2012 12:47 PM, Guido van Rossum wrote:

> this as well. Also, IIUC the IEEE library prescribes exceptions as
> well as return values; e.g. "man 3 log" on my OSX computer says that
> log(0) returns -inf as well as raise a divide-by-zero exception. So I
> think this is probably compliant with the standard -- one can decide
> to ignore the exceptions in certain contexts and honor them in others.
> (Probably even the 1/0 behavior can be defended this way.)

I agree. In C, as I remember, a function can both (passively) 'raise an
exception' by setting errno *and* return a value. This requires the
programmer to check for an exception, and forgetting to do so is a
common bug. In Python, raising an exception actively aborts returning a
value, so you had to choose one of the two behaviors.

>> Some other operations behave inconsistently:
>>
>>>>> 2 * 10.**308
>> inf
>>
>> but
>>>>> 10.**309
>> Traceback (most recent call last):
>> File "<stdin>", line 1, in <module>
>> OverflowError: (34, 'Result too large')
>
> Probably the same. IEEE 754 may be more complex than you think!

Or this might be an accidental inconsistency, in that float
multiplication was changed to return inf but pow was not. But I would be
reluctant to fiddle with such details now.

Alexander, while I might have chosen to make nan == nan True, I consider
it a near tossup with no happy resolution and would not change it now.
Guido's explanation is pretty clear: he went with the IEEE standard as
interpreted for Python by Tim Peters.

--
Terry Jan Reedy

Greg Ewing

unread,
Oct 8, 2012, 8:02:11 PM10/8/12
to python...@python.org
Guido van Rossum wrote:

> It's not about equality. If you ask whether two NaNs are *unequal* the
> answer is *also* False.

That's the weirdest part about this whole business, I think.
Unless you're really keeping your wits about you, it's easy
to forget that the assumption (x == y) == False implies
(x != y) == True doesn't necessarily hold.

This is actually a very important assumption when it comes
to reasoning about programs -- even more important than
reflexivity, etc, I believe. Consider

if x == y:
dosomething()
else:
dosomethingelse()

where x and y are known to be floats. It's easy to see that
the following is equivalent:

if not x == y:
dosomethingelse()
else:
dosomething()

but it's not quite so easy to spot that the following is
*not* equivalent:

if x != y:
dosomethingelse()
else:
dosomething()

This trap is made all the easier to fall into because float
comparison is *mostly* well-behaved, except for a small subset
of the possible values. Most other nonstandard comparison behaviours
in Python apply to whole types. E.g. we refuse to compare complex
numbers for ordering, even if their values happen to be real,
so if you try that you get an early exception. But the weirdness
with NaNs only shows up in corner cases that may escape testing.

Now, there *is* a third possibility -- we could raise an exception
if a comparison involving NaNs is attempted. This would be a
more faithful way of adhering to the IEEE 754 specification that
NaNs are "unordered". More importantly, it would make the second code
transformation above valid in all cases.

So the question that really needs to be answered, I think, is
not "Why is NaN == NaN false?", but "Why doesn't NaN == anything
raise an exception, when it would make so much more sense to
do so?"

--
Greg

Guido van Rossum

unread,
Oct 8, 2012, 8:11:33 PM10/8/12
to Greg Ewing, python...@python.org
Because == raising an exception is really unpleasant. We had this in
Python 2 for unicode/str comparisons and it was very awkward.

Nobody arguing against the status quo seems to care at all about
numerical algorithms though. I propose that you go find some numerical
mathematicians and ask them.

--
--Guido van Rossum (python.org/~guido)

Oscar Benjamin

unread,
Oct 8, 2012, 8:32:39 PM10/8/12
to Guido van Rossum, python...@python.org
The main purpose of quiet NaNs is to propagate through computation
ruining everything they touch. In a programming language like C that
lacks exceptions this is important as it allows you to avoid checking
all the time for invalid values, whilst still being able to know if
the end result of your computation was ever affected by an invalid
numerical operation. The reasons for NaNs to compare unequal are no
doubt related to this purpose.

It is of course arguable whether the same reasoning applies to a
language like Python that has a very good system of exceptions but I
agree with Guido that raising an exception on == would be unfortunate.
How many people would forget that they needed to catch those
exceptions? How awkward could your code be if you did remember to
catch all those exceptions? In an exception handling language it's
important to know that there are some operations that you can trust.


Oscar

Stephen J. Turnbull

unread,
Oct 8, 2012, 8:37:52 PM10/8/12
to Guido van Rossum, python-ideas
Guido van Rossum writes:

> Sounds good. (But now maybe we also need to come clean with the
> exceptions for NaNs compared as part of container comparisons?)

For a second I thought you meant IEEE 754 Exceptions. Whew! How
about:

"""
For reasons of efficiency, Python allows comparisons of containers to
shortcut element comparisons. These shortcuts mean that it is
possible that comparison of two containers may return True, even if
they contain NaNs. For details, see the language reference[1].
"""

Longer than I think it deserves, but maybe somebody has a better idea?

Footnotes:
[1] Sorry about that, but details don't really belong in a *Python*
tutorial. Maybe this should be "see the implementation notes"?

Stephen J. Turnbull

unread,
Oct 8, 2012, 8:50:52 PM10/8/12
to Terry Reedy, python...@python.org
Terry Reedy writes:

> I wonder if it would be helpful to make a NaN subclass of floats with
> its own arithmetic and comparison methods.

It can't be helpful, unless you go a lot further. Specifically, you'd
need to require containers to check every element for NaN-ness. That
doesn't seem very practical.

In any case, the presentation by Kahan (cited earlier by Alexander
himself) demolishes the idea that any sort of attempt to implement
DWIM for floats in a programming language can succeed at the present
state of the art. The best we can get is DWGM ("do what Guido means",
even if what Guido means is "ask the Timbot"<wink/>).

Kahan pretty explicitly endorses this approach, by the way. At least
in the context of choosing default policy for IEEE 754 Exceptions.

Guido van Rossum

unread,
Oct 8, 2012, 9:07:48 PM10/8/12
to Oscar Benjamin, python...@python.org
If we want to do *anything* I think we should first introduce a
floating point context similar to the Decimal context. Then we can
talk.

--
--Guido van Rossum (python.org/~guido)

Alexander Belopolsky

unread,
Oct 8, 2012, 9:31:40 PM10/8/12
to Terry Reedy, python...@python.org
On Mon, Oct 8, 2012 at 5:17 PM, Terry Reedy <tjr...@udel.edu> wrote:
> Alexander, while I might have chosen to make nan == nan True, I consider it
> a near tossup with no happy resolution and would not change it now.

While I did suggest to change nan == nan result two years ago,
<http://mail.python.org/pipermail/python-ideas/2010-March/006945.html>,
I am not suggesting it now. Here I am merely trying to understand to
what extent Python's float is implementing IEEE 754 and why in some
cases Python's behavior deviates from the standard while in the case
of nan == nan, IEEE 754 is taken as a gospel.

> Guido's
> explanation is pretty clear: he went with the IEEE standard as interpreted
> for Python by Tim Peters.

It would be helpful if that interpretation was clearly written
somewhere. Without a written document this interpretation seems
apocryphal to me.

Earlier in this thread, Guido wrote: "I am not aware of an update to
the standard." To the best of my knowledge IEEE Std 754 was last
updated in 2008. I don't think the differences between 1985 and 2008
revisions matter much for this discussion, but since I am going to
refer to chapter and verse, I will start by citing the document that I
will use:

IEEE Std 754(TM)-2008
(Revision of IEEE Std 754-1985)
IEEE Standard for Floating-Point Arithmetic
Approved 12 June 2008
IEEE-SA Standards Board

(AFAICT, the main difference between 754-2008 and 754-1985 is that the
former includes decimal floats added in 854-1987.)

Now, let me put my language lawyer hat on and compare Python floating
point implementations to IEEE 754-2008 standard. Here are the
relevant clauses:

3. Floating-point formats
4. Attributes and rounding
5. Operations
6. Infinity, NaNs, and sign bit
7. Default exception handling
8. Alternate exception handling attributes
9. Recommended operations
10. Expression evaluation
11. Reproducible floating-point results

Clause 3 (Floating-point formats) defines five formats: 3 binary and 2
decimal. Python supports a superset of decimal formats and a single
binary format. Section 3.1.2 (Conformance) contains the following
provision: "A programming environment conforms to this standard, in a
particular radix, by implementing one or more of the basic formats of
that radix as both a supported arithmetic format and a supported
interchange format." I would say Python is conforming to Clause 3.

Clause 4 (Attributes and rounding) is supported only by Decimal
through contexts: "For attribute specification, the implementation
shall provide language-defined means, such as compiler directives, to
specify a constant value for the attribute parameter for all standard
operations in a block; the scope of the attribute value is the block
with which it is associated." I believe Decimal is mostly conforming,
but float is not conforming at all.

Clause 5 requires "[a]ll conforming implementations of this standard
shall provide the operations listed in this clause for all supported
arithmetic formats, except as stated below." In other words, a
language standard that claims conformance with IEEE 754 must provide
all operations unless the standard states otherwise. Let's try to map
IEEE 754 required operations to Python float operations.

5.3.1 General operations

sourceFormat roundToIntegralTiesToEven(source)
sourceFormat roundToIntegralTiesToAway(source)
sourceFormat roundToIntegralTowardZero(source)
sourceFormat roundToIntegralTowardPositive(source)
sourceFormat roundToIntegralTowardNegative(source)
sourceFormat roundToIntegralExact(source)

Python only provides float.__trunc__ which implements
roundToIntegralTowardZero. (The builtin round() belongs to a
different category because it changes format from double to int.)

sourceFormat nextUp(source)
sourceFormat nextDown(source)

I don't think these are available for Python floats.

sourceFormat remainder(source, source) - float.__mod__

Not fully conforming. For example, the standard requires
remainder(-2.0, 1.0) to return -0.0, but in Python 3.3:

>>> -2.0 % 1.0
0.0

On the other hand,

>>> math.fmod(-2.0, 1.0)
-0.0

sourceFormat minNum(source, source)
sourceFormat maxNum(source, source)
sourceFormat minNumMag(source, source)
sourceFormat maxNumMag(source, source)

I don't think these are available for Python floats.

5.3.3 logBFormat operations

I don't think these are available for Python floats.

5.4.1 Arithmetic operations

formatOf-addition(source1, source2) - float.__add__
formatOf-subtraction(source1, source2) - float.__sub__
formatOf-multiplication(source1, source2) - float.__mul__
formatOf-division(source1, source2) - float.__truediv__
formatOf-squareRoot(source1) - math.sqrt
formatOf-fusedMultiplyAdd(source1, source2, source3) - missing
formatOf-convertFromInt(int) - float.__new__

With exception of fusedMultiplyAdd, Python float is conforming.

intFormatOf-convertToIntegerTiesToEven(source)
intFormatOf-convertToIntegerTowardZero(source)
intFormatOf-convertToIntegerTowardPositive(source)
intFormatOf-convertToIntegerTowardNegative(source)
intFormatOf-convertToIntegerTiesToAway(source)
intFormatOf-convertToIntegerExactTiesToEven(source)
intFormatOf-convertToIntegerExactTowardZero(source)
intFormatOf-convertToIntegerExactTowardPositive(source)
intFormatOf-convertToIntegerExactTowardNegative(source)
intFormatOf-convertToIntegerExactTiesToAway(source)

Python has a single builtin round().

5.5.1 Sign bit operations

sourceFormat copy(source) - float.__pos__
sourceFormat negate(source) - float.__neg__
sourceFormat abs(source) - float.__abs__
sourceFormat copySign(source, source) - math.copysign

Python float is conforming.

Now we are getting close to the issue at hand:
"""
5.6.1 Comparisons
Implementations shall provide the following comparison operations, for
all supported floating-point operands of the same radix in arithmetic
formats:

boolean compareQuietEqual(source1, source2)
boolean compareQuietNotEqual(source1, source2)
boolean compareSignalingEqual(source1, source2)
boolean compareSignalingGreater(source1, source2)
boolean compareSignalingGreaterEqual(source1, source2)
boolean compareSignalingLess(source1, source2)
boolean compareSignalingLessEqual(source1, source2)
boolean compareSignalingNotEqual(source1, source2)
boolean compareSignalingNotGreater(source1, source2)
boolean compareSignalingLessUnordered(source1, source2)
boolean compareSignalingNotLess(source1, source2)
boolean compareSignalingGreaterUnordered(source1, source2)
boolean compareQuietGreater(source1, source2)
boolean compareQuietGreaterEqual(source1, source2)
boolean compareQuietLess(source1, source2)
boolean compareQuietLessEqual(source1, source2)
boolean compareQuietUnordered(source1, source2)
boolean compareQuietNotGreater(source1, source2)
boolean compareQuietLessUnordered(source1, source2)
boolean compareQuietNotLess(source1, source2)
boolean compareQuietGreaterUnordered(source1, source2)
boolean compareQuietOrdered(source1, source2).
"""

Signaling comparisons are missing. Ordered/Unordered comparisons are
missing. Note that the standard does not require any particular
spelling for operations. "In this standard, operations are written as
named functions; in a specific programming environment they might be
represented by operators, or by families of format-specific functions,
or by operations or functions whose names might differ from those in
this standard." (Sec. 5.1) It would be perfectly conforming for
python to spell compareSignalingEqual() as '==' and
compareQuietEqual() as math.eq() or even
ieee745_2008.compareQuietEqual(). The choice that Python made was not
dictated by the standard. (As I have shown above, Python's %
operation does not implement a conforming IEEE 754 residual(), but
math.fmod() seems to fill the gap.)

This post is already too long, so I'll leave Clauses 6-11 for another
time. "IEEE 754 may be more complex than you think!" (GvR, earlier in
this thread.) I hope I already made the case that Python's float does
not conform to IEEE 754 and that IEEE 754 does not require an
operation spelled "==" or "float.__eq__" to return False when
comparing two NaNs. The standard requires support for 22 comparison
operations, but Python's float supports around six. On top of that,
Python has an operation that has no analogue in IEEE 754 - the "is"
comparison. This is why IEEE 754 standard does not help in answering
the main question in this thread: should (x is y) imply (x == y)? We
need to formulate a rationale for breaking this implication without a
reference to IEEE 754 or Tim's interpretation thereof.

Language-lawyierly-yours,

Alexander Belopolsky

Alexander Belopolsky

unread,
Oct 8, 2012, 9:37:47 PM10/8/12
to Guido van Rossum, python...@python.org
On Mon, Oct 8, 2012 at 9:07 PM, Guido van Rossum <gu...@python.org> wrote:
> If we want to do *anything* I think we should first introduce a
> floating point context similar to the Decimal context. Then we can
> talk.

+float('inf')

Guido van Rossum

unread,
Oct 8, 2012, 10:09:53 PM10/8/12
to Alexander Belopolsky, python...@python.org, Terry Reedy
On Mon, Oct 8, 2012 at 6:31 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> IEEE 754 standard does not help in answering
> the main question in this thread: should (x is y) imply (x == y)? We
> need to formulate a rationale for breaking this implication without a
> reference to IEEE 754 or Tim's interpretation thereof.

Such a rationale exists in my mind. Since floats are immutable, an
implementation may or may not intern certain float values (just as
certain string and int values are interned but others are not).
Therefore, the fact that "x is y" says nothing about whether the
computations that produced x and y had anything to do with each other.
This is not true for mutable objects: if I have two lists, computed
separately, and find they are the same object, the computations that
produced them must have communicated somehow, or the same list was
passed in to each computations. So, since two computations might
return the same object without having followed the same computational
path, in another implementation the exact same computation might not
return the same object, and so the == comparison should produce the
same value in either case -- in particular, if x and y are both NaN,
all 6 comparisons on them should return False (given that in general
comparing two NaNs returns False regardless of the operator used).

The reason for invoking IEEE 754 here is that without it, Python might
well have grown a language-wide rule stating that an object should
*always* compare equal to itself, as there would have been no
significant counterexamples. (As it is, such a rule only exists for
containers, and technically even there it is optional -- it is just
not required for containers to invoke == for contained items that
reference the same object.)

--
--Guido van Rossum (python.org/~guido)

Case Van Horsen

unread,
Oct 8, 2012, 11:38:13 PM10/8/12
to Alexander Belopolsky, python...@python.org
On Mon, Oct 8, 2012 at 6:37 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> On Mon, Oct 8, 2012 at 9:07 PM, Guido van Rossum <gu...@python.org> wrote:
>> If we want to do *anything* I think we should first introduce a
>> floating point context similar to the Decimal context. Then we can
>> talk.
>
> +float('inf')

I implemented a floating point context manager for gmpy2 and the MPFR
floating point library. By default, it enables a non-stop mode where
infinities and NaN are returned but you can also raise exceptions. You
can experiment with gmpy2: http://code.google.com/p/gmpy/

Some examples

>>> import gmpy2
>>> gmpy2.get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
round=RoundToNearest, real_round=Default, imag_round=Default,
emax=1073741823, emin=-1073741823,
subnormalize=False,
trap_underflow=False, underflow=False,
trap_overflow=False, overflow=False,
trap_inexact=False, inexact=False,
trap_invalid=False, invalid=False,
trap_erange=False, erange=False,
trap_divzero=False, divzero=False,
trap_expbound=False,
allow_complex=False)
>>> gmpy2.log(0)
mpfr('-inf')
>>> gmpy2.get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
round=RoundToNearest, real_round=Default, imag_round=Default,
emax=1073741823, emin=-1073741823,
subnormalize=False,
trap_underflow=False, underflow=False,
trap_overflow=False, overflow=False,
trap_inexact=False, inexact=False,
trap_invalid=False, invalid=False,
trap_erange=False, erange=False,
trap_divzero=False, divzero=True,
trap_expbound=False,
allow_complex=False)
>>> gmpy2.get_context().clear_flags()
>>> gmpy2.get_context().trap_divzero=True
>>> gmpy2.log(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
gmpy2.DivisionByZeroError: 'mpfr' division by zero in log()
>>> gmpy2.set_context(gmpy2.context())
>>> gmpy2.nan()==gmpy2.nan()
False
>>> gmpy2.get_context()
context(precision=53, real_prec=Default, imag_prec=Default,
round=RoundToNearest, real_round=Default, imag_round=Default,
emax=1073741823, emin=-1073741823,
subnormalize=False,
trap_underflow=False, underflow=False,
trap_overflow=False, overflow=False,
trap_inexact=False, inexact=False,
trap_invalid=False, invalid=False,
trap_erange=False, erange=True,
trap_divzero=False, divzero=False,
trap_expbound=False,
allow_complex=False)
>>> gmpy2.get_context().trap_erange=True
>>> gmpy2.nan()==gmpy2.nan()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
gmpy2.RangeError: comparison with NaN
>>>

Standard disclaimers:

* I'm the maintainer of gmpy2.

* Please use SVN or beta2 (when it is released) to avoid a couple of
embarrassing bugs. :(

Steven D'Aprano

unread,
Oct 9, 2012, 12:16:16 AM10/9/12
to python...@python.org
On Sun, Oct 07, 2012 at 10:35:17PM -0400, Ned Batchelder wrote:

> A sentence in section 5.4 (Numeric Types) would help. Something like,
> "In accordance with the IEEE 754 standard, NaN's are not equal to any
> value, even another NaN. This is because NaN doesn't represent a
> particular number, it represents an unknown result, and there is no way
> to know if one unknown result is equal to another unknown result."

NANs don't quite mean "unknown result". If they did they would probably
be called "MISSING" or "UNKNOWN" or "NA" (Not Available).

NANs represent a calculation result which is Not A Number. Hence the
name :-) Since we're talking about the mathematical domain here, a
numeric calculation that doesn't return a numeric result could be said
to have no result at all: there is no real-valued x for which x**2 ==
-1, hence sqrt(-1) can return a NAN.

It certainly doesn't mean "well, there is an answer, but I don't know
what it is". It means "I know that there is no answer".

Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say
that they are equal. If we did, we could prove anything:

sqrt(-1) = sqrt(-2)

Square both sides:

-1 = -2


I was not on the IEEE committee, so I can't speak for them, but my guess
is that they reasoned that since there are an infinite number of "no
result" not-a-number calculations, but only a finite number of NAN bit
patterns available to be used for them, it isn't even safe to presume
that two NANs with the same bit pattern are equal since they may have
come from completely different calculations.

Of course this was before object identity was a relevant factor. As I've
stated before, I think that having collections choose to optimize away
equality tests using object identity is fine. If I need a tuple that
honours NAN semantics, I can subclass tuple to get one. I shouldn't
expect the default tuple behaviour to carry that cost.

By the way, NANs are awesome and don't get anywhere near enough respect.
Here's a great idea from the D language:

http://www.drdobbs.com/cpp/nans-just-dont-get-no-respect/240005723



--
Steven

Steven D'Aprano

unread,
Oct 9, 2012, 12:32:36 AM10/9/12
to python...@python.org
On Mon, Oct 08, 2012 at 04:39:52PM -0400, Ned Batchelder wrote:

> How about:
>
> "In accordance with the IEEE 754 standard, when NaNs are compared to any
> value, even another NaN, the result is always False, regardless of the
> comparison. This is because NaN represents an unknown result. There is no
> way to know the relationship between an unknown result and any other
> result, especially another unknown one. Even comparing a NaN to itself
> always produces False."

Two issues:

1) It is not the case that NaN <comp> NaN is always false.

2) "invalid result" is more appropriate than "unknown result".



--
Steven

Steven D'Aprano

unread,
Oct 9, 2012, 12:26:35 AM10/9/12
to python...@python.org
On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote:

> It's not about equality. If you ask whether two NaNs are *unequal* the
> answer is *also* False.

Not so. I think you are conflating NAN equality/inequality with ordering
comparisons. Using Python 3.3:

py> nan = float('nan')
py> nan > 0
False
py> nan < 0
False
py> nan == 0
False
py> nan != 0
True

but:

py> nan == nan
False
py> nan != nan
True


--
Steven

Alexander Belopolsky

unread,
Oct 9, 2012, 1:32:12 AM10/9/12
to Steven D'Aprano, python...@python.org
On Tue, Oct 9, 2012 at 12:16 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> NANs don't quite mean "unknown result". If they did they would probably
> be called "MISSING" or "UNKNOWN" or "NA" (Not Available).
>
> NANs represent a calculation result which is Not A Number. Hence the
> name :-)

This is quite true, but in Python "Not A Number" is spelled None. In
many aspects, None is like signaling NaN - any numerical operation on
it results in a type error, but None == None is True.

..
> Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say
> that they are equal. If we did, we could prove anything:
>
> sqrt(-1) = sqrt(-2)
>
> Square both sides:
>
> -1 = -2

This is a typical mathematical fallacy where a progression of
seemingly equivalent equations contains an invalid operation. See
http://en.wikipedia.org/wiki/Mathematical_fallacy#All_numbers_equal_all_other_numbers

This is not an argument to make nan == nan false. The IEEE 754
argument goes as follows: in the domain of 2**64 bit patterns most
patterns represent real numbers, some represent infinities and some do
not represent either infinities or numbers. Boolean comparison
operations are defined on the entire domain, but <, =, or > outcomes
are not exclusive if NaNs are present. The forth outcome is
"unordered." In other words for any two patterns x and y one and only
one of the following is true: x < y or x = y or x > y or x and y are
unordered. If x is NaN, it compares as unordered to any other pattern
including itself. This explains why compareQuietEqual(x, x) is false
when x is NaN. In this case, x is unordered with itself, unordered is
different from equal, so compareQuietEqual(x, x) cannot be true. It
cannot raise an exception either because it has to be quiet. Thus the
only correct result is to return false.

The problem that we have in Python is that float.__eq__ is used for
too many different things and compareQuietEqual is not always
appropriate. Here is a partial list:

1. x == y
2. x in [y]
3. {y:1}[x]
4. x in {y}
5. [y].index(x)

In python 3, we already took a step away from using the same notion of
equality in all these cases. Thus in #2, we use x is y or x == y
instead of plain x == y. But that leads to some strange results:

>>> x = float('nan')
>>> x in [x]
True
>>> float('nan') in [float('nan')]
False

An alternative would be to define x in l as any(isnan(x) and isnan(y)
or x == y for y in l) when x and all elements of l are floats. Again,
I am not making a change proposal - just mention a possibility.

Alexander Belopolsky

unread,
Oct 9, 2012, 2:14:10 AM10/9/12
to Guido van Rossum, python...@python.org, Terry Reedy
On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum <gu...@python.org> wrote:
> Such a rationale exists in my mind. Since floats are immutable, an
> implementation may or may not intern certain float values (just as
> certain string and int values are interned but others are not).

This is an interesting argument, but I don't quite understand it. Are
you suggesting that some valid Python implementation may inter NaNs?
Wouldn't that require that all NaNs are equal?

> Therefore, the fact that "x is y" says nothing about whether the
> computations that produced x and y had anything to do with each other.

True.

> This is not true for mutable objects: if I have two lists, computed
> separately, and find they are the same object, the computations that
> produced them must have communicated somehow, or the same list was
> passed in to each computations.

True.

> So, since two computations might
> return the same object without having followed the same computational
> path, in another implementation the exact same computation might not
> return the same object, and so the == comparison should produce the
> same value in either case

True, but this logic does not dictate what this values should be.

> -- in particular, if x and y are both NaN,
> all 6 comparisons on them should return False (given that in general
> comparing two NaNs returns False regardless of the operator used).

Except for operator compareQuietUnordered() which is missing in
Python. Note that IEEE 754 also defines totalOrder() operation
which is more or less lexicographical ordering of bit patterns. A
hypothetical language could map its 6 comparisons to totalOrder() and
still claim IEEE 754 conformity as long as it implements the other 22
comparison predicates somehow.

> The reason for invoking IEEE 754 here is that without it, Python might
> well have grown a language-wide rule stating that an object should
> *always* compare equal to itself, as there would have been no
> significant counterexamples.

Why would it be a bad thing? Isn't this rule what Bertrand Meyer
calls one of the pillars of civilization?

It looks like you give a circular argument. Python cannot have a rule
that x is y implies x == y because that would preclude implementing
float.__eq__ as IEEE 754 equality comparison and we implement
float.__eq__ as IEEE 754 equality comparison in order to provide a
significant counterexample to x is y implies x == y rule. I am not
sure how interning comes into play here, so I must have missed
something.

Mark Dickinson

unread,
Oct 9, 2012, 2:43:57 AM10/9/12
to Ned Batchelder, python-ideas
On Mon, Oct 8, 2012 at 9:39 PM, Ned Batchelder <n...@nedbatchelder.com> wrote:
> How about:
>
> "In accordance with the IEEE 754 standard, when NaNs are compared to any
> value, even another NaN, the result is always False, regardless of the
> comparison. This is because NaN represents an unknown result. There is no
> way to know the relationship between an unknown result and any other result,
> especially another unknown one. Even comparing a NaN to itself always
> produces False."

Looks fine, but I'd suggest leaving out the philosophy ('there is no
way to know ...') and sticking to the statement that Python follows
the IEEE 754 standard in this respect. The justification isn't
particularly convincing and (IMO) only serves to invite arguments.

--
Mark

Guido van Rossum

unread,
Oct 9, 2012, 2:44:12 AM10/9/12
to Steven D'Aprano, python...@python.org
This smells like a bug in the != operator, it seems to fall back to not == which it didn't used to. More later.....

Mark Dickinson

unread,
Oct 9, 2012, 2:49:30 AM10/9/12
to python...@python.org
On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum <gu...@python.org> wrote:
> This smells like a bug in the != operator, it seems to fall back to not ==
> which it didn't used to. More later.....

I'm fairly sure it's deliberate, and has been this way in Python for a
long time. IEEE 754 also has x != x when x is a NaN (at least, for
those IEEE 754 functions that return a boolean rather than signaling
an invalid exception), and it's a well documented property of NaNs
across languages.

--
Mark

Guido van Rossum

unread,
Oct 9, 2012, 2:58:55 AM10/9/12
to Mark Dickinson, python...@python.org
On Mon, Oct 8, 2012 at 11:49 PM, Mark Dickinson <dick...@gmail.com> wrote:
> On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum <gu...@python.org> wrote:
>> This smells like a bug in the != operator, it seems to fall back to not ==
>> which it didn't used to. More later.....
>
> I'm fairly sure it's deliberate, and has been this way in Python for a
> long time. IEEE 754 also has x != x when x is a NaN (at least, for
> those IEEE 754 functions that return a boolean rather than signaling
> an invalid exception), and it's a well documented property of NaNs
> across languages.

Yeah, sorry, I misremembered. :-) This does mean we need to update the
text Ned is proposing.

--
--Guido van Rossum (python.org/~guido)

Steven D'Aprano

unread,
Oct 9, 2012, 3:05:49 AM10/9/12
to python...@python.org
On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote:
> This smells like a bug in the != operator, it seems to fall back to not ==
> which it didn't used to. More later.....

I'm pretty sure the behaviour is correct. When I get home this evening,
I will check my copy of the Standard Apple Numerics manual (one of the
first IEEE 754 compliant systems). In the meantime, I quote from

"What Every Computer Scientist Should Know About Floating-Point
Arithmetic"

"Since comparing a NaN to a number with <, ≤, >, ≥, or = (but not ≠)
always returns false..."

(Admittedly it doesn't specifically state the case of comparing a NAN
with a NAN.)

http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Guido van Rossum

unread,
Oct 9, 2012, 3:13:08 AM10/9/12
to Alexander Belopolsky, python...@python.org, Terry Reedy
On Mon, Oct 8, 2012 at 11:14 PM, Alexander Belopolsky
<alexander....@gmail.com> wrote:
> On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum <gu...@python.org> wrote:
>> Such a rationale exists in my mind. Since floats are immutable, an
>> implementation may or may not intern certain float values (just as
>> certain string and int values are interned but others are not).
>
> This is an interesting argument, but I don't quite understand it. Are
> you suggesting that some valid Python implementation may inter NaNs?
> Wouldn't that require that all NaNs are equal?

Sorry, it seems I got this part slightly wrong. Forget interning. The
argument goes the other way: If you *do* compute x and y exactly the
same way, and if they don't return the same object, and if they both
return NaN, the rules for comparing NaN apply, and the values must
compare unequal. So if you compute them exactly the same way but
somehow you do return the same object, that shouldn't suddenly make
them compare equal.

>> Therefore, the fact that "x is y" says nothing about whether the
>> computations that produced x and y had anything to do with each other.
>
> True.
>
>> This is not true for mutable objects: if I have two lists, computed
>> separately, and find they are the same object, the computations that
>> produced them must have communicated somehow, or the same list was
>> passed in to each computations.
>
> True.
>
>> So, since two computations might
>> return the same object without having followed the same computational
>> path, in another implementation the exact same computation might not
>> return the same object, and so the == comparison should produce the
>> same value in either case
>
> True, but this logic does not dictate what this values should be.
>
>> -- in particular, if x and y are both NaN,
>> all 6 comparisons on them should return False (given that in general
>> comparing two NaNs returns False regardless of the operator used).
>
> Except for operator compareQuietUnordered() which is missing in
> Python. Note that IEEE 754 also defines totalOrder() operation
> which is more or less lexicographical ordering of bit patterns. A
> hypothetical language could map its 6 comparisons to totalOrder() and
> still claim IEEE 754 conformity as long as it implements the other 22
> comparison predicates somehow.

Yes, but that's not the choice Python made, so it's irrelevant.
(Unless you now *do* want to change the language, despite stating
several times that you were just asking for explanations. :-)

>> The reason for invoking IEEE 754 here is that without it, Python might
>> well have grown a language-wide rule stating that an object should
>> *always* compare equal to itself, as there would have been no
>> significant counterexamples.
>
> Why would it be a bad thing? Isn't this rule what Bertrand Meyer
> calls one of the pillars of civilization?

I spent a week with Bertrand recently. He is prone to exaggeration. :-)

> It looks like you give a circular argument. Python cannot have a rule
> that x is y implies x == y because that would preclude implementing
> float.__eq__ as IEEE 754 equality comparison and we implement
> float.__eq__ as IEEE 754 equality comparison in order to provide a
> significant counterexample to x is y implies x == y rule. I am not
> sure how interning comes into play here, so I must have missed
> something.

No, that's not what I meant -- maybe my turn of phrase "invoking IEEE"
was confusing. The first part is what I meant: "Python cannot have a
rule that x is y implies x == y because that would preclude
implementing float.__eq__ as IEEE 754 equality comparison." The second
half should be: "And we have already (independently from all this)
decided that we want to implement float.__eq__ as IEEE 754 equality
comparison." I'm sure a logician could rearrange the words a bit and
make it look more logical.

--
--Guido van Rossum (python.org/~guido)

Guido van Rossum

unread,
Oct 9, 2012, 3:13:38 AM10/9/12
to Steven D'Aprano, python...@python.org
Already retracted. :-(

--
--Guido van Rossum (python.org/~guido)

Greg Ewing

unread,
Oct 9, 2012, 4:19:54 AM10/9/12
to python...@python.org
Oscar Benjamin wrote:
> The main purpose of quiet NaNs is to propagate through computation
> ruining everything they touch.

But they stop doing that as soon as they hit an if statement.
It seems to me that the behaviour chosen for NaN comparison
could just as easily make things go wrong as make them go
right. E.g.

while not (error < epsilon):
find_a_better_approximation()

If error ever ends up being NaN, this will go into an
infinite loop.

--
Greg

Nick Coghlan

unread,
Oct 9, 2012, 4:22:47 AM10/9/12
to Guido van Rossum, python...@python.org, Terry Reedy
On Tue, Oct 9, 2012 at 12:43 PM, Guido van Rossum <gu...@python.org> wrote:
> No, that's not what I meant -- maybe my turn of phrase "invoking IEEE"
> was confusing. The first part is what I meant: "Python cannot have a
> rule that x is y implies x == y because that would preclude
> implementing float.__eq__ as IEEE 754 equality comparison." The second
> half should be: "And we have already (independently from all this)
> decided that we want to implement float.__eq__ as IEEE 754 equality
> comparison." I'm sure a logician could rearrange the words a bit and
> make it look more logical.

I'll have a go. It's a lot longer, though :)

When designing their floating point support, language designers must
choose between two mutually exclusive options:
1. IEEE754 compliant floating point comparison where NaN != NaN, *even
if* they're the same object
2. The invariant that "x is y" implies "x == y"

The idea behind following the IEEE754 model is that mathematics is a
*value based system*. There is only really one NaN, just as there is
only one 4 (or 5, or any other specific value). The idea of a number
having an identity distinct from its value simply doesn't exist. Thus,
when modelling mathematics in an object system, it makes sense to say
that *object identity is irrelevant, and only value matters*.

This is the approach Python has chosen: for *numeric* operations,
including comparisons, object identity is irrelevant to the maximum
extent that is practical. Thus "x = float('nan'); assert x != x" holds
for *exactly the same reason* that "x = 10e50; y = 10e50; assert x ==
y" holds.

However, when it comes to containers, being able to assume that "x is
y" implies "x == y" has an immense practical benefit in terms of being
able to implement a large number of non-trivial optimisations. Thus
the Python language definition explicitly allows containers to make
that assumption, *even though it is known not to be universally true*.

This hybrid model means that even though "'x is y' implies 'x == y'"
is not true in the general case, it may still be *assumed to be true*
regardless by container implementations. In particular, the containers
defined in the standard library reference are *required* to make this
assumption.

This does mean that certain invariants about containers don't hold in
the presence of NaN values. This is mostly a theoretical concern, but,
in those cases where it *does* matter, then the appropriate solution
is to implement a custom container type that handles NaN values
correctly.

It's perhaps worth including a section explaining this somewhere in
the language reference. It's not an accident that Python behaves the
way it does, but it's certainly a rationale that can help implementors
correctly interpret the rest of the language spec.

Cheers,
Nick.

--
Nick Coghlan | ncog...@gmail.com | Brisbane, Australia

Greg Ewing

unread,
Oct 9, 2012, 4:35:21 AM10/9/12
to python...@python.org
Alexander Belopolsky wrote:
> "For attribute specification, the implementation
> shall provide language-defined means, such as compiler directives, to
> specify a constant value for the attribute parameter for all standard
> operations in a block; the scope of the attribute value is the block
> with which it is associated." I believe Decimal is mostly conforming,

That depends on whether "scope" is meant lexically or
dynamically. Decimal contexts are scoped dynamically.

--
Greg

Chris Angelico

unread,
Oct 9, 2012, 4:44:45 AM10/9/12
to python-ideas
On Tue, Oct 9, 2012 at 7:19 PM, Greg Ewing <greg....@canterbury.ac.nz> wrote:
> But they stop doing that as soon as they hit an if statement.
> It seems to me that the behaviour chosen for NaN comparison
> could just as easily make things go wrong as make them go
> right. E.g.
>
> while not (error < epsilon):
> find_a_better_approximation()
>
> If error ever ends up being NaN, this will go into an
> infinite loop.

But if you know that that's a possibility, you simply code your
condition the other way:

while error > epsilon:
find_a_better_approximation()

Which will then immediately terminate the loop if error bonks to NaN.

ChrisA

Oscar Benjamin

unread,
Oct 9, 2012, 4:52:01 AM10/9/12
to Greg Ewing, python...@python.org

On Oct 9, 2012 9:20 AM, "Greg Ewing" <greg....@canterbury.ac.nz> wrote:
>
> Oscar Benjamin wrote:
>>
>> The main purpose of quiet NaNs is to propagate through computation
>> ruining everything they touch.
>
>
> But they stop doing that as soon as they hit an if statement.
> It seems to me that the behaviour chosen for NaN comparison
> could just as easily make things go wrong as make them go
> right. E.g.
>
>    while not (error < epsilon):
>       find_a_better_approximation()
>
> If error ever ends up being NaN, this will go into an
> infinite loop.

I should expect that an experienced numericist would be aware of the possibility of a NaN and make a trivial modification of your loop to take advantage of the simple fact that any comparison with NaN returns false. It is only because you have artificially placed a not in the while clause that it doesn't work. I would have tested for error>eps without even thinking about NaNs.

Oscar

Steven D'Aprano

unread,
Oct 9, 2012, 8:54:52 AM10/9/12
to python...@python.org
On 09/10/12 11:32, Oscar Benjamin wrote:

> The main purpose of quiet NaNs is to propagate through computation
> ruining everything they touch. In a programming language like C that
> lacks exceptions this is important as it allows you to avoid checking
> all the time for invalid values, whilst still being able to know if
> the end result of your computation was ever affected by an invalid
> numerical operation.

Correct, but I'd like to point out that NaNs are a bit more
sophisticated than just "numeric contagion".

1) NaNs carry payload, so you can actually identify what sort of
calculation failed. E.g. NaN-27 might mean "logarithm of a negative
number", while NaN-95 might be "inverse trig function domain error".
Any calculation involving a single NaN is supposed to propagate the
same payload, so at the end of the calculation you can see that you
tried to take the log of a negative number and debug accordingly.


2) On rare occasions, NaNs can validly disappear from a calculation,
leaving you with a non-NaN answer. The rule is, if you can replace
the NaN with *any* other value, and still get the same result, then
the NaN is irrelevant and can be consumed. William Kahan gives an
example:

For example, 0*NaN must be NaN because 0*∞ is an INVALID
operation (NaN). On the other hand, for hypot(x, y) :=
√(x*x + y*y) we find that hypot(∞, y) = +∞ for all real y,
finite or not, and deduce that hypot(∞, NaN) = +∞ too;
naive implementations of hypot may do differently.

Page 7 of http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

--
Steven

Ethan Furman

unread,
Oct 9, 2012, 11:54:03 AM10/9/12
to python...@python.org
Steven D'Aprano wrote:
> 1) It is not the case that NaN <comp> NaN is always false.

Huh -- well, apparently NaN != Nan --> True.

However, borrowing Steven's earlier example, and modifying slightly:

sqr(-1) != sqr(-1)

Shouldn't this be False?

Or, to look at it another way, surely somewhere out in the Real World
(tm) it is the case that two NaNs are indeed equal.

~Ethan~

Steven D'Aprano

unread,
Oct 9, 2012, 12:11:42 PM10/9/12
to python...@python.org
On 10/10/12 02:54, Ethan Furman wrote:

> Or, to look at it another way, surely somewhere out in the Real
>World (tm) it is the case that two NaNs are indeed equal.

By definition, no.


--
Steven

Joshua Landau

unread,
Oct 9, 2012, 6:13:57 PM10/9/12
to python...@python.org
Just a curiosity here (as I can guess of plausible reasons myself, so there probably are some official stances).

Is there a reason NaNs are not instances of NaN class? Then x == x would be True (as they want), but [this NaN] == [that NaN] would be False, as expected.

I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1), but it seems a lot less of a big deal than all of the exceptions with container equalities.

Thanks,

Joshua

Steven D'Aprano

unread,
Oct 9, 2012, 9:14:26 PM10/9/12
to python...@python.org
On 10/10/12 09:13, Joshua Landau wrote:
> Just a curiosity here (as I can guess of plausible reasons myself, so there
> probably are some official stances).
>
> Is there a reason NaNs are not instances of NaN class?

Because that would complicate Python's using floats for absolutely no benefit.
Instead of float operations always returning a float, they would have to return
a float or a NAN. To check for a valid floating point instance, instead of
saying:

isinstance(x, float)

you would have to say:

isinstance(x, (float, NAN))

And what about infinities, denorm numbers, and negative zero? Do they get
dedicated classes too?

And what is the point of this added complexity? Nothing.

You *still* have the rule that "x == x for all x, except for NANs". The
only difference is that "NANs" now means "instances of NAN class" rather than
"NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of
a nuisance because some valid floating point values aren't floats but have a
different class, but nothing meaningful is different.


> Then x == x would be True (as they want), but [this NaN] == [that NaN]
> would be False, as expected.

Making NANs their own class wouldn't give you that. If we wanted that
behaviour, we could have it without introducing a NAN class: just change the
list __eq__ method to scan the list for a NAN using math.isnan before checking
whether the lists were identical.

But that would defeat the purpose of the identity check (an optimization to
avoid scanning the list)! Replacing math.isnan with isinstance doesn't change
that.


> I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),

That question has already been raised, and answered, repeatedly in this thread.


> but it seems a lot less of a big deal than all of the exceptions with
> container equalities.

Container equalities are not a big deal. I'm not sure what problem you think
you are solving.

Mike Graham

unread,
Oct 9, 2012, 9:25:55 PM10/9/12
to Steven D'Aprano, Python-Ideas
I'm sometimes surprised at the creativity and passion behind solutions
to this issue.

I've been a Python user for some years now, including time dealing
with stuff like numpy where you're fairly likely to run into NaNs.
I've been an active member of several support communities where I can
confidently say I have encountered tens of thousands of Python
questions. Not once can I recall ever having or seeing anyone have an
actual problem that I had or someone else had due to the way Python
handles NaN. As far as I can tell, it works _perfectly_.

I appreciate the aesthetic concerns, but I really wish someone would
explain to me what's actually broken and in need of fixing.

Mike

alex23

unread,
Oct 9, 2012, 10:23:23 PM10/9/12
to python...@python.org
On Oct 9, 5:14 pm, Guido van Rossum <gu...@python.org> wrote:
> I spent a week with Bertrand recently.

Any chance you might blog about this? :)

Stephen J. Turnbull

unread,
Oct 10, 2012, 2:06:10 AM10/10/12
to Ethan Furman, python...@python.org
Ethan Furman writes:

> Or, to look at it another way, surely somewhere out in the Real World
> (tm) it is the case that two NaNs are indeed equal.

Sure, but according to Kahan's Uncertainty principle, you'll never be
able to detect it.

Really-there's no-alternative-to-backward-compatibility-or-IEEE754-ly y'rs

Robert Kern

unread,
Oct 10, 2012, 9:23:38 AM10/10/12
to python...@python.org
On 10/10/12 2:25 AM, Mike Graham wrote:

> I'm sometimes surprised at the creativity and passion behind solutions
> to this issue.
>
> I've been a Python user for some years now, including time dealing
> with stuff like numpy where you're fairly likely to run into NaNs.
> I've been an active member of several support communities where I can
> confidently say I have encountered tens of thousands of Python
> questions. Not once can I recall ever having or seeing anyone have an
> actual problem that I had or someone else had due to the way Python
> handles NaN. As far as I can tell, it works _perfectly_.
>
> I appreciate the aesthetic concerns, but I really wish someone would
> explain to me what's actually broken and in need of fixing.

While I also don't think that anything needs to be fixed, I must say that in my
years of monitoring tens of thousands of Python questions, there have been a few
legitimate problems with the NaN behavior. It does come up from time to time.

The most frequent problem is checking if a list contains a NaN. The obvious
thing to do for many users:

nan in list_of_floats

This is a reasonable prediction based on what one normally does for most objects
in Python, but this is quite wrong. But because list.__contains__() checks for
identity first, it can look like it works when people test it out:

>>> nan = float('nan')
>>> nan in [1.0, 2.0, nan]
True

Then they write their code doing the wrong thing thinking that they tested their
approach.

I classify this as a wart: it breaks reasonable predictions from users, requires
more exceptions-based knowledge about NaNs to use correctly, and can trap users
who do try to experiment to determine the behavior. But I think that the cost of
acquiring and retaining such knowledge is not so onerous as to justify the cost
of any of the attempts to fix the wart.

The other NaN wart (unrelated to this thread) is that sorting a list of floats
containing a NaN will usually leave the list unsorted because "inequality
comparisons with a NaN always return False" breaks the assumptions of timsort
and other sorting algorithms. You should remember this, as you once demonstrated
the problem:

http://mail.python.org/pipermail/python-ideas/2011-April/010063.html

This is a real problem, so much so that numpy works around it by enforcing our
sorts to always sort NaN at the end of the array. Unfortunately, lists do not
have the luxury of cheaply knowing the type of all of the objects in the list,
so this is not an option for them.

Real problems, but nothing that motivates a change, in my opinion.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Joshua Landau

unread,
Oct 10, 2012, 5:33:45 PM10/10/12
to Steven D'Aprano, python...@python.org
On 10 October 2012 02:14, Steven D'Aprano <st...@pearwood.info> wrote:
On 10/10/12 09:13, Joshua Landau wrote:
Just a curiosity here (as I can guess of plausible reasons myself, so there
probably are some official stances).

Is there a reason NaNs are not instances of NaN class?

Because that would complicate Python's using floats for absolutely no benefit.
Instead of float operations always returning a float, they would have to return
a float or a NAN. To check for a valid floating point instance, instead of
saying:

isinstance(x, float)

you would have to say:

isinstance(x, (float, NAN))
 
Not the way I'm proposing it.

>>> class NAN(float):
...     def __new__(self):
...             return float.__new__(self, "nan")
...     def __eq__(self, other):
...             return other is self
... 
>>> isinstance(NAN(), float)
True
>>> NAN() is NAN()
False
>>> NAN() == NAN()
False
>>> x = NAN()
>>> x is x
True
>>> x == x
True
>>> x
nan

 
And what about infinities, denorm numbers, and negative zero? Do they get
dedicated classes too?

Infinities? No, although they might well if the infinities were different (set of reals vs set of ints, for example).
Denorms? No, that's a completely different thing.
-0.0? No, that's a completely different thing.
 
I was asking, because instances of a class maps on to a behavior that matches *almost exactly* what *both* parties want, why was it not used? This is not the case with anything other than that.

And what is the point of this added complexity? Nothing.

Simplicity. It's simpler.
 
You *still* have the rule that "x == x for all x, except for NANs".

False. I was proposing that x == x but NAN() != NAN().
 
The only difference is that "NANs" now means "instances of NAN class" rather than
"NAN floats" (and Decimals).

False, if you subclass float.
 
Working with IEEE 754 floats is now far more of
a nuisance because some valid floating point values aren't floats but have a
different class, but nothing meaningful is different.
 
Then x == x would be True (as they want), but [this NaN] == [that NaN]
would be False, as expected.

Making NANs their own class wouldn't give you that. If we wanted that
behaviour, we could have it without introducing a NAN class: just change the
list __eq__ method to scan the list for a NAN using math.isnan before checking
whether the lists were identical.

False.

>>> x == x
True
>>> [NAN()] == [NAN()]
False

as per my previous "implementation".
 
But that would defeat the purpose of the identity check (an optimization to
avoid scanning the list)! Replacing math.isnan with isinstance doesn't change
that.


I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),

That question has already been raised, and answered, repeatedly in this thread.
 
False. x != x, so that has not been "answered". This was an example problem with my own suggested implementation.

but it seems a lot less of a big deal than all of the exceptions with
container equalities.

Container equalities are not a big deal. I'm not sure what problem you think
you are solving.

 Why would you assume that? I mentioned it from honest curiosity, and all I got back was an attack. Please, I want to be civil but you need to act less angrily.

[Has not been spell-checked, as I don't really have time </lie>]

Thank you for your time, even though I disagree,

Joshua Landau

Joshua Landau

unread,
Oct 10, 2012, 5:38:32 PM10/10/12
to Steven D'Aprano, python...@python.org
On 10 October 2012 22:33, Joshua Landau <joshua.l...@gmail.com> wrote:
 Why would you assume that? I mentioned it from honest curiosity, and all I got back was an attack. Please, I want to be civil but you need to act less angrily.

After reconsidering, I regret these sentences.

Yes, I do still believe your response was overly angry, but I did get a thought out response and you did try and address my concerns. In the interest of benevolence, may I redact my statement?

Joshua Landau

unread,
Oct 10, 2012, 6:05:43 PM10/10/12
to Steven D'Aprano, python...@python.org
I don't normally triple-post, but here it goes.

After re-re-reading this thread, it turns out one (1) post and two (2) answers to that post have covered a topic very similar to the one I have raised. All of the others, to my understanding, do not dwell over the fact that float("nan") is not float("nan") . The mentioned post was not quite the same as mine, but it still had two replies.

I will respond to them here. My response, again, is a curiosity why, not a suggestion to change anything. I agree that there is probably no real concern with the current state, I have never had a concern and the concern caused by change would dwarf any possible benefits.


Response 1:
This implies that you want to differentiate between -0.0 and +0.0. That is bad.

My response:
Why would I want to do that?

Response 2:
"There is not space on this thread to convince you otherwise." [paraphrased]

My response:
That comment was not directed at me and thus has little relevance to my own post.


Hopefully now you should understand why I felt need to ask the question after so much has already been said on the topic.

Finally, Mike Graham says (probably referring to me):
"I'm sometimes surprised at the creativity and passion behind solutions to this issue."

My response:
It was an immediate thought, not one dwelled upon. The fact it was not answered in the thread prompted my curiosity. It is honestly nothing more.

Steven D'Aprano

unread,
Oct 10, 2012, 9:20:39 PM10/10/12
to python...@python.org
On 11/10/12 09:05, Joshua Landau wrote:

> After re-re-reading this thread, it turns out one *(1)* post and two
> *(2)* answers
> to that post have covered a topic very similar to the one I have raised.
> All of the others, to my understanding, do not dwell over the fact
> that *float("nan") is not float("nan")* .

That's no different from any other float.

py> float('nan') is float('nan')
False
py> float('1.5') is float('1.5')
False

Floats are not interned or cached, although of course interning is
implementation dependent and this is subject to change without notice.

For that matter, it's true of *nearly all builtins* in Python. The
exceptions being bool(obj) which returns one of two fixed instances,
and int() and str(), where *some* but not all instances are cached.


> Response 1:
> This implies that you want to differentiate between -0.0 and +0.0. That is
> bad.
>
> My response:
> Why would I want to do that?

If you are doing numeric work, you *should* differentiate between -0.0
and 0.0. That's why the IEEE 754 standard mandates a -0.0.

Both -0.0 and 0.0 compare equal, but they can be distinguished (although
doing so is tricky in Python). The reason for distinguishing them is to
distinguish between underflow to zero from positive or negative values.
E.g. log(x) should return -infinity if x underflows from a positive value,
and a NaN if x underflows from a negative.

Alexander Belopolsky

unread,
Oct 11, 2012, 12:07:33 AM10/11/12
to Steven D'Aprano, python...@python.org
On Wed, Oct 10, 2012 at 9:20 PM, Steven D'Aprano <st...@pearwood.info> wrote:
> Both -0.0 and 0.0 compare equal, but they can be distinguished (although
> doing so is tricky in Python).

Not really:

>>> math.copysign(1.0,-0.0)
-1.0
>>> math.copysign(1.0,0.0)
1.0

Mark Dickinson

unread,
Oct 12, 2012, 1:26:18 PM10/12/12
to Steven D'Aprano, python...@python.org
On Thu, Oct 11, 2012 at 2:20 AM, Steven D'Aprano <st...@pearwood.info> wrote:
> E.g. log(x) should return -infinity if x underflows from a positive value,
> and a NaN if x underflows from a negative.

IEEE 754 disagrees. :-) Both log(-0.0) and log(0.0) are required to
return -infinity (and/or signal the divideByZero exception).

And as for sqrt(-0.0) returning -0.0... Grr. I've never understood
the motivation for that one, especially as it disagrees with the usual
recommendations for complex square root (where the real part of the
result *always* has its sign bit cleared).

Mark

Joshua Landau

unread,
Oct 12, 2012, 2:42:38 PM10/12/12
to Steven D'Aprano, python...@python.org
On 11 October 2012 02:20, Steven D'Aprano <st...@pearwood.info> wrote:
On 11/10/12 09:05, Joshua Landau wrote:

After re-re-reading this thread, it turns out one *(1)* post and two
*(2)* answers

to that post have covered a topic very similar to the one I have raised.
All of the others, to my understanding, do not dwell over the fact
that *float("nan") is not float("nan")* .

That's no different from any other float.

py> float('nan') is float('nan')
False
py> float('1.5') is float('1.5')
False

Floats are not interned or cached, although of course interning is
implementation dependent and this is subject to change without notice.

For that matter, it's true of *nearly all builtins* in Python. The
exceptions being bool(obj) which returns one of two fixed instances,
and int() and str(), where *some* but not all instances are cached.
 
>>> float(1.5) is float(1.5)
True
>>> float("1.5") is float("1.5")
False

Confusing re-use of identity strikes again. Can anyone care to explain what causes this? I understand float(1.5) is likely to return the inputted float, but that's as far as I can reason.

What I was saying, though, is that all other posts assumed equality between two different NaNs should be the same as identity between a NaN and itself. This is what I'm really asking about, I guess.
 
Response 1:
This implies that you want to differentiate between -0.0 and +0.0. That is
bad.

My response:
Why would I want to do that?

If you are doing numeric work, you *should* differentiate between -0.0
and 0.0. That's why the IEEE 754 standard mandates a -0.0.

Both -0.0 and 0.0 compare equal, but they can be distinguished (although
doing so is tricky in Python). The reason for distinguishing them is to
distinguish between underflow to zero from positive or negative values.
E.g. log(x) should return -infinity if x underflows from a positive value,
and a NaN if x underflows from a negative.

Interesting. 

Can you give me a more explicit example? When would you not *want* f(-0.0) to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on creation]

MRAB

unread,
Oct 12, 2012, 3:19:48 PM10/12/12
to python-ideas
On 2012-10-12 19:42, Joshua Landau wrote:
> On 11 October 2012 02:20, Steven D'Aprano <st...@pearwood.info
> <mailto:st...@pearwood.info>> wrote:
>
> On 11/10/12 09:05, Joshua Landau wrote:
>
> After re-re-reading this thread, it turns out one *(1)* post and two
> *(2)* answers
>
> to that post have covered a topic very similar to the one I have
> raised.
> All of the others, to my understanding, do not dwell over the fact
> that *float("nan") is not float("nan")* .
>
>
> That's no different from any other float.
>
> py> float('nan') is float('nan')
> False
> py> float('1.5') is float('1.5')
> False
>
> Floats are not interned or cached, although of course interning is
> implementation dependent and this is subject to change without notice.
>
> For that matter, it's true of *nearly all builtins* in Python. The
> exceptions being bool(obj) which returns one of two fixed instances,
> and int() and str(), where *some* but not all instances are cached.
>
> >>> float(1.5) is float(1.5)
> True

It re-uses an immutable literal:

>>> 1.5 is 1.5
True
>>> "1.5" is "1.5"
True

and 'float' returns its argument if it's already a float:

>>> float(1.5) is 1.5
True

Therefore:

>>> float(1.5) is float(1.5)
True

But apart from that, when a new object is created, it doesn't check
whether it's identical to another, except in certain cases such as ints
in a limited range:

>>> float("1.5") is float("1.5")
False
>>> float("1.5") is 1.5
False
>>> int("1") is 1
True

And it's an implementation-specific behaviour.

Mark Dickinson

unread,
Oct 12, 2012, 3:22:37 PM10/12/12
to Joshua Landau, python...@python.org
On Fri, Oct 12, 2012 at 7:42 PM, Joshua Landau
<joshua.l...@gmail.com> wrote:
> Can you give me a more explicit example? When would you not *want* f(-0.0)
> to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on
> creation]

A few examples:

(1) In the absence of exceptions, 1 / 0.0 is +inf, while 1 / -0.0 is
-inf. So e.g. the function exp(-exp(1/x)) has different values at
-0.0 and 0.0:

>>> from numpy import float64, exp
>>> exp(-exp(1/float64(0.0)))
0.0
>>> exp(-exp(1/float64(-0.0)))
1.0

(2) For the atan2 function, we have e.g.,

>>> from math import atan2
>>> atan2(0.0, -1.0)
3.141592653589793
>>> atan2(-0.0, -1.0)
-3.141592653589793

This gives atan2 a couple of nice invariants: the sign of the result
always matches the sign of the first argument, and atan2(-y, x) ==
-atan2(y, x) for any (non-nan) x and y.

(3) Similarly, for complex math functions (which aren't covered by
IEEE 754, but are standardised in various other languages), it's
sometimes convenient to be able to depend on invariants like e.g.
asin(z.conj()) == asin(z).conj(). Those are only possible if -0.0 and
0.0 are distinguished; the effect is most visible if you pick values
lying on a branch cut.

>>> from cmath import sin
>>> z = complex(2.0, 0.0)
>>> asin(z).conjugate()
(1.5707963267948966-1.3169578969248166j)
>>> asin(z.conjugate())
(1.5707963267948966-1.3169578969248166j)

You can't take that too far, though: e.g., it would be nice if
complex multiplication had the property that (z * w).conjugate() was
always the same as z.conjugate() * w.conjugate(), but it's impossible
to keep both that invariant and the commutativity of multiplication.
(E.g., consider the result of complex(1, 1) * complex(1, -1).)

--
Mark

Tim Peters

unread,
Oct 12, 2012, 3:42:34 PM10/12/12
to python...@python.org
[Mark Dickinson]
> ...
> And as for sqrt(-0.0) returning -0.0... Grr. I've never understood
> the motivation for that one, especially as it disagrees with the usual
> recommendations for complex square root (where the real part of the
> result *always* has its sign bit cleared).

The only rationale I've seen for this is in Kahan's obscure paper
"Branch Cuts for Complex Elementary Functions or Much Ado About
Nothing's Sign Bit". Hard to find. Here's a mostly readable scan:

http://port70.net/~nsz/articles/float/kahan_branch_cuts_complex_elementary_functions_1987.pdf

In part it's to preserve various identities, such as that
sqrt(conjugate(z)) is the same as conjugate(sqrt(z)). When z is +0,
that becomes

sqrt(conjugate(+0)) same_as conjugate(sqrt(+0))

which is

sqrt(-0) same_as conjugate(+0)

which is

sqrt(-0) same as -0

Conviced? LOL. There are others in the paper ;-)

Mark Dickinson

unread,
Oct 12, 2012, 4:46:00 PM10/12/12
to Tim Peters, python...@python.org
On Fri, Oct 12, 2012 at 8:42 PM, Tim Peters <tim.p...@gmail.com> wrote:
> In part it's to preserve various identities, such as that
> sqrt(conjugate(z)) is the same as conjugate(sqrt(z)). When z is +0,
> that becomes
>
> sqrt(conjugate(+0)) same_as conjugate(sqrt(+0))
>
> which is
>
> sqrt(-0) same_as conjugate(+0)
>
> which is
>
> sqrt(-0) same as -0
>
> Conviced?

Not really. :-) In fact, it's exactly that paper that made me think
sqrt(-0.0) -> -0.0 is suspect.

The way I read it, the argument from the paper implies that
cmath.sqrt(complex(0.0, -0.0)) should be complex(0.0, -0.0), which I
have no problem with---it makes things nice and neat: quadrants 1 and
2 in the complex plane map to quadrant 1, and quadrants 3 and 4 to
quadrant 4, with the signs of the zeros making it clear what
'quadrant' means in all (non-nan) cases. But I don't see how to get
from there to math.sqrt(-0.0) being -0.0.

It's exactly the mismatch between the real and complex math that makes
no sense to me: math.sqrt(-0.0) should resemble
cmath.sqrt(complex(-0.0, +/-0.0)). But the latter, quite reasonably,
is complex(0.0, +/-0.0) (at least according to both Kahan and C99
Annex G), while the former is specified to be -0.0 in IEEE 754.

--
Mark

Joshua Landau

unread,
Oct 12, 2012, 5:16:13 PM10/12/12
to python-ideas
Thank you all for being so thorough. I think I'm sated for tonight. ^^

With all due respect,

Joshua Landau
Reply all
Reply to author
Forward
0 new messages