Google 网上论坛不再支持新的 Usenet 帖子或订阅项。历史内容仍可供查看。

float("nan") in set or as key

已查看 101 次
跳至第一个未读帖子

MRAB

未读,
2011年5月28日 19:41:162011/5/28
收件人 pytho...@python.org
Here's a curiosity. float("nan") can occur multiple times in a set or as
a key in a dict:

>>> {float("nan"), float("nan")}
{nan, nan}

except that sometimes it can't:

>>> nan = float("nan")
>>> {nan, nan}
{nan}

Erik Max Francis

未读,
2011年5月28日 20:16:502011/5/28
收件人

It's fundamentally because NaN is not equal to itself, by design.
Dictionaries and sets rely on equality to test for uniqueness of keys or
elements.

>>> nan = float("nan")

>>> nan == nan
False

In short, don't do that.

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
There was never a good war or a bad peace.
-- Benjamin Franklin, 1706-1790

Steven D'Aprano

未读,
2011年5月28日 20:26:072011/5/28
收件人
On Sun, 29 May 2011 00:41:16 +0100, MRAB wrote:

> Here's a curiosity. float("nan") can occur multiple times in a set or as
> a key in a dict:
>
> >>> {float("nan"), float("nan")}
> {nan, nan}

That's an implementation detail. Python is free to reuse the same object
when you create an immutable object twice on the same line, but in this
case doesn't. (I don't actually know if it ever does, but it could.)

And since NAN != NAN always, you can get two NANs in the one set, since
they're unequal.


> when you write float('nan')


>
> except that sometimes it can't:
>
> >>> nan = float("nan")
> >>> {nan, nan}
> {nan}

But in this case, you try to put the same NAN in the set twice. Since
sets optimize element testing by checking for identity before equality,
the NAN only goes in once.


--
Steven

Albert Hopkins

未读,
2011年5月28日 20:28:462011/5/28
收件人 pytho...@python.org
On Sun, 2011-05-29 at 00:41 +0100, MRAB wrote:
> Here's a curiosity. float("nan") can occur multiple times in a set or as
> a key in a dict:
>
> >>> {float("nan"), float("nan")}
> {nan, nan}
>
These two nans are not equal (they are two different nans)

> except that sometimes it can't:
>
> >>> nan = float("nan")
> >>> {nan, nan}
> {nan}

This is the same nan, so it is equal to itself.

Two "nan"s are not equal in the manner that 1.0 and 1.0 are equal:

>>> 1.0 == 1.0
True
>>> float("nan") == float("nan")
False


I can't cite this in a spec, but it makes sense (to me) that two things
which are nan are not necessarily the same nan.

Chris Angelico

未读,
2011年5月28日 20:32:432011/5/28
收件人 pytho...@python.org
On Sun, May 29, 2011 at 10:28 AM, Albert Hopkins <mar...@letterboxes.org> wrote:
> This is the same nan, so it is equal to itself.
>

Actually, they're not. But it's possible the dictionary uses an 'is'
check to save computation, and if one thing 'is' another, it is
assumed to equal it. That's true of most well-behaved objects, but nan
is not well-behaved :)

Chris Angelico

Erik Max Francis

未读,
2011年5月28日 20:44:562011/5/28
收件人

It's part of the IEEE standard.

Gregory Ewing

未读,
2011年5月28日 21:04:322011/5/28
收件人

NaNs are weird. They're not equal to themselves:

Python 2.7 (r27:82500, Oct 15 2010, 21:14:33)
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type "help", "copyright", "credits" or "license" for more information.


>>> nan = float("nan")

>>> nan == nan
False

This confuses the daylights out of Python's dict lookup machinery,
which assumes that two references to the same object can't possibly
compare unequal, so it doesn't bother calling __eq__ on them.

--
Greg

Grant Edwards

未读,
2011年5月28日 22:25:552011/5/28
收件人
On 2011-05-29, Albert Hopkins <mar...@letterboxes.org> wrote:
> On Sun, 2011-05-29 at 00:41 +0100, MRAB wrote:
>> Here's a curiosity. float("nan") can occur multiple times in a set or as
>> a key in a dict:
>>
>> >>> {float("nan"), float("nan")}
>> {nan, nan}
>>
> These two nans are not equal (they are two different nans)
>
>> except that sometimes it can't:
>>
>> >>> nan = float("nan")
>> >>> {nan, nan}
>> {nan}
>
> This is the same nan, so it is equal to itself.

No, it's not.

>>> x = float("nan")
>>> y = x
>>> x is y
True
>>> x == y
False

> I can't cite this in a spec, but it makes sense (to me) that two things
> which are nan are not necessarily the same nan.

Even if they _are_ the same nan, it's still not equal to itself.

--
Grant


John Nagle

未读,
2011年5月29日 02:12:542011/5/29
收件人

Right.

The correct answer to "nan == nan" is to raise an exception, because
you have asked a question for which the answer is nether True nor False.

The correct semantics for IEEE floating point look something like
this:

1/0 INF
INF + 1 INF
INF - INF NaN
INF == INF unordered
NaN == NaN unordered

INF and NaN both have comparison semantics which return
"unordered". The FPU sets a bit for this, which most language
implementations ignore. But you can turn on floating point
exception traps, and on x86 machines, they're exact - the
exception will occur exactly at the instruction which
triggered the error. In superscalar CPUs, a sizable part of
the CPU handles the unwinding necessary to do that. x86 does
it, because it's carefully emulating non-superscalar machines.
Most RISC machines don't bother.

Python should raise an exception on unordered comparisons.
Given that the language handles integer overflow by going to
arbitrary-precision integers, checking the FPU status bits is
cheap.

The advantage of raising an exception is that the logical operations
still work. For example,

not (a == b)
a != b

will always return the same results if exceptions are raised for
unordered comparison results. Also, exactly one of

a = b
a < b
a > b

is always true - something sorts tend to assume.

If you get an unordered comparison exception, your program
almost certainly was getting wrong answers.

(I used to do dynamics simulation engines, where this mattered.)

John Nagle

Wolfgang Rohdewald

未读,
2011年5月29日 04:27:142011/5/29
收件人 pytho...@python.org
On Sonntag 29 Mai 2011, Tim Delaney wrote:
> There's a second part the mystery - sets and dictionaries (and
> I think lists) assume that identify implies equality (hence
> the second result). This was recently discussed on
> python-dev, and the decision was to leave things as-is.

On Sonntag 29 Mai 2011, Grant Edwards wrote:
> Even if they are the same nan, it's still not equal to itself.

if I understand this thread correctly, they are not equal to
itself as specified by IEEE but Python treats them equal in
sets and dictionaries for performance reasons

--
Wolfgang

Steven D'Aprano

未读,
2011年5月29日 04:51:332011/5/29
收件人

*Exactly* correct.

NAN != NAN even if they are the same NAN, by design. This makes NANs ill-
behaved, but usefully so. Most (all?) Python built-ins assume that any
object X is equal to itself, so they behave strangely with NANs.


--
Steven

Steven D'Aprano

未读,
2011年5月29日 06:29:282011/5/29
收件人
On Sat, 28 May 2011 23:12:54 -0700, John Nagle wrote:

> The correct answer to "nan == nan" is to raise an exception, because
> you have asked a question for which the answer is nether True nor False.

Wrong.

The correct answer to "nan == nan" is False, they are not equal. Just as
None != "none", and 42 != [42], or a teacup is not equal to a box of
hammers.

Asking whether NAN < 0 could arguably either return "unordered" (raise an
exception) or return False ("no, NAN is not less than zero; neither is it
greater than zero"). The PowerPC Macintishes back in the 1990s supported
both behaviours. But that's different to equality tests.


> The correct semantics for IEEE floating point look something like
> this:
>
> 1/0 INF
> INF + 1 INF
> INF - INF NaN
> INF == INF unordered

Wrong. Equality is not an order comparison.

--
Steven

Grant Edwards

未读,
2011年5月29日 10:41:132011/5/29
收件人
On 2011-05-29, Wolfgang Rohdewald <wolf...@rohdewald.de> wrote:
> On Sonntag 29 Mai 2011, Tim Delaney wrote:
>> There's a second part the mystery - sets and dictionaries (and
>> I think lists) assume that identify implies equality (hence
>> the second result). This was recently discussed on
>> python-dev, and the decision was to leave things as-is.
>
> On Sonntag 29 Mai 2011, Grant Edwards wrote:
>> Even if they are the same nan, it's still not equal to itself.
>
> if I understand this thread correctly, they are not equal to itself
> as specified by IEEE

And Python follows that convention.

> but Python treats them equal in sets and dictionaries for performance
> reasons

It treats them as identical (not sure if that's the right word). The
implementation is checking for ( A is B or A == B ). Presumably, the
assumpting being that all objects are equal to themselves. That
assumption is not true for NaN objects, so the buggy behavior is
observed.

--
Grant

MRAB

未读,
2011年5月29日 13:44:082011/5/29
收件人 pytho...@python.org
Would there be any advantage to making NaN a singleton? I'm thinking
that it could make checking for it cheaper in the implementation of
sets and dicts. Or making NaN unhashable?

Chris Angelico

未读,
2011年5月29日 13:50:152011/5/29
收件人 pytho...@python.org
On Mon, May 30, 2011 at 3:44 AM, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Would there be any advantage to making NaN a singleton? I'm thinking
> that it could make checking for it cheaper in the implementation of
> sets and dicts. Or making NaN unhashable?

Doesn't matter. It still wouldn't be equal to itself, even though it
'is' itself, which will greatly confuse anything that optimizes that
away. Numbers are well-behaved; NaN is not a number; NaN is not
well-behaved. It makes sense... in a way.

Chris Angelico

Christian Heimes

未读,
2011年5月29日 14:05:072011/5/29
收件人 pytho...@python.org
Am 29.05.2011 19:44, schrieb MRAB:
> Would there be any advantage to making NaN a singleton? I'm thinking
> that it could make checking for it cheaper in the implementation of
> sets and dicts. Or making NaN unhashable?

It can't be a singleton, because IEEE 754 specifies millions of millions
of different NaN values. There are positive and negative NaNs, quiet
NaNs and signaling NaNs. 50 of 52 mantissa bits can vary freely, one bit
makes the difference between signaling and quiet NaNs and at least one
bit must be non-zero.

Christian

Steven D'Aprano

未读,
2011年5月29日 14:27:082011/5/29
收件人
On Sun, 29 May 2011 18:44:08 +0100, MRAB wrote:

> Would there be any advantage to making NaN a singleton?

Absolutely not. That would be a step backwards.

NANs can carry payload (a code indicating what sort of NAN it represents
-- log(-1) and 1/INF are not the same). So although Python currently has
no easy way to access that payload (you can do it with the struct
module), it does exist and for serious work you would want to be able to
set and get it.


> I'm thinking
> that it could make checking for it cheaper in the implementation of sets
> and dicts.

I don't see how would it be cheaper, but even if it were, talk about a
micro-optimization! I'd really *love* to see the code where the time it
takes to insert a NAN in a set was the bottleneck!

> Or making NaN unhashable?

I could live with that, although I don't think it is necessary. What
actual problem are you hoping to solve here?

--
Steven

Steven D'Aprano

未读,
2011年5月29日 14:46:542011/5/29
收件人
On Sun, 29 May 2011 20:05:07 +0200, Christian Heimes wrote:

> Am 29.05.2011 19:44, schrieb MRAB:
>> Would there be any advantage to making NaN a singleton? I'm thinking
>> that it could make checking for it cheaper in the implementation of
>> sets and dicts. Or making NaN unhashable?
>
> It can't be a singleton, because IEEE 754 specifies millions of millions
> of different NaN values.

A million-millioneton then? *wink*


> There are positive and negative NaNs,

I've never quite understood that. NANs are unordered, and therefore
cannot be said to be larger than zero (positive) or less than zero
(negative). So even if a NAN has the sign bit set, surely the right way
to think about that is to treat the sign bit as part of the payload?

It seems to me that talking about signed NANs is inaccurate and adds
confusion. NANs cause enough confusion as it is, without adding to it...

(I would expect the copysign function to honour the sign bit, so I
suppose in that sense one might describe NANs as signed.)

--
Steven

Nobody

未读,
2011年5月29日 17:19:492011/5/29
收件人
On Sun, 29 May 2011 10:29:28 +0000, Steven D'Aprano wrote:

>> The correct answer to "nan == nan" is to raise an exception, because
>> you have asked a question for which the answer is nether True nor False.
>
> Wrong.

That's overstating it. There's a good argument to be made for raising an
exception. Bear in mind that an exception is not necessarily an error,
just an "exceptional" condition.

> The correct answer to "nan == nan" is False, they are not equal.

There is no correct answer to "nan == nan". Defining it to be false is
just the "least wrong" answer. Arguably, "nan != nan" should also be
false, but that would violate the invariant "(x != y) == !(x == y)".

Steven D'Aprano

未读,
2011年5月29日 19:31:192011/5/29
收件人
On Sun, 29 May 2011 22:19:49 +0100, Nobody wrote:

> On Sun, 29 May 2011 10:29:28 +0000, Steven D'Aprano wrote:
>
>>> The correct answer to "nan == nan" is to raise an exception,
>>> because
>>> you have asked a question for which the answer is nether True nor
>>> False.
>>
>> Wrong.
>
> That's overstating it. There's a good argument to be made for raising an
> exception.

If so, I've never heard it, and I cannot imagine what such a good
argument would be. Please give it.

(I can think of *bad* arguments, like "NANs confuse me and I don't
understand the reason for their existence, therefore I'll give them
behaviours that make no sense and aren't useful". But you did state there
is a *good* argument.)

> Bear in mind that an exception is not necessarily an error,
> just an "exceptional" condition.

True, but what's your point? Testing two floats for equality is not an
exceptional condition.


>> The correct answer to "nan == nan" is False, they are not equal.
>
> There is no correct answer to "nan == nan".

Why on earth not?


> Defining it to be false is just the "least wrong" answer.

So you say, but I think you are incorrect.


> Arguably, "nan != nan" should also be false,
> but that would violate the invariant "(x != y) == !(x == y)".

I cannot imagine what that argument would be. Please explain.

--
Steven

Chris Torek

未读,
2011年5月29日 20:02:022011/5/29
收件人
Incidentally, note:

$ python
...


>>> nan = float("nan")
>>> nan

nan
>>> nan is nan
True
>>> nan == nan
False

In article <4de1e3e7$0$2195$742e...@news.sonic.net>


John Nagle <na...@animats.com> wrote:
> The correct answer to "nan == nan" is to raise an exception, because
>you have asked a question for which the answer is nether True nor False.

Well, in some sense, the "correct answer" depends on which question
you *meant* to ask. :-) Seriously, some (many?) instruction sets
have two kinds of comparison instructions: one that raises an
exception here, and one that does not.

> The correct semantics for IEEE floating point look something like
>this:
>
> 1/0 INF
> INF + 1 INF
> INF - INF NaN
> INF == INF unordered
> NaN == NaN unordered
>
>INF and NaN both have comparison semantics which return
>"unordered". The FPU sets a bit for this, which most language
>implementations ignore.

Again, this depends on the implementation.

This is similar to (e.g.) the fact that on the MIPS, there are two
different integer add instructions ("addi" and "addiu"): one
raises an overflow exception, the other performs C "unsigned"
style arithmetic (where, e.g., 0xffffffff + 1 = 0, in 32 bits).

>Python should raise an exception on unordered comparisons.
>Given that the language handles integer overflow by going to
>arbitrary-precision integers, checking the FPU status bits is
>cheap.

I could go for that myself. But then you also need a "don't raise
exception but give me an equality test result" operator (for various
special-case purposes at least) too. Of course a simple "classify
this float as one of normal, subnormal, zero, infinity, or NaN"
operator would suffice here (along with the usual "extract sign"
and "differentiate between quiet and signalling NaN" operations).
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: gmail (figure it out) http://web.torek.net/torek/index.html

Carl Banks

未读,
2011年5月29日 20:55:222011/5/29
收件人
On Sunday, May 29, 2011 4:31:19 PM UTC-7, Steven D&#39;Aprano wrote:
> On Sun, 29 May 2011 22:19:49 +0100, Nobody wrote:
>
> > On Sun, 29 May 2011 10:29:28 +0000, Steven D'Aprano wrote:
> >
> >>> The correct answer to "nan == nan" is to raise an exception,
> >>> because
> >>> you have asked a question for which the answer is nether True nor
> >>> False.
> >>
> >> Wrong.
> >
> > That's overstating it. There's a good argument to be made for raising an
> > exception.
>
> If so, I've never heard it, and I cannot imagine what such a good
> argument would be. Please give it.

Floating point arithmetic evolved more or less on languages like Fortran where things like exceptions were unheard of, and defining NaN != NaN was a bad trick they chose for testing against NaN for lack of a better way.

If exceptions had commonly existed in that environment there's no chance they would have chosen that behavior; comparison against NaN (or any operation with NaN) would have signaled a floating point exception. That is the correct way to handle exceptional conditions.

The only reason to keep NaN's current behavior is to adhere to IEEE, but given that Python has trailblazed a path of correcting arcane mathematical behavior, I definitely see an argument that Python should do the same for NaN, and if it were done Python would be a better language.


Carl Banks

Carl Banks

未读,
2011年5月29日 21:07:502011/5/29
收件人
On Sunday, May 29, 2011 7:41:13 AM UTC-7, Grant Edwards wrote:
> It treats them as identical (not sure if that's the right word). The
> implementation is checking for ( A is B or A == B ). Presumably, the
> assumpting being that all objects are equal to themselves. That
> assumption is not true for NaN objects, so the buggy behavior is
> observed.

Python makes this assumption in lots of common situations (apparently in an implementation-defined manner):

>>> nan = float("nan")

>>> nan == nan
False
>>> [nan] == [nan]
True

Therefore, I'd recommend never to rely on NaN != NaN except in casual throwaway code. It's too easy to forget that it will stop working when you throw an item into a list or tuple. There's a function, math.isnan(), that should be the One Obvious Way to test for NaN. NaN should also never be used as a dictionary key or in a set (of course).

If it weren't for compatibility with IEEE, there would be no sane argument that defining an object that is not equal to itself isn't a bug. But because there's a lot of code out there that depends on NaN != NaN, Python has to tolerate it.


Carl Banks

Chris Angelico

未读,
2011年5月29日 21:14:582011/5/29
收件人 pytho...@python.org
On Mon, May 30, 2011 at 10:55 AM, Carl Banks <pavlove...@gmail.com> wrote:
> If exceptions had commonly existed in that environment there's no chance they would have chosen that behavior; comparison against NaN (or any operation with NaN) would have signaled a floating point exception.  That is the correct way to handle exceptional conditions.
>
> The only reason to keep NaN's current behavior is to adhere to IEEE, but given that Python has trailblazed a path of correcting arcane mathematical behavior, I definitely see an argument that Python should do the same for NaN, and if it were done Python would be a better language.

If you're going to change behaviour, why have a floating point value
called "nan" at all? Other than being a title for one's grandmother,
what meaning does that string have, and why should it be able to be
cast as floating point?

Lifting from http://en.wikipedia.org/wiki/NaN a list of things that
can return a NaN (I've removed non-ASCII characters from this
snippet):
* Operations with a NaN as at least one operand.
(you need to bootstrap that somehow, so we can ignore this - it just
means that nan+1 = nan)

* The divisions 0/0 and infinity/infinity
* The multiplications 0*infinity and infinity*0
* The additions +inf + (-inf), (-inf) + +inf and equivalent subtractions
* The standard pow function and the integer exponent pown function
define 0**0, 1**inf, and inf**0 as 1.
* The powr function define all three indeterminate forms as invalid
operations and so returns NaN.
* The square root of a negative number.
* The logarithm of a negative number
* The inverse sine or cosine of a number that is less than -1 or
greater than +1.

Rather than having comparisons with NaN trigger exceptions, wouldn't
it be much cleaner to have all these operations trigger exceptions?
And, I would guess that they probably already do.

NaN has an additional use in that it can be used like a "null
pointer"; a floating-point variable can store 1.0, or 0.000000000005,
or "no there's no value that I'm storing in this variable". Since a
Python variable can contain None instead of a float, this use is
unnecessary too.

So, apart from float("nan"), are there actually any places where real
production code has to handle NaN? I was unable to get a nan by any of
the above methods, except for operations involving inf; for instance,
float("inf")-float("inf") == nan. All the others raised an exception
rather than return nan.

Chris Angelico

Carl Banks

未读,
2011年5月29日 22:17:012011/5/29
收件人
On Sunday, May 29, 2011 6:14:58 PM UTC-7, Chris Angelico wrote:
> On Mon, May 30, 2011 at 10:55 AM, Carl Banks
> wrote:
> > If exceptions had commonly existed in that environment there's no chance they would have chosen that behavior; comparison against NaN (or any operation with NaN) would have signaled a floating point exception.  That is the correct way to handle exceptional conditions.
> >
> > The only reason to keep NaN's current behavior is to adhere to IEEE,
> > but given that Python has trailblazed a path of correcting arcane
> > mathematical behavior, I definitely see an argument that Python
> > should do the same for NaN, and if it were done Python would be a
> > better language.
>
> If you're going to change behaviour, why have a floating point value
> called "nan" at all?

If I were designing a new floating-point standard for hardware, I would consider getting rid of NaN. However, with the floating point standard that exists, that almost all floating point hardware mostly conforms to, there are certain bit pattern that mean NaN.

Python could refuse to construct float() objects out of NaN (I doubt it would even be a major performance penalty), but there's reasons why you wouldn't, the main one being to interface with other code that does use NaN. It's better, then, to recognize the NaN bit patterns and do something reasonable when trying to operate on it.


Carl Banks

Chris Angelico

未读,
2011年5月29日 22:53:592011/5/29
收件人 pytho...@python.org
On Mon, May 30, 2011 at 12:17 PM, Carl Banks <pavlove...@gmail.com> wrote:
> If I were designing a new floating-point standard for hardware, I would consider getting rid of NaN.  However, with the floating point standard that exists, that almost all floating point hardware mostly conforms to, there are certain bit pattern that mean NaN.
>
> Python could refuse to construct float() objects out of NaN (I doubt it would even be a major performance penalty), but there's reasons why you wouldn't, the main one being to interface with other code that does use NaN.  It's better, then, to recognize the NaN bit patterns and do something reasonable when trying to operate on it.

Okay, here's a question. The Python 'float' value - is it meant to be
"a Python representation of an IEEE double-precision floating point
value", or "a Python representation of a real number"? For the most
part, Python's data types are defined by their abstract concepts - a
list isn't defined as a linked list of pointers, it's defined as an
ordered collection of objects. Python 3 removes the distinction
between 'int' and 'long', where 'int' is <2**32 and 'long' isn't, so
now a Py3 integer is... any integer.

The sys.float_info struct exposes details of floating point
representation. In theory, a Python implementation that uses bignum
floats could quite happily set all those values to extremes and work
with enormous precision (or could use a REXX-style "numeric digits
100" command to change the internal rounding, and automatically update
sys.float_info). And in that case, there would be no NaN value.

If Python is interfacing with some other code that uses NaN, that code
won't be using Python 'float' objects - it'll be using IEEE binary
format, probably. So all it would need to entail is a special return
value from an IEEE Binary to Python Float converter function (maybe
have it return None), and NaN is no longer a part of Python.

The real question is: Would NaN's removal be beneficial? And if so,
would it be worth the effort?

Chris Angelico

rusi

未读,
2011年5月29日 23:08:352011/5/29
收件人
On May 30, 7:53 am, Chris Angelico <ros...@gmail.com> wrote:

nan in floating point is like null in databases
It may be worthwhile to have a look at what choices SQL has made
http://en.wikipedia.org/wiki/Null_%28SQL%29

已删除帖子

Steven D'Aprano

未读,
2011年5月29日 23:59:492011/5/29
收件人
On Sun, 29 May 2011 17:55:22 -0700, Carl Banks wrote:

> Floating point arithmetic evolved more or less on languages like Fortran
> where things like exceptions were unheard of,

I'm afraid that you are completely mistaken.

Fortran IV had support for floating point traps, which are "things like
exceptions". That's as far back as 1966. I'd be shocked if earlier
Fortrans didn't also have support for traps.

http://www.bitsavers.org/pdf/ibm/7040/C28-6806-1_7040ftnMathSubrs.pdf


The IEEE standard specifies that you should be able to control whether a
calculation traps or returns a NAN. That's how Decimal does it, that's
how Apple's (sadly long abandoned) SANE did it, and floats should do the
same thing.

Serious numeric languages like Fortran have supported traps long before
exceptions were invented.

> and defining NaN != NaN
> was a bad trick they chose for testing against NaN for lack of a better
> way.

That's also completely wrong. The correct way to test for a NAN is with
the IEEE-mandated function isnan(). The NAN != NAN trick is exactly that,
a trick, used by programmers when their language or compiler doesn't
support isnan().

Without support for isinf(), identifying an INF is just as hard as
identifying an NAN, and yet their behaviour under equality is the
complete opposite:

>>> inf = float('inf')
>>> inf == inf
True


> If exceptions had commonly existed in that environment there's no chance
> they would have chosen that behavior;

They did exist, and they did choose that behaviour.


> comparison against NaN (or any
> operation with NaN) would have signaled a floating point exception.
> That is the correct way to handle exceptional conditions.
>
> The only reason to keep NaN's current behavior is to adhere to IEEE,

And because the IEEE behaviour is far more useful than the misguided
reliance on exceptions for things which are not exceptional.

Before spreading any more misinformation about IEEE 754 and NANs, please
learn something about it:

http://grouper.ieee.org/groups/754/
http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps

I particularly draw your attention to the FAQ about NANs:

http://grouper.ieee.org/groups/754/faq.html#exceptions


[quote]
The 754 model encourages robust programs. It is intended not only for
numerical analysts but also for spreadsheet users, database systems, or
even coffee pots. The propagation rules for NaNs and infinities allow
inconsequential exceptions to vanish. Similarly, gradual underflow
maintains error properties over a precision's range.

When exceptional situations need attention, they can be examined
immediately via traps or at a convenient time via status flags. Traps can
be used to stop a program, but unrecoverable situations are extremely
rare. Simply stopping a program is not an option for embedded systems or
network agents. More often, traps log diagnostic information or
substitute valid results.

Flags offer both predictable control flow and speed. Their use requires
the programmer be aware of exceptional conditions, but flag stickiness
allows programmers to delay handling exceptional conditions until
necessary.
[end quote]

--
Steven

Steven D'Aprano

未读,
2011年5月30日 00:15:112011/5/30
收件人
On Mon, 30 May 2011 11:14:58 +1000, Chris Angelico wrote:

> So, apart from float("nan"), are there actually any places where real
> production code has to handle NaN? I was unable to get a nan by any of
> the above methods, except for operations involving inf; for instance,
> float("inf")-float("inf") == nan. All the others raised an exception
> rather than return nan.

That's Python's poor design, due to reliance on C floating point
libraries that have half-hearted support for IEEE-754, and the
obstruction of people who don't understand the usefulness of NANs. They
shouldn't raise unless the caller specifies that he wants exceptions. The
default behaviour should be the most useful one, namely quiet
(propagating) NANs, rather than halting the calculation because of
something which may or may not be an error and may or may not be
recoverable.

Even Apple's Hypertalk supported them better in the late 1980s than
Python does now, and that was a language aimed at non-programmers!

The Decimal module is a good example of what floats should do. All flags
are supported, so you can choose whether you want exceptions or NANs. I
don't like Decimal's default settings, but at least they can be changed.

--
Steven

Steven D'Aprano

未读,
2011年5月30日 00:22:252011/5/30
收件人
On Mon, 30 May 2011 12:53:59 +1000, Chris Angelico wrote:

> Okay, here's a question. The Python 'float' value - is it meant to be "a
> Python representation of an IEEE double-precision floating point value",

Yes.

> or "a Python representation of a real number"?

No.

Floats are not real numbers. Many fundamental properties of the reals are
violated by floats, and I'm not talking about "weird" things like NANs
and INFs, but ordinary numbers:

>>> a, b = 0.1, 0.7
>>> a + b - b == a
False

> For the most part,
> Python's data types are defined by their abstract concepts - a list
> isn't defined as a linked list of pointers,

Nor is it implemented as a linked list of pointers.


> The sys.float_info struct exposes details of floating point
> representation. In theory, a Python implementation that uses bignum
> floats could quite happily set all those values to extremes and work
> with enormous precision (or could use a REXX-style "numeric digits 100"
> command to change the internal rounding, and automatically update
> sys.float_info). And in that case, there would be no NaN value.

NANs aren't for overflow, that's what INFs are for. Even if you had
infinite precision floats and could get rid of INFs, you would still need
NANs.


> The real question is: Would NaN's removal be beneficial?

No, it would be another step backwards to the bad old days before the
IEEE standard.


--
Steven

John Nagle

未读,
2011年5月30日 00:25:042011/5/30
收件人
On 5/29/2011 9:15 PM, Steven D'Aprano wrote:
> On Mon, 30 May 2011 11:14:58 +1000, Chris Angelico wrote:
>
>> So, apart from float("nan"), are there actually any places where real
>> production code has to handle NaN?

Yes. I used to write dynamic simulation engines. There were
situations that produced floating point overflow, leading to NaN
values. This wasn't an error; it just meant that the timestep
had to be reduced to handle some heavy object near the moment of
first collision.

Note that the difference between two INF values is a NaN.

It's important that ordered comparisons involving NaN and INF
raise exceptions so that you don't lose an invalid value. If
you're running with non-signaling NaNs, the idea is supposed to
be that, at the end of the computation, you check all your results
for INF and NaN values, to make sure you didn't overflow somewhere
during the computation. If, within the computation, there are
branches based on ordered comparisons, and those don't raise an
exception when the comparison result is unordered, you can reach
the end of the computation with valid-looking but wrong values.

John Nagle

Chris Torek

未读,
2011年5月30日 00:29:192011/5/30
收件人
In article <4de31635$0$29990$c3e8da3$5496...@news.astraweb.com>,

Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>That's also completely wrong. The correct way to test for a NAN is with
>the IEEE-mandated function isnan(). The NAN != NAN trick is exactly that,
>a trick, used by programmers when their language or compiler doesn't
>support isnan().

Perhaps it would be reasonable to be able to do:

x.isnan()

when x is a float.

>Without support for isinf(), identifying an INF is just as hard as
>identifying an NAN, and yet their behaviour under equality is the
>complete opposite:
>
>>>> inf = float('inf')
>>>> inf == inf
>True

Fortunately:

def isnan(x):
return x != x
_inf = float("inf")
def isinf(x):
return x == _inf
del _inf

both do the trick here.

I would like to have both modes (non-exception-ing and exception-ing)
of IEEE-style float available in Python, and am not too picky about
how they would be implemented or which one would be the default.
Python could also paper over the brokenness of various actual
implementations (where signalling vs quiet NaNs, and so on, do not
quite work right in all cases), with some performance penalty on
non-conformant hardware.

Raymond Hettinger

未读,
2011年5月30日 00:49:142011/5/30
收件人
On May 28, 4:41 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
> Here's a curiosity. float("nan") can occur multiple times in a set or as

> a key in a dict:

Which is by design.

NaNs intentionally have multiple possible instances (some
implementations even include distinct payload values).

Sets and dicts intentionally recognize an instance as being equal to
itself (identity-implies-equality); otherwise, you could put a NaN in
a set/dict but not be able to retrieve it. Basic invariants would
fail -- such as: assert all(elem in container for elem in container).

The interesting thing is that people experimenting with exotic objects
(things with random hash functions, things with unusual equality or
ordering relations, etc) are "surprised" when those objects display
their exotic behaviors.

To me, the "NaN curiousities" are among the least interesting. It's
more fun to break sort algorithms with sets (which override the
ordering relations with subset/superset relations) or with an object
that mutates a list during the sort. Now, that is curious :-)

Also, Dr Mertz wrote a Charming Python article full of these
curiosities:
http://gnosis.cx/publish/programming/charming_python_b25.txt

IMO, equality and ordering are somewhat fundamental concepts. If a
class is written that twists those concepts around a bit, then it
should be no surprise if curious behavior emerges. Heck, I would
venture to guess that something as simple as assuming the speed of
light is constant might yield twin paradoxes and other
curiousities ;-)

Raymond

Chris Torek

未读,
2011年5月30日 01:53:332011/5/30
收件人
In article <irv6e...@news1.newsguy.com> I wrote, in part:

> _inf = float("inf")
> def isinf(x):
> return x == _inf
> del _inf

Oops, take out the del, or otherwise fix the obvious problem,
e.g., perhaps:

def isinf(x):
return x == isinf._inf
isinf._inf = float("inf")

(Of course, if something like this were adopted properly, it would
all be in the base "float" type anyway.)

Steven D'Aprano

未读,
2011年5月30日 02:13:312011/5/30
收件人
On Mon, 30 May 2011 04:29:19 +0000, Chris Torek wrote:

> In article <4de31635$0$29990$c3e8da3$5496...@news.astraweb.com>, Steven
> D'Aprano <steve+comp....@pearwood.info> wrote:
>>That's also completely wrong. The correct way to test for a NAN is with
>>the IEEE-mandated function isnan(). The NAN != NAN trick is exactly
>>that, a trick, used by programmers when their language or compiler
>>doesn't support isnan().
>
> Perhaps it would be reasonable to be able to do:
>
> x.isnan()
>
> when x is a float.

Better than a float method is a function which takes any number as
argument:


>>> import math, fractions, decimal
>>> math.isnan(fractions.Fraction(2, 3))
False
>>> math.isnan(decimal.Decimal('nan'))
True


You can even handle complex NANs with the cmath module:

>>> import cmath
>>> cmath.isnan(complex(1, float('nan')))
True


--
Steven

Steven D'Aprano

未读,
2011年5月30日 02:14:532011/5/30
收件人
On Mon, 30 May 2011 04:15:11 +0000, Steven D'Aprano wrote:

> On Mon, 30 May 2011 11:14:58 +1000, Chris Angelico wrote:
>
>> So, apart from float("nan"), are there actually any places where real
>> production code has to handle NaN? I was unable to get a nan by any of
>> the above methods, except for operations involving inf; for instance,
>> float("inf")-float("inf") == nan. All the others raised an exception
>> rather than return nan.
>
> That's Python's poor design, due to reliance on C floating point
> libraries that have half-hearted support for IEEE-754, and the
> obstruction of people who don't understand the usefulness of NANs.

That last comment mine is a bit harsh, and I'd like to withdraw it as
unnecessarily confrontational.

--
Steven

Chris Torek

未读,
2011年5月30日 15:58:352011/5/30
收件人
In article <4de3358b$0$29990$c3e8da3$5496...@news.astraweb.com>

Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>Better than a float method is a function which takes any number as
>argument:
>
>>>> import math, fractions, decimal
>>>> math.isnan(fractions.Fraction(2, 3))
>False
>>>> math.isnan(decimal.Decimal('nan'))
>True

Ah, apparently someone's been using Larry Wall's time machine. :-)

I should have looked at documentation. In my case, though:

$ python
Python 2.5.1 (r251:54863, Dec 16 2010, 14:12:43)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> math.isnan
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'isnan'

>You can even handle complex NANs with the cmath module:
>
>>>> import cmath
>>>> cmath.isnan(complex(1, float('nan')))
>True

Would it be appropriate to have isnan() methods for Fraction,
Decimal, and complex, so that you do not need to worry about whether
to use math.isnan() vs cmath.isnan()? (I almost never work with
complex numbers so am not sure if the "or" behavior -- cmath.isinf
and cmath.isnan return true if either real or complex part are
Infinity or NaN respectively -- is appropriate in algorithms that
might be working on any of these types of numbers.)

It might also be appropriate to have trivial always-False isinf and
isnan methods for integers.

Steven D'Aprano

未读,
2011年5月30日 19:22:012011/5/30
收件人
On Mon, 30 May 2011 19:58:35 +0000, Chris Torek wrote:

> In article <4de3358b$0$29990$c3e8da3$5496...@news.astraweb.com> Steven
> D'Aprano <steve+comp....@pearwood.info> wrote:
>>Better than a float method is a function which takes any number as
>>argument:
>>
>>>>> import math, fractions, decimal
>>>>> math.isnan(fractions.Fraction(2, 3))
>>False
>>>>> math.isnan(decimal.Decimal('nan'))
>>True
>
> Ah, apparently someone's been using Larry Wall's time machine. :-)

He has one too? Just like Guido van Rossum.


> I should have looked at documentation. In my case, though:
>
> $ python
> Python 2.5.1 (r251:54863, Dec 16 2010, 14:12:43) [GCC 4.0.1 (Apple
> Inc. build 5465)] on darwin Type "help", "copyright", "credits" or
> "license" for more information.


Python 2.5 is two major releases behind! I feel your pain, though,
because 2.5 is the system python on my desktop as well. (And 2.4 is the
system python on my server, ouch!)

You should consider installing 2.7 and/or 3.2 in parallel with the system
python.


> Would it be appropriate to have isnan() methods for Fraction, Decimal,
> and complex, so that you do not need to worry about whether to use
> math.isnan() vs cmath.isnan()?

Probably not. From a purely object-oriented, Java-esque viewpoint, yes,
all number types should support isnan and isinf methods, but Python uses
a more mixed style, and a function that accepts multiple types is
appropriate.

Unless you're using complex numbers, you don't need to care about complex
numbers. *wink* Hence for "normal" numeric use, stick to the math module.
If you do need complex numbers, cmath.isnan works perfectly fine with non-
complex arguments:

>>> cmath.isnan(42)
False
>>> cmath.isnan(float('nan'))
True


--
Steven

Carl Banks

未读,
2011年5月31日 22:45:012011/5/31
收件人
On Sunday, May 29, 2011 8:59:49 PM UTC-7, Steven D&#39;Aprano wrote:
> On Sun, 29 May 2011 17:55:22 -0700, Carl Banks wrote:
>
> > Floating point arithmetic evolved more or less on languages like Fortran
> > where things like exceptions were unheard of,
>
> I'm afraid that you are completely mistaken.
>
> Fortran IV had support for floating point traps, which are "things like
> exceptions". That's as far back as 1966. I'd be shocked if earlier
> Fortrans didn't also have support for traps.
>
> http://www.bitsavers.org/pdf/ibm/7040/C28-6806-1_7040ftnMathSubrs.pdf

Fine, it wasn't "unheard of". I'm pretty sure the existence of a few high end compiler/hardware combinations that supported traps doesn't invalidate my basic point. NaN was needed because few systems had a separate path to deal with exceptional situations like producing or operating on something that isn't a number. When they did exist few programmers used them. If floating-point were standardized today it might not even have NaN (and definitely wouldn't support the ridiculous NaN != NaN), because all modern systems can be expected to support exceptions, and modern programmers can be expected to use them.


> The IEEE standard specifies that you should be able to control whether a
> calculation traps or returns a NAN. That's how Decimal does it, that's
> how Apple's (sadly long abandoned) SANE did it, and floats should do the
> same thing.

If your aim is to support every last clause of IEEE for better or worse, then yes that's what Python should do. If your aim is to make Python the best language it can be, then Python should reject IEEE's obsolete notions, and throw exceptions when operating on NaN.


Carl Banks

Carl Banks

未读,
2011年5月31日 22:59:152011/5/31
收件人
On Sunday, May 29, 2011 7:53:59 PM UTC-7, Chris Angelico wrote:
> Okay, here's a question. The Python 'float' value - is it meant to be
> "a Python representation of an IEEE double-precision floating point
> value", or "a Python representation of a real number"?

The former. Unlike the case with integers, there is no way that I know of to represent an abstract real number on a digital computer.

Python also includes several IEEE-defined operations in its library (math.isnan, math.frexp).


Carl Banks

Chris Angelico

未读,
2011年5月31日 23:05:432011/5/31
收件人 pytho...@python.org
On Wed, Jun 1, 2011 at 12:59 PM, Carl Banks <pavlove...@gmail.com> wrote:
> On Sunday, May 29, 2011 7:53:59 PM UTC-7, Chris Angelico wrote:
>> Okay, here's a question. The Python 'float' value - is it meant to be
>> "a Python representation of an IEEE double-precision floating point
>> value", or "a Python representation of a real number"?
>
> The former.  Unlike the case with integers, there is no way that I know of to represent an abstract real number on a digital computer.

This seems peculiar. Normally Python seeks to define its data types in
the abstract and then leave the concrete up to the various
implementations - note, for instance, how Python 3 has dispensed with
'int' vs 'long' and just made a single 'int' type that can hold any
integer. Does this mean that an implementation of Python on hardware
that has some other type of floating point must simulate IEEE
double-precision in all its nuances?

I'm glad I don't often need floating point numbers. They can be so annoying!

Chris Angelico

Carl Banks

未读,
2011年5月31日 23:30:302011/5/31
收件人 pytho...@python.org
On Tuesday, May 31, 2011 8:05:43 PM UTC-7, Chris Angelico wrote:
> On Wed, Jun 1, 2011 at 12:59 PM, Carl Banks
> wrote:
> > On Sunday, May 29, 2011 7:53:59 PM UTC-7, Chris Angelico wrote:
> >> Okay, here's a question. The Python 'float' value - is it meant to be
> >> "a Python representation of an IEEE double-precision floating point
> >> value", or "a Python representation of a real number"?
> >
> > The former.  Unlike the case with integers, there is no way that I know of to represent an abstract real number on a digital computer.
>
> This seems peculiar. Normally Python seeks to define its data types in
> the abstract and then leave the concrete up to the various
> implementations - note, for instance, how Python 3 has dispensed with
> 'int' vs 'long' and just made a single 'int' type that can hold any
> integer. Does this mean that an implementation of Python on hardware
> that has some other type of floating point must simulate IEEE
> double-precision in all its nuances?

I think you misunderstood what I was saying.

It's not *possible* to represent a real number abstractly in any digital computer. Python couldn't have an "abstract real number" type even it wanted to.

(Math aside: Real numbers are not countable, meaning they cannot be put into one-to-one correspondence with integers. A digital computer can only represent countable things exactly, for obvious reasons; therefore, to model non-countable things like real numbers, one must use a countable approximation like floating-point.)

You might be able to get away with saying float() merely represents an "abstract floating-point number with provisions for nan and inf", but pretty much everyone uses IEEE format, so what's the point? And no it doesn't mean Python has to support every nuance (and it doesn't).


Carl Banks

已删除帖子

rusi

未读,
2011年5月31日 23:33:192011/5/31
收件人

Why can python not have an fpu object (class?) where one can go and
turn on/off the button that makes nan signalling?

In short, cant we have the cake and eat it too?

Roy Smith

未读,
2011年5月31日 23:43:092011/5/31
收件人
In article
Carl Banks <pavlove...@gmail.com> wrote:

> pretty much everyone uses IEEE format

Is there *any* hardware in use today which supports floating point using
a format other than IEEE?

Chris Angelico

未读,
2011年5月31日 23:57:572011/5/31
收件人 pytho...@python.org
On Wed, Jun 1, 2011 at 1:30 PM, Carl Banks <pavlove...@gmail.com> wrote:
> I think you misunderstood what I was saying.
>
> It's not *possible* to represent a real number abstractly in any digital computer.  Python couldn't have an "abstract real number" type even it wanted to.

True, but why should the "non-integer number" type be floating point
rather than (say) rational? Actually, IEEE floating point could mostly
be implemented in a two-int rationals system (where the 'int' is
arbitrary precision, so it'd be Python 2's 'long' rather than its
'int'); in a sense, the mantissa is the numerator, and the scale
defines the denominator (which will always be a power of 2). Yes,
there are very good reasons for going with the current system. But are
those reasons part of the details of implementation, or are they part
of the definition of the data type?

> (Math aside: Real numbers are not countable, meaning they cannot be put into one-to-one correspondence with integers.  A digital computer can only represent countable things exactly, for obvious reasons; therefore, to model non-countable things like real numbers, one must use a countable approximation like floating-point.)

Right. Obviously a true 'real number' representation can't be done.
But there are multiple plausible approximations thereof (the best
being rationals).

Not asking for Python to be changed, just wondering why it's defined
by what looks like an implementation detail. It's like defining that a
'character' is an 8-bit number using the ASCII system, which then
becomes problematic with Unicode. (Ohai, C, didn't notice you standing
there.)

Chris Angelico

Ben Finney

未读,
2011年6月1日 01:18:402011/6/1
收件人
Chris Angelico <ros...@gmail.com> writes:

> Right. Obviously a true 'real number' representation can't be done.
> But there are multiple plausible approximations thereof (the best
> being rationals).

Sure. But most of those are not what is most commonly meant by ‘float’
type.

> Not asking for Python to be changed, just wondering why it's defined
> by what looks like an implementation detail.

Because, in the case of the ‘float’ type, the agreed-upon meaning of
that type – in Python as in just about every other language that is
well-specified – is “an IEEE float as per the IEEE 754 spec”.

A foolish consistency to the spec would be a hobgoblin for little minds.
But, given that a ‘float’ type which deviated from that spec would just
be inviting all sorts of other confusion, it's not a foolish
consistency.

> It's like defining that a 'character' is an 8-bit number using the
> ASCII system, which then becomes problematic with Unicode.

Right. That's why in Python 3 the Unicode text type is called ‘unicode’,
the IEEE float type is called ‘float’, and the byte string type is
called ‘bytes’.

It's also why the ‘str’ type in Python 2 was painful enough to need
changing: it didn't clearly stick to a specification, but tried to
straddle the worlds between one specification (a text type) and an
incompatible other specification (a bytes sequence type).

Where there is a clearly-defined widely-agreed specification for a type,
it's a good idea to stick to that specification when claiming to
implement that functionality in a type.

--
\ “The man who is denied the opportunity of taking decisions of |
`\ importance begins to regard as important the decisions he is |
_o__) allowed to take.” —C. Northcote Parkinson |
Ben Finney

Carl Banks

未读,
2011年6月1日 02:09:362011/6/1
收件人 comp.lan...@googlegroups.com、pytho...@python.org
On Tuesday, May 31, 2011 8:57:57 PM UTC-7, Chris Angelico wrote:
> On Wed, Jun 1, 2011 at 1:30 PM, Carl Banks
> wrote:
> > I think you misunderstood what I was saying.
> >
> > It's not *possible* to represent a real number abstractly in any digital computer.  Python couldn't have an "abstract real number" type even it wanted to.
>
> True, but why should the "non-integer number" type be floating point
> rather than (say) rational?

Python has several non-integer number types in the standard library. The one we are talking about is called float. If the type we were talking about had instead been called real, then your question might make some sense. But the fact that it's called float really does imply that that underlying representation is floating point.


> Actually, IEEE floating point could mostly
> be implemented in a two-int rationals system (where the 'int' is
> arbitrary precision, so it'd be Python 2's 'long' rather than its
> 'int'); in a sense, the mantissa is the numerator, and the scale
> defines the denominator (which will always be a power of 2). Yes,
> there are very good reasons for going with the current system. But are
> those reasons part of the details of implementation, or are they part
> of the definition of the data type?

Once again, Python float is an IEEE double-precision floating point number. This is part of the language; it is not an implementation detail. As I mentioned elsewhere, the Python library establishes this as part of the language because it includes several functions that operate on IEEE numbers.

And, by the way, the types you're comparing it to aren't as abstract as you say they are. Python's int type is required to have a two's-compliment binary representation and support bitwise operations.


> > (Math aside: Real numbers are not countable, meaning they
> > cannot be put into one-to-one correspondence with integers.
> >  A digital computer can only represent countable things
> > exactly, for obvious reasons; therefore, to model
> > non-countable things like real numbers, one must use a
> > countable approximation like floating-point.)
>
> Right. Obviously a true 'real number' representation can't be done.
> But there are multiple plausible approximations thereof (the best
> being rationals).

That's a different question. I don't care to discuss it, except to say that your default real-number type would have to be called something other than float, if it were not a floating point.


> Not asking for Python to be changed, just wondering why it's defined
> by what looks like an implementation detail. It's like defining that a
> 'character' is an 8-bit number using the ASCII system, which then
> becomes problematic with Unicode.

It really isn't. Unlike with characters (which are trivially extensible to larger character sets, just add more bytes), different real number approximations differ in details too important to be left to the implementation.

For instance, say you are using an implementation that uses floating point, and you define a function that uses Newton's method to find a square root:

def square_root(N,x=None):
if x is None:
x = N/2
for i in range(100):
x = (x + N/x)/2
return x

It works pretty well on your floating-point implementation. Now try running it on an implementation that uses fractions by default....

(Seriously, try running this function with N as a Fraction.)

So I'm going to opine that the representation does not seem like an implementation detail.


Carl Banks

已删除帖子

Jerry Hill

未读,
2011年6月1日 09:44:362011/6/1
收件人 pytho...@python.org
> On Wed, Jun 1, 2011 at 1:30 PM, Carl Banks <pavlove...@gmail.com> wrote:
> True, but why should the "non-integer number" type be floating point
> rather than (say) rational?

You seem to be implying that python only provides a single non-integer
numeric type. That's not true. Python ships with a bunch of
different numeric types, including a rational type. Off the top of my
head, we have:

IEEE floating point numbers
(http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex)
Rationals (http://docs.python.org/library/fractions.html)
Base-10 fixed and floating point numbers
(http://docs.python.org/library/decimal.html)
Complex numbers
(http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
plus http://docs.python.org/library/cmath.html)
Integers (both ints and longs, which are pretty well unified by now)

Floats have far and away the best performance in most common
situations, so they end up being the default, but if you want to use
something different, it's usually not hard to do.

--
Jerry

Grant Edwards

未读,
2011年6月1日 10:03:142011/6/1
收件人
On 2011-06-01, Chris Angelico <ros...@gmail.com> wrote:
> On Wed, Jun 1, 2011 at 12:59 PM, Carl Banks <pavlove...@gmail.com> wrote:
>> On Sunday, May 29, 2011 7:53:59 PM UTC-7, Chris Angelico wrote:
>>> Okay, here's a question. The Python 'float' value - is it meant to be
>>> "a Python representation of an IEEE double-precision floating point
>>> value", or "a Python representation of a real number"?
>>
>> The former. ?Unlike the case with integers, there is no way that I know of to represent an abstract real number on a digital computer.

>
> This seems peculiar. Normally Python seeks to define its data types
> in the abstract and then leave the concrete up to the various
> implementations - note,

But, "real numbers" and "IEEE float" are so different that I don't
think that it would be a wise decision for people to pretend they're
working with real numbers when in fact they are working with IEEE
floats.

> for instance, how Python 3 has dispensed with 'int' vs 'long' and
> just made a single 'int' type that can hold any integer.

Those concepts are much closer than "real numbers" and "IEEE floats".

> Does this mean that an implementation of Python on hardware that has
> some other type of floating point must simulate IEEE double-precision
> in all its nuances?

I certainly hope so. I depend on things like propogation of
non-signalling nans, the behavior of infinities, etc.

> I'm glad I don't often need floating point numbers. They can be so
> annoying!

They can be -- especially if one pretends one is working with real
numbers instead of fixed-length binary floating point numbers. Like
any tool, floating point has to be used properly. Screwdrivers make
very annoying hammers.

--
Grant Edwards grant.b.edwards Yow! How's it going in
at those MODULAR LOVE UNITS??
gmail.com

Grant Edwards

未读,
2011年6月1日 10:04:482011/6/1
收件人

Well, there are probably still some VAXes around in odd corners...

--
Grant Edwards grant.b.edwards Yow! Thank god!! ... It's
at HENNY YOUNGMAN!!
gmail.com

Chris Angelico

未读,
2011年6月1日 12:12:332011/6/1
收件人 pytho...@python.org
On Wed, Jun 1, 2011 at 11:44 PM, Jerry Hill <malac...@gmail.com> wrote:
>> On Wed, Jun 1, 2011 at 1:30 PM, Carl Banks <pavlove...@gmail.com> wrote:
>> True, but why should the "non-integer number" type be floating point
>> rather than (say) rational?

Careful with the attributions, Carl was quoting me when he posted that :)

> You seem to be implying that python only provides a single non-integer
> numeric type.  That's not true.  Python ships with a bunch of
> different numeric types, including a rational type.  Off the top of my
> head, we have:
>
> IEEE floating point numbers
> (http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex)
> Rationals (http://docs.python.org/library/fractions.html)
> Base-10 fixed and floating point numbers
> (http://docs.python.org/library/decimal.html)
> Complex numbers
> (http://docs.python.org/library/stdtypes.html#numeric-types-int-float-long-complex
> plus http://docs.python.org/library/cmath.html)
> Integers (both ints and longs, which are pretty well unified by now)

I know Python does support all of the above. Leave off int/long and
complex, which are obviously not trying to store real numbers
(although I guess you could conceivably make 'complex' the vehicle for
reals too), there's three: float, fraction, decimal. Of them, one is a
built-in type and the other two are imported modules. Hence my
question about why this one and not that one should be the "default"
that people will naturally turn to as soon as they need non-integers.
(Or, phrasing it another way: Only one of them is the type that "3.2"
in your source code will be represented as.)

> Floats have far and away the best performance in most common
> situations, so they end up being the default, but if you want to use
> something different, it's usually not hard to do.

And that, right there, is the answer.

ChrisA

OKB (not okblacke)

未读,
2011年6月1日 13:17:542011/6/1
收件人
Carl Banks wrote:

> On Tuesday, May 31, 2011 8:57:57 PM UTC-7, Chris Angelico wrote:
>> On Wed, Jun 1, 2011 at 1:30 PM, Carl Banks wrote:
>> > I think you misunderstood what I was saying.
>> >
>> > It's not *possible* to represent a real number abstractly in any
>> > digita
> l computer.  Python couldn't have an "abstract real number" type
> even it wanted to.
>>

>> True, but why should the "non-integer number" type be floating
>> point rather than (say) rational?
>

> Python has several non-integer number types in the standard
> library. The one we are talking about is called float. If the
> type we were talking about had instead been called real, then your
> question might make some sense. But the fact that it's called
> float really does imply that that underlying representation is
> floating point.

That's true, but that's sort of putting the cart before the horse.
In response to that, one can just ask: why is this type called "float"?
Why is it that when I type 1.37 or sqrt(2) in my program, the resulting
object is a "float" rather than some other numeric type? I'm aware
that there are answers to this having to do with standardization and
efficiency. But I do sometimes wish that the "default" type for non-
integers (as created through Python expressions) was something more like
"rationals with a denominator no bigger than N".

--
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead. Go, instead, where there is
no path, and leave a trail."
--author unknown

Ethan Furman

未读,
2011年6月1日 14:10:332011/6/1
收件人 pytho...@python.org
Carl Banks wrote:
> For instance, say you are using an implementation that uses
> floating point, and you define a function that uses Newton's
> method to find a square root:
>
> def square_root(N,x=None):
> if x is None:
> x = N/2
> for i in range(100):
> x = (x + N/x)/2
> return x
>
> It works pretty well on your floating-point implementation.
> Now try running it on an implementation that uses fractions
> by default....
>
> (Seriously, try running this function with N as a Fraction.)

Okay, will this thing ever stop? It's been running for 90 minutes now.
Is it just incredibly slow?

Any enlightenment appreciated!

~Ethan~

Chris Torek

未读,
2011年6月1日 14:29:212011/6/1
收件人
>Carl Banks wrote:
>> For instance, say you are using an implementation that uses
> > floating point, and you define a function that uses Newton's
> > method to find a square root:
>>
>> def square_root(N,x=None):
>> if x is None:
>> x = N/2
>> for i in range(100):
>> x = (x + N/x)/2
>> return x
>>
>> It works pretty well on your floating-point implementation.
> > Now try running it on an implementation that uses fractions
> > by default....
>>
>> (Seriously, try running this function with N as a Fraction.)

In article <mailman.2376.1306950...@python.org>


Ethan Furman <et...@stoneleaf.us> wrote:
>Okay, will this thing ever stop? It's been running for 90 minutes now.
> Is it just incredibly slow?

The numerator and denominator get very big, very fast.

Try adding a bit of tracing:

for i in range(100):
x = (x + N/x) / 2

print 'refinement %d: %s' % (i + 1, x)

and lo:

>>> square_root(fractions.Fraction(5,2))
refinement 1: 13/8
refinement 2: 329/208
refinement 3: 216401/136864
refinement 4: 93658779041/59235012928
refinement 5: 17543933782901678712641/11095757974628660884096
refinement 6: 615579225157677613558476890352854841917537921/389326486355976942712506162834130868382115072
refinement 7: 757875564891453502666431245010274191070178420221753088072252795554063820074969259096915201/479322593608746863553102599134385944371903608931825380820104910630730251583028097491290624
refinement 8: 1148750743719079498041767029550032831122597958315559446437317334336105389279028846671983328007126798344663678217310478873245910031311232679502892062001786881913873645733507260643841/726533762792931259056428876869998002853417255598937481942581984634876784602422528475337271599486688624425675701640856472886826490140251395415648899156864835350466583887285148750848

In the worst case, the number of digits in numerator and denominator
could double on each pass, so if you start with 1 digit in each,
you end with 2**100 in each. (You will run out of memory first
unless you have a machine with more than 64 bits of address space. :-) )

Carl Banks

未读,
2011年6月1日 16:25:562011/6/1
收件人
On Wednesday, June 1, 2011 10:17:54 AM UTC-7, OKB (not okblacke) wrote:
> Carl Banks wrote:
>
> > On Tuesday, May 31, 2011 8:57:57 PM UTC-7, Chris Angelico wrote:
> >> On Wed, Jun 1, 2011 at 1:30 PM, Carl Banks wrote:
> > Python has several non-integer number types in the standard
> > library. The one we are talking about is called float. If the
> > type we were talking about had instead been called real, then your
> > question might make some sense. But the fact that it's called
> > float really does imply that that underlying representation is
> > floating point.
>
> That's true, but that's sort of putting the cart before the horse.

Not really. The (original) question Chris Angelico was asking was, "Is it an implementation detail that Python's non-integer type is represented as an IEEE floating-point?" Which the above is the appropriate answer to.

> In response to that, one can just ask: why is this type called "float"?

Which is a different question; not the question I was answering, and not one I care to discuss.

Carl Banks

Nobody

未读,
2011年6月1日 16:41:062011/6/1
收件人
On Sun, 29 May 2011 23:31:19 +0000, Steven D'Aprano wrote:

>> That's overstating it. There's a good argument to be made for raising an
>> exception.
>
> If so, I've never heard it, and I cannot imagine what such a good
> argument would be. Please give it.

Exceptions allow you to write more natural code by ignoring the awkward
cases. E.g. writing "x * y + z" rather than first determining whether
"x * y" is even defined then using a conditional.

>> Bear in mind that an exception is not necessarily an error,
>> just an "exceptional" condition.
>
> True, but what's your point? Testing two floats for equality is not an
> exceptional condition.

NaN itself is an exceptional condition which arises when a result is
undefined or not representable. When an operation normally returns a
number but a specific case cannot do so, it returns not-a-number.

The usual semantics for NaNs are practically identical to those for
exceptions. If any intermediate result in a floating-point expression is
NaN, the overall result is NaN. Similarly, if any intermediate calculation
throws an exception, the calculation as a whole throws an exception.

If x is NaN, then "x + y" is NaN, "x * y" is NaN, pretty much anything
involving x is NaN. By this reasoning both "x == y" and "x != y" should
also be NaN. But only the floating-point types have a NaN value, while
bool doesn't. However, all types have exceptions.

>>> The correct answer to "nan == nan" is False, they are not equal.
>>
>> There is no correct answer to "nan == nan".
>
> Why on earth not?

Why should there be a correct answer? What does NaN actually mean?

Apart from anything else, defining "NaN == NaN" as False means that
"x == x" is False if x is NaN, which violates one of the fundamental
axioms of an equivalence relation (and, in every other regard, "==" is
normally intended to be an equivalence relation).

The creation of NaN was a pragmatic decision on how to handle exceptional
conditions in hardware. It is not holy writ, and there's no fundamental
reason why a high-level language should export the hardware's behaviour
verbatim.

>> Arguably, "nan != nan" should also be false,
>> but that would violate the invariant "(x != y) == !(x == y)".
>
> I cannot imagine what that argument would be. Please explain.

A result of NaN means that the result of the calculation is undefined, so
the value is "unknown". If x is unknown and y is unknown, then whether x
is equal to y is itself unknown, and whether x differs from y is also
unknown.

Carl Banks

未读,
2011年6月1日 16:41:152011/6/1
收件人

Fraction needs to find the LCD of the denominators when adding; but LCD calculation becomes very expensive as the denominators get large (which they will since you're dividing by an intermediate result in a loop). I suspect the time needed grows exponentially (at least) with the value of the denominators.

The LCD calculation should slow the calculation down to an astronomical crawl well before you encounter memory issues.

This is why representation simply cannot be left as an implementation detail; rationals and floating-points behave too differently.


Carl Banks

Grant Edwards

未读,
2011年6月1日 17:01:232011/6/1
收件人
On 2011-05-29, Nobody <nob...@nowhere.com> wrote:
> On Sun, 29 May 2011 10:29:28 +0000, Steven D'Aprano wrote:
>
>>> The correct answer to "nan == nan" is to raise an exception, because
>>> you have asked a question for which the answer is nether True nor
>>> False.
>>
>> Wrong.

>
> That's overstating it. There's a good argument to be made for raising
> an exception. Bear in mind that an exception is not necessarily an

> error, just an "exceptional" condition.
>
>> The correct answer to "nan == nan" is False, they are not equal.
>
> There is no correct answer to "nan == nan".

For those of us who have to deal with the real world (that means
complying with IEEE-754), there _is_ a correct answer. IIRC, the IEEE
standard requires nan == nan is false, and nan != nan is true.

That said, I don't remember what the other comparisons are supposed to
do...

> Defining it to be false is just the "least wrong" answer.


>
> Arguably, "nan != nan" should also be false, but that would violate
> the invariant "(x != y) == !(x == y)".

And it would violate the IEEE standard. IEEE-754 has it's warts, but
we're far better off than we were with dozens of incompatible,
undocumented, vendor-specific schemes (most of which had more warts
than IEEE-754).

--
Grant Edwards grant.b.edwards Yow! I'm dressing up in
at an ill-fitting IVY-LEAGUE
gmail.com SUIT!! Too late...

Steven D'Aprano

未读,
2011年6月1日 20:53:262011/6/1
收件人
On Tue, 31 May 2011 19:45:01 -0700, Carl Banks wrote:

> On Sunday, May 29, 2011 8:59:49 PM UTC-7, Steven D&#39;Aprano wrote:
>> On Sun, 29 May 2011 17:55:22 -0700, Carl Banks wrote:
>>
>> > Floating point arithmetic evolved more or less on languages like
>> > Fortran where things like exceptions were unheard of,
>>
>> I'm afraid that you are completely mistaken.
>>
>> Fortran IV had support for floating point traps, which are "things like
>> exceptions". That's as far back as 1966. I'd be shocked if earlier
>> Fortrans didn't also have support for traps.
>>
>> http://www.bitsavers.org/pdf/ibm/7040/C28-6806-1_7040ftnMathSubrs.pdf
>
> Fine, it wasn't "unheard of". I'm pretty sure the existence of a few
> high end compiler/hardware combinations that supported traps doesn't
> invalidate my basic point.

On the contrary, it blows it out of the water and stomps its corpse into
a stain on the ground. NANs weren't invented as an alternative for
exceptions, but because exceptions are usually the WRONG THING in serious
numeric work.

Note the "usually". For those times where you do want to interrupt a
calculation just because of an invalid operation, the standard allows you
to set a trap and raise an exception.

There's plenty of information available about how and why IEEE-754 was
created. Go do some research and stop making up rubbish based on what you
assume must have been their motives. Start with William Kahan, who has
written extensively about it. If you can find a version of the Apple
Numerics Manual 2nd Edition, it has an extremely entertaining forward by
Professor Kahan about the mess that was floating point before IEEE-754.


> If your aim is to support every last clause of IEEE for better or
> worse, then yes that's what Python should do. If your aim is to make
> Python the best language it can be, then Python should reject IEEE's
> obsolete notions, and throw exceptions when operating on NaN.

Python's usefulness for good numeric work is seriously hurt by the fact
that it tries so hard to never generate a NAN, and rarely an INF, and
instead keeps raising annoying exceptions that have to be caught (at
great expense of performance) and turned into something useful.

You'll note that, out of the box, numpy generates NANs:

>>> import numpy
>>> x = numpy.array([float(x) for x in range(5)])
>>> x/x
Warning: invalid value encountered in divide
array([ nan, 1., 1., 1., 1.])


The IEEE standard supports both use-cases: those who want exceptions to
bail out early, and those who want NANs so the calculation can continue.
This is a good thing. Failing to support the standard is a bad thing.
Despite your opinion, it is anything but obsolete.


--
Steven

Steven D'Aprano

未读,
2011年6月1日 21:10:112011/6/1
收件人
On Wed, 01 Jun 2011 14:03:14 +0000, Grant Edwards wrote:

> On 2011-06-01, Chris Angelico <ros...@gmail.com> wrote:
>> On Wed, Jun 1, 2011 at 12:59 PM, Carl Banks <pavlove...@gmail.com>
>> wrote:
>>> On Sunday, May 29, 2011 7:53:59 PM UTC-7, Chris Angelico wrote:
>>>> Okay, here's a question. The Python 'float' value - is it meant to be
>>>> "a Python representation of an IEEE double-precision floating point
>>>> value", or "a Python representation of a real number"?
>>>
>>> The former. ?Unlike the case with integers, there is no way that I
>>> know of to represent an abstract real number on a digital computer.
>>
>> This seems peculiar. Normally Python seeks to define its data types in
>> the abstract and then leave the concrete up to the various
>> implementations - note,
>
> But, "real numbers" and "IEEE float" are so different that I don't think
> that it would be a wise decision for people to pretend they're working
> with real numbers when in fact they are working with IEEE floats.

People pretend that *all the time*.

Much of the opposition to NANs, for example, is that it violates
properties of the reals. But so do ordinary floats! People just pretend
otherwise.

For reals, a + b - a = b, always without exception. For floats, not so
much.

For reals, a*(b + c) = a*b + a*c, always without exception. For floats,
not so much.

For reals, 1/(1/x) = x, except for 0, always. For floats, not so much.
For IEEE floats with proper support for INF, 0 is one of the cases which
does work!

These sorts of violations are far more likely to bite you than the NAN
boogey, that x != x when x is a NAN. But people go into paroxysms of
concern over the violation that they will probably never see, and ignore
the dozens that they trip over day after day.

Compiler optimizations are some of the worst and most egregious
violations of the rule Floats Are Not Reals. Large numbers of numeric
algorithms are simply broken due to invalid optimizations written by C
programmers who think that because they have a high school understanding
of real-value math they therefore understand floats.


--
Steven

Steven D'Aprano

未读,
2011年6月2日 01:19:002011/6/2
收件人

True. Any rational implementation that has any hope of remaining fast has
to limit the denominator to not exceed some fixed value.

Which makes it roughly equivalent to a float, only done in software with
little hardware support.

(In case it's not obvious: floats are equivalent to implicit rationals
with a scaling factor and denominator equal to some power of two.)

--
Steven

John Nagle

未读,
2011年6月2日 03:15:242011/6/2
收件人
On 5/31/2011 7:45 PM, Carl Banks wrote:
> Fine, it wasn't "unheard of". I'm pretty sure the existence of a few
> high end compiler/hardware combinations that supported traps doesn't
> invalidate my basic point. NaN was needed because few systems had a
> separate path to deal with exceptional situations like producing or
> operating on something that isn't a number. When they did exist few
> programmers used them. If floating-point were standardized today it
> might not even have NaN (and definitely wouldn't support the
> ridiculous NaN != NaN), because all modern systems can be expected to
> support exceptions, and modern programmers can be expected to use
> them.

Actually, it's the older architectures that support exact floating
point exceptions. x86 does, even in the superscalar and 64 bit variants.
But almost none of the RISC machines have exact floating point
exceptions. With all the parallelism inside FPUs today, it takes an
elaborate "retirement unit" to back out everything that happened
after the error and leave the CPU in a clean state. Most RISC
machines don't bother. ARM machines have both IEEE mode and
"run-fast mode"; the latter doesn't support FPU exceptions
but does support NaNs.

Many game machines and GPUs don't have full IEEE floating point.
Some don't have exceptions. Others don't have full INF/NaN semantics.

John Nagle

Steven D'Aprano

未读,
2011年6月2日 05:54:302011/6/2
收件人
On Wed, 01 Jun 2011 21:41:06 +0100, Nobody wrote:

> On Sun, 29 May 2011 23:31:19 +0000, Steven D'Aprano wrote:
>
>>> That's overstating it. There's a good argument to be made for raising
>>> an exception.
>>
>> If so, I've never heard it, and I cannot imagine what such a good
>> argument would be. Please give it.
>
> Exceptions allow you to write more natural code by ignoring the awkward
> cases. E.g. writing "x * y + z" rather than first determining whether "x
> * y" is even defined then using a conditional.

You've quoted me out of context. I wasn't asking for justification for
exceptions in general. There's no doubt that they're useful. We were
specifically talking about NAN == NAN raising an exception rather than
returning False.


>>> Bear in mind that an exception is not necessarily an error, just an
>>> "exceptional" condition.
>>
>> True, but what's your point? Testing two floats for equality is not an
>> exceptional condition.
>
> NaN itself is an exceptional condition which arises when a result is
> undefined or not representable. When an operation normally returns a
> number but a specific case cannot do so, it returns not-a-number.

I'm not sure what "not representable" is supposed to mean, but if you
"undefined" you mean "invalid", then correct.


> The usual semantics for NaNs are practically identical to those for
> exceptions. If any intermediate result in a floating-point expression is
> NaN, the overall result is NaN.

Not necessarily. William Kahan gives an example where passing a NAN to
hypot can justifiably return INF instead of NAN. While it's certainly
true that *mostly* any intermediate NAN results in a NAN, that's not a
guarantee or requirement of the standard. A function is allowed to
convert NANs back to non-NANs, if it is appropriate for that function.

Another example is the Kronecker delta:

def kronecker(x, y):
if x == y: return 1
return 0

This will correctly consume NAN arguments. If either x or y is a NAN, it
will return 0.

(As an aside, this demonstrates that having NAN != any NAN, including
itself, is useful, as kronecker(x, x) will return 0 if x is a NAN.)


> Similarly, if any intermediate
> calculation throws an exception, the calculation as a whole throws an
> exception.

This is certainly true... the exception cannot look into the future and
see that it isn't needed because a later calculation cancels it out.

Exceptions, or hardware traps, stop the calculation. NANs allow the
calculation to proceed. Both behaviours are useful, and the standard
allows for both.


> If x is NaN, then "x + y" is NaN, "x * y" is NaN, pretty much anything
> involving x is NaN. By this reasoning both "x == y" and "x != y" should
> also be NaN.

NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
because it is an invalid operation, not because NANs are magical goop
that spoil everything they touch.

For example, print(NAN) does not return a NAN or raise an exception, nor
is there any need for it to. Slightly more esoteric: the signbit and
copysign functions both accept NANs without necessarily returning NANs.

Equality comparison is another such function. There's no need for
NAN == NAN to fail, because the equality operation is perfectly well
defined for NANs.


> But only the floating-point types have a NaN value, while
> bool doesn't. However, all types have exceptions.

What relevance does bool have?

>>>> The correct answer to "nan == nan" is False, they are not equal.
>>>
>>> There is no correct answer to "nan == nan".
>>
>> Why on earth not?
>
> Why should there be a correct answer? What does NaN actually mean?

NAN means "this is a sentinel marking that an invalid calculation was
attempted". For the purposes of numeric calculation, it is often useful
to allow those sentinels to propagate through your calculation rather
than to halt the program, perhaps because you hope to find that the
invalid marker ends up not being needed and can be ignored, or because
you can't afford to halt the program.

Does INVALID == INVALID? There's no reason to think that the question
itself is an invalid operation. If you can cope with the question "Is an
apple equal to a puppy dog?" without shouting "CANNOT COMPUTE!!!" and
running down the street, there's no reason to treat NAN == NAN as
anything worse.

So what should NAN == NAN equal? Consider the answer to the apple and
puppy dog comparison. Chances are that anyone asked that will give you a
strange look and say "Of course not, you idiot". (In my experience, and
believe it or not I have actually tried this, some people will ask you to
define equality. But they're a distinct minority.)

If you consider "equal to" to mean "the same as", then the answer is
clear and obvious: apples do not equal puppies, and any INVALID sentinel
is not equal to any other INVALID. (Remember, NAN is not a value itself,
it's a sentinel representing the fact that you don't have a valid number.)

So NAN == NAN should return False, just like the standard states, and
NAN != NAN should return True. "No, of course not, they're not equal."


> Apart from anything else, defining "NaN == NaN" as False means that "x
> == x" is False if x is NaN, which violates one of the fundamental axioms
> of an equivalence relation (and, in every other regard, "==" is normally
> intended to be an equivalence relation).

Yes, that's a consequence of NAN behaviour. I can live with that.


> The creation of NaN was a pragmatic decision on how to handle
> exceptional conditions in hardware. It is not holy writ, and there's no
> fundamental reason why a high-level language should export the
> hardware's behaviour verbatim.

There is a good, solid reason: it's a *useful* standard that *works*,
proven in practice, invented by people who have forgotten more about
floating point than you or I will ever learn, and we dismiss their
conclusions at our peril.

A less good reason: its a standard. Better to stick to a not-very-good
standard than to have the Wild West, where everyone chooses their own
behaviour. You have NAN == NAN raise ValueError, Fred has it return True,
George has it return False, Susan has it return a NAN, Michelle makes it
raise MathError, somebody else returns Maybe ...

But IEEE-754 is not just a "not-very-good" standard. It is an extremely
good standard.

>>> Arguably, "nan != nan" should also be false, but that would violate
>>> the invariant "(x != y) == !(x == y)".
>>
>> I cannot imagine what that argument would be. Please explain.
>
> A result of NaN means that the result of the calculation is undefined,
> so the value is "unknown".

Incorrect. NANs are not "unknowns", or missing values.


--
Steven

Grant Edwards

未读,
2011年6月2日 09:05:552011/6/2
收件人
On 2011-06-02, Steven D'Aprano <steve+comp....@pearwood.info> wrote:

> But IEEE-754 is not just a "not-very-good" standard. It is an
> extremely good standard.

I get the distinct impression that the people arguing that IEEE-754 is
somehow "wrong" about the value of 'NaN == NaN' are the people who
don't actually use floating point. Those of us that do use floating
point and depend on the predictable behavior of NaNs seem to be happy
enough with the standard.

Two of my perennial complaints about Python's handling of NaNs and
Infs:

1) They weren't handle by pickle et al.

2) The string representations produced by repr() and accepted by
float() weren't standardized across platforms.

I think the latter has finally been fixed, hasn't it?

--
Grant Edwards grant.b.edwards Yow! Remember, in 2039,
at MOUSSE & PASTA will
gmail.com be available ONLY by
prescription!!

Robert Kern

未读,
2011年6月2日 13:04:002011/6/2
收件人 pytho...@python.org
On 6/2/11 8:05 AM, Grant Edwards wrote:

> Two of my perennial complaints about Python's handling of NaNs and
> Infs:
>
> 1) They weren't handle by pickle et al.
>
> 2) The string representations produced by repr() and accepted by
> float() weren't standardized across platforms.
>
> I think the latter has finally been fixed, hasn't it?

And the former!

Python 2.7.1 |EPD 7.0-2 (32-bit)| (r271:86832, Dec 3 2010, 15:41:32)
[GCC 4.0.1 (Apple Inc. build 5488)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> inf = 1e300*1e300
>>> nan = inf / inf
>>> import cPickle
>>> cPickle.loads(cPickle.dumps(nan))
nan
>>> cPickle.loads(cPickle.dumps(inf))
inf

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

Nobody

未读,
2011年6月2日 16:47:022011/6/2
收件人
On Thu, 02 Jun 2011 09:54:30 +0000, Steven D'Aprano wrote:

>> Exceptions allow you to write more natural code by ignoring the awkward
>> cases. E.g. writing "x * y + z" rather than first determining whether "x
>> * y" is even defined then using a conditional.
>
> You've quoted me out of context. I wasn't asking for justification for
> exceptions in general. There's no doubt that they're useful. We were
> specifically talking about NAN == NAN raising an exception rather than
> returning False.

It's arguable that NaN itself simply shouldn't exist in Python; if the FPU
ever generates a NaN, Python should raise an exception at that point.

But given that NaNs propagate in almost the same manner as exceptions,
you could "optimise" this by treating a NaN as a special-case
implementation of exceptions, and turn it into a real exception at the
point where you can no longer use a NaN (e.g. when using a comparison
operator).

This would produce the same end result as raising an exception
immediately, but would reduce the number of isnan() tests.

>> NaN itself is an exceptional condition which arises when a result is
>> undefined or not representable. When an operation normally returns a
>> number but a specific case cannot do so, it returns not-a-number.
>
> I'm not sure what "not representable" is supposed to mean,

Consider sqrt(-1). This is defined (as "i" aka "j"), but not representable
as a floating-point "real". Making root/log/trig/etc functions return
complex numbers when necessary probably be inappropriate for a language
such as Python.

> but if you "undefined" you mean "invalid", then correct.

I mean undefined, in the sense that 0/0 is undefined (I note that Python
actually raises an exception for "0.0/0.0").

>> The usual semantics for NaNs are practically identical to those for
>> exceptions. If any intermediate result in a floating-point expression is
>> NaN, the overall result is NaN.
>
> Not necessarily. William Kahan gives an example where passing a NAN to
> hypot can justifiably return INF instead of NAN.

Hmm. Is that still true if the NaN signifies "not representable" (e.g.
known but complex) rather than undefined (e.g. unknown value but known to
be real)?

> While it's certainly
> true that *mostly* any intermediate NAN results in a NAN, that's not a
> guarantee or requirement of the standard. A function is allowed to
> convert NANs back to non-NANs, if it is appropriate for that function.
>
> Another example is the Kronecker delta:
>
> def kronecker(x, y):
> if x == y: return 1
> return 0
>
> This will correctly consume NAN arguments. If either x or y is a NAN, it
> will return 0. (As an aside, this demonstrates that having NAN != any
> NAN, including itself, is useful, as kronecker(x, x) will return 0 if x
> is a NAN.)

How is this useful? On the contrary, I'd suggest that the fact that
kronecker(x, x) can return 0 is an argument against the "NaN != NaN" axiom.

A case where the semantics of exceptions differ from those of NaN is:

def cond(t, x, y):
if t:
return x
else:
return y

as cond(True, x, nan()) will return x, while cond(True, x, raise()) will
raise an exception.

But this is a specific instance of a more general problem with strict
languages, i.e. strict functions violate referential transparency.

This is why even strict languages (i.e. almost everything except for a
handful of functional languages which value mathematical purity, e.g.
Haskell) have non-strict conditionals. If you remove the conditional from
the function and write it in-line, then:

if True:
return x
else:
raise()

behaves like NaN.

Also, note that the "convenience" of NaN (e.g. not propagating from the
untaken branch of a conditional) is only available for floating-point
types. If it's such a good idea, why don't we have it for other types?

> Equality comparison is another such function. There's no need for
> NAN == NAN to fail, because the equality operation is perfectly well
> defined for NANs.

The definition is entirely arbitrary. You could just as easily define that
(NaN == NaN) is True. You could just as easily define that "1 + NaN" is 27.

Actually, "NaN == NaN" makes more sense than "NaN != NaN", as the former
upholds the equivalence axioms and is consistent with the normal behaviour
of "is" (i.e. "x is y" => "x == y", even if the converse isn't necessarily
true).

If you're going to argue that "NaN == NaN" should be False on the basis
that the values are sentinels for unrepresentable values (which may be
*different* unrepresentable values), it follows that "NaN != NaN" should
also be False for the same reason.

>> But only the floating-point types have a NaN value, while
>> bool doesn't. However, all types have exceptions.
>
> What relevance does bool have?

The result of comparisons is a bool.

>> Why should there be a correct answer? What does NaN actually mean?
>
> NAN means "this is a sentinel marking that an invalid calculation was
> attempted". For the purposes of numeric calculation, it is often useful
> to allow those sentinels to propagate through your calculation rather
> than to halt the program, perhaps because you hope to find that the
> invalid marker ends up not being needed and can be ignored, or because
> you can't afford to halt the program.
>
> Does INVALID == INVALID?

Either True or INVALID. You can make a reasonable argument for either.
Making a reasonable argument that it should be False is much harder.

> If you can cope with the question "Is an apple equal to a puppy dog?"

It depends upon your definition of equality, but it's not a particularly
hard question. And completely irrelevant here.

> So what should NAN == NAN equal? Consider the answer to the apple and
> puppy dog comparison. Chances are that anyone asked that will give you a
> strange look and say "Of course not, you idiot". (In my experience, and
> believe it or not I have actually tried this, some people will ask you to
> define equality. But they're a distinct minority.)
>
> If you consider "equal to" to mean "the same as", then the answer is
> clear and obvious: apples do not equal puppies,

This is "equality" as opposed to "equivalence", i.e. x and y are equal if
and only if f(x) and f(y) are equal for all f.

> and any INVALID sentinel is not equal to any other INVALID.

This does not follow. Unless you explicity define the sentinel to be
unequal to itself, the strict equality definition holds, as NaN tends to
be a specific bit pattern (multiple bit patterns are interpreted as NaN,
but operations which result in a NaN will use a specific pattern, possibly
modulo the sign bit).

If you want to argue that "NaN == NaN" should be False, then do so. Simply
asserting that it should be False won't suffice (nor will citing the IEEE
FP standard *unless* you're arguing that "because the standard says so" is
the only reason required).

> (Remember, NAN is not a value itself, it's a sentinel representing the
> fact that you don't have a valid number.)

i'm aware of that.

> So NAN == NAN should return False,

Why?

> just like the standard states, and NAN != NAN should return True.

Why?

In both cases, the more obvious result should be some kind of sentinel
indicating that we don't have a valid boolean. Why should this sentinel
propagate through arithmetic operations but not through logical operations?

>> Apart from anything else, defining "NaN == NaN" as False means that "x
>> == x" is False if x is NaN, which violates one of the fundamental axioms
>> of an equivalence relation (and, in every other regard, "==" is normally
>> intended to be an equivalence relation).
>
> Yes, that's a consequence of NAN behaviour.

Another consequence:

> x = float("nan")
> x is x
True
> x == x
False

Ordinarily, you would consider this behaviour a bug in the class' __eq__
method.

> I can live with that.

I can *live* with it (not that I have much choice), but that doesn't meant
that it's correct or even anything short of downright stupid.

>> The creation of NaN was a pragmatic decision on how to handle
>> exceptional conditions in hardware. It is not holy writ, and there's no
>> fundamental reason why a high-level language should export the
>> hardware's behaviour verbatim.
>
> There is a good, solid reason: it's a *useful* standard

Debatable.

> that *works*,

Debatable.

> proven in practice,

If anything, it has proven to be a major nuisance. It takes a lot of
effort to create (or even specify) code which does the right thing in the
presence of NaNs.

Turning NaNs into exceptions at their source wouldn't make it
significantly harder to write correct code (there are a handful of cases
where the existing behaviour produces the right answer almost by accident,
far more where it doesn't), and would mean that "simple" code (where NaN
hasn't been explicitly considered) raises an exception rather than
silently producing a wrong answer.

> invented by people who have forgotten more about
> floating point than you or I will ever learn, and we dismiss their
> conclusions at our peril.

I'm not aware that they made any conclusions about Python. I don't
consider any conclusions about the most appropriate behaviour for hardware
(which may have no choice beyond exactly /which/ bit pattern to put into a
register) to automatically determine what is the most appropriate
behaviour for a high-level language.

> A less good reason: its a standard. Better to stick to a not-very-good
> standard than to have the Wild West, where everyone chooses their own
> behaviour. You have NAN == NAN raise ValueError, Fred has it return True,
> George has it return False, Susan has it return a NAN, Michelle makes it
> raise MathError, somebody else returns Maybe ...

This isn't an issue if you have the language deal with it.

>> A result of NaN means that the result of the calculation is undefined,
>> so the value is "unknown".
>
> Incorrect. NANs are not "unknowns", or missing values.

You're contradicting yourself here.

Gregory Ewing

未读,
2011年6月2日 19:17:172011/6/2
收件人
Steven D'Aprano wrote:

> def kronecker(x, y):
> if x == y: return 1
> return 0
>
> This will correctly consume NAN arguments. If either x or y is a NAN, it
> will return 0.

I'm far from convinced that this result is "correct". For one
thing, the Kronecker delta is defined on integers, not reals,
so expecting it to deal with NaNs at all is nonsensical.
For another, this function as written is numerically suspect,
since it relies on comparing floats for exact equality.

But the most serious problem is, given that

> NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
> because it is an invalid operation,

if kronecker(NaN, x) or kronecker(x, Nan) returns anything
other than NaN or some other sentinel value, then you've
*lost* the information that an invalid operation occurred
somewhere earlier in the computation.

You can't get a valid result from data produced by an
invalid computation. Garbage in, garbage out.

> not because NANs are magical goop that spoil everything they touch.

But that's exactly how the *have* to behave if they truly
indicate an invalid operation.

SQL has been mentioned in relation to all this. It's worth
noting that the result of comparing something to NULL in
SQL is *not* true or false -- it's NULL!

--
Greg

Steven D'Aprano

未读,
2011年6月3日 00:23:102011/6/3
收件人
On Fri, 03 Jun 2011 11:17:17 +1200, Gregory Ewing wrote:

> Steven D'Aprano wrote:
>
>> def kronecker(x, y):
>> if x == y: return 1
>> return 0
>>
>> This will correctly consume NAN arguments. If either x or y is a NAN,
>> it will return 0.
>
> I'm far from convinced that this result is "correct". For one thing, the
> Kronecker delta is defined on integers, not reals, so expecting it to
> deal with NaNs at all is nonsensical.

Fair point. Call it an extension of the Kronecker Delta to the reals then.


> For another, this function as
> written is numerically suspect, since it relies on comparing floats for
> exact equality.

Well, it is a throw away function demonstrating a principle, not battle-
hardened production code.

But it's hard to say exactly what alternative there is, if you're going
to accept floats. Should you compare them using an absolute error? If so,
you're going to run into trouble if your floats get large. It is very
amusing when people feel all virtuous for avoiding equality and then
inadvertently do something like this:

y = 2.1e12
if abs(x - y) <= 1e-9:
# x is equal to y, within exact tolerance
...

Apart from being slower and harder to read, how is this different from
the simpler, more readable x == y?

What about a relative error? Then you'll get into trouble when the floats
are very small. And how much error should you accept? What's good for
your application may not be good for mine.

Even if you define your equality function to accept some limited error
measured in Units in Last Place (ULP), "equal to within 2 ULP" (or any
other fixed tolerance) is no better, or safer, than exact equality, and
very likely worse.

In practice, either the function needs some sort of "how to decide
equality" parameter, so the caller can decide what counts as equal in
their application, or you use exact floating point equality and leave it
up to the caller to make sure the arguments are correctly rounded so that
values which should compare equal do compare equal.


> But the most serious problem is, given that
>
>> NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
>> because it is an invalid operation,
>
> if kronecker(NaN, x) or kronecker(x, Nan) returns anything other than
> NaN or some other sentinel value, then you've *lost* the information
> that an invalid operation occurred somewhere earlier in the computation.

If that's the most serious problem, then I'm laughing, because of course
I haven't lost anything.

x = result_of_some_computation(a, b, c) # may return NAN
y = kronecker(x, 42)

How have I lost anything? I still have the result of the computation in
x. If I throw that value away, it is because I no longer need it. If I do
need it, it is right there, where it always was.

You seem to have fallen for the myth that NANs, once they appear, may
never disappear. This is a common, but erroneous, misapprehension, e.g.:

"NaN is like a trap door that once you have fallen in you cannot
come back out. Otherwise, the possibility exists that a calculation
will have gone off course undetectably."

http://www.rhinocerus.net/forum/lang-fortran/94839-fortran-ieee-754-
maxval-inf-nan-2.html#post530923

Certainly if you, the function writer, has any reasonable doubt about the
validity of a NAN input, you should return a NAN. But that doesn't mean
that NANs are "trap doors". It is fine for them to disappear *if they
don't matter* to the final result of the calculation. I quote:

"The key result of these rules is that once you get a NaN during
a computation, the NaN has a STRONG TENDENCY [emphasis added] to
propagate itself throughout the rest of the computation..."

http://www.savrola.com/resources/NaN.html

Another couple of good examples:

- from William Kahan, and the C99 standard: hypot(INF, x) is always INF
regardless of the value of x, hence hypot(INF, NAN) returns INF.

- since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0)
is also 1.

In the case of the real-valued Kronecker delta, I argue that the NAN
doesn't matter, and it is reasonable to allow it to disappear.

Another standard example where NANs get thrown away is the max and min
functions. The latest revision of IEEE-754 (2008) allows for max and min
to ignore NANs.


> You can't get a valid result from data produced by an invalid
> computation. Garbage in, garbage out.

Of course you can. Here's a trivial example:

def f(x):
return 1

It doesn't matter what value x takes, the result of f(x) should be 1.
What advantage is there in having f(NAN) return NAN?


>> not because NANs are magical goop that spoil everything they touch.
>
> But that's exactly how the *have* to behave if they truly indicate an
> invalid operation.
>
> SQL has been mentioned in relation to all this. It's worth noting that
> the result of comparing something to NULL in SQL is *not* true or false
> -- it's NULL!

I'm sure they have their reasons for that. Whether they are good reasons
or not, I don't know. I do know that the 1999 SQL standard defined *four*
results for boolean comparisons, true/false/unknown/null, but allowed
implementations to treat unknown and null as the same.


--
Steven

Chris Angelico

未读,
2011年6月3日 00:35:522011/6/3
收件人 pytho...@python.org
On Fri, Jun 3, 2011 at 2:23 PM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
>> You can't get a valid result from data produced by an invalid
>> computation. Garbage in, garbage out.
>
> Of course you can. Here's a trivial example:
>
> def f(x):
>    return 1
>

If your incoming x is garbage, your outgoing 1 is also garbage. Later
on, you can use 'isgarbage(x)' to find out whether anything went
wrong. You can also use 'isinsane(self)', which is defined as follows:

class Programmer:
def isinsane(self):
return True if float("nan")==float("nan") else True

Chris Angelico

Steven D'Aprano

未读,
2011年6月3日 01:59:172011/6/3
收件人
On Fri, 03 Jun 2011 14:35:52 +1000, Chris Angelico wrote:

> On Fri, Jun 3, 2011 at 2:23 PM, Steven D'Aprano
> <steve+comp....@pearwood.info> wrote:
>>> You can't get a valid result from data produced by an invalid
>>> computation. Garbage in, garbage out.
>>
>> Of course you can. Here's a trivial example:
>>
>> def f(x):
>>    return 1
>>
>>
> If your incoming x is garbage, your outgoing 1 is also garbage.

If there were non-garbage input where f(x) would return something other
than 1, then you might argue that "well, we can't be sure what value
f(x) should return, so we better return a NAN". But there is no such
input.

NANs are a tool, not poison. They indicate an invalid calculation. Not
all calculations are critical. What do you do when you reach an invalid
calculation and you can't afford to just give up and halt the program
with an error? You try to fix it with another calculation!

If you're in the fortunate situation that you can say "this bad input
does not matter", then *that input does not matter*. Regardless of
whether your input is a NAN, or you've just caught an exception, you have
the opportunity to decide what the appropriate response is.

You might not be able to fix the situation, in which case it is
appropriate to return a NAN to signal to the next function that you don't
have a valid result. But sometimes one bad value is not the end of the
world. Perhaps you try again with a smaller step size, or you skip this
iteration of the calculation, or you throw away the current value and
start again from a different starting point, or do whatever is needed to
get the result you want.

In the case of my toy function, whatever is needed is... nothing at all.
Just return 1, the same as you would for any other input, because the
input literally does not matter for the output.


--
Steven

Grant Edwards

未读,
2011年6月3日 10:52:392011/6/3
收件人
On 2011-06-02, Nobody <nob...@nowhere.com> wrote:
> On Thu, 02 Jun 2011 09:54:30 +0000, Steven D'Aprano wrote:
>
>>> Exceptions allow you to write more natural code by ignoring the
>>> awkward cases. E.g. writing "x * y + z" rather than first determining
>>> whether "x * y" is even defined then using a conditional.
>>
>> You've quoted me out of context. I wasn't asking for justification
>> for exceptions in general. There's no doubt that they're useful. We
>> were specifically talking about NAN == NAN raising an exception
>> rather than returning False.
>
> It's arguable that NaN itself simply shouldn't exist in Python; if
> the FPU ever generates a NaN, Python should raise an exception at
> that point.

Sorry, I just don't "get" that argument. I depend on compliance with
IEEE-754, and I find the current NaN behavior very useful, and
labor-saving.

> But given that NaNs propagate in almost the same manner as
> exceptions, you could "optimise" this by treating a NaN as a
> special-case implementation of exceptions, and turn it into a real
> exception at the point where you can no longer use a NaN (e.g. when
> using a comparison operator).
>
> This would produce the same end result as raising an exception
> immediately, but would reduce the number of isnan() tests.

I've never found the number of isnan() checks in my code to be an
issue -- there just arent that many of them, and when they are there,
it provides an easy to read and easy to maintain way to handle things.

> I mean undefined, in the sense that 0/0 is undefined

But 0.0/0.0 _is_ defined. It's NaN. ;)

> (I note that Python actually raises an exception for "0.0/0.0").

IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN.
Pythons claims it implements IEEE-754. Python got it wrong.

> Also, note that the "convenience" of NaN (e.g. not propagating from
> the untaken branch of a conditional) is only available for
> floating-point types. If it's such a good idea, why don't we have it
> for other types?

> The definition is entirely arbitrary.

I don't agree, but even if was entirely arbitrary, that doesn't make
the decision meaningless. IEEE-754 says it's True, and standards
compliance is valuable. Each country's decision to drive on the
right/left side of the road is entire arbitrary, but once decided
there's a huge benefit to everybody following the rule.

> You could just as easily define that (NaN == NaN) is True. You could
> just as easily define that "1 + NaN" is 27.

I don't think that would be "just as easy" to use.

> Actually, "NaN == NaN" makes more sense than "NaN != NaN", as the
> former upholds the equivalence axioms

You seem to be talking about reals. We're talking about floats.

> If you're going to argue that "NaN == NaN" should be False on the
> basis that the values are sentinels for unrepresentable values (which
> may be *different* unrepresentable values), it follows that "NaN !=
> NaN" should also be False for the same reason.

Mostly I just want Python to follow the IEEE-754 standard [which I
happen to find to be very well thought out and almost always behaves
in a practical, useful manner].

> If you want to argue that "NaN == NaN" should be False, then do so.
> Simply asserting that it should be False won't suffice (nor will
> citing the IEEE FP standard *unless* you're arguing that "because the
> standard says so" is the only reason required).

For those of us who have to accomplish real work and interface with
real devices "because the standard says so" is actaully a darned good
reason. Years of experience has also shown to me that it's a very
practical decision.

> If anything, it has proven to be a major nuisance. It takes a lot of
> effort to create (or even specify) code which does the right thing in
> the presence of NaNs.

That's not been my experience. NaNs save a _huge_ amount of effort
compared to having to pass value+status info around throughout complex
caclulations.

> I'm not aware that they made any conclusions about Python.

They made some very informed (and IMO valid) conclusions about
scientific computing using binary floating point arithmatic. Those
conclusions apply largly to Python.

--
Grant

Chris Torek

未读,
2011年6月3日 13:52:242011/6/3
收件人
>On 2011-06-02, Nobody <nob...@nowhere.com> wrote:
>> (I note that Python actually raises an exception for "0.0/0.0").

In article <isasfm$inl$1...@reader1.panix.com>


Grant Edwards <inv...@invalid.invalid> wrote:
>IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN.
>Pythons claims it implements IEEE-754. Python got it wrong.

Indeed -- or at least, inconsistent. (Again I would not mind at
all if Python had "raise exception on NaN-result" mode *as well
as* "quietly make NaN", perhaps using signalling vs quiet NaN to
tell them apart in most cases, plus some sort of floating-point
context control, for instance.)

>> Also, note that the "convenience" of NaN (e.g. not propagating from
>> the untaken branch of a conditional) is only available for
>> floating-point types. If it's such a good idea, why don't we have it
>> for other types?

Mostly because for integers it's "too late" and there is no standard
for it. For others, well:

>>> import decimal
>>> decimal.Decimal('nan')
Decimal("NaN")
>>> _ + 1
Decimal("NaN")
>>> decimal.setcontext(decimal.ExtendedContext)
>>> print decimal.Decimal(1) / 0
Infinity
>>> [etc]

(Note that you have to set the decimal context to one that does
not produce a zero-divide exception, such as the pre-loaded
decimal.ExtendedContext. On my one Python 2.7 system -- all the
rest are earlier versions, with 2.5 the highest I can count on,
and that only by upgrading it on the really old work systems --
I note that fractions.Fraction(0,0) raises a ZeroDivisionError,
and there is no fractions.ExtendedContext or similar.)

>> The definition is entirely arbitrary.
>
>I don't agree, but even if was entirely arbitrary, that doesn't make
>the decision meaningless. IEEE-754 says it's True, and standards
>compliance is valuable. Each country's decision to drive on the
>right/left side of the road is entire arbitrary, but once decided
>there's a huge benefit to everybody following the rule.

This analogy perhaps works better than expected. Whenever I swap
between Oz or NZ and the US-of-A, I have a brief mental clash that,
if I am not careful, could result in various bad things. :-)

Carl Banks

未读,
2011年6月3日 16:27:002011/6/3
收件人
On Wednesday, June 1, 2011 5:53:26 PM UTC-7, Steven D&#39;Aprano wrote:
> On Tue, 31 May 2011 19:45:01 -0700, Carl Banks wrote:
>
> > On Sunday, May 29, 2011 8:59:49 PM UTC-7, Steven D&#39;Aprano wrote:
> >> On Sun, 29 May 2011 17:55:22 -0700, Carl Banks wrote:
> >>
> >> > Floating point arithmetic evolved more or less on languages like
> >> > Fortran where things like exceptions were unheard of,
> >>
> >> I'm afraid that you are completely mistaken.
> >>
> >> Fortran IV had support for floating point traps, which are "things like
> >> exceptions". That's as far back as 1966. I'd be shocked if earlier
> >> Fortrans didn't also have support for traps.
> >>
> >> http://www.bitsavers.org/pdf/ibm/7040/C28-6806-1_7040ftnMathSubrs.pdf
> >
> > Fine, it wasn't "unheard of". I'm pretty sure the existence of a few
> > high end compiler/hardware combinations that supported traps doesn't
> > invalidate my basic point.
>
> On the contrary, it blows it out of the water and stomps its corpse into
> a stain on the ground.

Really? I am claiming that, even if everyone and their mother thought exceptions were the best thing ever, NaN would have been added to IEEE anyway because most hardware didn't support exceptions. Therefore the fact that NaN is in IEEE is not any evidence that NaN is a good idea.

You are saying that the existence of one early system that supported exceptions not merely argument against that claim, but blows it out of the water? Your logic sucks then.

You want to go off arguing that there were good reasons aside from backwards compatibility they added NaN, be my guest. Just don't go around saying, "Its in IEEE there 4 its a good idear LOL". Lots of standards have all kinds of bad ideas in them for the sake of backwards compatibility, and when someone goes around claiming that something is a good idea simply because some standard includes it, it is the first sign that they're clueless about what standarization actually is.


> NANs weren't invented as an alternative for
> exceptions, but because exceptions are usually the WRONG THING in serious
> numeric work.
>
> Note the "usually". For those times where you do want to interrupt a
> calculation just because of an invalid operation, the standard allows you
> to set a trap and raise an exception.

I don't want to get into an argument over best practices in serious numerical programming, so let's just agree with this point for argument's sake.

Here's the problem: Python is not for serious numerical programming. Yeah, it's a really good language for calling other languages to do numerical programming, but it's not good for doing serious numerical programming itself. Anyone with some theoretical problem where NaN is a good idea should already be using modules or separate programs written in C or Fortran.

Casual and lightweight numerical work (which Python is good at) is not a wholly separate problem domain where the typical rules ("Errors should never pass silently") should be swept aside.


[snip]


> You'll note that, out of the box, numpy generates NANs:
>
> >>> import numpy
> >>> x = numpy.array([float(x) for x in range(5)])
> >>> x/x
> Warning: invalid value encountered in divide
> array([ nan, 1., 1., 1., 1.])

Steven, seriously I don't know what's going through your head. I'm saying strict adherence to IEEE is not the best idea, and you cite the fact that a library tries to strictly adhere to IEEE as evidence that strictly adhering to IEEE is a good idea. Beg the question much?


> The IEEE standard supports both use-cases: those who want exceptions to
> bail out early, and those who want NANs so the calculation can continue.
> This is a good thing. Failing to support the standard is a bad thing.
> Despite your opinion, it is anything but obsolete.

There are all kinds of good reasons to go against standards. "Failing to support the standard is a bad thing" are the words of a fool. A wise person considers the cost of breaking the standard versus the benefit got.

It's clear tha IEEE's NaN handling is woefully out of place in the philosophy of Python, which tries to be newbie friendly and robust to errors; and Python has no real business trying to perform serious numerical work where (ostensibly) NaNs might find a use. Therefore, the cost of breaking standard is small, but the benefit significant, so Python would be very wise to break with IEEE in the handling of NaNs.


Carl Banks

Chris Angelico

未读,
2011年6月3日 16:35:122011/6/3
收件人 pytho...@python.org
On Sat, Jun 4, 2011 at 6:27 AM, Carl Banks <pavlove...@gmail.com> wrote:
> Really?  I am claiming that, even if everyone and their mother thought exceptions were the best thing ever, NaN would have been added to IEEE anyway because most hardware didn't support exceptions.  Therefore the fact that NaN is in IEEE is not any evidence that NaN is a good idea.

Uhh, noob question here. I'm way out of my depth with hardware floating point.

Isn't a signaling nan basically the same as an exception? Which would
imply that the hardware did support exceptions (if it did indeed
support IEEE floating point, which specifies signalling nan)?

Chris Angelico

Nobody

未读,
2011年6月3日 19:29:102011/6/3
收件人
On Fri, 03 Jun 2011 14:52:39 +0000, Grant Edwards wrote:

>> It's arguable that NaN itself simply shouldn't exist in Python; if
>> the FPU ever generates a NaN, Python should raise an exception at
>> that point.
>
> Sorry, I just don't "get" that argument. I depend on compliance with
> IEEE-754, and I find the current NaN behavior very useful, and
> labor-saving.

If you're "fluent" in IEEE-754, then you won't find its behaviour
unexpected. OTOH, if you are approach the issue without preconceptions,
you're likely to notice that you effectively have one exception mechanism
for floating-point and another for everything else.

>> But given that NaNs propagate in almost the same manner as
>> exceptions, you could "optimise" this by treating a NaN as a
>> special-case implementation of exceptions, and turn it into a real
>> exception at the point where you can no longer use a NaN (e.g. when
>> using a comparison operator).
>>
>> This would produce the same end result as raising an exception
>> immediately, but would reduce the number of isnan() tests.
>
> I've never found the number of isnan() checks in my code to be an
> issue -- there just arent that many of them, and when they are there,
> it provides an easy to read and easy to maintain way to handle things.

I think that you misunderstood. What I was saying here was that, if you
wanted exception-on-NaN behaviour from Python, the interpreter wouldn't
need to call isnan() on every value received from the FPU, but rely upon
NaN-propagation and only call it at places where a NaN might disappear
(e.g. comparisons).

>> I mean undefined, in the sense that 0/0 is undefined
>
> But 0.0/0.0 _is_ defined. It's NaN. ;)

Mathematically, it's undefined.

>> (I note that Python actually raises an exception for "0.0/0.0").
>
> IMHO, that's a bug. IEEE-754 states explicit that 0.0/0.0 is NaN.
> Pythons claims it implements IEEE-754. Python got it wrong.

But then IEEE-754 considers integers and floats to be completely different
beasts, while Python makes some effort to maintain a unified "numeric"
interface. If you really want IEEE-754 to-the-letter, that's undesirable,
although I'd question the choice of Python in such situations.

>> The definition is entirely arbitrary.
>
> I don't agree, but even if was entirely arbitrary, that doesn't make
> the decision meaningless. IEEE-754 says it's True, and standards
> compliance is valuable.

True, but so are other things. People with a background in mathematics (as
opposed to arithmetic and numerical methods) would probably consider
following the equivalence axioms to be valuable. Someone more used to
Python than IEEE-754 might consider following the "x is y => x == y" axiom
to be valuable.

As for IEEE-754 saying that it's True: they only really had two
choices: either it's True or it's False. NaNs provide "exceptions"
even if the hardware or the language lacks them, but that falls down once
you leave the scope of floating-point. It wouldn't have been within
IEEE-754's ambit to declare that comparing NaNs should return NaB
(Not A Boolean).

>> Actually, "NaN == NaN" makes more sense than "NaN != NaN", as the
>> former upholds the equivalence axioms
>
> You seem to be talking about reals. We're talking about floats.

Floats are supposed to approximate reals. They're also a Python
data type, and should make some effort to fit in with the rest of
the language.

>> If anything, it has proven to be a major nuisance. It takes a lot of
>> effort to create (or even specify) code which does the right thing in
>> the presence of NaNs.
>
> That's not been my experience. NaNs save a _huge_ amount of effort
> compared to having to pass value+status info around throughout complex
> caclulations.

That's what exceptions are for. NaNs probably save a huge amount of effort
in languages which lack exceptions, but that isn't applicable to Python.
In Python, they result in floats not "fitting in".

Let's remember that the thread started with an oddity relating to using
floats as dictionary keys, which mostly works but fails for NaN because of
the (highly unusual) property that "x == x" is False for NaNs.

Why did the Python developers choose this behaviour? It's quite likely
that they didn't choose it, but just overlooked the fact that NaN
creates this corner-case which breaks code which works for every other
primitive type except floats and even every other float except NaN.

In any case, I should probably re-iterate at this point that I'm not
actually arguing *for* exception-on-NaN or NaN==NaN or similar, just
pointing out that IEEE-754 is not the One True Approach and that other
approaches are not necessarily heresy and may have some merit. To go back
to the point where I entered this thread:

Chris Angelico

未读,
2011年6月3日 19:51:062011/6/3
收件人 pytho...@python.org
On Sat, Jun 4, 2011 at 9:29 AM, Nobody <nob...@nowhere.com> wrote:
> Floats are supposed to approximate reals. They're also a Python
> data type, and should make some effort to fit in with the rest of
> the language.
>

That's what I thought a week ago. But that's not really true. Floats
are supposed to hold non-integral values, but the data type is "IEEE
754 floating point", not "real number". There's several ways to store
real numbers, and not one of them is (a) perfectly accurate, or (b)
plausibly fast to calculate. Using rationals (fractions) with infinite
range leads to exponential performance costs, and still doesn't
properly handle irrationals like pi. And if you cap the denominator to
a power of 2 and cap the length of the mantissa, err I mean numerator,
then you have IEEE 754 floating point. Python offers you a way to
store and manipulate floating point numbers, not real numbers.

Chris Angelico

Gregory Ewing

未读,
2011年6月3日 20:14:032011/6/3
收件人
Steven D'Aprano wrote:
> Fair point. Call it an extension of the Kronecker Delta to the reals then.

That's called the Dirac delta function, and it's a bit different --
instead of a value of 1, it has an infinitely high spike of zero
width at the origin, whose integral is 1. (Which means it's not
strictly a function, because it's impossible for a true function
on the reals to have those properties.)

You don't normally use it on its own; usually it turns up as part
of an integral. I find it difficult to imagine a numerical algorithm
that relies on directly evaluating it. Such an algorithm would be
numerically unreliable. You just wouldn't do it that way; you'd
find some other way to calculate the integral that avoids evaluating
the delta.

> y = 2.1e12
> if abs(x - y) <= 1e-9:
> # x is equal to y, within exact tolerance
> ...

If you expect your numbers to be on the order of 1e12, then 1e-9
is obviously not a sensible choice of tolerance. You don't just
pull tolerances out of thin air, you justify them based on
knowledge of the problem at hand.

> In practice, either the function needs some sort of "how to decide
> equality" parameter,

If it's general purpose library code, then yes, that's exactly
what it needs.

> or you use exact floating point equality and leave it
> up to the caller to make sure the arguments are correctly rounded

Not really a good idea. Trying to deal with this kind of thing
by rounding is fraught with difficulties and pitfalls. It can
only work when you're not really using floats as approximations
of reals, but as some set of discrete values, in which case
it's probably safer to use appropriately-scaled integers.

> - from William Kahan, and the C99 standard: hypot(INF, x) is always INF
> regardless of the value of x, hence hypot(INF, NAN) returns INF.
>
> - since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0)
> is also 1.

These are different from your kronecker(), because the result
*never* depends on the value of x, whether it's NaN or not.
But kronecker() clearly does depend on the value of x sometimes.

The reasoning appears to be based on the idea that NaN means
"some value, we just don't know what it is". Accepting that
interpretation, the argument doesn't apply to kronecker().
You can't say that the NaN in kronecker(NaN, 42) doesn't
matter, because if you don't know what value it represents,
you can't be sure that it *isn't* meant to be 42.

> Another standard example where NANs get thrown away is the max and min
> functions. The latest revision of IEEE-754 (2008) allows for max and min
> to ignore NANs.

Do they provide a justification for that? I'm having trouble
seeing how it makes sense.

--
Greg

Steven D'Aprano

未读,
2011年6月3日 22:21:312011/6/3
收件人
On Sat, 04 Jun 2011 12:14:03 +1200, Gregory Ewing wrote:

> Steven D'Aprano wrote:
>> Fair point. Call it an extension of the Kronecker Delta to the reals
>> then.
>
> That's called the Dirac delta function, and it's a bit different

Yes, I'm familiar with the Dirac delta. As you say, it's not really
relevant to the question on hand.

In any case, my faux Kronecker was just a throw away example. If you
don't like it, throw it away! The specific example doesn't matter, since
the principle applies: functions may throw away NANs if they are not
relevant to the calculation. The presence of a NAN is not intended to be
irreversible, merely *usually* irreversible.


[...]


>> y = 2.1e12
>> if abs(x - y) <= 1e-9:
>> # x is equal to y, within exact tolerance ...
>
> If you expect your numbers to be on the order of 1e12, then 1e-9 is
> obviously not a sensible choice of tolerance. You don't just pull
> tolerances out of thin air, you justify them based on knowledge of the
> problem at hand.

Exactly. But that's precisely what people do! Hence my comment (which you
snipped) about people feeling virtuous because they avoid "testing floats
for equality", but then they go and do an operation like the above.

I'm sure you realise this, but for anyone reading who doesn't understand
why the above is silly, there are no floats less than 1e-9 from y above.

--
Steven

Steven D'Aprano

未读,
2011年6月4日 00:54:352011/6/4
收件人
On Fri, 03 Jun 2011 13:27:00 -0700, Carl Banks wrote:

> On Wednesday, June 1, 2011 5:53:26 PM UTC-7, Steven D&#39;Aprano wrote:

[...]


>> On the contrary, it blows it out of the water and stomps its corpse
>> into a stain on the ground.
>
> Really? I am claiming that, even if everyone and their mother thought
> exceptions were the best thing ever, NaN would have been added to IEEE
> anyway because most hardware didn't support exceptions.

You can claim that the Atlantic Ocean is made of strawberry yoghurt too,
if you like, but that doesn't make it true.

The standard was written by people who made and used hardware that *did*
support exceptions (hardware traps). They wrote code in languages that
supported traps (mostly Fortran). The IEEE-754 standard mandates
exceptions (not in the sense of Python exceptions, but still exceptions),
and recommends various exception handling mechanisms, including try/catch.

NANs weren't invented because the standard writers didn't have a way of
performing exceptions. You are simply *completely wrong* on that claim.
There are plenty of documents about the IEEE-754 standard, including
draft copies of it, and interviews with some of the participants. Go do
some reading before spreading more misapprehensions.

> You are saying that the existence of one early system that supported
> exceptions not merely argument against that claim, but blows it out of
> the water? Your logic sucks then.

Not one. ALL OF THEM. All of the manufacturers who were involved in the
IEEE-754 standard had traps: Intel, Cray, DEC, CDC, Apple, and Intel.
There may have been CPUs at the time that didn't have traps, but they
weren't used for numeric work and they didn't matter. Traps were a
standard mechanism used in numeric work.


> You want to go off arguing that there were good reasons aside from
> backwards compatibility they added NaN, be my guest. Just don't go
> around saying, "Its in IEEE there 4 its a good idear LOL". Lots of
> standards have all kinds of bad ideas in them for the sake of backwards
> compatibility, and when someone goes around claiming that something is a
> good idea simply because some standard includes it, it is the first sign
> that they're clueless about what standarization actually is.

No, I don't think that supporting NANs is useful merely because it is a
standard. I've *repeatedly* said that NANs are useful as an alternative
to exceptions, so don't misrepresent what I say.


[...]


> Here's the problem: Python is not for serious numerical programming.

I disagree. So do the numpy and scipy communities, and sage, and
matplotlib. So do the Python developers: Python now has a fully IEEE-754
compliant Decimal implementation. (What I want is floats to be equally
compliant. I don't care if they default to raising exceptions.)

Despite it's weaknesses, Python is a good alternative to things like
Mathematica and Matlab (which of course have weaknesses of their own),
and it's not just me comparing them:

http://vnoel.wordpress.com/2008/05/03/bye-matlab-hello-python-thanks-sage/
http://www.larssono.com/musings/matmatpy/index.html
http://blog.revolutionanalytics.com/2009/07/mathematica-vs-matlab-vs-python.html


> Yeah, it's a really good language for calling other languages to do
> numerical programming, but it's not good for doing serious numerical
> programming itself. Anyone with some theoretical problem where NaN is a
> good idea should already be using modules or separate programs written
> in C or Fortran.

And since Python is intended to be the glue between these modules, how
are you supposed to get data containing NANs between these modules unless
Python supports NANs?

I shouldn't have to fear running a snippet of Python code in case it
chokes on a NAN. That cripples Python's usefulness as a glue language for
numeric work.


> Casual and lightweight numerical work (which Python is good at) is not a
> wholly separate problem domain where the typical rules ("Errors should
> never pass silently") should be swept aside.

NANs are not necessarily errors, they're hardly silent, and if you don't
want NANs, the standard mandates that there be a way to turn them off.

> [snip]
>> You'll note that, out of the box, numpy generates NANs:
>>
>> >>> import numpy
>> >>> x = numpy.array([float(x) for x in range(5)]) x/x
>> Warning: invalid value encountered in divide array([ nan, 1., 1.,
>> 1., 1.])
>
> Steven, seriously I don't know what's going through your head. I'm
> saying strict adherence to IEEE is not the best idea, and you cite the
> fact that a library tries to strictly adhere to IEEE as evidence that
> strictly adhering to IEEE is a good idea. Beg the question much?

And I'm demonstrating that the people who do serious numeric work stick
to the standard as much as possible. They do this because the standard is
proven to be useful, otherwise they would abandon it, or start a new
standard.


[...]


> It's clear tha IEEE's NaN handling is woefully out of place in the
> philosophy of Python, which tries to be newbie friendly and robust to
> errors;

NANs are newbie friendly, and robust to errors.

You can't get more newbie friendly than Apple's Hypertalk, sadly
abandoned. Among Mac users in the late 80s and 90s, Hypertalk, and its
front end Hypercard, was like software Lego and BASIC rolled into one.
And it supported NANs from day one.


--
Steven

Ethan Furman

未读,
2011年6月4日 02:04:382011/6/4
收件人 pytho...@python.org
Steven D'Aprano wrote:
> NANs are not necessarily errors, they're hardly silent, and if you don't
> want NANs, the standard mandates that there be a way to turn them off.

So how does one turn them off in standard Python?

~Ethan~

rusi

未读,
2011年6月4日 03:52:172011/6/4
收件人
On Jun 4, 4:29 am, Nobody <nob...@nowhere.com> wrote:
> On Fri, 03 Jun 2011 14:52:39 +0000, Grant Edwards wrote:
> >> It's arguable that NaN itself simply shouldn't exist in Python; if
> >> the FPU ever generates a NaN, Python should raise an exception at
> >> that point.
>

>


> If you're "fluent" in IEEE-754, then you won't find its behaviour
> unexpected. OTOH, if you are approach the issue without preconceptions,
> you're likely to notice that you effectively have one exception mechanism
> for floating-point and another for everything else.

Three actually: None, nan and exceptions
Furthermore in boolean contexts nan behaves like True whereas None
behaves like false.

Steven D'Aprano

未读,
2011年6月4日 05:35:392011/6/4
收件人

Turn them off? You have to find a way to turn them on first! What makes
you think that Python supports IEEE-754 for floats?

By default, Decimal raises exceptions for division by zero.

>>> import decimal
>>> 1/decimal.Decimal(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.1/decimal.py", line 1359, in __rtruediv__
return other.__truediv__(self, context=context)
File "/usr/local/lib/python3.1/decimal.py", line 1292, in __truediv__
return context._raise_error(DivisionByZero, 'x / 0', sign)
File "/usr/local/lib/python3.1/decimal.py", line 3812, in _raise_error
raise error(explanation)
decimal.DivisionByZero: x / 0


To get INF or NAN semantics is easy for decimal:

>>> decimal.setcontext(decimal.ExtendedContext)
>>> 1/decimal.Decimal(0)
Decimal('Infinity')


but impossible for float. The best you can do is subclass float, or
surround each calculation in a try...except, which defeats the point of
them.

In general, Python goes to great trouble and expense to avoid generating
any float INFs or NANs -- and when it does generate them, it's normally
at the whim of the C maths library and therefore non-portable.

>>> math.sqrt(-1.0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: math domain error

>>> decimal.Decimal(-1).sqrt()
Decimal('NaN')


And sometimes inconsistently so:

>>> math.fsum([1, 2, float('inf'), float('nan')])
nan
>>> math.fsum([1, 2, float('inf'), float('-inf')])
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: -inf + inf in fsum

--
Steven

Ben Finney

未读,
2011年6月4日 06:20:192011/6/4
收件人
Steven D'Aprano <steve+comp....@pearwood.info> writes:

> What makes you think that Python supports IEEE-754 for floats?

That would be an easy impression to get from this long rambling thread.
The argument that Python's ‘float’ type is not meant to be anything
*but* an IEEE 754 floating point type has been made several times.

What would you say Python's ‘float’ type is intended to be, if not an
IEEE 754 floating point type?

--
\ “Most people, I think, don't even know what a rootkit is, so |
`\ why should they care about it?” —Thomas Hesse, Sony BMG, 2006 |
_o__) |
Ben Finney

Nobody

未读,
2011年6月4日 15:29:452011/6/4
收件人
On Sat, 04 Jun 2011 00:52:17 -0700, rusi wrote:

>> If you're "fluent" in IEEE-754, then you won't find its behaviour
>> unexpected. OTOH, if you are approach the issue without preconceptions,
>> you're likely to notice that you effectively have one exception mechanism
>> for floating-point and another for everything else.
>
> Three actually: None, nan and exceptions

None isn't really an exception; at least, it shouldn't be used like that.
Exceptions are for conditions which are in some sense "exceptional". Cases
like dict.get() returning None when the key isn't found are meant for
the situation where the key not existing is unexceptional. If you "expect"
the key to exist, you'd use dict[key] instead (and get an exception if it
doesn't).

Ethan Furman

未读,
2011年6月4日 17:28:242011/6/4
收件人 pytho...@python.org
Steven D'Aprano wrote:
> On Fri, 03 Jun 2011 23:04:38 -0700, Ethan Furman wrote:
>
>> Steven D'Aprano wrote:
>>> NANs are not necessarily errors, they're hardly silent, and if you
>>> don't want NANs, the standard mandates that there be a way to turn them
>>> off.
>> So how does one turn them off in standard Python?
>
> Turn them off? You have to find a way to turn them on first! What makes
> you think that Python supports IEEE-754 for floats?

So if Python doesn't support IEEE-754 for floats, why the big deal about
NaNs? Does it have to do with how the NumPy, SciPy, Sage, etc.,
libraries interface with Python?

~Ethan~

Robert Kern

未读,
2011年6月4日 17:49:402011/6/4
收件人 pytho...@python.org
On 6/4/11 4:28 PM, Ethan Furman wrote:
> Steven D'Aprano wrote:
>> On Fri, 03 Jun 2011 23:04:38 -0700, Ethan Furman wrote:
>>
>>> Steven D'Aprano wrote:
>>>> NANs are not necessarily errors, they're hardly silent, and if you
>>>> don't want NANs, the standard mandates that there be a way to turn them
>>>> off.
>>> So how does one turn them off in standard Python?
>>
>> Turn them off? You have to find a way to turn them on first! What makes you
>> think that Python supports IEEE-754 for floats?
>
> So if Python doesn't support IEEE-754 for floats, why the big deal about NaNs?

Steven is being a little hyperbolic. Python does not fully conform to all of the
details of the IEEE-754 specification, though it does conform to most of them.
In particular, it raises an exception when you divide by 0.0 when the IEEE-754
specification states that you ought to issue the "divide by zero" or "invalid"
signal depending on the numerator (and which may be trapped by the user, but not
by default) and will return either an inf or a NaN value if not trapped. Thus,
the canonical example of a NaN-returning operation in fully-conforming IEEE-754
arithmetic, 0.0/0.0, raises an exception in Python. You can generate a NaN by
other means, namely dividing inf/inf.

One other deviation is the one which you were asking about. The standard does
say that the "invalid" signal should be issued in most circumstances that
generate a NaN and that the user should be able to trap that signal. Python
explicitly disables that mechanism. It used to provide an optional module,
fpectl, for providing a signal handler for those. However, creating a handler
for such a low-level signal in a high-level language like Python is inherently
unsafe, so it is not really supported any more.

The decimal module mostly gets it right. It translates the signals into Python
exceptions that can be disabled in a particular context.

Steven D'Aprano

未读,
2011年6月4日 22:03:032011/6/4
收件人
On Sat, 04 Jun 2011 16:49:40 -0500, Robert Kern wrote:

> Steven is being a little hyperbolic. Python does not fully conform to
> all of the details of the IEEE-754 specification, though it does conform
> to most of them.

I'm not sure that "most" is correct, but that depends on how you count
the details. Let's just say it has partial support and let's not attempt
to quantify it.

(Which is a big step up from how things were even just a few years ago,
when there wasn't even a consistent way to create special values like INF
and NAN. Many thanks to those who did that work, whoever you are!)


> In particular, it raises an exception when you divide
> by 0.0 when the IEEE-754 specification states that you ought to issue
> the "divide by zero" or "invalid" signal depending on the numerator (and
> which may be trapped by the user, but not by default) and will return
> either an inf or a NaN value if not trapped. Thus, the canonical example
> of a NaN-returning operation in fully-conforming IEEE-754 arithmetic,
> 0.0/0.0, raises an exception in Python. You can generate a NaN by other
> means, namely dividing inf/inf.

But it's inconsistent and ad hoc. The guiding philosophy of Python
floating point maths appears to be:

(1) Python will always generate an exception on any failed operation, and
never a NAN or INF (I believe I've even seen Guido explicitly state this
as a design principle);

(2) arithmetic expressions and maths functions will usually, but not
always, honour NANs and INFs if you provide then as input.

I see this thread being driven by people who have failed to notice that
(1) already applies, and so pure Python will never give them a NAN they
didn't explicitly create themselves, but want to remove (2) as well.

Personally I think Python would be a better language if it *always*
returned NANs and INFs for failed float operations, but I recognise that
I'm in a minority and that many people will prefer exceptions. Even
though I think Guido is wrong to believe that exceptions are more newbie
friendly than NANs (my Hypercard experience tells me differently), I
accept that opinions differ and I'm happy for exceptions to be the
default behaviour.

But it makes me rather annoyed when people who know nothing about
IEEE-754 special values, their uses and justification, come along and
insist that the only right answer is to throw away what little support
for them we have.


> One other deviation is the one which you were asking about. The standard
> does say that the "invalid" signal should be issued in most
> circumstances that generate a NaN and that the user should be able to
> trap that signal. Python explicitly disables that mechanism. It used to
> provide an optional module, fpectl, for providing a signal handler for
> those. However, creating a handler for such a low-level signal in a
> high-level language like Python is inherently unsafe, so it is not
> really supported any more.

More unsafe than ctypes?

In any case, I believe that in Python, catching an exception is more or
less the moral equivalent to trapping a low-level signal.


> The decimal module mostly gets it right. It translates the signals into
> Python exceptions that can be disabled in a particular context.

All I want for Christmas is for floats to offer the same level of
IEEE-754 support as decimal, only faster. And a pony.

--
Steven

Steven D'Aprano

未读,
2011年6月5日 03:21:102011/6/5
收件人
On Sat, 04 Jun 2011 00:29:10 +0100, Nobody wrote:

> If you're "fluent" in IEEE-754, then you won't find its behaviour
> unexpected. OTOH, if you are approach the issue without preconceptions,
> you're likely to notice that you effectively have one exception
> mechanism for floating-point and another for everything else.

Returning a sentinel meaning "an exceptional event occurred" is hardly
unusual, even in Python. str.find() does is, as does re.search() and
re.match().

In any case, nobody says that NANs should replace exceptions for floats,
least of all the standard.


[...]
> As for IEEE-754 saying that it's [NAN == NAN] True: they only really


> had two choices: either it's True or it's False.

Incorrect. They could have specified that it was an error, like dividing
by zero, but they didn't. Instead, the standard specifies that there are
four mutually exclusive relationships possible:

greater than
less than
equal
unordered

and that comparisons should either return a code identifying the
relationship, or a True/False value. The standard allows for order
comparisons less_than(x, y) etc. in both signalling and quiet forms.

See section 7.11 of
http://www.validlab.com/754R/drafts/archive/2006-10-04.pdf

(the most recent draft of the 2008 standard I can find without paying for
the official standard).


> NaNs provide "exceptions" even if the
> hardware or the language lacks them, but that falls down once you leave
> the scope of floating-point. It wouldn't have been within IEEE-754's
> ambit to declare that comparing NaNs should return NaB (Not A Boolean).

Of course it would have been. That's effectively what the standard
actually does. Not "Not A Bool" per se, but comparisons can return
"Unordered", or they can signal.

--
Steven

Erik Max Francis

未读,
2011年6月5日 03:27:342011/6/5
收件人
Gregory Ewing wrote:
> Steven D'Aprano wrote:
>> Fair point. Call it an extension of the Kronecker Delta to the reals
>> then.
>
> That's called the Dirac delta function, and it's a bit different --
> instead of a value of 1, it has an infinitely high spike of zero
> width at the origin, whose integral is 1. (Which means it's not
> strictly a function, because it's impossible for a true function
> on the reals to have those properties.)
>
> You don't normally use it on its own; usually it turns up as part
> of an integral. I find it difficult to imagine a numerical algorithm
> that relies on directly evaluating it. Such an algorithm would be
> numerically unreliable. You just wouldn't do it that way; you'd
> find some other way to calculate the integral that avoids evaluating
> the delta.

True, but that's the Dirac delta, which as you (and later he) said, is
quite a different thing, not simply a Kronecker delta extended to the
reals. Kronecker deltas are used all the time over the reals; for
instance, in tensor calculus. Just because the return values are either
0 or 1 doesn't mean that their use is incompatible over reals (as
integers are subsets of reals).

--
Erik Max Francis && m...@alcyone.com && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM/Y!M/Skype erikmaxfrancis
It is a rough road that leads to the heights of greatness.
-- Seneca, 4 BC-65 AD

Nobody

未读,
2011年6月5日 14:15:022011/6/5
收件人
On Sun, 05 Jun 2011 07:21:10 +0000, Steven D'Aprano wrote:

> Returning a sentinel meaning "an exceptional event occurred" is hardly
> unusual, even in Python. str.find() does is, as does re.search() and
> re.match().

These are not "exceptional" conditions; if they were, an exception would
be used.

E.g. dict supports both d.get(key) and d[key] for lookups. The former
returns a sentinel, the latter raises an exception. The latter makes sense
if you "expect" the key to be present, the former if you don't.

>> As for IEEE-754 saying that it's [NAN == NAN] True: they only really
>> had two choices: either it's True or it's False.
>
> Incorrect. They could have specified that it was an error, like dividing
> by zero, but they didn't.

Specifying an error doesn't remove the requirement to also specify a
result. E.g. dividing a finite value by zero produces a result of
infinity. In languages which lack exceptions, errors only matter if the
code bothers to check for them (if such checks are even possible; C89
lacks <fenv.h>).

Robert Kern

未读,
2011年6月5日 15:44:232011/6/5
收件人 pytho...@python.org
On 6/4/11 9:03 PM, Steven D'Aprano wrote:
> On Sat, 04 Jun 2011 16:49:40 -0500, Robert Kern wrote:
>
>> Steven is being a little hyperbolic. Python does not fully conform to
>> all of the details of the IEEE-754 specification, though it does conform
>> to most of them.
>
> I'm not sure that "most" is correct, but that depends on how you count
> the details. Let's just say it has partial support and let's not attempt
> to quantify it.

Fair enough. When I said "most", I was really counting in terms of operations
that are actually performed, i.e. successful ones. If I have two regular
floating point numbers x and y and add them together, the result is going to be
what the IEEE-754 standard specifies almost all of the time. Almost every flop
you actually do in Python will give you the IEEE-754 answer.

Of course, where Python tends to diverge is in the failure modes and error
conditions. And also of course, the standard has more rules for all of those
special cases, so saying that it conforms to "most of the rules" is not quite
right, either.

> (Which is a big step up from how things were even just a few years ago,
> when there wasn't even a consistent way to create special values like INF
> and NAN. Many thanks to those who did that work, whoever you are!)
>
>
>> In particular, it raises an exception when you divide
>> by 0.0 when the IEEE-754 specification states that you ought to issue
>> the "divide by zero" or "invalid" signal depending on the numerator (and
>> which may be trapped by the user, but not by default) and will return
>> either an inf or a NaN value if not trapped. Thus, the canonical example
>> of a NaN-returning operation in fully-conforming IEEE-754 arithmetic,
>> 0.0/0.0, raises an exception in Python. You can generate a NaN by other
>> means, namely dividing inf/inf.
>
> But it's inconsistent and ad hoc. The guiding philosophy of Python
> floating point maths appears to be:
>
> (1) Python will always generate an exception on any failed operation, and
> never a NAN or INF (I believe I've even seen Guido explicitly state this
> as a design principle);

Well, if so, then it doesn't do it very well:

[~]
|2> inf = 1e300 * 1e300

[~]
|3> nan = inf / inf

[~]
|4> inf
inf

[~]
|5> nan
nan

More difficult to implement safely and more tempting to do unsafe things. If I
remember correctly, I don't think you are supposed to allocate new memory inside
such a signal handler. Of course, that's almost impossible to do in pure Python
code.

> In any case, I believe that in Python, catching an exception is more or
> less the moral equivalent to trapping a low-level signal.

I agree.

>> The decimal module mostly gets it right. It translates the signals into
>> Python exceptions that can be disabled in a particular context.
>
> All I want for Christmas is for floats to offer the same level of
> IEEE-754 support as decimal, only faster. And a pony.

Hear-hear!

Chris Torek

未读,
2011年6月5日 18:54:402011/6/5
收件人
In article <mailman.2438.1307133...@python.org>

Chris Angelico <ros...@gmail.com> wrote:
>Uhh, noob question here. I'm way out of my depth with hardware
>floating point.
>
>Isn't a signaling nan basically the same as an exception?

Not exactly, but one could think of them as "very similar".

Elsethread, someone brought up the key distinction, which is
that in hardware that implements IEEE arithmetic, you have two
possibilities at pretty much all times:

- op(args) causes an exception (and therefore does not deliver
a result), or
- op(args) delivers a result that may indicate "exception-like
lack of result".

In both cases, a set of "accrued exceptions" flags accumulates the
new exception, and a set of "most recent exceptions" flags tells
you about the current exception. A set of "exception enable"
flags -- which has all the same elements as "current" and
"accrued" -- tells the hardware which "exceptional results"
should trap.

A number is "NaN" if it has all-1-bits for its exponent and at
least one nonzero bit in its mantissa. (All-1s exponent, all-0s
mantissa represents Infinity, of the sign specified by the sign
bit.) For IEEE double precision floating point, there are 52
mantissa bits, so there are (2^52-1) different NaN bit patterns.
One of those 52 bits is the "please signal on use" bit.

A signalling NaN traps at (more or less -- details vary depending
on FPU architecture) load time. However, there must necessarily
(for OS and thread-library level context switching) be a method
of saving the FPU state without causing an exception when loading
a NaN bit pattern, even if the NaN has the "signal" bit set.

>Which would imply that the hardware did support exceptions (if it
>did indeed support IEEE floating point, which specifies signalling nan)?

The actual hardware implementations (of which there are many) handle
the niggling details differently. Some CPUs do not implement
Infinity and NaN in hardware at all, delivering a trap to the OS
on every use of an Inf-or-NaN bit pattern. The OS then has to
emulate what the hardware specification says (if anything), and
make it look as though the hardware did the job. Sometimes denorms
are also done in software.

Some implementations handle everything directly in hardware, and
some of those get it wrong. :-) Often the OS has to fix up some
special case -- for instance, the hardware might trap on every NaN
and make software decide whether the bit pattern was a signalling
NaN, and if so, whether user code should receive an exception.

As I think John Nagle pointed out earlier, sometimes the hardware
does "support" exceptions, but rather loosely, where the hardware
delivers a morass of internal state and a vague indication that
one or more exceptions happened "somewhere near address <A>",
leaving a huge pile of work for software.

In Python, the decimal module gets everything either right or
close-to-right per the (draft? final? I have not kept up with
decimal FP standards) standard. Internal Python floating point,
not quite so much.

Chris Angelico

未读,
2011年6月5日 19:13:252011/6/5
收件人 pytho...@python.org
On Mon, Jun 6, 2011 at 8:54 AM, Chris Torek <nos...@torek.net> wrote:
> A signalling NaN traps at (more or less -- details vary depending
> on FPU architecture) load time.

Load. By this you mean the operation of taking a bit-pattern in RAM
and putting it into a register? So, you can calculate 0/0, get a
signalling NaN, and then save that into a memory variable, all without
it trapping; and then it traps when you next perform an operation on
that number?

Apologies, this is getting quite off-topic and away from Python.

Chris Angelico

Steven D'Aprano

未读,
2011年6月5日 20:55:182011/6/5
收件人
On Sun, 05 Jun 2011 19:15:02 +0100, Nobody wrote:

> On Sun, 05 Jun 2011 07:21:10 +0000, Steven D'Aprano wrote:
>
>> Returning a sentinel meaning "an exceptional event occurred" is hardly
>> unusual, even in Python. str.find() does is, as does re.search() and
>> re.match().
>
> These are not "exceptional" conditions; if they were, an exception would
> be used.


Exceptional does not mean rare or unexpected. Searching for a substring
returns the offset of that substring. If it is not found, that's the
exceptional case.

str.index raises an exception, and str.find returns a sentinel:

>>> "spam".index("z")


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

ValueError: substring not found
>>> "spam".find("z")
-1

>>> As for IEEE-754 saying that it's [NAN == NAN] True: they only really
>>> had two choices: either it's True or it's False.
>>
>> Incorrect. They could have specified that it was an error, like
>> dividing by zero, but they didn't.
>
> Specifying an error doesn't remove the requirement to also specify a
> result.

Untrue, but irrelevant. (Standards often allow implementation-dependent
behaviour.) The standard *could* have said that NAN == NAN be an error,
but *didn't*, so what it should or shouldn't have done if it were an
error is irrelevant, because it's not an error.

And thus we come back full circle. Hundreds of words, and I'm still no
closer to understanding why you think that "NAN == NAN" should be an
error.

--
Steven

Steven D'Aprano

未读,
2011年6月5日 21:21:432011/6/5
收件人
On Mon, 06 Jun 2011 09:13:25 +1000, Chris Angelico wrote:

> On Mon, Jun 6, 2011 at 8:54 AM, Chris Torek <nos...@torek.net> wrote:
>> A signalling NaN traps at (more or less -- details vary depending on
>> FPU architecture) load time.
>
> Load. By this you mean the operation of taking a bit-pattern in RAM and
> putting it into a register? So, you can calculate 0/0, get a signalling
> NaN, and then save that into a memory variable, all without it trapping;
> and then it traps when you next perform an operation on that number?

The intended behaviour is operations on "quiet NANs" should return NANs,
but operations on "signalling NANs" should cause a trap, which can either
be ignored, and converted into a quiet NAN, or treated as an exception.

E.g. in Decimal:


>>> import decimal
>>> qnan = decimal.Decimal('nan') # quiet NAN
>>> snan = decimal.Decimal('snan') # signalling NAN
>>> 1 + qnan
Decimal('NaN')
>>> 1 + snan


Traceback (most recent call last):
File "<stdin>", line 1, in <module>

File "/usr/local/lib/python3.1/decimal.py", line 1108, in __add__
ans = self._check_nans(other, context)
File "/usr/local/lib/python3.1/decimal.py", line 746, in _check_nans
self)


File "/usr/local/lib/python3.1/decimal.py", line 3812, in _raise_error
raise error(explanation)

decimal.InvalidOperation: sNaN

> Apologies, this is getting quite off-topic and away from Python.

Not at all. I think this is a big myth, that the IEEE-754 standard is
irrelevant for high-level programming languages. It's not.

The state of the art of floating point is in a poor state. Not anywhere
near as poor as the bad old days before there was *any* standardization
at all, things were terrible back then, but ignoring the hard-earned
lessons of those who lived through the days before the standard is a
mistake. IEEE-754 is not just for hardware, particularly since now the
vast majority of machines run hardware which almost completely conforms
to IEEE-754. The bottleneck now is not hardware, but languages that don't
treat floating point maths correctly.

http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf


(The article is seven years old now, but as far as I know, the criticisms
still apply.)


--
Steven

Chris Torek

未读,
2011年6月5日 21:56:512011/6/5
收件人
>> On Mon, Jun 6, 2011 at 8:54 AM, Chris Torek <nos...@torek.net> wrote:
>>> A signalling NaN traps at (more or less -- details vary depending on
>>> FPU architecture) load time.

>On Mon, 06 Jun 2011 09:13:25 +1000, Chris Angelico wrote:
>> Load. By this you mean the operation of taking a bit-pattern in RAM and
>> putting it into a register? So, you can calculate 0/0, get a signalling
>> NaN, and then save that into a memory variable, all without it trapping;
>> and then it traps when you next perform an operation on that number?

I mean, if you think of the FPU as working (in principle) with
either just one or two registers and a load/store architecture, or
a tiny little FPU-stack (the latter is in fact the case for Intel
FPUs), with no optimization, you get a trap when you attempted to
load-up the sNaN value in order to do some operation on it. For
instance, if x is an sNaN, "y = x + 1" turns into "load x; load
1.0; add; store y" and the trap occurs when you do "load x".

In article <4dec2ba6$0$29996$c3e8da3$5496...@news.astraweb.com>,


Steven D'Aprano <steve+comp....@pearwood.info> wrote:
>The intended behaviour is operations on "quiet NANs" should return NANs,
>but operations on "signalling NANs" should cause a trap, which can either
>be ignored, and converted into a quiet NAN, or treated as an exception.
>
>E.g. in Decimal:
>
>>>> import decimal
>>>> qnan = decimal.Decimal('nan') # quiet NAN
>>>> snan = decimal.Decimal('snan') # signalling NAN
>>>> 1 + qnan
>Decimal('NaN')
>>>> 1 + snan
>Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> File "/usr/local/lib/python3.1/decimal.py", line 1108, in __add__
> ans = self._check_nans(other, context)
> File "/usr/local/lib/python3.1/decimal.py", line 746, in _check_nans
> self)
> File "/usr/local/lib/python3.1/decimal.py", line 3812, in _raise_error
> raise error(explanation)
>decimal.InvalidOperation: sNaN

Moreover:

>>> cx = decimal.getcontext()
>>> cx
Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999999, Emax=999999999, capitals=1, flags=[], traps=[DivisionByZero, Overflow, InvalidOperation])
>>> cx.traps[decimal.InvalidOperation] = False
>>> snan
Decimal("sNaN")
>>> 1 + snan
Decimal("NaN")

so as you can see, by ignoring the InvalidOperation exception, we
had our sNaN converted to a (regular, non-signal-ing, "quiet") NaN,
and 1 + NaN is still NaN.

(I admit that my mental model using "loads" can mislead a bit since:

>>> cx.traps[decimal.InvalidOperation] = True # restore trapping
>>> also_snan = snan
>>>

A simple copy operation is not a "load" in this particular sense,
and on most real hardware, one just uses an ordinary 64-bit integer
memory-copying operation to copy FP bit patterns from one place to
another.)

There is some good information on wikipedia:

http://en.wikipedia.org/wiki/NaN

(Until I read this, I was not aware that IEEE now recommends that
the quiet-vs-signal bit be 1-for-quiet 0-for-signal. I prefer the
other way around since you can then set memory to all-1-bits if it
contains floating point numbers, and get exceptions if you refer
to a value before seting it.)

Chris Angelico

未读,
2011年6月6日 00:11:032011/6/6
收件人 pytho...@python.org
On Mon, Jun 6, 2011 at 11:21 AM, Steven D'Aprano
<steve+comp....@pearwood.info> wrote:
> The intended behaviour is operations on "quiet NANs" should return NANs,
> but operations on "signalling NANs" should cause a trap, which can either
> be ignored, and converted into a quiet NAN, or treated as an exception.
>
> E.g. in Decimal: [snip]

So does this mean that:

a = 0.0/0.0
b = a + 1

(with signalling NANs) should trap on the second line but not the
first? That's the first "operation on a nan".

> http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf
> (The article is seven years old now, but as far as I know, the criticisms
> still apply.)

Thanks, that's my travel-home literature for tonight! :) I read the
other two articles you sent me (asynchronously), and they're most
interesting. I'm definitely still inclined to avoid any sort of
floating point work if at all possible, but hey, this gives me more
topics to bore people with at parties! (Wait. I never get invited to
parties any more. I think my work on that front is complete.)

Chris Angelico

正在加载更多帖子。
0 个新帖子