That does not match my experience. In Python 3.2, I generate a large
unicode string, and an equal but not identical copy:
s = "aЖcdef"*100000
t = "a" + s[1:]
assert s is not t and s == t
Using timeit, s == s is about 10000 times faster than s == t.
--
Steven
This was discussed not long ago in a different thread. Here is the line:
http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/unicodeobject.c#l10508
As I understood it that line is the reason that comparisons for
interned strings are faster.
Oscar
I'm pretty sure the behaviour is correct. When I get home this evening,
I will check my copy of the Standard Apple Numerics manual (one of the
first IEEE 754 compliant systems). In the meantime, I quote from
"What Every Computer Scientist Should Know About Floating-Point
Arithmetic"
"Since comparing a NaN to a number with <, ≤, >, ≥, or = (but not ≠)
always returns false..."
(Admittedly it doesn't specifically state the case of comparing a NAN
with a NAN.)
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
On Oct 9, 2012 9:20 AM, "Greg Ewing" <greg....@canterbury.ac.nz> wrote:
>
> Oscar Benjamin wrote:
>>
>> The main purpose of quiet NaNs is to propagate through computation
>> ruining everything they touch.
>
>
> But they stop doing that as soon as they hit an if statement.
> It seems to me that the behaviour chosen for NaN comparison
> could just as easily make things go wrong as make them go
> right. E.g.
>
> while not (error < epsilon):
> find_a_better_approximation()
>
> If error ever ends up being NaN, this will go into an
> infinite loop.
I should expect that an experienced numericist would be aware of the possibility of a NaN and make a trivial modification of your loop to take advantage of the simple fact that any comparison with NaN returns false. It is only because you have artificially placed a not in the while clause that it doesn't work. I would have tested for error>eps without even thinking about NaNs.
Oscar
> The main purpose of quiet NaNs is to propagate through computation
> ruining everything they touch. In a programming language like C that
> lacks exceptions this is important as it allows you to avoid checking
> all the time for invalid values, whilst still being able to know if
> the end result of your computation was ever affected by an invalid
> numerical operation.
Correct, but I'd like to point out that NaNs are a bit more
sophisticated than just "numeric contagion".
1) NaNs carry payload, so you can actually identify what sort of
calculation failed. E.g. NaN-27 might mean "logarithm of a negative
number", while NaN-95 might be "inverse trig function domain error".
Any calculation involving a single NaN is supposed to propagate the
same payload, so at the end of the calculation you can see that you
tried to take the log of a negative number and debug accordingly.
2) On rare occasions, NaNs can validly disappear from a calculation,
leaving you with a non-NaN answer. The rule is, if you can replace
the NaN with *any* other value, and still get the same result, then
the NaN is irrelevant and can be consumed. William Kahan gives an
example:
For example, 0*NaN must be NaN because 0*∞ is an INVALID
operation (NaN). On the other hand, for hypot(x, y) :=
√(x*x + y*y) we find that hypot(∞, y) = +∞ for all real y,
finite or not, and deduce that hypot(∞, NaN) = +∞ too;
naive implementations of hypot may do differently.
Page 7 of http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
--
Steven
On 10/10/12 09:13, Joshua Landau wrote:Because that would complicate Python's using floats for absolutely no benefit.
Just a curiosity here (as I can guess of plausible reasons myself, so there
probably are some official stances).
Is there a reason NaNs are not instances of NaN class?
Instead of float operations always returning a float, they would have to return
a float or a NAN. To check for a valid floating point instance, instead of
saying:
isinstance(x, float)
you would have to say:
isinstance(x, (float, NAN))
>>> class NAN(float):
... def __new__(self):
... return float.__new__(self, "nan")
... def __eq__(self, other):
... return other is self
...
>>> isinstance(NAN(), float)
True
>>> NAN() is NAN()
False
>>> NAN() == NAN()
False
>>> x = NAN()
>>> x is x
True
>>> x == x
True
>>> x
nan
And what about infinities, denorm numbers, and negative zero? Do they get
dedicated classes too?
And what is the point of this added complexity? Nothing.
You *still* have the rule that "x == x for all x, except for NANs".
The only difference is that "NANs" now means "instances of NAN class" rather than
"NAN floats" (and Decimals).
Working with IEEE 754 floats is now far more of
a nuisance because some valid floating point values aren't floats but have a
different class, but nothing meaningful is different.
Making NANs their own class wouldn't give you that. If we wanted thatThen x == x would be True (as they want), but [this NaN] == [that NaN]
would be False, as expected.
behaviour, we could have it without introducing a NAN class: just change the
list __eq__ method to scan the list for a NAN using math.isnan before checking
whether the lists were identical.
>>> x == x
True
>>> [NAN()] == [NAN()]
False
But that would defeat the purpose of the identity check (an optimization to
avoid scanning the list)! Replacing math.isnan with isinstance doesn't change
that.That question has already been raised, and answered, repeatedly in this thread.
I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),
Container equalities are not a big deal. I'm not sure what problem you thinkbut it seems a lot less of a big deal than all of the exceptions with
container equalities.
you are solving.
Why would you assume that? I mentioned it from honest curiosity, and all I got back was an attack. Please, I want to be civil but you need to act less angrily.
On 11/10/12 09:05, Joshua Landau wrote:
After re-re-reading this thread, it turns out one *(1)* post and two
*(2)* answersthat *float("nan") is not float("nan")* .
to that post have covered a topic very similar to the one I have raised.
All of the others, to my understanding, do not dwell over the fact
That's no different from any other float.
py> float('nan') is float('nan')
False
py> float('1.5') is float('1.5')
False
Floats are not interned or cached, although of course interning is
implementation dependent and this is subject to change without notice.
For that matter, it's true of *nearly all builtins* in Python. The
exceptions being bool(obj) which returns one of two fixed instances,
and int() and str(), where *some* but not all instances are cached.
If you are doing numeric work, you *should* differentiate between -0.0Response 1:
This implies that you want to differentiate between -0.0 and +0.0. That is
bad.
My response:
Why would I want to do that?
and 0.0. That's why the IEEE 754 standard mandates a -0.0.
Both -0.0 and 0.0 compare equal, but they can be distinguished (although
doing so is tricky in Python). The reason for distinguishing them is to
distinguish between underflow to zero from positive or negative values.
E.g. log(x) should return -infinity if x underflows from a positive value,
and a NaN if x underflows from a negative.