Float precision and float equality

95 views

Anton81

Dec 5, 2009, 10:37:17 AM12/5/09
to
I'd like to do calculations with floats and at some point equality of
two number will be checked.
What is the best way to make sure that equality of floats will be
detected, where I assume that mismatches beyond a certain point are
due to truncation errors?

Mark Dickinson

Dec 5, 2009, 11:46:55 AM12/5/09
to

Well, it depends a lot on the details of the application, but
a good general scheme is to allow both a fixed relative error
and an absolute error, and to assert that your two values are
'nearly equal' if they're within *either* the relative error *or*
the absolute error. Something like, for example:

def almostEqual(expected, actual, rel_err=1e-7, abs_err = 1e-20):
absolute_error = abs(actual-expected)
return absolute_error <= max(abs_err, rel_err * abs(expected))

Then choose the values of rel_err and abs_err to suit your
application.

What sort of calculations are you doing?
--
Mark

Raymond Hettinger

Dec 5, 2009, 3:25:48 PM12/5/09
to

Less short answer: use Decimal with a high precision and then round()
or quantize.
Long answer: the amount of error depends on the calculation and the
scale of the inputs; some calculations potentially propagate tiny
errors to the point where they become large enough to overpower the
signal in your data (e.g. the Lorentz equation or some other chaotic
sequence).

Raymond

Mark Dickinson

Dec 5, 2009, 3:56:38 PM12/5/09
to
On Dec 5, 8:25 pm, Raymond Hettinger <pyt...@rcn.com> wrote:
> On Dec 5, 7:37 am, Anton81 <gerenu...@googlemail.com> wrote:
>
> > I'd like to do calculations with floats and at some point equality of
> > two number will be checked.
> > What is the best way to make sure that equality of floats will be
> > detected, where I assume that mismatches beyond a certain point are
> > due to truncation errors?
>

Can you explain how this would work? I'm imagining a test
something like:

if round(x, 6) == round(y, 6): ...

but that still would end up missing some cases where x and y
are equal to within 1ulp, which presumably isn't what's wanted:

>>> x, y = 0.1234565, 0.123456500000000004
>>> round(x, 6) == round(y, 6)
False

--
Mark

Message has been deleted

sturlamolden

Dec 5, 2009, 7:31:05 PM12/5/09
to

isequal = lambda x,y : abs(x-y) < eps

where eps is the truncation error.

Tim Roberts

Dec 6, 2009, 2:42:28 AM12/6/09
to
Raymond Hettinger <pyt...@rcn.com> wrote:
>
> if not round(x - y, 6): ...

That's a dangerous suggestion. It only works if x and y happen to be
roughly in the range of integers.

For example, here x and y are within roundoff error of each other, but
round doesn't know it:
>>> x=1e32
>>> y=x+1e16
>>> x-y
-18014398509481984.0
>>> round(x-y,6)
-18014398509481984.0

It fails in the other direction when the numbers are small:
>>> x=.0000000123
>>> y=.0000000234
>>> x-y
-1.1100000000000002e-008
>>> round(x-y,6)
0.0

Mark's solution is the generically correct one, which takes into account
the rough range of the values.
--
Tim Roberts, ti...@probo.com
Providenza & Boekelheide, Inc.

Message has been deleted

dbd

Dec 6, 2009, 2:23:34 PM12/6/09
to
On Dec 6, 1:12 am, Raymond Hettinger <pyt...@rcn.com> wrote:

> On Dec 5, 11:42 pm, Tim Roberts <t...@probo.com> wrote:
>
> > Raymond Hettinger <pyt...@rcn.com> wrote:
>
> > >   if not round(x - y, 6): ...
>
> > That's a dangerous suggestion.  It only works if x and y happen to be
> > roughly in the range of integers.
.>
.> Right.  Using abs(x-y) < eps is the way to go.
.>
.> Raymond

This only works when abs(x) and abs(y) are larger that eps, but not
too much larger.

Mark's suggestion is longer, but it works. The downside is it requires

Dale B. Dalrymple

Anton81

Dec 6, 2009, 2:34:21 PM12/6/09
to
I do some linear algebra and whenever the prefactor of a vector turns
out to be zero, I want to remove it.

I'd like to keep the system comfortable. So basically I should write a
new class for numbers that has it's own __eq__ operator?
Is there an existing module for that?

r0g

Dec 6, 2009, 3:52:05 PM12/6/09
to
dbd wrote:
> On Dec 6, 1:12 am, Raymond Hettinger <pyt...@rcn.com> wrote:
>> On Dec 5, 11:42 pm, Tim Roberts <t...@probo.com> wrote:
>>
>>> Raymond Hettinger <pyt...@rcn.com> wrote:
>>>> if not round(x - y, 6): ...
>>> That's a dangerous suggestion. It only works if x and y happen to be
>>> roughly in the range of integers.
> .>
> .> Right. Using abs(x-y) < eps is the way to go.
> .>
> .> Raymond
>
> This only works when abs(x) and abs(y) are larger that eps, but not
> too much larger.

Okay, I'm confused now... I thought them being larger was entirely the
point. At what point can they become too large? Isn't eps entirely
arbitrary anyway?

>
> Mark's suggestion is longer, but it works. The downside is it requires
> you to think about the scale and accuracy of your application.
>

Shouldn't one be doing that in any case??

Roger.

sturlamolden

Dec 6, 2009, 4:48:24 PM12/6/09
to
On 6 Des, 21:52, r0g <aioe....@technicalbloke.com> wrote:

> > .> Right.  Using abs(x-y) < eps is the way to go.
> > .>
> > .> Raymond
>
> > This only works when abs(x) and abs(y) are larger that eps, but not
> > too much larger.
>
> Okay, I'm confused now... I thought them being larger was entirely the
> point.

Yes. dbd got it wrong. If both a smaller than eps, the absolute
difference is smaller than eps, so they are considered equal.

Dave Angel

Dec 6, 2009, 5:21:46 PM12/6/09
to Anton81, pytho...@python.org

You have to define your own "comfortable." But if it's zero you're
checking for, then I most certainly wouldn't try to hide it inside a
"number class." Most such formulas go ballistic when you get near zero.

The definition of 'close enough" is very context dependent, and
shouldn't be hidden at too low a level. But your mileage may vary.

For example, in your case, you might want to check that the prefactor is
much smaller than the average (of the abs values) of the vector
elements. Enough orders of magnitude smaller, and you call it equal to
zero.

DaveA

Carl Banks

Dec 6, 2009, 5:54:37 PM12/6/09
to

I highly recommend against it; among other things it invalidates the
transitive property of equality:

"If a == b and b == c, then a == c."

It will also make the number non-hashable, and have several other
negative consequences. Plus, it's not something that's never
foolproof. What numbers are close enought to be condidered "equal"
depends on the calculations.

(I remember once struggling in a homework assignment over seemingly
large discrepancies in a calculation I was doing, until i realized
that the actual numbers were on the scale of 10**11, and the
difference was around 10**1, so it really didn't matter.)

Carl Banks

TheSeeker

Dec 6, 2009, 7:12:07 PM12/6/09
to
On Dec 6, 4:54 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> On Dec 6, 11:34 am, Anton81 <gerenu...@googlemail.com> wrote:
>
> > I do some linear algebra and whenever the prefactor of a vector turns
> > out to be zero, I want to remove it.
>
> > I'd like to keep the system comfortable. So basically I should write a
> > new class for numbers that has it's own __eq__ operator?
> > Is there an existing module for that?
>
> I highly recommend against it; among other things it invalidates the
> transitive property of equality:
>
> "If a == b and b == c, then a == c."
>
> It will also make the number non-hashable, and have several other
> negative consequences.    What numbers are close enought to be condidered "equal"

> depends on the calculations.
>
> (I remember once struggling in a homework assignment over seemingly
> large discrepancies in a calculation I was doing, until i realized
> that the actual numbers were on the scale of 10**11, and the
> difference was around 10**1, so it really didn't matter.)
>
> Carl Banks

Maybe it's the gin, but
"Plus, it's not something that's never foolproof.'

+1 QOTW

Cheers,
TheSeeker

David Cournapeau

Dec 6, 2009, 7:16:22 PM12/6/09
to Mark Dickinson, pytho...@python.org
On Sun, Dec 6, 2009 at 1:46 AM, Mark Dickinson <dick...@gmail.com> wrote:
> On Dec 5, 3:37 pm, Anton81 <gerenu...@googlemail.com> wrote:
>> I'd like to do calculations with floats and at some point equality of
>> two number will be checked.
>> What is the best way to make sure that equality of floats will be
>> detected, where I assume that mismatches beyond a certain point are
>> due to truncation errors?
>
> Well, it depends a lot on the details of the application, but
> a good general scheme is to allow both a fixed relative error
> and an absolute error, and to assert that your two values are
> 'nearly equal' if they're within *either* the relative error *or*
> the absolute error.  Something like, for example:
>
> def almostEqual(expected, actual, rel_err=1e-7, abs_err = 1e-20):
>    absolute_error = abs(actual-expected)
>    return absolute_error <= max(abs_err, rel_err * abs(expected))

If you can depend on IEEE 754 semantics, one relatively robust method
is to use the number of representable floats between two numbers. The
main advantage compared to the proposed methods is that it somewhat
automatically takes into account the amplitude of input numbers:

abs(x - y) <= N * spacing(max(abs(x), abs(y)))

Where spacing(a) is the smallest number such as a + spacing(a) != a.
Whether a and b are small or big, the same value of N can be used, and
it tells you how close two numbers are in terms of internal
representation.

Upcoming numpy 1.4.0 has an implementation for spacing - implementing
your own for double is not difficult, though,

cheers,

David

Dave Angel

Dec 6, 2009, 10:15:30 PM12/6/09
to Carl Banks, pytho...@python.org

Carl Banks wrote:
> On Dec 6, 11:34 am, Anton81 <gerenu...@googlemail.com> wrote:
>
>> I do some linear algebra and whenever the prefactor of a vector turns
>> out to be zero, I want to remove it.
>>
>> I'd like to keep the system comfortable. So basically I should write a
>> new class for numbers that has it's own __eq__ operator?
>> Is there an existing module for that?
>>
>
> I highly recommend against it; among other things it invalidates the
> transitive property of equality:
>

> "If a =b and b == c, then a == c."

>
> It will also make the number non-hashable, and have several other
> negative consequences. Plus, it's not something that's never
> foolproof. What numbers are close enought to be condidered "equal"
> depends on the calculations.
>
> (I remember once struggling in a homework assignment over seemingly
> large discrepancies in a calculation I was doing, until i realized
> that the actual numbers were on the scale of 10**11, and the
> difference was around 10**1, so it really didn't matter.)
>
>
>
> Carl Banks
>
>

A few decades ago I implemented the math package (microcode) under the
machine language for a proprietary processor (this is when a processor
took 5 boards of circuitry to implement). I started with floating point
add and subtract, and continued all the way through the trig, log, and
even random functions. Anyway, a customer called asking whether a
particular problem he had was caused by his logic, or by errors in our
math. He was calculating the difference in height between an
always-level table and a perfectly flat table (between an arc of a great
circle around the earth, and a flat table that doesn't follow the
curvature.) In a couple of hundred feet of table, the difference was
measured in millionths of an inch, as I recall. Anyway it turned out
his calculation was effectively subtracting
(8000 miles plus a little bit) - (8000 miles)
and if he calculated it three different ways, he got three different
results, one was off in about the 3rd place, while the other was only
half the value. I was able to show him another way (through geometrical
transformations) to solve the problem that got the exact answer, or at
least to more digits than he could possibly measure. I think I recall
that the new solution also cancelled out the need for trig. Sometimes
the math package shouldn't hide the problem, but give it to you straight.

DaveA

dbd

Dec 7, 2009, 12:43:28 AM12/7/09
to

Small x,y failure case:
eps and even eps squared are representable as floats. If you have
samples of a sine wave with peak amplitude of one half eps, the "abs(x-
y) < eps" test would report all values on the sine wave as equal to
zero. This would not be correct.
Large x,y failure case:
If you have two calculation paths that symbolically should produce the
same value of size one over eps, valid floating point implementations
may differ by an lsb or more. An single lsb error would be 1, much
greater than the test allows as 'nearly equal' for floating point
comparison.

1.0 + eps is the smallest value greater than 1.0, distinguishable from
1.0. Long chains of floating point calculations that would
symbolically be expected to produce a value of 1.0 many be expected to
produce errors of an eps or more due to the inexactness of floating
point representation. These errors should be allowed in floating point
equality comparison. The value of the minimum representable error will
scale as the floating point number varies. A constant comparison value
is not appropriate.

Mark was right, DaveA's discussion explains a strategy to use.

Dale B. Dalrymple

Steven D'Aprano

Dec 7, 2009, 3:27:55 AM12/7/09
to
On Sun, 06 Dec 2009 14:54:37 -0800, Carl Banks wrote:

> (I remember once struggling in a homework assignment over seemingly
> large discrepancies in a calculation I was doing, until i realized that
> the actual numbers were on the scale of 10**11, and the difference was
> around 10**1, so it really didn't matter.)

Well that depends on the accuracy of the calculations, surely? If the
calculations were accurate to one part in 10**20, then an error around
10**1 is about ten trillion times larger than acceptable.

*wink*

--
Steven

sturlamolden

Dec 7, 2009, 7:28:07 AM12/7/09
to
On 7 Des, 06:43, dbd <d...@ieee.org> wrote:

> If you have
> samples of a sine wave with peak amplitude of one half eps, the "abs(x-
> y) < eps" test would report all values on the sine wave as equal to
> zero. This would not be correct.

You don't understand this at all do you?

If you have a sine wave with an amplitude less than the truncation
error, it will always be approximately equal to zero.

Numerical maths is about approximations, not symbolic equalities.

> 1.0 + eps is the smallest value greater than 1.0, distinguishable from
> 1.0.

Which is the reason 0.5*eps*sin(x) is never distinguishable from 0.

> A constant comparison value is not appropriate.

That require domain specific knowledge. Sometimes we look at a
significant number of digits; sometimes we look at a fixed number of
decimals; sometimes we look at abs(y/x). But there will always be a
truncation error of some sort, and differences less than that is never
significant.

Mark Dickinson

Dec 7, 2009, 8:23:21 AM12/7/09
to
On Dec 6, 7:34 pm, Anton81 <gerenu...@googlemail.com> wrote:
> I do some linear algebra and whenever the prefactor of a vector turns
> out to be zero, I want to remove it.

Hmm. Comparing against zero is something of a special case. So you'd
almost certainly be doing an 'if abs(x) < tol: ...' check, but the
question is what value to use for tol, and that (again) depends on
what you're doing. Perhaps 'tol' could be something like 'eps *
scale', where 'eps' is an indication of the size of relative error
you're prepared to admit (eps = 1e-12 might be reasonable; to allow
for rounding errors, it should be something comfortably larger than
the machine epsilon sys.float_info.epsilon, which is likely to be
around 2e-16 for a typical machine), and 'scale' is something closely
related to the scale of your problem: in your example, perhaps scale
could be the largest of all the prefactors you have, or some sort of
average of all the prefactors. There's really no one-size-fits-all

> I'd like to keep the system comfortable. So basically I should write a
> new class for numbers that has it's own __eq__ operator?

That's probably not a good idea, for the reasons that Carl Banks

Mark

Mark Dickinson

Dec 7, 2009, 8:30:10 AM12/7/09
to
On Dec 7, 12:16 am, David Cournapeau <courn...@gmail.com> wrote:
> If you can depend on IEEE 754 semantics, one relatively robust method
> is to use the number of representable floats between two numbers. The
> main advantage compared to the proposed methods is that it somewhat
> automatically takes into account the amplitude of input numbers:

FWIW, there's a function that can be used for this in Lib/test/
test_math.py in Python svn; it's used to check that math.gamma isn't
out by more than 20 ulps (for a selection of test values).

def to_ulps(x):
"""Convert a non-NaN float x to an integer, in such a way that
abs(ulps(x) - ulps(y)) gives the difference in ulps between two
floats.

The results from this function will only make sense on platforms
where C doubles are represented in IEEE 754 binary64 format.

"""
n = struct.unpack('<q', struct.pack('<d', x))[0]
if n < 0:
n = ~(n+2**63)
return n

--
Mark

dbd

Dec 7, 2009, 1:53:31 PM12/7/09
to
On Dec 7, 4:28 am, sturlamolden <sturlamol...@yahoo.no> wrote:
> ...

>
> You don't understand this at all do you?
>
> If you have a sine wave with an amplitude less than the truncation
> error, it will always be approximately equal to zero.
>
> Numerical maths is about approximations, not symbolic equalities.
>
> > 1.0 + eps is the smallest value greater than 1.0, distinguishable from
> > 1.0.
>
> Which is the reason 0.5*eps*sin(x) is never distinguishable from 0.
> ...

A calculated value of 0.5*eps*sin(x) has a truncation error on the
order of eps squared. 0.5*eps and 0.495*eps are readily distinguished
(well, at least for values of eps << 0.01 :).

At least one of us doesn't understand floating point.

Dale B. Dalrymple

Carl Banks

Dec 7, 2009, 3:58:50 PM12/7/09
to

You're talking about machine epsilon? I think everyone else here is
talking about a number that is small relative to the expected smallest
scale of the calculation.

Carl Banks

dbd

Dec 10, 2009, 1:46:17 PM12/10/09
to
On Dec 7, 12:58 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> On Dec 7, 10:53 am, dbd <d...@ieee.org> wrote:
> > ...

>
> You're talking about machine epsilon?  I think everyone else here is
> talking about a number that is small relative to the expected smallest
> scale of the calculation.
>
> Carl Banks

When you implement an algorithm supporting floats (per the OP's post),
the expected scale of calculation is the range of floating point
numbers. For floating point numbers the intrinsic truncation error is
proportional to the value represented over the normalized range of the
floating point representation. At absolute values smaller than the
normalized range, the truncation has a fixed value. These are not
necessarily 'machine' characteristics but the characteristics of the
floating point format implemented.

A useful description of floating point issues can be found:

http://dlc.sun.com/pdf/800-7895/800-7895.pdf

Dale B. Dalrymple

Carl Banks

Dec 10, 2009, 5:23:06 PM12/10/09
to
On Dec 10, 10:46 am, dbd <d...@ieee.org> wrote:
> On Dec 7, 12:58 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
>
> > On Dec 7, 10:53 am, dbd <d...@ieee.org> wrote:
> > > ...
>
> > You're talking about machine epsilon?  I think everyone else here is
> > talking about a number that is small relative to the expected smallest
> > scale of the calculation.
>
> > Carl Banks
>
> When you implement an algorithm supporting floats (per the OP's post),
> the expected scale of calculation is the range of floating point
> numbers. For floating point numbers the intrinsic truncation error is
> proportional to the value represented over the normalized range of the
> floating point representation. At absolute values smaller than the
> normalized range, the truncation has a fixed value. These are not
> necessarily 'machine' characteristics but the characteristics of the
> floating point format implemented.

I know, and it's irrelevant, because no one, I don't think, is talking
tomfoolery with the floating point's least significant bits.

> A useful description of floating point issues can be found:

[snip]

I'm not reading it because I believe I grasp the situation just fine.
But you are welcome to convince me otherwise. Here's how:

Say I have two numbers, a and b. They are expected to be in the range
(-1000,1000). As far as I'm concerned, if they differ by less than
0.1, they might as well be equal. Therefore my test for "equality"
is:

abs(a-b) < 0.08

Can you give me a case where this test fails?

If a and b are too far out of their expected range, all bets are off,
but feel free to consider arbitrary values of a and b for extra
credit.

Carl Banks

Raymond Hettinger

Dec 10, 2009, 8:23:10 PM12/10/09
to
[Carl Banks]

> > You're talking about machine epsilon?  I think everyone else here is
> > talking about a number that is small relative to the expected smallest
> > scale of the calculation.

That was also my reading of the OP's question.

The suggestion to use round() was along the
lines of performing a quantize or snap-to-grid
operation after each step in the calculation.
That approach parallels the recommendation for how
to use the decimal module for fixed point calculations:
http://docs.python.org/library/decimal.html#decimal-faq

Raymond

dbd

Dec 11, 2009, 3:37:09 AM12/11/09
to
On Dec 10, 2:23 pm, Carl Banks <pavlovevide...@gmail.com> wrote:

> ...

> > A useful description of floating point issues can be found:
>
> [snip]
>
> I'm not reading it because I believe I grasp the situation just fine.

> ...

>
> Say I have two numbers, a and b.  They are expected to be in the range
> (-1000,1000).  As far as I'm concerned, if they differ by less than
> 0.1, they might as well be equal.

> ...
> Carl Banks

I don't expect Carl to read. I posted the reference for the OP whose
only range specification was "calculations with floats" and "equality
of floats" and who expressed concern about "truncation errors". Those
who can't find "floats" in the original post will find nothing of
interest in the reference.

Dale B. Dalrymple