I'd like to do calculations with floats and at some point equality of two number will be checked. What is the best way to make sure that equality of floats will be detected, where I assume that mismatches beyond a certain point are due to truncation errors?
On Dec 5, 3:37 pm, Anton81 <gerenu...@googlemail.com> wrote:
> I'd like to do calculations with floats and at some point equality of > two number will be checked. > What is the best way to make sure that equality of floats will be > detected, where I assume that mismatches beyond a certain point are > due to truncation errors?
Well, it depends a lot on the details of the application, but a good general scheme is to allow both a fixed relative error and an absolute error, and to assert that your two values are 'nearly equal' if they're within *either* the relative error *or* the absolute error. Something like, for example:
On Dec 5, 7:37 am, Anton81 <gerenu...@googlemail.com> wrote:
> I'd like to do calculations with floats and at some point equality of > two number will be checked. > What is the best way to make sure that equality of floats will be > detected, where I assume that mismatches beyond a certain point are > due to truncation errors?
Short answer: use round(). Less short answer: use Decimal with a high precision and then round() or quantize. Long answer: the amount of error depends on the calculation and the scale of the inputs; some calculations potentially propagate tiny errors to the point where they become large enough to overpower the signal in your data (e.g. the Lorentz equation or some other chaotic sequence).
On Dec 5, 8:25 pm, Raymond Hettinger <pyt...@rcn.com> wrote:
> On Dec 5, 7:37 am, Anton81 <gerenu...@googlemail.com> wrote:
> > I'd like to do calculations with floats and at some point equality of > > two number will be checked. > > What is the best way to make sure that equality of floats will be > > detected, where I assume that mismatches beyond a certain point are > > due to truncation errors?
> Short answer: use round().
Can you explain how this would work? I'm imagining a test something like:
if round(x, 6) == round(y, 6): ...
but that still would end up missing some cases where x and y are equal to within 1ulp, which presumably isn't what's wanted:
On 5 Des, 16:37, Anton81 <gerenu...@googlemail.com> wrote:
> I'd like to do calculations with floats and at some point equality of > two number will be checked. > What is the best way to make sure that equality of floats will be > detected, where I assume that mismatches beyond a certain point are > due to truncation errors?
That's a dangerous suggestion. It only works if x and y happen to be roughly in the range of integers.
For example, here x and y are within roundoff error of each other, but round doesn't know it: >>> x=1e32 >>> y=x+1e16 >>> x-y -18014398509481984.0 >>> round(x-y,6) -18014398509481984.0
It fails in the other direction when the numbers are small: >>> x=.0000000123 >>> y=.0000000234 >>> x-y -1.1100000000000002e-008 >>> round(x-y,6) 0.0
Mark's solution is the generically correct one, which takes into account the rough range of the values. -- Tim Roberts, t...@probo.com Providenza & Boekelheide, Inc.
I do some linear algebra and whenever the prefactor of a vector turns out to be zero, I want to remove it.
I'd like to keep the system comfortable. So basically I should write a new class for numbers that has it's own __eq__ operator? Is there an existing module for that?
dbd wrote: > On Dec 6, 1:12 am, Raymond Hettinger <pyt...@rcn.com> wrote: >> On Dec 5, 11:42 pm, Tim Roberts <t...@probo.com> wrote:
>>> Raymond Hettinger <pyt...@rcn.com> wrote: >>>> if not round(x - y, 6): ... >>> That's a dangerous suggestion. It only works if x and y happen to be >>> roughly in the range of integers. > .> > .> Right. Using abs(x-y) < eps is the way to go. > .> > .> Raymond
> This only works when abs(x) and abs(y) are larger that eps, but not > too much larger.
Okay, I'm confused now... I thought them being larger was entirely the point. At what point can they become too large? Isn't eps entirely arbitrary anyway?
> Mark's suggestion is longer, but it works. The downside is it requires > you to think about the scale and accuracy of your application.
Anton81 wrote: > I do some linear algebra and whenever the prefactor of a vector turns > out to be zero, I want to remove it.
> I'd like to keep the system comfortable. So basically I should write a > new class for numbers that has it's own __eq__ operator? > Is there an existing module for that?
You have to define your own "comfortable." But if it's zero you're checking for, then I most certainly wouldn't try to hide it inside a "number class." Most such formulas go ballistic when you get near zero.
The definition of 'close enough" is very context dependent, and shouldn't be hidden at too low a level. But your mileage may vary.
For example, in your case, you might want to check that the prefactor is much smaller than the average (of the abs values) of the vector elements. Enough orders of magnitude smaller, and you call it equal to zero.
On Dec 6, 11:34 am, Anton81 <gerenu...@googlemail.com> wrote:
> I do some linear algebra and whenever the prefactor of a vector turns > out to be zero, I want to remove it.
> I'd like to keep the system comfortable. So basically I should write a > new class for numbers that has it's own __eq__ operator? > Is there an existing module for that?
I highly recommend against it; among other things it invalidates the transitive property of equality:
"If a == b and b == c, then a == c."
It will also make the number non-hashable, and have several other negative consequences. Plus, it's not something that's never foolproof. What numbers are close enought to be condidered "equal" depends on the calculations.
(I remember once struggling in a homework assignment over seemingly large discrepancies in a calculation I was doing, until i realized that the actual numbers were on the scale of 10**11, and the difference was around 10**1, so it really didn't matter.)
> On Dec 6, 11:34 am, Anton81 <gerenu...@googlemail.com> wrote:
> > I do some linear algebra and whenever the prefactor of a vector turns > > out to be zero, I want to remove it.
> > I'd like to keep the system comfortable. So basically I should write a > > new class for numbers that has it's own __eq__ operator? > > Is there an existing module for that?
> I highly recommend against it; among other things it invalidates the > transitive property of equality:
> "If a == b and b == c, then a == c."
> It will also make the number non-hashable, and have several other > negative consequences. What numbers are close enought to be condidered "equal" > depends on the calculations.
> (I remember once struggling in a homework assignment over seemingly > large discrepancies in a calculation I was doing, until i realized > that the actual numbers were on the scale of 10**11, and the > difference was around 10**1, so it really didn't matter.)
> Carl Banks
Maybe it's the gin, but "Plus, it's not something that's never foolproof.'
On Sun, Dec 6, 2009 at 1:46 AM, Mark Dickinson <dicki...@gmail.com> wrote: > On Dec 5, 3:37 pm, Anton81 <gerenu...@googlemail.com> wrote: >> I'd like to do calculations with floats and at some point equality of >> two number will be checked. >> What is the best way to make sure that equality of floats will be >> detected, where I assume that mismatches beyond a certain point are >> due to truncation errors?
> Well, it depends a lot on the details of the application, but > a good general scheme is to allow both a fixed relative error > and an absolute error, and to assert that your two values are > 'nearly equal' if they're within *either* the relative error *or* > the absolute error. Something like, for example:
If you can depend on IEEE 754 semantics, one relatively robust method is to use the number of representable floats between two numbers. The main advantage compared to the proposed methods is that it somewhat automatically takes into account the amplitude of input numbers:
abs(x - y) <= N * spacing(max(abs(x), abs(y)))
Where spacing(a) is the smallest number such as a + spacing(a) != a. Whether a and b are small or big, the same value of N can be used, and it tells you how close two numbers are in terms of internal representation.
Upcoming numpy 1.4.0 has an implementation for spacing - implementing your own for double is not difficult, though,
Carl Banks wrote: > On Dec 6, 11:34 am, Anton81 <gerenu...@googlemail.com> wrote:
>> I do some linear algebra and whenever the prefactor of a vector turns >> out to be zero, I want to remove it.
>> I'd like to keep the system comfortable. So basically I should write a >> new class for numbers that has it's own __eq__ operator? >> Is there an existing module for that?
> I highly recommend against it; among other things it invalidates the > transitive property of equality:
> "If a =b and b == c, then a == c."
> It will also make the number non-hashable, and have several other > negative consequences. Plus, it's not something that's never > foolproof. What numbers are close enought to be condidered "equal" > depends on the calculations.
> (I remember once struggling in a homework assignment over seemingly > large discrepancies in a calculation I was doing, until i realized > that the actual numbers were on the scale of 10**11, and the > difference was around 10**1, so it really didn't matter.)
> Carl Banks
A few decades ago I implemented the math package (microcode) under the machine language for a proprietary processor (this is when a processor took 5 boards of circuitry to implement). I started with floating point add and subtract, and continued all the way through the trig, log, and even random functions. Anyway, a customer called asking whether a particular problem he had was caused by his logic, or by errors in our math. He was calculating the difference in height between an always-level table and a perfectly flat table (between an arc of a great circle around the earth, and a flat table that doesn't follow the curvature.) In a couple of hundred feet of table, the difference was measured in millionths of an inch, as I recall. Anyway it turned out his calculation was effectively subtracting (8000 miles plus a little bit) - (8000 miles) and if he calculated it three different ways, he got three different results, one was off in about the 3rd place, while the other was only half the value. I was able to show him another way (through geometrical transformations) to solve the problem that got the exact answer, or at least to more digits than he could possibly measure. I think I recall that the new solution also cancelled out the need for trig. Sometimes the math package shouldn't hide the problem, but give it to you straight.
On Dec 6, 1:48 pm, sturlamolden <sturlamol...@yahoo.no> wrote:
> On 6 Des, 21:52, r0g <aioe....@technicalbloke.com> wrote:
> > > .> Right. Using abs(x-y) < eps is the way to go. > > > .> > > > .> Raymond
> > > This only works when abs(x) and abs(y) are larger that eps, but not > > > too much larger.
> > Okay, I'm confused now... I thought them being larger was entirely the > > point.
> Yes. dbd got it wrong. If both a smaller than eps, the absolute > difference is smaller than eps, so they are considered equal.
Small x,y failure case: eps and even eps squared are representable as floats. If you have samples of a sine wave with peak amplitude of one half eps, the "abs(x- y) < eps" test would report all values on the sine wave as equal to zero. This would not be correct. Large x,y failure case: If you have two calculation paths that symbolically should produce the same value of size one over eps, valid floating point implementations may differ by an lsb or more. An single lsb error would be 1, much greater than the test allows as 'nearly equal' for floating point comparison.
1.0 + eps is the smallest value greater than 1.0, distinguishable from 1.0. Long chains of floating point calculations that would symbolically be expected to produce a value of 1.0 many be expected to produce errors of an eps or more due to the inexactness of floating point representation. These errors should be allowed in floating point equality comparison. The value of the minimum representable error will scale as the floating point number varies. A constant comparison value is not appropriate.
Mark was right, DaveA's discussion explains a strategy to use.
On Sun, 06 Dec 2009 14:54:37 -0800, Carl Banks wrote: > (I remember once struggling in a homework assignment over seemingly > large discrepancies in a calculation I was doing, until i realized that > the actual numbers were on the scale of 10**11, and the difference was > around 10**1, so it really didn't matter.)
Well that depends on the accuracy of the calculations, surely? If the calculations were accurate to one part in 10**20, then an error around 10**1 is about ten trillion times larger than acceptable.
> If you have > samples of a sine wave with peak amplitude of one half eps, the "abs(x- > y) < eps" test would report all values on the sine wave as equal to > zero. This would not be correct.
You don't understand this at all do you?
If you have a sine wave with an amplitude less than the truncation error, it will always be approximately equal to zero.
Numerical maths is about approximations, not symbolic equalities.
> 1.0 + eps is the smallest value greater than 1.0, distinguishable from > 1.0.
Which is the reason 0.5*eps*sin(x) is never distinguishable from 0.
> A constant comparison value is not appropriate.
That require domain specific knowledge. Sometimes we look at a significant number of digits; sometimes we look at a fixed number of decimals; sometimes we look at abs(y/x). But there will always be a truncation error of some sort, and differences less than that is never significant.
On Dec 6, 7:34 pm, Anton81 <gerenu...@googlemail.com> wrote:
> I do some linear algebra and whenever the prefactor of a vector turns > out to be zero, I want to remove it.
Hmm. Comparing against zero is something of a special case. So you'd almost certainly be doing an 'if abs(x) < tol: ...' check, but the question is what value to use for tol, and that (again) depends on what you're doing. Perhaps 'tol' could be something like 'eps * scale', where 'eps' is an indication of the size of relative error you're prepared to admit (eps = 1e-12 might be reasonable; to allow for rounding errors, it should be something comfortably larger than the machine epsilon sys.float_info.epsilon, which is likely to be around 2e-16 for a typical machine), and 'scale' is something closely related to the scale of your problem: in your example, perhaps scale could be the largest of all the prefactors you have, or some sort of average of all the prefactors. There's really no one-size-fits-all easy answer here.
> I'd like to keep the system comfortable. So basically I should write a > new class for numbers that has it's own __eq__ operator?
That's probably not a good idea, for the reasons that Carl Banks already enumerated.
On Dec 7, 12:16 am, David Cournapeau <courn...@gmail.com> wrote:
> If you can depend on IEEE 754 semantics, one relatively robust method > is to use the number of representable floats between two numbers. The > main advantage compared to the proposed methods is that it somewhat > automatically takes into account the amplitude of input numbers:
FWIW, there's a function that can be used for this in Lib/test/ test_math.py in Python svn; it's used to check that math.gamma isn't out by more than 20 ulps (for a selection of test values).
def to_ulps(x): """Convert a non-NaN float x to an integer, in such a way that adjacent floats are converted to adjacent integers. Then abs(ulps(x) - ulps(y)) gives the difference in ulps between two floats.
The results from this function will only make sense on platforms where C doubles are represented in IEEE 754 binary64 format.
""" n = struct.unpack('<q', struct.pack('<d', x))[0] if n < 0: n = ~(n+2**63) return n
On Dec 7, 4:28 am, sturlamolden <sturlamol...@yahoo.no> wrote:
> ...
> You don't understand this at all do you?
> If you have a sine wave with an amplitude less than the truncation > error, it will always be approximately equal to zero.
> Numerical maths is about approximations, not symbolic equalities.
> > 1.0 + eps is the smallest value greater than 1.0, distinguishable from > > 1.0.
> Which is the reason 0.5*eps*sin(x) is never distinguishable from 0. > ...
A calculated value of 0.5*eps*sin(x) has a truncation error on the order of eps squared. 0.5*eps and 0.495*eps are readily distinguished (well, at least for values of eps << 0.01 :).
At least one of us doesn't understand floating point.
> On Dec 7, 4:28 am, sturlamolden <sturlamol...@yahoo.no> wrote:
> > ...
> > You don't understand this at all do you?
> > If you have a sine wave with an amplitude less than the truncation > > error, it will always be approximately equal to zero.
> > Numerical maths is about approximations, not symbolic equalities.
> > > 1.0 + eps is the smallest value greater than 1.0, distinguishable from > > > 1.0.
> > Which is the reason 0.5*eps*sin(x) is never distinguishable from 0. > > ...
> A calculated value of 0.5*eps*sin(x) has a truncation error on the > order of eps squared. 0.5*eps and 0.495*eps are readily distinguished > (well, at least for values of eps << 0.01 :).
> At least one of us doesn't understand floating point.
You're talking about machine epsilon? I think everyone else here is talking about a number that is small relative to the expected smallest scale of the calculation.
On Dec 7, 12:58 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> On Dec 7, 10:53 am, dbd <d...@ieee.org> wrote: > > ...
> You're talking about machine epsilon? I think everyone else here is > talking about a number that is small relative to the expected smallest > scale of the calculation.
> Carl Banks
When you implement an algorithm supporting floats (per the OP's post), the expected scale of calculation is the range of floating point numbers. For floating point numbers the intrinsic truncation error is proportional to the value represented over the normalized range of the floating point representation. At absolute values smaller than the normalized range, the truncation has a fixed value. These are not necessarily 'machine' characteristics but the characteristics of the floating point format implemented.
A useful description of floating point issues can be found:
> On Dec 7, 12:58 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> > On Dec 7, 10:53 am, dbd <d...@ieee.org> wrote: > > > ...
> > You're talking about machine epsilon? I think everyone else here is > > talking about a number that is small relative to the expected smallest > > scale of the calculation.
> > Carl Banks
> When you implement an algorithm supporting floats (per the OP's post), > the expected scale of calculation is the range of floating point > numbers. For floating point numbers the intrinsic truncation error is > proportional to the value represented over the normalized range of the > floating point representation. At absolute values smaller than the > normalized range, the truncation has a fixed value. These are not > necessarily 'machine' characteristics but the characteristics of the > floating point format implemented.
I know, and it's irrelevant, because no one, I don't think, is talking about magnitude-specific truncation value either, nor about any other tomfoolery with the floating point's least significant bits.
> A useful description of floating point issues can be found:
[snip]
I'm not reading it because I believe I grasp the situation just fine. But you are welcome to convince me otherwise. Here's how:
Say I have two numbers, a and b. They are expected to be in the range (-1000,1000). As far as I'm concerned, if they differ by less than 0.1, they might as well be equal. Therefore my test for "equality" is:
abs(a-b) < 0.08
Can you give me a case where this test fails?
If a and b are too far out of their expected range, all bets are off, but feel free to consider arbitrary values of a and b for extra credit.
> > You're talking about machine epsilon? I think everyone else here is > > talking about a number that is small relative to the expected smallest > > scale of the calculation.
That was also my reading of the OP's question.
The suggestion to use round() was along the lines of performing a quantize or snap-to-grid operation after each step in the calculation. That approach parallels the recommendation for how to use the decimal module for fixed point calculations: http://docs.python.org/library/decimal.html#decimal-faq