Strange result from round

Fernando Rodríguez

unread,

Apr 5, 2002, 4:50:18 AM4/5/02

to

I just noticed this:
CL-USER 60 > (round 5.1)
5
0.09999999999999964

Is ths normal? O:-)

-----------------------
Fernando Rodriguez

Wolfhard Buß

unread,

Apr 5, 2002, 5:21:07 AM4/5/02

to

Fernando Rodríguez <fr...@wanadoo.es> writes:

> I just noticed this:
> CL-USER 60 > (round 5.1)
> 5
> 0.09999999999999964
>
> Is ths normal? O:-)

Floating-point arithmetic isn't exact. Conversion between number systems
with various bases isn't exact either. So yes, this is normal.
You can always use rationals.

(round 51/10) => 5, 1/10

--
"Das Auto hat keine Zukunft. Ich setze aufs Pferd." Wilhelm II. (1859-1941)

Nils Goesche

unread,

Apr 5, 2002, 5:40:32 AM4/5/02

to

In article <fpsqaug5jqvkbkt3t...@4ax.com>, Fernando Rodríguez wrote:
>
> I just noticed this:
> CL-USER 60 > (round 5.1)
> 5
> 0.09999999999999964
>
> Is ths normal? O:-)

http://cch.loria.fr/documentation/IEEE754/ACM/goldberg.pdf

should answer this any many related questions.

Regards,
--
Nils Goesche
"Don't ask for whom the <CTRL-G> tolls."

PGP key ID 0x42B32FC9

Kent M Pitman

unread,

Apr 5, 2002, 11:24:46 AM4/5/02

to

wb...@gmx.net (Wolfhard =?iso-8859-1?q?Bu=DF?=) writes:

> Fernando Rodríguez <fr...@wanadoo.es> writes:
>
> > I just noticed this:
> > CL-USER 60 > (round 5.1)
> > 5
> > 0.09999999999999964
> >
> > Is ths normal? O:-)
>
> Floating-point arithmetic isn't exact. Conversion between number systems
> with various bases isn't exact either. So yes, this is normal.
> You can always use rationals.
>
> (round 51/10) => 5, 1/10

I'm not a floating point expert here, and it's easy to get this kind
of thing wrong, but _believe_ that the case Fernando cites is neither
a case of floating point arithmetic not being exact (though it's true,
it's often not exact), nor a case of conversion being inexact (which
I'm sure you knew). In addition to the two problems you cite,
floating point _notation_ does not match floating point internal
_representation_ exactly either, so I think the real underlying
problem in this case is that 5.1 is getting represented as a binary
fraction which is already an approximation even before you start.
ROUND in this case is separating the 5 from the .1 but I don't think
there is a loss of information in that particular operation; I think
it's "exact" insofar as the inputs were exact, because there is no
increase in magnitude that would force a loss of precision. (I could
be wrong on this, so someone who's more familiar with the internals
please correct me.) My point here is that if you think 5.1 is an exact
number just because it prints nicely, you already have a misconception
about what a floating point number is _even before_ you start doing math.
(In a sense, I think the reason it looks more exact than it is
is that the binary to decimal conversion done by the printer
conspires in some cases to hide the representational trick that's going on.)

Wolfhard Buß

unread,

Apr 5, 2002, 1:17:58 PM4/5/02

to

You're right. The conversion I mentioned is not the issue in this case.
The conversion between internal and external representation and vice
versa is.

Erik Naggum

unread,

Apr 5, 2002, 2:39:45 PM4/5/02

to

* Kent M Pitman

| ROUND in this case is separating the 5 from the .1 but I don't think
| there is a loss of information in that particular operation; I think
| it's "exact" insofar as the inputs were exact, because there is no
| increase in magnitude that would force a loss of precision.

Well, actually, there is. If 5.1 and 0.1 both use n bits of precision,
(- 5.1 5.0) actually ends up using n-3 bits of precision for the 0.1
return value, effectively replacing the three least significant bits with
zeros. Since 0.1 has a bit pattern of repeating groups of 1100, losing
the three least significant bits must lead to a value different from 0.1.

///
--
In a fight against something, the fight has value, victory has none.
In a fight for something, the fight is a loss, victory merely relief.

Post with compassion: http://home.chello.no/~xyzzy/kitten.jpg

Kent M Pitman

unread,

Apr 5, 2002, 3:35:01 PM4/5/02

to

Erik Naggum <er...@naggum.net> writes:

> * Kent M Pitman
> | ROUND in this case is separating the 5 from the .1 but I don't think
> | there is a loss of information in that particular operation; I think
> | it's "exact" insofar as the inputs were exact, because there is no
> | increase in magnitude that would force a loss of precision.
>
> Well, actually, there is. If 5.1 and 0.1 both use n bits of precision,
> (- 5.1 5.0) actually ends up using n-3 bits of precision for the 0.1
> return value, effectively replacing the three least significant bits with
> zeros. Since 0.1 has a bit pattern of repeating groups of 1100, losing
> the three least significant bits must lead to a value different from 0.1.

Yes, that's so. But I thought I was already taking this into account.

I guess it's really a philosophical question, not a technical one.

For example, there must be some decimally expressible rational whose
exact representation is the same as the exact representation of 5.1.
If that (numerically equal) number were rounded, the missing binary
digits when you do the "shift" are correctly supplied as zero [well,
if you're rounding down; I'm more confident that what I'm saying is
true of truncate than round]. Consequently, the operation of
truncation is [to the limits of the original representation] exact.
What is in question is the "well" of numbers that have as their stable
point the particular binary digits. That is, my understanding is that
the number line says

........................ <- actual real number line

\ /\ /\ /\ /
\ / \ / \ / \ /
v v v v
. . . . <- binary number line (probably not as evenly
spaced as shown, not that it matters other
than to just make it painful in practice)

The "wells" I am talking about are the v's above, that take a range of
rationals (or reals) and map them into a particular set of binary numbers.
When unpacked to decimal, there is one particular representative decimal
(among the reals) that "names" the binary point, but the operation is not
on that decimal, it's on the binary. [Incidentally, my understanding of the
problem of printing decimals is that you have to be careful to choose a
decimal that will re-read in the same well, so you don't get drift by
read/print.]

But at the binary level, the operations are exact if you do not lose
digits in the computations. The problem is that when you do the
operation in binary and then map back to the representative decimal
you don't always get the result of doing the operation directly on the
decimal number "symbolically". The real question becomes, though,
whether the operation is in error (that is, whether the binary exists
to support the decimal) or whether the mapping is in error, since in
principle there are other input numbers that could have been typed.

This is very similar to the question of whether \Foo and foo and FOO
and |FOO| are the same. Of course they are, and as a direct consequence,
the system can't tell on behalf of which symbol a string-concatenation
operation is done. It would be false, though, to say that the operation
of concatenating the string designator foo and the string designator bar
being "FOOBAR" and not "foobar" is an error. It's true that "foobar" is
the string designated by the concatenation of the string "foo" and
the string "bar", but that again illustrates that some of these operations
are information-losing and we have to live with them.

Anyone wondering why I say I don't know much about floats will perhaps
now see better what I'm getting at--these are issues that you can acquire
a great deal of _trivia_ about, but developing good _intuitions_ about them
is harder because the operations for moving around in the space are not
uniquely reversible, and so the action of shifting representation for
convenience of expression (usually the earmark of intelligent behavior, IMO)
is thwarted by the fact that there are few shifts of representation (i.e.,
informally, few "intelligent ways to rethink things") that are not
information losing. That fact, in the aggregate, is the basis for why
intuitions break down... intuitions assume you can gloss details that
you can't gloss. I wish I had taken some group theory. I'm sure there
is a simpler way to say some of this in group theory. Anyone still in school
who has the chance to take some should do so... I have always regretted
not having it at my disposal terminologically.

Geoff Summerhayes

unread,

Apr 5, 2002, 4:33:01 PM4/5/02

to

"Erik Naggum" <er...@naggum.net> wrote in message
news:32270244...@naggum.net...

> * Kent M Pitman
> | ROUND in this case is separating the 5 from the .1 but I don't think
> | there is a loss of information in that particular operation; I think
> | it's "exact" insofar as the inputs were exact, because there is no
> | increase in magnitude that would force a loss of precision.
>
> Well, actually, there is. If 5.1 and 0.1 both use n bits of precision,
> (- 5.1 5.0) actually ends up using n-3 bits of precision for the 0.1
> return value, effectively replacing the three least significant bits with
> zeros. Since 0.1 has a bit pattern of repeating groups of 1100, losing
> the three least significant bits must lead to a value different from 0.1.
>

I threw this together today after not finding anything
to print out a float in a base other than decimal.
It's more for exploration of floating point in
different bases than anything else. I was aiming to
write something that produced something more along
the lines of -1.110110*10^1010 without really ramping
up the error in the representation, perhaps over the
weekend.

(defun print-float-base(float &optional (base 10) (stream *standard-output*))
(let ((*print-base* base)
(*print-radix* t))
(multiple-value-bind (signif expon sign)
(integer-decode-float float)
(format stream "~&#.(* ~A (float ~A) (expt ~A ~A))~%"
sign signif (float-radix float) expon))))

Example:

(print-float-base 5.1 2)
#.(* #b1 (float #b10100011001100110011001100110011001100110011001100110)
(expt #b10 #b-110010))

(print-float-base 0.1 2)
#.(* #b1 (float #b11001100110011001100110011001100110011001100110011010)
(expt #b10 #b-111000))

(print-float-base (- 5.1 5.0) 2)
#.(* #b1 (float #b11001100110011001100110011001100110011001100110000000)
(expt #b10 #b-111000))

(- 5.1 5.0 0.1)
-3.608224830031759E-16

-----------
Geoff

Erik Naggum

unread,

Apr 5, 2002, 6:08:36 PM4/5/02

to

* "Geoff Summerhayes"

| I threw this together today after not finding anything to print out a
| float in a base other than decimal. It's more for exploration of
| floating point in different bases than anything else. I was aiming to
| write something that produced something more along the lines of
| -1.110110*10^1010 without really ramping up the error in the
| representation, perhaps over the weekend.
|
| (defun print-float-base(float &optional (base 10) (stream *standard-output*))
| (let ((*print-base* base)
| (*print-radix* t))
| (multiple-value-bind (signif expon sign)
| (integer-decode-float float)
| (format stream "~&#.(* ~A (float ~A) (expt ~A ~A))~%"
| sign signif (float-radix float) expon))))

scale-float is a more efficient way of changing the exponent. Also
remember to use a (unit) float of the same type as the argument float for
the float call, or you end up just playing with single-floats. Quoted
from the standard on the integer-decode-float page:

(multiple-value-bind (signif expon sign)
(integer-decode-float f)
(scale-float (float signif f) expon)) == (abs f)

I tend to use the function rational to decode floating point numbers.

Joe Marshall

unread,

Apr 6, 2002, 12:31:53 AM4/6/02

to

Remember that all floats are really rational numbers in disguise.
The denominator must be a power of two, and the numerator must
be in the range 4503599627370496 to 9007199254740991
(assuming double precision normalized).

So the closest float to the decimal number 5.1 is

5742089524897382 / 1125899906842624

This is actually slightly less than 51/10, but we don't have
the luxury of picking the denominator in a floating point
number, and out of these numerators, the middle one is the
closest.

5742089524897381
5742089524897382
5742089524897383

(To see this, multiply 1125899906842624 by 51 and divide by 10)

Now we take the floor. We divide 5742089524897382 by
1125899906842624, getting 5 with a remainder of

112589990684262 / 1125899906842624

The denominator is still a power of two, but the numerator
is well below the minimum of 4503599627370496. This won't
do, but it is easily corrected. We multiply the numerator
and denominator by 2 until numerator *is* in the right range.

112589990684262 * 64 = 7205759403792768
1125899906842624 * 64 = 72057594037927936

And our remainder is therefore

7205759403792768 / 72057594037927936

Now we want to print this fraction as a decimal equivalent.
There is a trick here. The decimal expansion may not be of
finite length, so we'll have to be prepared to truncate it.
But we also want it to be the case that when the decimal
expansion is read back in, the same bit pattern will be
constructed by the reader.

1 / 72057594037927956 is about .0000000000000000139,
so we'll have to print out about 17 digits (unless we're
lucky and the last few are zero). As it turns out,
the number 0.09999999999999964 is the shortest decimal
number that, when read in, becomes

7205759403792768 / 72057594037927936

> Fernando Rodríguez <fr...@wanadoo.es> writes:
>
> Floating-point arithmetic isn't exact.

In this case, the arithmetic was exact. What was inexact
was the use of 5742089524897382 / 1125899906842624 as a
substitute for 51/10.

"Kent M Pitman" <pit...@world.std.com> wrote in message
news:sfwn0wi...@shell01.TheWorld.com...

> I don't think there is a loss of information in that particular
> operation; I think it's "exact" insofar as the inputs were
> exact, because there is no increase in magnitude that would
> force a loss of precision.

Kent is correct, there was no information lost by the round
operation or the printing of the result.

Joe Marshall

unread,

Apr 6, 2002, 1:28:24 AM4/6/02

to

"Erik Naggum" <er...@naggum.net> wrote in message
news:32270244...@naggum.net...

> * Kent M Pitman
> | ROUND in this case is separating the 5 from the .1 but I don't think
> | there is a loss of information in that particular operation; I think
> | it's "exact" insofar as the inputs were exact, because there is no
> | increase in magnitude that would force a loss of precision.
>
> Well, actually, there is. If 5.1 and 0.1 both use n bits of precision,
> (- 5.1 5.0) actually ends up using n-3 bits of precision for the 0.1
> return value, effectively replacing the three least significant bits
with
> zeros. Since 0.1 has a bit pattern of repeating groups of 1100, losing
> the three least significant bits must lead to a value different from
0.1.

There isn't any precision lost in the subtraction.

Both 5.1 and 5.0, when represented in floating point, start with
a bit pattern of 101000.....
When you subtract them, the six most significant bits are zero.
The float is renormalized by multiplying the numerator and
denominator by 2 (i.e. shifting the mantissa and incrementing the
exponent) until the most significant bit is a one.

Where you lose precision is in the construction of 5.1 as a
floating point number. 5.1 is represented internally as
5742089524897382 / 1125899906842624

and 5.0 is represented as
5629499534213120 / 1125899906842624

subtracting yields
112589990684262 / 1125899906842624

multiplying the numerator and denominator by 64 gives this
7205759403792768 / 72057594037927936

Now the decimal .1 is represented internally as
7205759403792794 / 72057594037927936

which is a bit different from the result of the subtraction,
but the real problem is that the floating point representation
for 5.1 is slightly smaller than 51/10, so the answer comes
out slightly smaller than 1/10.

The `cancellation' of the most significant six digits in
the subtraction does not introduce any more error than had
already been introduced by using an approximation to 5.1

Harald Hanche-Olsen

unread,

Apr 6, 2002, 7:41:51 AM4/6/02

to

+ Kent M Pitman <pit...@world.std.com>:

| That fact, in the aggregate, is the basis for why intuitions break
| down... intuitions assume you can gloss details that you can't
| gloss.

Isn't this one of the things education is about: Building a better
intuition? (I think I first heard this stated by a physicist.) Not
that your improved intuition can ever replace the hard work of
figuring out the details, but every new insight obtained should
improve your intuition so that even stuff that was previously counter-
intuitive is now obvious.

| I wish I had taken some group theory. I'm sure there
| is a simpler way to say some of this in group theory.

Group theory is useful for lots of things, but I can't see much of an
application for it in reasoning about floating point arithmetic. In
fact, one of the reasons this is so difficult is precisely that the
floating point numbers do not form a group, much less a ring or field,
as we would have liked them to do. In short, there is just no way to
understand floating point arithmetic without getting down in the
nitty-gritty and getting your hands dirty. There is (AFAIK) no simple
and elegant mathematical theory in which you can express these things.

The best I can come up with, terminologywise, to your diagram is this:
The set F of all floating point numbers of a certain kind is a finite
set of dyadically rational numbers (that is, rational numbers whose
denominators are powers of two). Along with F, we assume given a
mapping p: I->F, where I is an interval of real numbers, roughly (but
by no means exactly) extending from most-negative-whatever-float to
most-positive-whatever-float (but at least, I contains all of F).
Moreover, this mapping is a projection onto F in the sense that p(f)=f
whenever f belongs to F. But the mapping is many-to-one. The inverse
image of any f in F is an interval which I will write here simply as
[f] (for those familiar with TeX, p^{-1}[f] would be a more
conventional notation). All these intervals create a partition of the
interval I of approximately representable real numbers. An arithmetic
operation, such as +, is then in principle carried out within F as
follows: Replace the exact sum a+b by p(a+b), if a+b is within I.
(If not, we are facing an overflow, so we adjoin an artificial entity
Inf to assign to the "sum".)

Any rounding rules, such as truncate or round-to-even, are built into
the function p.

In the given example, 5.1 does not belong to F but 5 does. So one
really ends up subtracting 5 from p(5.1). That the result is
different from p(0.1) is just a special case of the general
observation that p(a)-p(b) is in general not equal to p(a-b), and
similarly with the other operations. However, when you rightly
observe that there is no loss of precision in this special case, what
you are really saying is that p(5.1)-5 already belongs to F, so in
this case the final step of projecting the result back into F is not
necessary.

So why wasn't floating point arithmetic designed so that the
(obviously very desirable) property p(x@y)=p(x)@p(y) holds for all the
four arithmetical operations (@ in {+ - * /})? The unfortunate answer
is that this is mathematically impossible to achieve.

I don't know if this way of seeing things really helps - to some of
the more mathematically oriented abstract thinkers it may, while to
others it's merely stating the obvious in a form of gobble-de-gook.
To the latter, my apologies. (You should have stopped reading before
you got to this point, though.)

--
* Harald Hanche-Olsen <URL:http://www.math.ntnu.no/~hanche/>
- Yes it works in practice - but does it work in theory?

Geoff Summerhayes

unread,

Apr 8, 2002, 2:40:53 PM4/8/02

to

"Erik Naggum" <er...@naggum.net> wrote in message

news:32270369...@naggum.net...

>
> scale-float is a more efficient way of changing the exponent. Also
> remember to use a (unit) float of the same type as the argument float for
> the float call, or you end up just playing with single-floats. Quoted
> from the standard on the integer-decode-float page:
>
> (multiple-value-bind (signif expon sign)
> (integer-decode-float f)
> (scale-float (float signif f) expon)) == (abs f)
>
> I tend to use the function rational to decode floating point numbers.
>

I avoided using scale-float because I wanted to get around the default
base eventually. Here's the revised function, first working draft, I'm
afraid it's not very pretty:

------------------------------

(defun print-float-base (float &optional (base 10) (stream *standard-output*))
"Print an approximation of a floating-point number
in the specified base to a stream. The representation
is not readable and has the format -M.MMMMM..M*BASE^EXPONENT
All components are printed in the base."
(let ((fraction (rational float))
(power 0)
(*print-base* base)
;; estimate length of significant digits in output
(digits (ceiling (log (expt (float-radix float)
(float-digits float)) base))))
;; normalize float
(do ((integer (truncate fraction)
(truncate fraction)))
((< integer base))
(progn
(incf power)
(setf fraction (/ fraction base))))
(do ((integer (truncate fraction)
(truncate fraction)))
((or (zerop float) (not (zerop integer))))
(progn
(decf power)
(setf fraction (* fraction base))))
;; print mantissa
(multiple-value-bind (integer remainder)
(truncate fraction)
(let ((*print-radix* (not (= 10 base))))
(format stream "~A." integer))
(if (not (zerop remainder))
(progn
(setf remainder (abs remainder))
(do ((x 0 (1+ x)))
((or (zerop remainder) (> x digits)))
(multiple-value-setq (integer remainder)
(truncate (* remainder base)))
(let ((*print-radix* nil))
(format stream "~A" integer))))
(format stream "0"))
;; print base and exponent
;; the printed representation of base is always
;; `10' in that base
(let ((*print-radix* nil))
(format stream "*10^~A" power)))))

-----------------------

CL-USER 132 > (print-float-base 5.1)
5.09999999999999964*10^0
NIL

CL-USER 133 > (print-float-base 5.1 2)
#b1.010001100110011001100110011001100110011001100110011*10^10
NIL

CL-USER 134 > (print-float-base 5.1 3)
#3r1.20022002200220022002200220022002110*10^1
NIL

CL-USER 135 > (print-float-base 5.1 16)
#x5.1999999999998*10^0
NIL

---------

Geoff

William D Clinger

unread,

Apr 8, 2002, 6:46:29 PM4/8/02

to

> http://cch.loria.fr/documentation/IEEE754/ACM/goldberg.pdf
>
> should answer this any many related questions.

For more up-to-date information on floating point i/o, see

http://www.acm.org/pubs/citations/proceedings/pldi/93542/p92-clinger/
http://www.acm.org/pubs/citations/proceedings/pldi/93542/p112-steele/

Will