How to display a "double" in all its precision???

CS Imam

unread,

Aug 6, 2006, 10:59:05 PM8/6/06

to

Hello,

Here is a code fragment that is very simple... but I can't get it to
work!

public static void main(String[] args)
{
for (int i = 1; i <= 30 ; i++)
{
double x = Math.pow(2, i);
x = 1 + 1 / x;
System.out.printf("For i = %d: %.40f%n", i, x);
System.out.println(
Long.toBinaryString(Double.doubleToLongBits(x)) );
System.out.println();
}
}

All this code is supposed to do is print out the fractions 1+1/2,
1+1/4, 1+1/8, etc. When one prints out the raw bits (see
doubleToLongBits), the code is clearly working.

But on the regular printf("For i...etc"), at i=17 and above, the
numbers get frozen at 16 digits displayed after the decimal point (the
precision). But it's not really the precision, because the bits ARE
changing correctly. What gives???

Help!

- not a stunningly gorgeous woman who would marry you if you solve this
problem

EJP

unread,

Aug 6, 2006, 11:24:53 PM8/6/06

to

CS Imam wrote:

> But on the regular printf("For i...etc"), at i=17 and above, the
> numbers get frozen at 16 digits displayed after the decimal point (the
> precision). But it's not really the precision, because the bits ARE
> changing correctly. What gives???

There *is no more* precision. A double has 53 bits of binary precision
which is about 16 decimal digits. The 40 decimal digits you're expecting
would take 133 bits.

Maybe you're getting confused between 40 bits of binary precision and 40
decimal digits?

CS Imam

unread,

Aug 6, 2006, 11:28:47 PM8/6/06

to

Thank you for your reply, but here is the output to clarify what I am
seeing. As you will see, the binary representation shows that the
double is more than capable of representing the numbers in question
(1/2^i where i goes from 1 to 30). As you said, the double gives 52
bits of precision. However, when displaying the number in decimal, Java
appears to be unable to display it correctly - why does it properly
display in base 2, but not in base 10?

I hope I was clearer this time - please pardon me if not! But thanks
for your help...

For i = 1: 1.5000000000000000000000000000000000000000
11111111111000000000000000000000000000000000000000000000000000

For i = 2: 1.2500000000000000000000000000000000000000
11111111110100000000000000000000000000000000000000000000000000

For i = 3: 1.1250000000000000000000000000000000000000
11111111110010000000000000000000000000000000000000000000000000

For i = 4: 1.0625000000000000000000000000000000000000
11111111110001000000000000000000000000000000000000000000000000

For i = 5: 1.0312500000000000000000000000000000000000
11111111110000100000000000000000000000000000000000000000000000

For i = 6: 1.0156250000000000000000000000000000000000
11111111110000010000000000000000000000000000000000000000000000

For i = 7: 1.0078125000000000000000000000000000000000
11111111110000001000000000000000000000000000000000000000000000

For i = 8: 1.0039062500000000000000000000000000000000
11111111110000000100000000000000000000000000000000000000000000

For i = 9: 1.0019531250000000000000000000000000000000
11111111110000000010000000000000000000000000000000000000000000

For i = 10: 1.0009765625000000000000000000000000000000
11111111110000000001000000000000000000000000000000000000000000

For i = 11: 1.0004882812500000000000000000000000000000
11111111110000000000100000000000000000000000000000000000000000

For i = 12: 1.0002441406250000000000000000000000000000
11111111110000000000010000000000000000000000000000000000000000

For i = 13: 1.0001220703125000000000000000000000000000
11111111110000000000001000000000000000000000000000000000000000

For i = 14: 1.0000610351562500000000000000000000000000
11111111110000000000000100000000000000000000000000000000000000

For i = 15: 1.0000305175781250000000000000000000000000
11111111110000000000000010000000000000000000000000000000000000

For i = 16: 1.0000152587890625000000000000000000000000
11111111110000000000000001000000000000000000000000000000000000

For i = 17: 1.0000076293945312000000000000000000000000
11111111110000000000000000100000000000000000000000000000000000

For i = 18: 1.0000038146972656000000000000000000000000
11111111110000000000000000010000000000000000000000000000000000

For i = 19: 1.0000019073486328000000000000000000000000
11111111110000000000000000001000000000000000000000000000000000

For i = 20: 1.0000009536743164000000000000000000000000
11111111110000000000000000000100000000000000000000000000000000

For i = 21: 1.0000004768371582000000000000000000000000
11111111110000000000000000000010000000000000000000000000000000

For i = 22: 1.0000002384185790000000000000000000000000
11111111110000000000000000000001000000000000000000000000000000

For i = 23: 1.0000001192092896000000000000000000000000
11111111110000000000000000000000100000000000000000000000000000

For i = 24: 1.0000000596046448000000000000000000000000
11111111110000000000000000000000010000000000000000000000000000

For i = 25: 1.0000000298023224000000000000000000000000
11111111110000000000000000000000001000000000000000000000000000

For i = 26: 1.0000000149011612000000000000000000000000
11111111110000000000000000000000000100000000000000000000000000

For i = 27: 1.0000000074505806000000000000000000000000
11111111110000000000000000000000000010000000000000000000000000

For i = 28: 1.0000000037252903000000000000000000000000
11111111110000000000000000000000000001000000000000000000000000

For i = 29: 1.0000000018626451000000000000000000000000
11111111110000000000000000000000000000100000000000000000000000

For i = 30: 1.0000000009313226000000000000000000000000
11111111110000000000000000000000000000010000000000000000000000

-----------------

Patricia Shanahan

unread,

Aug 7, 2006, 12:22:19 AM8/7/06

to

CS Imam wrote:
> Hello,
>
> Here is a code fragment that is very simple... but I can't get it to
> work!
>
> public static void main(String[] args)
> {
> for (int i = 1; i <= 30 ; i++)
> {
> double x = Math.pow(2, i);
> x = 1 + 1 / x;
> System.out.printf("For i = %d: %.40f%n", i, x);
> System.out.println(
> Long.toBinaryString(Double.doubleToLongBits(x)) );
> System.out.println();
> }
> }
>
> All this code is supposed to do is print out the fractions 1+1/2,
> 1+1/4, 1+1/8, etc. When one prints out the raw bits (see
> doubleToLongBits), the code is clearly working.
>
> But on the regular printf("For i...etc"), at i=17 and above, the
> numbers get frozen at 16 digits displayed after the decimal point (the
> precision). But it's not really the precision, because the bits ARE
> changing correctly. What gives???

Double.toString, used implicitly in conversion of x to a String,
produces the shortest string that, when converted back to double, will
produce the original number.

BigDecimal is the easiest way I know to get all the digits:

System.out.println(new BigDecimal(x));

Patricia

EJP

unread,

Aug 7, 2006, 12:41:11 AM8/7/06

to

CS Imam wrote:

No, 16 decimal places really *is* the precision. The *binary* bits
change beyond 16 because the radix is binary not decimal. The 16th
decimal digit expresses a lot more precision than the 16th binary digit.
You need to understant that. In fact I don't understand what you are
expecting to see. The last decimal number printed is 1.0000000009313226
and the last binary number printed converts precisely back to that.
Nothing is being lost. You can't get 40 decimal digits out of 53 bits,
you can only get 16.

Patricia Shanahan

unread,

Aug 7, 2006, 1:06:28 AM8/7/06

to

For any number that can be expressed as a terminating binary fraction,
including any number that is representable in Java double, there is a
unique decimal fraction that is EXACTLY equal to it, not just a
close-enough approximation. I believe that is the answer the OP is
looking for.

(In another message, I suggested getting it via BigDecimal).

Patricia

CS Imam

unread,

Aug 7, 2006, 3:16:49 AM8/7/06

to

Thanks again for replying, you and Patricia both. The BigDecimal DID
work, and that is great to solve the problem at hand.

But I'm really interested in understanding my... misunderstanding I
suppose.

I am not looking for 40 places of decimal precision; I only used
"%.40d" as an overkill to see "all the numbers". I *am* aware that
doubles are supposed to give only 15 digits of decimal precison
approximately. However, what I find puzzling is that in binary, we are
supposed to get 52 (not 53 as far as I know) bits of precision. So here
is my misunderstanding: I see the bits changing in binary. And yet when
they are converted into decimal through the "prints" (and Patricia
pointed out that the problem is really in "toString"), the decimal
equivalent is NOT precise. And this does not make sense to me. If the
underlying raw number in binary IS precise, then the converted decimal
number should be precise as well, right?

When you wrote:

> The last decimal number printed is 1.0000000009313226
> and the last binary number printed converts precisely back to that.
> Nothing is being lost. You can't get 40 decimal digits out of 53 bits,
> you can only get 16.

Actually as far as I know, something IS being lost. The last binary
number, if you convert it to decimal, should be:

1.000000000931322574615478515625

In binary, the underlying bits are as follows:

11111111110000000000000000000000000000010000000000000000000000

So again, if the underlying bits are precisely expressing some number
within the 52 bits of accuracy, why does converting it to a decimal
representation fail?

I really apologize if I am not seeing something that you are
explaining!

thanks, and sorry again.

Chris Uppal

unread,

Aug 7, 2006, 5:14:25 AM8/7/06

to

CS Imam wrote:

> However, what I find puzzling is that in binary, we are
> supposed to get 52 (not 53 as far as I know) bits of precision. So here
> is my misunderstanding: I see the bits changing in binary. And yet when
> they are converted into decimal through the "prints" (and Patricia
> pointed out that the problem is really in "toString"), the decimal
> equivalent is NOT precise. And this does not make sense to me. If the
> underlying raw number in binary IS precise, then the converted decimal
> number should be precise as well, right?

I think what you may be missing is that there is a /range/ of precise decimal
numbers which would all have the same representation as a double. So, although
any given double converts exactly into precisely one arbitrary-precision
decimal number, that number is not the only one which the double value may be
"trying" to represent.

The string representation has to /choose/ one value from the infinite set of
arbitrary-precision decimal numbers which the double value might be intended
to represent. One option would be to chose the unique element which was
exactly equal to the double, but that's not the only possible design. In fact
(and defensibly, IMO, although it would be nice to have a choice) the element
which is chosen is the one with the fewest digits -- otherwise, for instance,
0.1D would print out as:
0.1000000000000000055511151231257827021181583404541015625
which is certainly precise, but is probably not what most programmers (or
users) would wish to see.

-- chris

hiwa

unread,

Aug 7, 2006, 6:01:01 AM8/7/06

to

This is the funniest post of this summer.
The OP reminds me of some type of elderly people who stubbonly believe
in an impossible.
Thanks for a big laugh.

Seriously:
Learn and study IEEE 754 bit-array-representation of 64 bit FPN.

Patricia Shanahan

unread,

Aug 7, 2006, 10:02:10 AM8/7/06

to

hiwa wrote:
> This is the funniest post of this summer.
> The OP reminds me of some type of elderly people who stubbonly believe
> in an impossible.

Section 3.2.2 Double of ANSI/IEEE Std 754-1985 begins "A 64-bit double
format number X is divided as shown in in Fig. 2. The value v of X is
inferred from its constituent fields thus:" followed by a series of cases.

The relevant case is "(3) If 0 < e < 2047, then ..." followed by a
formula for the value of a finite, normalized, non-zero double number.
Using "^" for exponentiation, and "*" for multiplication, and "." for
the centered dot, it is equivalent to:

(-1)^s * 2^(e-1023 *(1 "." f) where s is the sign bit, e is the
exponent, and f is the fraction.

I interpreted the base article as requesting a print out, in decimal, of
that value. My BigDecimal suggestion, which does exactly that, worked
for the OP, so that seems to be the correct interpretation.

Why do you consider this to be impossible? Or do you disagree with my
interpretation?

> Thanks for a big laugh.

Although I find IEEE 754 floating point arithmetic both interesting and
useful, I'm completely missing its humor. I know explanations can
sometimes kill a joke, but perhaps you could explain what is so funny?

> Seriously:
> Learn and study IEEE 754 bit-array-representation of 64 bit FPN.
>

I've read the standard from cover to cover a couple of times, and reread
individual sections far more often. I've looked at many, many doubles as
bit patterns. To me, the OP's request seemed quite reasonable. Indeed,
it is something I've needed for myself when working on understanding
some of the subtleties of rounding, so I already had a solution. What am
I missing?

Patricia

Patricia Shanahan

unread,

Aug 7, 2006, 10:09:28 AM8/7/06

to

CS Imam wrote:
...

> I am not looking for 40 places of decimal precision; I only used
> "%.40d" as an overkill to see "all the numbers". I *am* aware that
> doubles are supposed to give only 15 digits of decimal precison
> approximately. However, what I find puzzling is that in binary, we are
> supposed to get 52 (not 53 as far as I know) bits of precision.

Chris has already responded to your main point.

I just want to clarify the reason for 53 bits of precision, rather than 52.

It is a consequence of floating point normalization. Even in non-binary
systems, there are advantages to avoiding, wherever possible, leading
zero digits in the mantissa. A "normalized" floating point format is one
in which the most significant digit of the mantissa is never zero.

For binary, there is an additional advantage. We know the leading digit
of the mantissa of a normalized float is a binary digit, and is not
zero. There is no point spending a bit in a dense format on a binary
digit that must be one, so it does not appear. Normalization buys us an
extra bit of precision.

Suppose you have a normalized double with 52 bit fraction f. The full
mantissa is 1.f, a 1 before the binary point, followed by the 52 bit
fraction after the binary point.

Patricia

jmcgill

unread,

Aug 7, 2006, 12:50:47 PM8/7/06

to

EJP wrote:
> There *is no more* precision. A double has 53 bits of binary precision
> which is about 16 decimal digits.

I've looked for a formula before to get this kind of constraint. That
is, to answer the question, "how many decimal digits of precision is
N bits of binary precision for a given floating point model?"

Maybe this is a question for a discrete math forum.

CS Imam

unread,

Aug 7, 2006, 1:13:30 PM8/7/06

to

Patricia,

Just wanted to thank you for your information. You turned out to be
right on the money. Those posters who are insisting that there is no
loss of precision, and that it is impossible to do better, and that
this is a funny post... you need to go back and read.

Based on what Patricia wrote, I did some searching on the net, and
found that this is indeed a difficult problem: what to print in decimal
format given a binary number. And it also turns out that it was
addressed in a classic paper by Guy Steele and Jon White. Here is a
link to the paper:

"How to Print Floating Point Numbers Accurately"

http://portal.acm.org/ft_gateway.cfm?id=989431&type=pdf&coll=portal&dl=ACM&CFID=15151515&CFTOKEN=6184618

You may read section 2 "Properties of Radix Conversion" to understand
the issues involved. If you don't have time, then read Patricia's
original answer to me. That was the succinct and correct answer, not
repeatedly insisting that nothing was being lost. Indeed, LOTS is being
lost... but on purpose it turns out.

- my hat is off to Ms. Shanahan

Patricia Shanahan

unread,

Aug 7, 2006, 1:56:46 PM8/7/06

to

The superficial, rough answer is that N bits have 2^N possible values
(using "^" for exponentiation). M decimal digits have 10^M possible values.

If 10^M = 2^N then M = log10(2^N) = N*log10(2)

So the rough answer is to multiply N by log10(2), about 0.301.

53*log10(2) is about 15.954, almost 16.

Once you get into actual arithmetic, everything gets more complicated.

Patricia

jmcgill

unread,

Aug 7, 2006, 3:24:12 PM8/7/06

to

Patricia Shanahan wrote:

> Once you get into actual arithmetic, everything gets more complicated.

Is it at least correct to claim that 53 bits of binary precision
guarantees no less than 15 decimal digits? Even this, no doubt,
depends on the exponent.

Patricia Shanahan

unread,

Aug 7, 2006, 3:35:00 PM8/7/06

to

I'm not sure I see how the exponent affects it that much, as long as you
are thinking of significant digits. Of course, if you are thinking of
digits after the decimal point, even a 20 digit decimal floating point
system does not guarantee 15 decimal digits.

Patricia

jmcgill

unread,

Aug 7, 2006, 3:44:30 PM8/7/06

to

Patricia Shanahan wrote:
> jmcgill wrote:
>> Patricia Shanahan wrote:
>>
>>> Once you get into actual arithmetic, everything gets more complicated.
>>
>> Is it at least correct to claim that 53 bits of binary precision
>> guarantees no less than 15 decimal digits? Even this, no doubt,
>> depends on the exponent.
>
> I'm not sure I see how the exponent affects it that much, as long as you
> are thinking of significant digits.

I suspected that larger exponents lead to more granularity in the ranges
or something like that.

Also, when I posted to the thread I somehow thought I was posting on a C
group, not java. I realize java programmers do not generally deal with
the bitwise evaluation of data.

I'm wondering all this because in other disciplines, "error bounds" is
always such an early, and often repeated, focus. Yet I had never seen
nor been asked for error bounds in IEEE numeric representations.

Patricia Shanahan

unread,

Aug 7, 2006, 4:16:39 PM8/7/06

to

jmcgill wrote:
> Patricia Shanahan wrote:
>> jmcgill wrote:
>>> Patricia Shanahan wrote:
>>>
>>>> Once you get into actual arithmetic, everything gets more complicated.
>>>
>>> Is it at least correct to claim that 53 bits of binary precision
>>> guarantees no less than 15 decimal digits? Even this, no doubt,
>>> depends on the exponent.
>>
>> I'm not sure I see how the exponent affects it that much, as long as you
>> are thinking of significant digits.
>
>
> I suspected that larger exponents lead to more granularity in the ranges
> or something like that.

I think you need to distinguish more between absolute and relative effects.

For example, the absolute difference x-y between two consecutive
representable numbers, x and y, is a strictly increasing function of the
exponent. The relative difference (x-y)/x varies within a given value of
the exponent, but does not increase with exponent.

> Also, when I posted to the thread I somehow thought I was posting on a C
> group, not java. I realize java programmers do not generally deal with
> the bitwise evaluation of data.

I'm neither a Java programmer nor a C programmer. I'm a programmer who
happens be using Java right now. I may ignore some details when working
in high level languages, but that does not mean I understand them any
less than when I'm working in assembly language.

> I'm wondering all this because in other disciplines, "error bounds" is
> always such an early, and often repeated, focus. Yet I had never seen
> nor been asked for error bounds in IEEE numeric representations.

There is a required accuracy for all the basic operations, specified in
the IEEE 754 standard, although some implementations confuse matters by
keeping intermediate results with more accuracy. That tends to reduce
the need for discussion.

A good library specification should discuss error bounds for those
functions whose implementation is allowed some slack. See, for example,
the Java API documentation for sin:

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Math.html#sin(double)

"The computed result must be within 1 ulp of the exact result."

("ulp" is short for Unit Least Place, a difference of one in the least
significant bit of the fraction. See the top of the referenced page for
a more detailed explanation.)

Beyond the basics, the subject rapidly gets very complex, see
"numerical analysis" in any good technical bookstore or library.
Floating point application accuracy is a difficult, but intensely
studied, subject.

Patricia

jmcgill

unread,

Aug 8, 2006, 2:14:52 PM8/8/06

to

If you're starting from scratch, start by truly understanding unsigned
and signed char, then unsigned and signed two's complement, then single
precision floating point, and then, with a full comprehension of that,
the full spec won't be that hard to understand.

Are there CS programs out there that don't include a computer
organization class where this stuff gets drilled into your brain?

The_Sage

unread,

Aug 8, 2006, 10:36:48 PM8/8/06

to

>Reply to article by: "CS Imam" <csi...@gmail.com>
>Date written: 7 Aug 2006 10:13:30 -0700
>MsgID:<1154970810.6...@n13g2000cwa.googlegroups.com>

>Just wanted to thank you for your information. You turned out to be
>right on the money. Those posters who are insisting that there is no
>loss of precision, and that it is impossible to do better, and that
>this is a funny post... you need to go back and read.

Isn't it obvious that your "lack of precision" is due to rounding off and
quantizing errors -- as would be expected?

Not all floating point decimal numbers can be exactly represented in
digit-restricted binary, therefore some truncation or rouding will occur during
conversion. This reduces precision.

Remember the math lecture that if all your calculations are done with, say
10-digits, your final answer will not be accurate to 10-digits due to rounding
errors? If you want 10-digit accuracy, you need more than 10-digits to work
with. The more operations you perform on a digit-restricted number, the less
accurate it becomes due to rounding off. This reduces precision and this is the
reason for the existence of "guard digits", ie -- 10-digit calculators use
13-digit calculations internally in order to maintain 10-digit accuracy.

Most application programmers do not list the internal precision of their math
routines, -- assuming that they are even aware of such math issues. Usually, if
you are really serious about your math, you will test the math routines for
accuracy by performing repetitive loops.

See http://support.microsoft.com/default.aspx?scid=kb;EN-US;q42980
See http://docs.sun.com/source/806-3568/ncg_goldberg.html

The Sage

=============================================================
http://members.cox.net/the.sage/index.htm

"All those painted screens erected by man to shut out reality
-- history, religion, duty, social position --
all were illusions, mere opium fantasies"
John Fowles, The French Lieutenant's Woman
=============================================================

Patricia Shanahan

unread,

Aug 9, 2006, 1:48:57 AM8/9/06

to

The_Sage wrote:
>> Reply to article by: "CS Imam" <csi...@gmail.com>
>> Date written: 7 Aug 2006 10:13:30 -0700
>> MsgID:<1154970810.6...@n13g2000cwa.googlegroups.com>
>
>> Just wanted to thank you for your information. You turned out to be
>> right on the money. Those posters who are insisting that there is no
>> loss of precision, and that it is impossible to do better, and that
>> this is a funny post... you need to go back and read.
>
> Isn't it obvious that your "lack of precision" is due to rounding off and
> quantizing errors -- as would be expected?

No, in this case it isn't at all obvious. Take another look at the
program in the base message of the thread.

The value being printed, x, is calculated as 1+1/Math.pow(2,i) where i
ranges from one to 30.

For i in the range one through thirty, each of Math.pow(2,i),
1/Math.pow(2,i) and 1+1/Math.pow(2,i) has a mathematical result that is
exactly representable as a double, and that is required to be the result
according to the Math.pow documentation and the JLS descriptions of
divide and add.

The OP knew that the calculations were exact, and that the final double
held the expected result.

The issue was entirely one of output formatting, and using BigDecimal it
is possible to get the decimal representation of the exact result.

Patricia

blm...@myrealbox.com

unread,

Aug 9, 2006, 3:16:22 AM8/9/06

to

In article <Qv4Cg.733$0F5.344@fed1read04>,

It's probably presented somewhere in most CS programs, but drilled
into the students' brains -- hm, I'm going to guess that not so
many of them do that. The ACM's most recent set of curriculum
guidelines (http://acm.org/education/curric_vols/cc2001.pdf) call
for spending about a week's worth of lecture time on bit-level
representations of various kinds of data, including integers and
floating point. You can only get across so much in a week.

And if you consider the general population of people trying to write
code, and not just those who are products of a formal CS program
somewhere .... If most programmers understood how floating point
works, would there be so many questions along the lines of "how
come when I divide 1.0 by 10 I don't get exactly one tenth?" ?

Not a good state of affairs, I agree.

--
B. L. Massingill
ObDisclaimer: I don't speak for my employers; they return the favor.

Chris Uppal

unread,

Aug 9, 2006, 5:41:13 AM8/9/06

to

jmcgill wrote:

> Are there CS programs out there that don't include a computer
> organization class where this stuff gets drilled into your brain?

I would imagine there are lots.

And I think that's defensible: a course could validly limit it's coverage of
floating-point to "don't use floating point (unless you know what you are
doing)". With an optional course component which covered not only
floating-point representation issues, but also issues of numerical stability
and the like. Few programmers would need the optional component, I would
think -- it would be of interest primarily to scientists and masochists.

-- chris

Patricia Shanahan

unread,

Aug 9, 2006, 11:03:08 AM8/9/06

to

I think some of the confusion in this thread may be a result of this
strategy. Programmers seem know floating point rounding error exists,
without being able to recognize exact calculations, or maybe even
without realizing that some floating point calculations do have exact
results.

Patricia

Chris Smith

unread,

Aug 9, 2006, 3:23:26 PM8/9/06

to

CS Imam <csi...@gmail.com> wrote:
> That was the succinct and correct answer, not
> repeatedly insisting that nothing was being lost. Indeed, LOTS is being
> lost... but on purpose it turns out.

I'll point out that while you and Patricia are right, the other
responses you got aren't as dumb as you seem to think. Specifically, no
information at all was lost in that display under the following two
assumptions:

(a) you understand a floating point value as representing a range of
possible mathematical values, as Chris Uppal pointed out; AND

(b) you know the original precision of the binary floating point number.

Under those assumptions, which are quite reasonable for most uses, you
got back a correct answer with no loss of information versus the
original. However, if you don't assume (a), then the answer is
incorrect; and if you don't assume (b), then information was lost.

Hope that clarifies,

--
Chris Smith - Lead Software Developer / Technical Trainer
MindIQ Corporation

Patricia Shanahan

unread,

Aug 9, 2006, 8:01:49 PM8/9/06

to

Chris Smith wrote:
> CS Imam <csi...@gmail.com> wrote:
>> That was the succinct and correct answer, not
>> repeatedly insisting that nothing was being lost. Indeed, LOTS is being
>> lost... but on purpose it turns out.
>
> I'll point out that while you and Patricia are right, the other
> responses you got aren't as dumb as you seem to think. Specifically, no
> information at all was lost in that display under the following two
> assumptions:
>
> (a) you understand a floating point value as representing a range of
> possible mathematical values, as Chris Uppal pointed out; AND

There are three problems with regarding a floating point number as
representing a range of possible mathematical values rather than as
corresponding to a unique real:

1. It conflicts with both the JLS and ANSI/IEEE Std 754-1985. Each gives
a formula for calculating the real number value of a floating point
number, based on the values of its bit fields. The formulas differ, but
give the same results.

2. It would make describing floating point operations much harder. Every
statement of the form "In the remaining cases, where neither an
infinity, nor a zero, nor NaN is involved, and the operands have the
same sign or have different magnitudes, the exact mathematical sum is
computed." would need to be replaced by a more complicated discussion in
terms of the ranges of the two floating point numbers.

3. There are different rounding ranges for different purposes. An add is
allowed at most half a ulp of rounding error, and must round the half
way between numbers towards even. Math.sin is allowed one ulp of
rounding error. Which range does a double x represent? Only the add
results that would round to it? Or does x's range include sine(y) if
Math.sin(y)==x?

I find it simpler to go with the specs, and think of each floating point
number as having a unique value, surrounded a range of real numbers that
would be rounded to it under the arithmetic rounding rules, and broader
ranges that could be rounded to it under some of the more relaxed
function evaluation rules.

>
> (b) you know the original precision of the binary floating point number.
>
> Under those assumptions, which are quite reasonable for most uses, you
> got back a correct answer with no loss of information versus the
> original. However, if you don't assume (a), then the answer is
> incorrect; and if you don't assume (b), then information was lost.
>
> Hope that clarifies,
>

Certainly I find the normal Java Double.toString result very practical
for most, but not all, purposes. Printing the shortest decimal number
that Double.valueOf(String) would round to the double is a reasonable
default.

Patricia

Chris Smith

unread,

Aug 10, 2006, 5:23:57 PM8/10/06

to

Patricia,

I believe that "conflicts" is too strong a word for the relationship
between a mental model of floating point numbers as ranges, and the JLS
and IEEE specs. A floating point value can have both a range of numbers
that it best represents, and also an exact mathematical value. It is
more useful to use the exact mathematical value for some purposes, and
the range for others.

I do suspect, though, that there is too much emphasis here on the exact
mathematical value of a floating point number. For most purposes, this
exact value is somewhat arbitrary from the perspective of the
programmer; it may or may not be precisely specified by the operations
(the "within one ulp" operations cause it to become unspecified), and
even when it is specified, it is still often not particularly relevant
to the intended operation. For most purposes, the most meaningful thing
that can be said about the exact value of the floating point number is
that it approximates the correct answer to some degree of accuracy that
depends on context. The same can be said of any other number that
rounds to that floating point value, and there's not necessarily any
good reason to choose one over another except that it happens to be
representable.

The ranges of values that are best represented by a given float are not
accuracy ranges and have nothing to do with the degree of accuracy of
the approximation, so the error in certain calculations is not relevant.
An operation can lack accuracy all it wants, and since floating point
numbers have no concept of accuracy, this would have to be tracked
elsewhere, in separate variables. All it means is that there's
generally no reason to believe that 0.100000001490116119384765625 is
really a better answer than 0.1 to that question. They are both within
the range of numbers that would be represented by a given float.

Chris Uppal

unread,

Aug 10, 2006, 5:32:19 AM8/10/06

to

Patricia Shanahan wrote:

> There are three problems with regarding a floating point number as
> representing a range of possible mathematical values rather than as
> corresponding to a unique real:

I don't think Chris Smith was suggesting quite that, and I certainly wasn't. I
wasn't saying that each double represents a range of arbitrary-precision
decimal numbers (or reals if you prefer), but that there is a range of such and
you don't (in general) know /which/ of them it represents.

The toString() operation doesn't know, and that's the point. So it has to
choose one of them more-or-less arbitrarily. The choice of the shortest one is
defensible on practical grounds, but I don't see any deeper justification (the
comment in the Steele/Wite paper that the OP linked to the effect that it has
"less information" strikes me as a nice bit of hand-waving, but no more than
that ;-)

-- chris

Chris Uppal

unread,

Aug 10, 2006, 5:33:00 AM8/10/06

to

Patricia Shanahan wrote:

[me:]

> > And I think that's defensible: a course could validly limit it's
> > coverage of floating-point to "don't use floating point (unless you

> > know what you are doing)". [...].

>
> I think some of the confusion in this thread may be a result of this
> strategy.

Quite possible. If so, then I don't the problem is the strategy, but that its
lessons haven't been learned. People are not avoiding floating-point as
eagerly as they should be ;-)

> Programmers seem know floating point rounding error exists,
> without being able to recognize exact calculations, or maybe even
> without realizing that some floating point calculations do have exact
> results.

That certainly applied to me for most of my career to date.

-- chris