float and double precision

carso...@gmail.com

unread,

Mar 26, 2009, 12:33:23 PM3/26/09

to

so, I hear that the float and double intrinsic types don't have very
good precision, due to the way which they store the data--they are
meant to be stored as 1.xxxxx right? Not 1000.xxx or 0.xxx, anyways--
if something like,

float x=1000;
float y=1000.43;
std::cout<< y-x<< std::endl;

will result in 0.429xxxx or whatever,
why can calculator do it?
Is there some other type that can be used to store precise numbers of
any size?

Victor Bazarov

unread,

Mar 26, 2009, 1:33:46 PM3/26/09

to

carso...@gmail.com wrote:
> so, I hear that the float and double intrinsic types don't have very
> good precision, due to the way which they store the data--they are
> meant to be stored as 1.xxxxx right?

Actually, it's 0.1xxxxx in binary, usually. IOW, the mantissa value is
always in the range [0.5, 1).

> Not 1000.xxx or 0.xxx, anyways--
> if something like,
>
> float x=1000;
> float y=1000.43;
> std::cout<< y-x<< std::endl;
>
> will result in 0.429xxxx or whatever,

It could.

> why can calculator do it?

Because it probably uses more digits of precision than 'float'...

> Is there some other type that can be used to store precise numbers of
> any size?

Yes, look on the web for "arbitrary precision floating point library".
Or you could use rationals (if your algorithm allows that). Or go for
some kind of mathematical formula for the number. You're still going to
be SOL with numbers like Pi or e (which aren't from a formula, really).

The built-in FP types are limited, there are only three. In addition to
the two you've named there is the 'long double', which is allowed to be
implemented as 'double'. Sucks, don' it?

V
--
Please remove capital 'A's when replying by e-mail
I do not respond to top-posted replies, please don't ask

osmium

unread,

Mar 26, 2009, 2:16:05 PM3/26/09

to

<carso...@gmail.com> wrote:

If calculators expressed numbers in binary form, they would have the general
problem you allude to. I think they use a 4-bit group to represent a single
decimal digit, and the exponent is similarly, represented as a separate, but
associated, 2 decimal digit datum. IOW, it is not what a computer person
would recognize as a floating point form. It is closer to what is called
binary coded decimal but it is not that either. But you can get a better
idea of what I am talking about looking up BCD, on Wiki say. Using 4 bits
to represent only 10 digits is wasteful, so be it.

Juha Nieminen

unread,

Mar 27, 2009, 2:58:29 AM3/27/09

to

Victor Bazarov wrote:
>> Is there some other type that can be used to store precise numbers of
>> any size?
>
> Yes, look on the web for "arbitrary precision floating point library".

Not that it will be of any help with regard to rounding errors as a
result of calculations/conversions. A value like 0.1 cannot be
represented accurately with base-2 floating point numbers even if you
use a gigabyte of RAM to store it (but you will get pretty darn close,
though).

Carson Myers

unread,

Mar 27, 2009, 3:18:23 AM3/27/09

to

would it be practical to use two int values, do you think?
Like in a class--one for the fractional part, and one for the whole-
number part-
that way I suppose it would be possible to avoid rounding errors since
the fractional part wouldn't really be treated as a fractional part,
but rather as a regular integer--you'd just have to worry about
handling the math and the behavior of the <1 portion of it yourself,
which would (I can imagine) be slow.

However I don't really understand how 0.1 could not be represented...
I've read about how it's a computer science problem, don't fully
understand but can vaguely grasp the concept (haven't read very much)--
but my compiler will still output 0.1 for float and double--if I
compiled "float x=0.1; std::cout<<x<<std::endl;" on another compiler
or system it may show something else? Unbelievable...

Tim Love

unread,

Mar 27, 2009, 3:34:09 AM3/27/09

to

>However I don't really understand how 0.1 could not be represented...
>I've read about how it's a computer science problem, don't fully
>understand but can vaguely grasp the concept (haven't read very much)--

http://www.eason.com/library/math/floatingmath.pdf is often mentioned
(What Every Computer Scientist Should Know About Floating-Point Arithmetic)
though http://www.mathworks.com/support/tech-notes/1100/1108.html might be
an easier read.
It's worth knowing about even if you're not a Computer Scientist - spreadsheets exhibit the same trouble - you can't assume that 117/9 and 11.7/.9 are equal, or that adding a, b, and c (in that order) will give you the same answer as adding c, b, and a (in that order). Tough life, which is why programmers are paid so much.

James Kanze

unread,

Mar 27, 2009, 5:27:56 AM3/27/09

to

On Mar 26, 6:33 pm, Victor Bazarov <v.Abaza...@comAcast.net> wrote:

> carsonmy...@gmail.com wrote:
> > so, I hear that the float and double intrinsic types don't
> > have very good precision, due to the way which they store
> > the data--they are meant to be stored as 1.xxxxx right?

> Actually, it's 0.1xxxxx in binary, usually. IOW, the mantissa
> value is always in the range [0.5, 1).

> > Not 1000.xxx or 0.xxx, anyways--

> > if something like,

> > float x=1000;
> > float y=1000.43;
> > std::cout<< y-x<< std::endl;

> > will result in 0.429xxxx or whatever,

> It could.

> > why can calculator do it?

It can't, in general. Try something like 1.0/3.0.

> Because it probably uses more digits of precision than
> 'float'...

Or because it uses decimal arithmetic. Which is not only
slower (usually), but has the disadvantage of variable
precision.

If you're doing bookkeeping, or working in some other context
where the rounding rules are determined by a legal specification
based on decimal arithmetic, then you need a decimal class which
does decimal arithmetic. Typically, however, such applications
aren't "numbers crunchers", so you can afford the extra runtime.

> > Is there some other type that can be used to store precise
> > numbers of any size?

> Yes, look on the web for "arbitrary precision floating point
> library".

I'd be interested in seeing one capable of storing the exact
value of pi, or even the exact value of sqrt(2.0). Some numbers
require infinite precision in any base.

In practice, even simple division is a problem. You can only
store 1/n precisely in a finite number of bits if the base being
used is n or a multiple of n. In order to guarantee exactness,
you'd have to use some sort of rational representation.

Note that 10 is a multiple of 2, so with enough bits, you can
store any decimal representation. But this just begs the
question: numbers don't always come from literals or input
strings; they are also the result of expressions like a/b or
sqrt(c).

> Or you could use rationals (if your algorithm allows that).
> Or go for some kind of mathematical formula for the number.
> You're still going to be SOL with numbers like Pi or e (which
> aren't from a formula, really).

They can be expressed as the results of an equation.

> The built-in FP types are limited, there are only three. In
> addition to the two you've named there is the 'long double',
> which is allowed to be implemented as 'double'. Sucks, don'
> it?

Although neither made it into the final draft, there were
proposals on the table for decimal arithmetic and a rational
class. For that matter, I think the decimal arithmetic is being
adopted in the form of a technical report or something like
that.

--
James Kanze (GABI Software) email:james...@gmail.com
Conseils en informatique orientée objet/
Beratung in objektorientierter Datenverarbeitung
9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34

James Kanze

unread,

Mar 27, 2009, 5:40:04 AM3/27/09

to

On Mar 26, 7:16 pm, "osmium" <r124c4u...@comcast.net> wrote:

> <carsonmy...@gmail.com> wrote:
> > so, I hear that the float and double intrinsic types don't
> > have very good precision, due to the way which they store
> > the data--they are meant to be stored as 1.xxxxx right? Not
> > 1000.xxx or 0.xxx, anyways-- if something like,

> > float x=1000;
> > float y=1000.43;
> > std::cout<< y-x<< std::endl;

> > will result in 0.429xxxx or whatever, why can calculator do
> > it? Is there some other type that can be used to store
> > precise numbers of any size?

> If calculators expressed numbers in binary form, they would
> have the general problem you allude to. I think they use a
> 4-bit group to represent a single decimal digit, and the
> exponent is similarly, represented as a separate, but
> associated, 2 decimal digit datum. IOW, it is not what a
> computer person would recognize as a floating point form.

It sounds like classical floating point to me. I don't know of
any modern machines which use decimal (although IBM mainframes
use base 16, and Unisys mainframes base 8), but they've
certainly existed in the past (e.g. IBM 1401). The C++ standard
references the C standard for this---see §5.2.4.2.2 in the C
standard.

> It is closer to what is called binary coded decimal but it is
> not that either.

It corresponds exactly to BCD: the number is broken down into
four bit blocks, each of which may take on a value of 0 to 9.

> But you can get a better idea of what I am talking about
> looking up BCD, on Wiki say. Using 4 bits to represent only
> 10 digits is wasteful, so be it.

On most machines, it is also slower. More importantly, it means
that the actual precision varies somewhat according to the
stored value; i.e. it has the same problems as IBM's base 16
format. (I'm not competent enough in numerical processing to
judge myself, but I know that some specialists complain loudly
about this.)

Martin Eisenberg

unread,

Mar 27, 2009, 7:16:13 AM3/27/09

to

James Kanze wrote:
>> carso...@gmail.com wrote:

>> > if something like,
>> >
>> > float x=1000;
>> > float y=1000.43;
>> > std::cout<< y-x<< std::endl;
>> >
>> > will result in 0.429xxxx or whatever,

>> > why can calculator do it?

Actually they fudge it for your viewing pleasure by displaying
results rounded to less digits than they carry internally. To be
fair, there is a solid reason for that arrangement -- it stands a
reasonable chance to keep rounding errors, inevitably accumulating
over a chain of operations, out of the numbers that engineers end up
copying to their notebooks.

> In practice, even simple division is a problem. You can only
> store 1/n precisely in a finite number of bits if the base being
> used is n or a multiple of n. In order to guarantee exactness,
> you'd have to use some sort of rational representation.
>
> Note that 10 is a multiple of 2, so with enough bits, you can
> store any decimal representation.

You've used your own previous statement in the wrong direction.
Consider that 0.1 (dec) is periodic in binary.

Martin

--
Why is 6 afraid of 7?
Because 7 8 9.

Victor Bazarov

unread,

Mar 27, 2009, 9:17:20 AM3/27/09

to

Carson Myers wrote:
> would it be practical to use two int values, do you think?

That's what the algorithms based on rational numbers do. No way to
precisely represent Pi or e or the square root of 2 on those, however.

> Like in a class--one for the fractional part, and one for the whole-
> number part-
> that way I suppose it would be possible to avoid rounding errors since
> the fractional part wouldn't really be treated as a fractional part,
> but rather as a regular integer--you'd just have to worry about
> handling the math and the behavior of the <1 portion of it yourself,
> which would (I can imagine) be slow.
>
> However I don't really understand how 0.1 could not be represented...

Try calculating the binary representation of it. It's a good exercise.

> [..]