The machine epsilon

jacob navia

unread,

Jun 29, 2007, 4:08:40 AM6/29/07

to

Hi.

Continuing with my tutorial, here is an entry I have added recently. I
hope it is not controversial. If you see any errors/ambiguities/etc please
just answer in this thread.

Thanks in advance for your help.
----------------------------------------------------------------------
The machine epsilon

The machine epsilon is the smallest number that changes the result of an
addition operation at the point where the representation of the numbers
is the densest. In IEEE754 representation this number has an exponent
value of the bias, and a fraction of 1. If you add a number smaller than
this to 1.0, the result will be 1.0. For the different representations
we have in the standard header <float.h>:

#define FLT_EPSILON 1.19209290e-07F // float
#define DBL_EPSILON 2.2204460492503131e-16 // double
#define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
// qfloat epsilon truncated so that it fits in this page...
#define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106

This defines are part of the C99 ANSI standard. For the standard types
(float, double and long double) this defines should always exist in
other compilers.

Here is a program that will find out the machine epsilon for a given
floating point representation.

#include <stdio.h>
int main(void)
{
double float_radix=2.0;
double inverse_radix = 1.0/float_radix;
double machine_precision = 1.0;
double temp = 1.0 + machine_precision;

while (temp != 1.0) {
machine_precision *= inverse_radix;
temp = 1.0 + machine_precision ;
printf("%.17g\n",machine_precision);
}
return 0;
}

Richard Heathfield

unread,

Jun 29, 2007, 4:39:24 AM6/29/07

to

jacob navia said:

<snip>

> For the different
> representations we have in the standard header <float.h>:
>
> #define FLT_EPSILON 1.19209290e-07F // float
> #define DBL_EPSILON 2.2204460492503131e-16 // double
> #define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
> // qfloat epsilon truncated so that it fits in this page...
> #define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106

Conforming implementations must not define QFLT_EPSILON in <float.h>

> This defines are part of the C99 ANSI standard.

Well, three of them are.

Your text suffers from your usual confusion between lcc-win32 and C.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999

jacob navia

unread,

Jun 29, 2007, 5:08:19 AM6/29/07

to

Richard Heathfield wrote:
> jacob navia said:
>
> <snip>
>
>> For the different
>> representations we have in the standard header <float.h>:
>>
>> #define FLT_EPSILON 1.19209290e-07F // float
>> #define DBL_EPSILON 2.2204460492503131e-16 // double
>> #define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
>> // qfloat epsilon truncated so that it fits in this page...
>> #define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
>
> Conforming implementations must not define QFLT_EPSILON in <float.h>

The C standard paragraph J.5.6: Common extensions:
J.5.6 Other arithmetic types
Additional arithmetic types, such as _ _int128 or double double, and
their appropriate conversions are defined (6.2.5, 6.3.1). Additional
floating types may have more range or precision than long double, may be
used for evaluating expressions of other floating types, and may be
used to define float_t or double_t.

>
>> This defines are part of the C99 ANSI standard.
>
> Well, three of them are.
>

You cut the next sentence!

"For the standard types (float, double and long double) this defines
should always exist in other compilers. "

This is a good example of BIAS when quoting.

Richard Tobin

unread,

Jun 29, 2007, 5:09:54 AM6/29/07

to

In article <4684cc3c$0$27366$ba4a...@news.orange.fr>,
jacob navia <ja...@jacob.remcomp.fr> wrote:

>>> This defines are part of the C99 ANSI standard.

Presumably you meant "these", not "this". And it would be less jargonish
to say "definitions" rather than "defines".

>> Well, three of them are.
>>
>
>You cut the next sentence!
>
>"For the standard types (float, double and long double) this defines
>should always exist in other compilers. "
>
>This is a good example of BIAS when quoting.

I don't think so. There's no need to say something that's wrong here,
even if you clarify it in the next sentence.

-- Richard

--
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.

jacob navia

unread,

Jun 29, 2007, 5:28:38 AM6/29/07

to

Richard Tobin wrote:
> In article <4684cc3c$0$27366$ba4a...@news.orange.fr>,
> jacob navia <ja...@jacob.remcomp.fr> wrote:
>
>>>> This defines are part of the C99 ANSI standard.
>
> Presumably you meant "these", not "this". And it would be less jargonish
> to say "definitions" rather than "defines".
>
>>> Well, three of them are.
>>>
>> You cut the next sentence!
>>
>> "For the standard types (float, double and long double) this defines
>> should always exist in other compilers. "
>>
>> This is a good example of BIAS when quoting.
>
> I don't think so. There's no need to say something that's wrong here,
> even if you clarify it in the next sentence.
>
> -- Richard
>

OK. Now it is:

These definitions (except the qfloat part) are part of the C99 ANSI
standard. For the standard types (float, double and long double) they
should always exist in other compilers. The type qfloat is an extension
of lcc-win32.

Richard Heathfield

unread,

Jun 29, 2007, 5:33:29 AM6/29/07

to

jacob navia said:

> Richard Heathfield wrote:
>> jacob navia said:
>>
>> <snip>
>>
>>> For the different
>>> representations we have in the standard header <float.h>:
>>>
>>> #define FLT_EPSILON 1.19209290e-07F // float
>>> #define DBL_EPSILON 2.2204460492503131e-16 // double
>>> #define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
>>> // qfloat epsilon truncated so that it fits in this page...
>>> #define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
>>
>> Conforming implementations must not define QFLT_EPSILON in <float.h>
>
> The C standard paragraph J.5.6: Common extensions:
> J.5.6 Other arithmetic types
> Additional arithmetic types, such as _ _int128 or double double, and
> their appropriate conversions are defined (6.2.5, 6.3.1). Additional
> floating types may have more range or precision than long double, may
> be
> used for evaluating expressions of other floating types, and may be
> used to define float_t or double_t.

What has that to do with what I said? I didn't say that implementations
can't provide extra types. I said a conforming implementation must not
define QFLT_EPSILON in <float.h> - that is not the same as saying that
implementations cannot provide extra types.

>>> This defines are part of the C99 ANSI standard.
>>
>> Well, three of them are.
>>
>
> You cut the next sentence!

Yes. It was not relevant to my point, which is that QFLT_EPSILON is not
part of the C Standard.

> "For the standard types (float, double and long double) this defines
> should always exist in other compilers. "
>
> This is a good example of BIAS when quoting.

No, it isn't. Your statement incorrectly claimed that QFLT_EPSILON was
part of the C99 Standard. The fact that it was followed by another
statement which didn't reiterate the claim does not make your original
claim correct.

Richard Heathfield

unread,

Jun 29, 2007, 5:40:38 AM6/29/07

to

jacob navia said:

<snip>

> OK. Now it is:
>
> These definitions (except the qfloat part) are part of the C99 ANSI
> standard. For the standard types (float, double and long double) they
> should always exist in other compilers. The type qfloat is an
> extension of lcc-win32.

Why mention lcc-win32 extensions in a C tutorial?

And if it's an lcc-win32 tutorial, why ask for comment in a C newsgroup?
Is there not an lcc-win32 newsgroup where you can discuss lcc-win32
extensions?

Richard Tobin

unread,

Jun 29, 2007, 5:44:45 AM6/29/07

to

In article <Teidna5iB9yQThnb...@bt.com>,
Richard Heathfield <r...@see.sig.invalid> wrote:

>Why mention lcc-win32 extensions in a C tutorial?
>
>And if it's an lcc-win32 tutorial, why ask for comment in a C newsgroup?
>Is there not an lcc-win32 newsgroup where you can discuss lcc-win32
>extensions?

You're not being reasonable. He's writing a C tutorial intended for
lcc-win32 users. Since most of it is standard C, I see no reason why
he shouldn't post about it here.

Richard Heathfield

unread,

Jun 29, 2007, 6:40:58 AM6/29/07

to

Richard Tobin said:

<snip>

> You're not being reasonable.

Shurely shome mishtake?

> He's writing a C tutorial

The internal evidence of his articles suggests otherwise.

<snip>

Eric Sosman

unread,

Jun 29, 2007, 7:31:10 AM6/29/07

to

jacob navia wrote:
> Hi.
>
> Continuing with my tutorial, here is an entry I have added recently. I
> hope it is not controversial. If you see any errors/ambiguities/etc please
> just answer in this thread.
>
> Thanks in advance for your help.
> ----------------------------------------------------------------------
> The machine epsilon
>
> The machine epsilon is the smallest number that changes the result of an
> addition operation at the point where the representation of the numbers
> is the densest.

"The densest" grates, first because you don't define what
"dense" means, and second because under the only reasonable
definition I can think of the statement is false. (Distinct
FP values are "denser" at smaller magnitudes, and FP numbers
support magnitudes considerably smaller than unity.)

> In IEEE754 representation this number has an exponent
> value of the bias, and a fraction of 1.

If by "the bias" you mean the traditional offset in the
encoding of the exponent, then I think this statement is wrong.
An FP epsilon has to do with the precision of the fraction,
not with the span of possible exponents.

> If you add a number smaller than
> this to 1.0, the result will be 1.0. For the different representations
> we have in the standard header <float.h>:

Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type). Why not just
start with this "practitioner's definition" instead of with the
ambiguities and misstatements? In addition to the virtue of
being a correct definition, it also captures the notion of the
"graininess" of FP numbers, and hence the reason why someone
would be interested in the value of epsilon.

--
Eric Sosman
eso...@acm-dot-org.invalid

Richard Tobin

unread,

Jun 29, 2007, 7:47:59 AM6/29/07

to

In article <V6mdnVPRuLoCcBnb...@comcast.com>,
Eric Sosman <eso...@acm-dot-org.invalid> wrote:

>> In IEEE754 representation this number has an exponent
>> value of the bias, and a fraction of 1.

> If by "the bias" you mean the traditional offset in the
>encoding of the exponent, then I think this statement is wrong.
>An FP epsilon has to do with the precision of the fraction,
>not with the span of possible exponents.

The epsilon is such that if it were de-normalised to have the same
exponent as 1.0, only the lowest bit of the fraction would be set. To
normalise it, you need to shift that lowest bit up into the hidden
bit, and adjust the exponent accordingly. For 1.0, the exponent is
equal to the bias. So the epsilon has an exponent equal to the bias
minus the number of fraction bits, and the fraction part is zero
(because of the hidden bit).

Eric Sosman

unread,

Jun 29, 2007, 8:35:34 AM6/29/07

to

Richard Tobin wrote:
> In article <V6mdnVPRuLoCcBnb...@comcast.com>,
> Eric Sosman <eso...@acm-dot-org.invalid> wrote:
>
>
>>> In IEEE754 representation this number has an exponent
>>> value of the bias, and a fraction of 1.
>
>> If by "the bias" you mean the traditional offset in the
>> encoding of the exponent, then I think this statement is wrong.
>> An FP epsilon has to do with the precision of the fraction,
>> not with the span of possible exponents.
>
> The epsilon is such that if it were de-normalised to have the same
> exponent as 1.0, only the lowest bit of the fraction would be set. To
> normalise it, you need to shift that lowest bit up into the hidden
> bit, and adjust the exponent accordingly. For 1.0, the exponent is
> equal to the bias. So the epsilon has an exponent equal to the bias
> minus the number of fraction bits, and the fraction part is zero
> (because of the hidden bit).

Right: Not equal to the bias, as I said. (And in any case
it's wrong to speak of the exponent *value* being equal to the
bias or to the bias minus delta: the exponent is 1-delta and
the bias crops up in the *encoding* of that value.)

--
Eric Sosman
eso...@acm-dot-org.invalid

Richard Tobin

unread,

Jun 29, 2007, 8:53:33 AM6/29/07

to

In article <p8ydnd7w3dsxYRnb...@comcast.com>,

Surely you mean 0-delta?

It does seem to be common to call the encoded value the exponent; I
suppose the context usually makes it clear. But I agree that it's
inaccurate - I should have said the "exponent bits" or the "encoded
exponent" or something like that.

Army1987

unread,

Jun 29, 2007, 9:06:26 AM6/29/07

to

"jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio news:4684cc3c$0$27366$ba4a...@news.orange.fr...

> Richard Heathfield wrote:
>> jacob navia said:
>>
>> <snip>
>>
>>> For the different
>>> representations we have in the standard header <float.h>:
>>>
>>> #define FLT_EPSILON 1.19209290e-07F // float
>>> #define DBL_EPSILON 2.2204460492503131e-16 // double
>>> #define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
>>> // qfloat epsilon truncated so that it fits in this page...
>>> #define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
>>
>> Conforming implementations must not define QFLT_EPSILON in <float.h>
>
> The C standard paragraph J.5.6: Common extensions:
> J.5.6 Other arithmetic types
> Additional arithmetic types, such as _ _int128 or double double, and
> their appropriate conversions are defined (6.2.5, 6.3.1). Additional
> floating types may have more range or precision than long double, may be
> used for evaluating expressions of other floating types, and may be
> used to define float_t or double_t.

Where does it allow you to use an identifier which doesn't begin
with an underscore and the standard never reserves?
What happens if i write a program with the lines
#include <float.h>
int main(int QFLT_EPSILON, char *argv[])
(yes, I'm allowed to do that) and try to compile it with your
compiler?

jacob navia

unread,

Jun 29, 2007, 9:56:10 AM6/29/07

to

Richard Tobin wrote:
> In article <V6mdnVPRuLoCcBnb...@comcast.com>,
> Eric Sosman <eso...@acm-dot-org.invalid> wrote:
>
>
>>> In IEEE754 representation this number has an exponent
>>> value of the bias, and a fraction of 1.
>
>> If by "the bias" you mean the traditional offset in the
>> encoding of the exponent, then I think this statement is wrong.
>> An FP epsilon has to do with the precision of the fraction,
>> not with the span of possible exponents.
>
> The epsilon is such that if it were de-normalised to have the same
> exponent as 1.0, only the lowest bit of the fraction would be set. To
> normalise it, you need to shift that lowest bit up into the hidden
> bit, and adjust the exponent accordingly. For 1.0, the exponent is
> equal to the bias. So the epsilon has an exponent equal to the bias
> minus the number of fraction bits, and the fraction part is zero
> (because of the hidden bit).
>
> -- Richard

That is what I am trying to say. The epsilon is
sign: 0
exponent: zero, i.e. the bias.
mantissa: 0000000000 0000000000 0000000000 0000000000 0000000000 01
<--10-->
Using printf:
printf("%a",DBL_EPSILON); 0x1.0000000000000p-052

jacob navia

unread,

Jun 29, 2007, 10:08:46 AM6/29/07

to

jacob navia wrote:
> mantissa: 0000000000 0000000000 0000000000 0000000000 0000000000 01

This is wrong, excuse me. The mantissa is normalized, and the bits are:
100000000 etc

The value is 2^-52--> 2.2204460492503130808472633361816e-16

Richard Tobin

unread,

Jun 29, 2007, 10:07:46 AM6/29/07

to

In article <46850fb7$0$27382$ba4a...@news.orange.fr>,
jacob navia <ja...@jacob.remcomp.fr> wrote:

>That is what I am trying to say. The epsilon is
>sign: 0
>exponent: zero, i.e. the bias.
>mantissa: 0000000000 0000000000 0000000000 0000000000 0000000000 01

If you put those bits into a double, you don't get the epsilon (try
it). You are writing it as a de-normalised representation, but if you
put those bits in a double they would be interpreted as a normalized
value, equal to 1+epsilon.

The IEEE representation of DBL_EPSILON is

sign bit: 0
exponent bits: 01111001011 (representing -52)
mantissa bits: 000.... (representing 1.0, because of the hidden bit)

Eric Sosman

unread,

Jun 29, 2007, 10:29:14 AM6/29/07

to

Eric Sosman wrote On 06/29/07 07:31,:
> jacob navia wrote:
>> [...]

>>If you add a number smaller than
>>this to 1.0, the result will be 1.0. For the different representations
>>we have in the standard header <float.h>:
>
>
> Finally, we get to an understandable definition: x is the
> FP epsilon if 1+x is the smallest representable number greater

> than 1 (when evaluated in the appropriate type). [...]

Now that I think of it, the two descriptions are
not the same. Mine is correct, as far as I know, but
Jacob's is subtly wrong. (Hint: Rounding modes.)

--
Eric....@sun.com

Eric Sosman

unread,

Jun 29, 2007, 10:39:02 AM6/29/07

to

Richard Tobin wrote On 06/29/07 08:53,:

> In article <p8ydnd7w3dsxYRnb...@comcast.com>,
> Eric Sosman <eso...@acm-dot-org.invalid> wrote:
>
>>(And in any case
>>it's wrong to speak of the exponent *value* being equal to the
>>bias or to the bias minus delta: the exponent is 1-delta and
>>the bias crops up in the *encoding* of that value.)
>
>
> Surely you mean 0-delta?

I don't think so. The fraction of a normalized, non-
zero, finite IEEE number has a value 0.5 <= f < 1, so
unity is represented as two to the first times one-half:
2^1 * .100...000(2). The unbiased exponent value in the
representation of unity is therefore one, not zero.

Of course, omitting any definition of "delta" leaves
quite a bit of wiggle room. ;-)

--
Eric....@sun.com

jacob navia

unread,

Jun 29, 2007, 10:47:09 AM6/29/07

to

Eric Sosman wrote:
> Richard Tobin wrote On 06/29/07 08:53,:
>> In article <p8ydnd7w3dsxYRnb...@comcast.com>,
>> Eric Sosman <eso...@acm-dot-org.invalid> wrote:
>>
>>> (And in any case
>>> it's wrong to speak of the exponent *value* being equal to the
>>> bias or to the bias minus delta: the exponent is 1-delta and
>>> the bias crops up in the *encoding* of that value.)
>>
>> Surely you mean 0-delta?
>
> I don't think so. The fraction of a normalized, non-
> zero, finite IEEE number has a value 0.5 <= f < 1, so
> unity is represented as two to the first times one-half:
> 2^1 * .100...000(2). The unbiased exponent value in the
> representation of unity is therefore one, not zero.
>

I think you forget the implicit bit Eric.

jacob navia

unread,

Jun 29, 2007, 10:56:05 AM6/29/07

to

The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is
representable in the given floating point type, b^(1-p)

Since we have 53 bits in the mantissa we have
2^(1-53)--> 2.2204460492503131E-16
as shown by my program!

BY THE WAY I added an exercise:

Exercise
--------

Explain why the last number printed is NOT DBL_EPSILON
but the number before.

Richard Tobin

unread,

Jun 29, 2007, 10:46:20 AM6/29/07

to

In article <1183127942.886274@news1nwk>,
Eric Sosman <Eric....@Sun.COM> wrote:

> I don't think so. The fraction of a normalized, non-
>zero, finite IEEE number has a value 0.5 <= f < 1, so
>unity is represented as two to the first times one-half:
>2^1 * .100...000(2). The unbiased exponent value in the
>representation of unity is therefore one, not zero.

I was interpreting the hidden bit as representing 1 so that with the
fraction bits the mantissa is 1.f, i.e. in the range [1,2), and
interpreting an exponent field with bits 0111... as representing 0.
This seems to be the way it's usually described in the web pages I
just searched. But it is equivalent to say that the mantissa is 0.1f
and the exponent is one greater.

jacob navia

unread,

Jun 29, 2007, 11:12:52 AM6/29/07

to

Richard Tobin wrote:
> In article <46850fb7$0$27382$ba4a...@news.orange.fr>,
> jacob navia <ja...@jacob.remcomp.fr> wrote:
>
>> That is what I am trying to say. The epsilon is
>> sign: 0
>> exponent: zero, i.e. the bias.
>> mantissa: 0000000000 0000000000 0000000000 0000000000 0000000000 01
>
> If you put those bits into a double, you don't get the epsilon (try
> it). You are writing it as a de-normalised representation, but if you
> put those bits in a double they would be interpreted as a normalized
> value, equal to 1+epsilon.
>
> The IEEE representation of DBL_EPSILON is
>
> sign bit: 0
> exponent bits: 01111001011 (representing -52)
> mantissa bits: 000.... (representing 1.0, because of the hidden bit)
>
> -- Richard

#include <stdio.h>
#include <float.h>
int main(void)
{
double d = DBL_EPSILON;
int *p = (int *)&d;

printf("%a\n",DBL_EPSILON);
printf("0x%x 0x%x\n",p[0],p[1]);
}

The output is
0x1.0000000000000p-052
0x0 0x3cb00000
3cb --> 971.
971 - 1023 --> -52
The rest is zero, since there is a hidden bit.

The effective representation of DBL_EPSILON
is then:
sign:0
exponent: 971 (-52 since 971-1023 is -52)
Bias: 1023
mantissa: Zero (since we have a hidden bit of zero)

jacob navia

unread,

Jun 29, 2007, 11:25:10 AM6/29/07

to

You will have to add the -ansic flag to your compilation flags

Eric Sosman

unread,

Jun 29, 2007, 11:38:47 AM6/29/07

to

jacob navia wrote On 06/29/07 10:56,:

> Eric Sosman wrote:
>
>>Eric Sosman wrote On 06/29/07 07:31,:
>>
>>>jacob navia wrote:
>>>
>>>>[...]
>>>>If you add a number smaller than
>>>>this to 1.0, the result will be 1.0. For the different representations
>>>>we have in the standard header <float.h>:
>>>
>>> Finally, we get to an understandable definition: x is the
>>>FP epsilon if 1+x is the smallest representable number greater
>>>than 1 (when evaluated in the appropriate type). [...]
>>
>> Now that I think of it, the two descriptions are
>>not the same. Mine is correct, as far as I know, but
>>Jacob's is subtly wrong. (Hint: Rounding modes.)
>>
>
>
> The standard says:
> 5.2.4.2.2:
> DBL_EPSILON
> the difference between 1 and the least value greater than 1 that is
> representable in the given floating point type, b^(1-p)

Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true. For example, on
the machine in front of me at the moment,

1.0 + DBL_EPSILON * 3 / 4

is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.

--
Eric....@sun.com

CBFalconer

unread,

Jun 29, 2007, 12:40:20 PM6/29/07

to

jacob navia wrote:
> Eric Sosman wrote:
>
... snip ...
>>
>> I don't think so. The fraction of a normalized, non-zero, finite

>> IEEE number has a value 0.5 <= f < 1, so unity is represented as
>> two to the first times one-half: 2^1 * .100...000(2). The
>> unbiased exponent value in the representation of unity is
>> therefore one, not zero.
>
> I think you forget the implicit bit Eric.

In some systems. Not necessarily C. Do try to stay on topic.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>
<http://www.aaxnet.com/editor/edit043.html>
cbfalconer at maineline dot net

--
Posted via a free Usenet account from http://www.teranews.com

jacob navia

unread,

Jun 29, 2007, 1:43:58 PM6/29/07

to

This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1

jacob navia

unread,

Jun 29, 2007, 1:44:39 PM6/29/07

to

CBFalconer wrote:
> jacob navia wrote:
>> Eric Sosman wrote:
>>
> ... snip ...
>>> I don't think so. The fraction of a normalized, non-zero, finite
>>> IEEE number has a value 0.5 <= f < 1, so unity is represented as
>>> two to the first times one-half: 2^1 * .100...000(2). The
>>> unbiased exponent value in the representation of unity is
>>> therefore one, not zero.
>> I think you forget the implicit bit Eric.
>
> In some systems. Not necessarily C. Do try to stay on topic.
>

The C standard assumes IEEE 754 representation Chuck.

Walter Roberson

unread,

Jun 29, 2007, 1:55:01 PM6/29/07

to

In article <46854549$0$25949$ba4a...@news.orange.fr>,
jacob navia <ja...@jacob.remcomp.fr> wrote:

>The C standard assumes IEEE 754 representation Chuck.

C89 doesn't.

--
There are some ideas so wrong that only a very intelligent person
could believe in them. -- George Orwell

Army1987

unread,

Jun 29, 2007, 2:01:49 PM6/29/07

to

"jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio news:46852496$0$27369$ba4a...@news.orange.fr...

Thanks. (Not that I might ever be going to do that, but if I can
disable that, it is no worse (at least under this aspect) than gcc
not allowing me to name a variable random if I don't use -ansi
because of the POSIX function named that way.)

Army1987

unread,

Jun 29, 2007, 2:07:54 PM6/29/07

to

"jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio news:46854549$0$25949$ba4a...@news.orange.fr...

It doesn't. It says that an implementation only can define
__STDC_IEC_559__ when it comforms to that standard. C doesn't even
require FLT_RADIX to be a power of 2.

(Hey, n1124.pdf has a blank between the two underscores...)

Army1987

unread,

Jun 29, 2007, 2:10:40 PM6/29/07

to

"jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio news:46854520$0$25949$ba4a...@news.orange.fr...

Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0.

jacob navia

unread,

Jun 29, 2007, 2:25:24 PM6/29/07

to

That's true!

I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4

CBFalconer

unread,

Jun 29, 2007, 2:12:02 PM6/29/07

to

jacob navia wrote:
> CBFalconer wrote:
>> jacob navia wrote:
>>

... snip ...

>>
>>> I think you forget the implicit bit Eric.
>>
>> In some systems. Not necessarily C. Do try to stay on topic.
>
> The C standard assumes IEEE 754 representation Chuck.

No it doesn't. It allows it.

Joe Wright

unread,

Jun 29, 2007, 2:51:10 PM6/29/07

to

I'm not sure whose point I'm taking but I offer the following:
Some time ago I wrote a utility, one of whose facilities is
to display binary representations of Floating Point numbers.
The following is the representation of DBL_EPSILON.

00111100 10110000 00000000 00000000 00000000 00000000 00000000 00000000
Exp = 971 (-51)
111 11001101
Man = .10000 00000000 00000000 00000000 00000000 00000000 00000000
2.2204460492503131e-16

Now this is DBL_EPSILON + 1.0

00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000001
Exp = 1023 (1)
000 00000001
Man = .10000 00000000 00000000 00000000 00000000 00000000 00000001
1.0000000000000002e+00

--
Joe Wright
"Everything should be made as simple as possible, but not simpler."
--- Albert Einstein ---

Army1987

unread,

Jun 29, 2007, 2:51:18 PM6/29/07

to

"jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio news:46854ed7$0$27395$ba4a...@news.orange.fr...

NO! The rounding mode needn't be to nearest!

#include <stdio.h>

int main(void)

{

volatile double oneplus = 2, epsilon = 1;

while (1 + epsilon/2 > 1) {

epsilon /= 2;

oneplus = 1 + epsilon;

}

epsilon = oneplus - 1;

printf("DBL_EPSILON is %g\n", epsilon);

return 0;

}

And i'm not very sure wheter it'd work if FLT_RADIX were not a power of 2.

Keith Thompson

unread,

Jun 29, 2007, 3:20:03 PM6/29/07

to

It most certainly does not, and it never has.

--
Keith Thompson (The_Other_Keith) ks...@mib.org <http://www.ghoti.net/~kst>
San Diego Supercomputer Center <*> <http://users.sdsc.edu/~kst>
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"

jacob navia

unread,

Jun 29, 2007, 3:23:43 PM6/29/07

to

Keith Thompson wrote:
> jacob navia <ja...@jacob.remcomp.fr> writes:
>> CBFalconer wrote:
>>> jacob navia wrote:
>>>> Eric Sosman wrote:
>>> ... snip ...
>>>>> I don't think so. The fraction of a normalized, non-zero, finite
>>>>> IEEE number has a value 0.5 <= f < 1, so unity is represented as
>>>>> two to the first times one-half: 2^1 * .100...000(2). The
>>>>> unbiased exponent value in the representation of unity is
>>>>> therefore one, not zero.
>>>> I think you forget the implicit bit Eric.
>>> In some systems. Not necessarily C. Do try to stay on topic.
>>>
>> The C standard assumes IEEE 754 representation Chuck.
>
> It most certainly does not, and it never has.
>

In my copy of the standard there is a lengthy
Annex F (normative) IEC 60559 floating-point arithmetic

This annex specifies C language support for the IEC 60559 floating-point
standard. The
IEC 60559 floating-point standard is specifically Binary floating-point
arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously
designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the
floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An
implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in this
annex. Where
a binding between the C language and IEC 60559 is indicated, the IEC
60559-specified
behavior is adopted by reference, unless stated otherwise.

So, obviously in some systems no standard floating
point will be used, but that should be extremely rare.

Keith Thompson

unread,

Jun 29, 2007, 3:45:29 PM6/29/07

to

jacob navia <ja...@jacob.remcomp.fr> writes:
> Army1987 wrote:
>> "jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio
>> news:46854520$0$25949$ba4a...@news.orange.fr...
>>> Eric Sosman wrote:
>>>> jacob navia wrote On 06/29/07 10:56,:

[...]

>>>>> The standard says:
>>>>> 5.2.4.2.2:
>>>>> DBL_EPSILON
>>>>> the difference between 1 and the least value greater than 1 that
>>>>> is representable in the given floating point type, b^(1-p)
>>>> Yes, but what you said in your tutorial was "If you add
>>>> a number smaller than this [epsilon] to 1.0, the result will
>>>> be 1.0," and that is not necessarily true. For example, on
>>>> the machine in front of me at the moment,
>>>>
>>>> 1.0 + DBL_EPSILON * 3 / 4
>>>>
>>>> is greater than one, even though `DBL_EPSILON * 3 / 4' is
>>>> 25% smaller than DBL_EPSILON.
>>>>
>>> This is because your machine makes calculations in extended
>>> precision since DBL_EPSILON must by definition be the smallest
>>> quantity where 1+DBL_EPSILON != 1
>> Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
>> 1.0 + DBL_EPSILON.
>> In C99 I would use nextafter(1.0, 2.0) - 1.0.
>
> That's true!
>
> I forgot about the rounding issue completely. The good "test" would be
> DBL_EPSILON*1/4

And what if inexact results are always rounded up? I don't believe
anything in the C standard disallows that.

Eric Sosman

unread,

Jun 29, 2007, 3:46:42 PM6/29/07

to

jacob navia wrote On 06/29/07 13:43,:

Wrong again, Jacob. Twice.

First, the machine I'm using does not even have an
extended precision, much less use one. Its double is
a plain vanilla IEEE 64-bit double, and that's that.
What it does have, though, in common with other IEEE
implementations, is the ability to deliver a correctly
rounded answer. 1+DBL_EPSILON*3/4 is (mathematically)
closer to 1+DBL_EPSILON than it is to 1, so the sum
rounds up rather than down. (Did you not read the hint
I wrote earlier, and that you have now quoted twice?
"Not," I guess.)

Second, you have again mis-stated what the Standard
says (even after quoting the Standard's own text). The
Standard does *not* say that epsilon is the smallest
value which when added to 1 yields a sum greater than 1;
it says that epsilon is the difference between 1 and the
next larger representable number. If you think the two
statements are equivalent, you have absolutely no call
to be writing a tutorial on floating-point arithmetic!

For IEEE double, there are about four and a half
thousand million million distinct values strictly less
than DBL_EPSILON which when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode, which is
the mode in effect when a C program starts.

--
Eric....@sun.com

Army1987

unread,

Jun 29, 2007, 3:49:18 PM6/29/07

to

"jacob navia" <ja...@jacob.remcomp.fr> ha scritto nel messaggio news:46855c82$0$27407$ba4a...@news.orange.fr...

> Keith Thompson wrote:
>> jacob navia <ja...@jacob.remcomp.fr> writes:
>>> CBFalconer wrote:
>>>> jacob navia wrote:
>>>>> Eric Sosman wrote:
>>>> ... snip ...
>>>>>> I don't think so. The fraction of a normalized, non-zero, finite
>>>>>> IEEE number has a value 0.5 <= f < 1, so unity is represented as
>>>>>> two to the first times one-half: 2^1 * .100...000(2). The
>>>>>> unbiased exponent value in the representation of unity is
>>>>>> therefore one, not zero.
>>>>> I think you forget the implicit bit Eric.
>>>> In some systems. Not necessarily C. Do try to stay on topic.
>>>>
>>> The C standard assumes IEEE 754 representation Chuck.
>>
>> It most certainly does not, and it never has.
>>
>
> In my copy of the standard there is a lengthy
> Annex F (normative) IEC 60559 floating-point arithmetic
>
> This annex specifies C language support for the IEC 60559 floating-point standard. The
> IEC 60559 floating-point standard is specifically Binary floating-point arithmetic for
> microprocessor systems, second edition (IEC 60559:1989), previously designated
> IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic

> (ANSI/IEEE 754?1985). IEEE Standard for Radix-Independent Floating-Point
> Arithmetic (ANSI/IEEE 854?1987) generalizes the binary standard to remove

> dependencies on radix and word length. IEC 60559 generally refers to the floating-point
> standard, as in IEC 60559 operation, IEC 60559 format, etc. An implementation that
> defines _ _STDC_IEC_559_ _ shall conform to the specifications in this annex. Where
> a binding between the C language and IEC 60559 is indicated, the IEC 60559-specified
> behavior is adopted by reference, unless stated otherwise.
>
> So, obviously in some systems no standard floating
> point will be used, but that should be extremely rare.

Read the first words of the last sentence you copynpasted.
The standard explicitly allows FLT_RADIX to be even a power of 10.
Read 5.2.4.2.2 throughout.

jacob navia

unread,

Jun 29, 2007, 5:07:05 PM6/29/07

to

Great Eric, unnormalized numbers exist.

What's your point?

Jacob is always wrong.

Granted.

I am always wrong. Happy?

Walter Roberson

unread,

Jun 29, 2007, 5:23:00 PM6/29/07

to

In article <468574bc$0$27414$ba4a...@news.orange.fr>,
jacob navia <ja...@jacob.remcomp.fr> wrote:
>Eric Sosman wrote:

>> For IEEE double, there are about four and a half
>> thousand million million distinct values strictly less
>> than DBL_EPSILON which when added to 1 will produce the
>> sum 1+DBL_EPSILON in "round to nearest" mode, which is
>> the mode in effect when a C program starts.

>Great Eric, unnormalized numbers exist.

>What's your point?

Eric isn't talking about unnormalized numbers.

Consider DBL_EPSILON . It is noticably less than 1/2 so in IEEE
754 format, it will be normalized as some negative exponent
followed by a hidden 1 followed by some 53 bit binary fraction.
Take that 53 bit binary fraction as an integer and subtract 1 from
it, and construct a double with the same exponent as DBL_EPSILON
but the reduced fraction. Call the result, say, NEARLY_DBL_EPSILON .
Now, take 1 + NEARLY_DBL_EPSILON . What is the result?
Algebraicly, it isn't quite 1 + DBL_EPSILON, but floating point
arithmetic doesn't obey normal algebra, so the result depends upon
the IEEE rounding mode in effect. If "round to nearest" (or
perhaps some other modes) is in effect, the result is close enough to
1 + DBL_EPSILON that the processor will round the result to
1 + DBL_EPSILON . C only promises accuracy to at best 1 ULP
("Unit in the Last Place"), so this isn't "wrong" (and what
exactly is right or wrong in such a case is arguable.)
--
"It is important to remember that when it comes to law, computers
never make copies, only human beings make copies. Computers are given
commands, not permission. Only people can be given permission."
-- Brad Templeton

Flash Gordon

unread,

Jun 29, 2007, 5:30:41 PM6/29/07

to

jacob navia wrote, On 29/06/07 22:07:

> Eric Sosman wrote:
>> jacob navia wrote On 06/29/07 13:43,:
>>> Eric Sosman wrote:

<snip>

>>>> 1.0 + DBL_EPSILON * 3 / 4
>>>>
>>>> is greater than one, even though `DBL_EPSILON * 3 / 4' is
>>>> 25% smaller than DBL_EPSILON.
>>>>
>>>
>>> This is because your machine makes calculations in extended
>>> precision since DBL_EPSILON must by definition be the smallest
>>> quantity where 1+DBL_EPSILON != 1
>>
>> Wrong again, Jacob. Twice.
>>
>> First, the machine I'm using does not even have an
>> extended precision, much less use one. Its double is

<snip>

>> For IEEE double, there are about four and a half
>> thousand million million distinct values strictly less
>> than DBL_EPSILON which when added to 1 will produce the
>> sum 1+DBL_EPSILON in "round to nearest" mode, which is
>> the mode in effect when a C program starts.
>
> Great Eric, unnormalized numbers exist.
>
> What's your point?

His point was that the information you were putting in your tutorial was
incorrect, and your refutation of the correction was incorrect.

> Jacob is always wrong.
>
> Granted.
>
> I am always wrong. Happy?

No. People would be happy if you would gracefully accept corrections.
Why do you think that if someone finds an error in what you post they
are out to get you?
--
Flash Gordon.

Mark McIntyre

unread,

Jun 29, 2007, 6:03:27 PM6/29/07

to

On Fri, 29 Jun 2007 10:40:58 +0000, in comp.lang.c , Richard
Heathfield <r...@see.sig.invalid> wrote:

>Richard Tobin said:
>
><snip>
>
>> You're not being reasonable.
>
>Shurely shome mishtake?

On your part, yes.

>> He's writing a C tutorial
>
>The internal evidence of his articles suggests otherwise.

And now you're being gratuitous too. Give Jacob his due - he's trying
to write a C tutorial, he's very sensibly asking for a review here,
and is prepared to take in comments even from people who he has had
run-ins with before. Stop being so childish.
--
Mark McIntyre

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it."
--Brian Kernighan

Eric Sosman

unread,

Jun 29, 2007, 6:28:16 PM6/29/07

to

jacob navia wrote On 06/29/07 17:07,:

Not in IEEE float or double (extended formats are
less tightly specified, and might allow unnormalized
numbers; I'm not sure). IEEE float and double have
"denormal numbers," which are a different matter.

And for what it's worth, none of the four and a
half thousand million million values I mentioned is a
denormal. All are in the approximate range 1.1e-16
through 2.2e-16, while the largest IEEE double denormal
is about 2.2e-308.

Are you sure you're the right person to write a
tutorial on floating-point numbers? The mistakes you
have made in this thread are of an elementary nature
uncharacteristic of someone even moderately familiar
with the topic. You might want to consider offering
your readers a URL to someone else's writeup instead,
lest you commit self-Schildtification.

> What's your point?
>
> Jacob is always wrong.
>
> Granted.
>
> I am always wrong. Happy?

That is not my point; "always" overstates the case.
My point is that Jacob, right or wrong, doesn't appear
to read with much care, even when what he's reading is
a response to his own request for assistance. "If you
see any errors/ambiguities/etc please just answer in
this thread" so Jacob can feel he's being persecuted.

--
Eric....@sun.com

Keith Thompson

unread,

Jun 29, 2007, 8:23:36 PM6/29/07

to

jacob navia <ja...@jacob.remcomp.fr> writes:
> Keith Thompson wrote:
>> jacob navia <ja...@jacob.remcomp.fr> writes:

[...]

>>> The C standard assumes IEEE 754 representation Chuck.
>> It most certainly does not, and it never has.
>
> In my copy of the standard there is a lengthy
> Annex F (normative) IEC 60559 floating-point arithmetic
>
> This annex specifies C language support for the IEC 60559
> floating-point standard. The IEC 60559 floating-point standard is
> specifically Binary floating-point arithmetic for microprocessor
> systems, second edition (IEC 60559:1989), previously designated IEC
> 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
> (ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent
> Floating-Point Arithmetic (ANSI/IEEE 854−1987) generalizes the
> binary standard to remove dependencies on radix and word length. IEC
> 60559 generally refers to the floating-point standard, as in IEC
> 60559 operation, IEC 60559 format, etc. An implementation that
> defines _ _STDC_IEC_559_ _ shall conform to the specifications in
> this annex. Where a binding between the C language and IEC 60559 is
> indicated, the IEC 60559-specified behavior is adopted by reference,
> unless stated otherwise.

Right. The relevant sentence is:

An implementation that defines __STDC_IEC_559__ shall conform to

the specifications in this annex.

See also C99 6.18.8p2, "Predefined macro names":

The following macro names are conditionally defined by the
implementation:

__STDC_IEC_559__ The integer constant 1, intended to indicate
conformance to the specifications in annex F (IEC 60559
floating-point arithmetic).

[...]

An implementation is not required to conform to annex F. It's merely
required to do so *if* it defines __STDC_IEC_559__.

> So, obviously in some systems no standard floating
> point will be used, but that should be extremely rare.

Why should they be rare? Most new systems these days do implement
IEEE floating-point, or at least use the formats, but there are still
systems that don't (VAX, some older Crays, IBM mainframes). Annex F
merely provides a framework for implementations that do support IEEE
FP, but it's explicitly optional.

I can also imagine an implementation that uses the IEEE floating-point
formats, but doesn't meet the requirements of Annex F; such an
implementation could not legally define __STDC_IEC_559__. (I have no
idea whether such implementations exist, or how common they are.)

And of course the __STDC_IEC_559__ and annex F are new in C99, so they
don't apply to the majority of implementations that don't conform to
the C99 standard.

Keith Thompson

unread,

Jun 29, 2007, 8:30:31 PM6/29/07

to

Here's *my* floating-point tutorial:

Read "What every computer scientist should know about floating-point
arithmetic", by David Goldberg.

jacob navia

unread,

Jun 30, 2007, 5:08:44 AM6/30/07

to

This program produces in my machine (x86)
DBL_EPSILON is 0

jacob navia

unread,

Jun 30, 2007, 5:35:12 AM6/30/07

to

Army1987 wrote:
>> I forgot about the rounding issue completely. The good "test" would be
>> DBL_EPSILON*1/4
>
> NO! The rounding mode needn't be to nearest!
>

According to the C standard:
(Annex F.7.3)

At program startup the floating-point environment is initialized as
prescribed by IEC 60559:
— All floating-point exception status flags are cleared.

— The rounding direction mode is rounding to nearest. (!!!!!)

— The dynamic rounding precision mode (if supported) is set so that
results are not shortened.

Harald van Dĳk

unread,

Jun 30, 2007, 5:50:23 AM6/30/07

to

As has been pointed out to you, annex F does not apply unless
__STDC_IEC_559__ is defined by the implementation.

JT

unread,

Jun 30, 2007, 6:00:12 AM6/30/07

to

Army1987 wrote:
> NO! The rounding mode needn't be to nearest!

jacob navia <j...@jacob.remcomp.fr> wrote:
> According to the C standard:
> (Annex F.7.3)
>
> At program startup the floating-point environment
> is initialized as prescribed by IEC 60559:

> - All floating-point exception status flags are cleared.
> - The rounding direction mode is rounding to nearest. (!!!!!)

To Jacob: Eric Sosman already SAID THAT on June 29
that "rounding to nearest" is the start-up default.
But the program can change the default after start up.

So the FLAWED PARAPHRASE you keep insisting is not
a universal fact. You should use the OFFICIAL DEFINITION
of epsilon, rather than the FLAWED paraphrase
you KEEP insisting.

On Jun 29, 7:46 pm, Eric Sosman wrote:
> Wrong again, Jacob. Twice.
> First...
> Second...

> there are about four and a half
> thousand million million distinct values
> strictly less than DBL_EPSILON which
> when added to 1 will produce the
> sum 1+DBL_EPSILON in "round to nearest" mode,
> which is the mode in effect when a C program starts.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^