Continuing with my tutorial, here is an entry I have added recently. I 
hope it is not controversial. If you see any errors/ambiguities/etc please
just answer in this thread.
Thanks in advance for your help.
----------------------------------------------------------------------
The machine epsilon
The machine epsilon is the smallest number that changes the result of an 
addition operation at the point where the representation of the numbers 
is the densest. In IEEE754 representation this number has an exponent 
value of the bias, and a fraction of 1. If you add a number smaller than 
this to 1.0, the result will be 1.0. For the different representations 
we have in the standard header <float.h>:
#define FLT_EPSILON 1.19209290e-07F // float
#define DBL_EPSILON 2.2204460492503131e-16 // double
#define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
// qfloat epsilon truncated so that it fits in this page...
#define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
This defines are part of the C99 ANSI standard. For the standard types 
(float, double and long double) this defines should always exist in 
other compilers.
Here is a program that will find out the machine epsilon for a given 
floating point representation.
#include <stdio.h>
int main(void)
{
        double float_radix=2.0;
        double inverse_radix = 1.0/float_radix;
        double machine_precision = 1.0;
        double temp = 1.0 + machine_precision;
        while (temp != 1.0) {
               machine_precision *= inverse_radix;
               temp = 1.0 + machine_precision ;
               printf("%.17g\n",machine_precision);
        }
        return 0;
}
<snip>
> For the different
> representations we have in the standard header <float.h>:
> 
> #define FLT_EPSILON 1.19209290e-07F // float
> #define DBL_EPSILON 2.2204460492503131e-16 // double
> #define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
> // qfloat epsilon truncated so that it fits in this page...
> #define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
Conforming implementations must not define QFLT_EPSILON in <float.h>
> This defines are part of the C99 ANSI standard.
Well, three of them are.
Your text suffers from your usual confusion between lcc-win32 and C.
<snip>
-- 
Richard Heathfield <http://www.cpax.org.uk>
Email: -www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
The C standard paragraph J.5.6: Common extensions:
J.5.6 Other arithmetic types
Additional arithmetic types, such as _ _int128 or double double, and
their appropriate conversions are defined (6.2.5, 6.3.1). Additional
floating types may have more range or precision than long double, may be
  used for evaluating expressions of other floating types, and may be
used to define float_t or double_t.
> 
>> This defines are part of the C99 ANSI standard.
> 
> Well, three of them are.
> 
You cut the next sentence!
"For the standard types (float, double and long double) this defines 
should always exist in other compilers. "
This is a good example of BIAS when quoting.
>>> This defines are part of the C99 ANSI standard.
Presumably you meant "these", not "this".  And it would be less jargonish
to say "definitions" rather than "defines".
>> Well, three of them are.
>> 
>
>You cut the next sentence!
>
>"For the standard types (float, double and long double) this defines 
>should always exist in other compilers. "
>
>This is a good example of BIAS when quoting.
I don't think so.  There's no need to say something that's wrong here,
even if you clarify it in the next sentence.
-- Richard
-- 
"Consideration shall be given to the need for as many as 32 characters
in some alphabets" - X3.4, 1963.
OK. Now it is:
These definitions (except the qfloat part) are part of the C99 ANSI 
standard. For the standard types (float, double and long double) they
should always exist in other compilers. The type qfloat is an extension
of lcc-win32.
> Richard Heathfield wrote:
>> jacob navia said:
>> 
>> <snip>
>> 
>>> For the different
>>> representations we have in the standard header <float.h>:
>>>
>>> #define FLT_EPSILON 1.19209290e-07F // float
>>> #define DBL_EPSILON 2.2204460492503131e-16 // double
>>> #define LDBL_EPSILON 1.084202172485504434007452e-19L //long double
>>> // qfloat epsilon truncated so that it fits in this page...
>>> #define QFLT_EPSILON 1.09003771904865842969737513593110651 ... E-106
>> 
>> Conforming implementations must not define QFLT_EPSILON in <float.h>
> 
> The C standard paragraph J.5.6: Common extensions:
> J.5.6 Other arithmetic types
> Additional arithmetic types, such as _ _int128 or double double, and
> their appropriate conversions are defined (6.2.5, 6.3.1). Additional
> floating types may have more range or precision than long double, may
> be
>   used for evaluating expressions of other floating types, and may be
> used to define float_t or double_t.
What has that to do with what I said? I didn't say that implementations 
can't provide extra types. I said a conforming implementation must not 
define QFLT_EPSILON in <float.h> - that is not the same as saying that 
implementations cannot provide extra types.
>>> This defines are part of the C99 ANSI standard.
>> 
>> Well, three of them are.
>> 
> 
> You cut the next sentence!
Yes. It was not relevant to my point, which is that QFLT_EPSILON is not 
part of the C Standard.
> "For the standard types (float, double and long double) this defines
> should always exist in other compilers. "
> 
> This is a good example of BIAS when quoting.
No, it isn't. Your statement incorrectly claimed that QFLT_EPSILON was 
part of the C99 Standard. The fact that it was followed by another 
statement which didn't reiterate the claim does not make your original 
claim correct.
<snip>
 
> OK. Now it is:
> 
> These definitions (except the qfloat part) are part of the C99 ANSI
> standard. For the standard types (float, double and long double) they
> should always exist in other compilers. The type qfloat is an
> extension of lcc-win32.
Why mention lcc-win32 extensions in a C tutorial?
And if it's an lcc-win32 tutorial, why ask for comment in a C newsgroup? 
Is there not an lcc-win32 newsgroup where you can discuss lcc-win32 
extensions?
>Why mention lcc-win32 extensions in a C tutorial?
>
>And if it's an lcc-win32 tutorial, why ask for comment in a C newsgroup? 
>Is there not an lcc-win32 newsgroup where you can discuss lcc-win32 
>extensions?
You're not being reasonable.  He's writing a C tutorial intended for
lcc-win32 users.  Since most of it is standard C, I see no reason why
he shouldn't post about it here.
<snip>
 
> You're not being reasonable.
Shurely shome mishtake?
> He's writing a C tutorial
The internal evidence of his articles suggests otherwise.
<snip>
     "The densest" grates, first because you don't define what
"dense" means, and second because under the only reasonable
definition I can think of the statement is false.  (Distinct
FP values are "denser" at smaller magnitudes, and FP numbers
support magnitudes considerably smaller than unity.)
> In IEEE754 representation this number has an exponent 
> value of the bias, and a fraction of 1.
     If by "the bias" you mean the traditional offset in the
encoding of the exponent, then I think this statement is wrong.
An FP epsilon has to do with the precision of the fraction,
not with the span of possible exponents.
> If you add a number smaller than 
> this to 1.0, the result will be 1.0. For the different representations 
> we have in the standard header <float.h>:
     Finally, we get to an understandable definition: x is the
FP epsilon if 1+x is the smallest representable number greater
than 1 (when evaluated in the appropriate type).  Why not just
start with this "practitioner's definition" instead of with the
ambiguities and misstatements?  In addition to the virtue of
being a correct definition, it also captures the notion of the
"graininess" of FP numbers, and hence the reason why someone
would be interested in the value of epsilon.
-- 
Eric Sosman
eso...@acm-dot-org.invalid
>> In IEEE754 representation this number has an exponent 
>> value of the bias, and a fraction of 1.
>     If by "the bias" you mean the traditional offset in the
>encoding of the exponent, then I think this statement is wrong.
>An FP epsilon has to do with the precision of the fraction,
>not with the span of possible exponents.
The epsilon is such that if it were de-normalised to have the same
exponent as 1.0, only the lowest bit of the fraction would be set.  To
normalise it, you need to shift that lowest bit up into the hidden
bit, and adjust the exponent accordingly.  For 1.0, the exponent is
equal to the bias.  So the epsilon has an exponent equal to the bias
minus the number of fraction bits, and the fraction part is zero
(because of the hidden bit).
     Right: Not equal to the bias, as I said.  (And in any case
it's wrong to speak of the exponent *value* being equal to the
bias or to the bias minus delta: the exponent is 1-delta and
the bias crops up in the *encoding* of that value.)
-- 
Eric Sosman
eso...@acm-dot-org.invalid
Surely you mean 0-delta?
It does seem to be common to call the encoded value the exponent; I
suppose the context usually makes it clear.  But I agree that it's
inaccurate - I should have said the "exponent bits" or the "encoded
exponent" or something like that.
That is what I am trying to say. The epsilon is
sign: 0
exponent: zero, i.e. the bias.
mantissa: 0000000000 0000000000 0000000000 0000000000 0000000000 01
           <--10-->
Using printf:
printf("%a",DBL_EPSILON); 0x1.0000000000000p-052
This is wrong, excuse me. The mantissa is normalized, and the bits are:
100000000 etc
The value is 2^-52--> 2.2204460492503130808472633361816e-16
>That is what I am trying to say. The epsilon is
>sign: 0
>exponent: zero, i.e. the bias.
>mantissa: 0000000000 0000000000 0000000000 0000000000 0000000000 01
If you put those bits into a double, you don't get the epsilon (try
it).  You are writing it as a de-normalised representation, but if you
put those bits in a double they would be interpreted as a normalized
value, equal to 1+epsilon.
The IEEE representation of DBL_EPSILON is
sign bit: 0
exponent bits: 01111001011  (representing -52)
mantissa bits: 000....      (representing 1.0, because of the hidden bit)
    Now that I think of it, the two descriptions are
not the same.  Mine is correct, as far as I know, but
Jacob's is subtly wrong.  (Hint: Rounding modes.)
    I don't think so.  The fraction of a normalized, non-
zero, finite IEEE number has a value 0.5 <= f < 1, so
unity is represented as two to the first times one-half:
2^1 * .100...000(2).  The unbiased exponent value in the
representation of unity is therefore one, not zero.
    Of course, omitting any definition of "delta" leaves
quite a bit of wiggle room. ;-)
I think you forget the implicit bit Eric.
The standard says:
5.2.4.2.2:
DBL_EPSILON
the difference between 1 and the least value greater than 1 that is 
representable in the given floating point type, b^(1-p)
Since we have 53 bits in the mantissa we have
2^(1-53)--> 2.2204460492503131E-16
as shown by my program!
BY THE WAY I added an exercise:
Exercise
--------
Explain why the last number printed is NOT DBL_EPSILON
but the number before.
>    I don't think so.  The fraction of a normalized, non-
>zero, finite IEEE number has a value 0.5 <= f < 1, so
>unity is represented as two to the first times one-half:
>2^1 * .100...000(2).  The unbiased exponent value in the
>representation of unity is therefore one, not zero.
I was interpreting the hidden bit as representing 1 so that with the
fraction bits the mantissa is 1.f, i.e. in the range [1,2), and
interpreting an exponent field with bits 0111... as representing 0.
This seems to be the way it's usually described in the web pages I
just searched.  But it is equivalent to say that the mantissa is 0.1f
and the exponent is one greater.
         printf("%a\n",DBL_EPSILON);
         printf("0x%x 0x%x\n",p[0],p[1]);
}
The output is
0x1.0000000000000p-052
0x0 0x3cb00000
3cb --> 971.
971 - 1023 --> -52
The rest is zero, since there is a hidden bit.
The effective representation of DBL_EPSILON
is then:
sign:0
exponent: 971 (-52 since 971-1023 is -52)
Bias: 1023
mantissa: Zero (since we have a hidden bit of zero)
You will have to add the -ansic flag to your compilation flags
    Yes, but what you said in your tutorial was "If you add
a number smaller than this [epsilon] to 1.0, the result will
be 1.0," and that is not necessarily true.  For example, on
the machine in front of me at the moment,
1.0 + DBL_EPSILON * 3 / 4
is greater than one, even though `DBL_EPSILON * 3 / 4' is
25% smaller than DBL_EPSILON.
In some systems. Not necessarily C. Do try to stay on topic.
-- 
 <http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
 <http://www.securityfocus.com/columnists/423>
 <http://www.aaxnet.com/editor/edit043.html>
                        cbfalconer at maineline dot net
-- 
Posted via a free Usenet account from http://www.teranews.com
This is because your machine makes calculations in extended
precision since DBL_EPSILON must by definition be the smallest
quantity where 1+DBL_EPSILON != 1
>The C standard assumes IEEE 754 representation Chuck.
C89 doesn't.
-- 
  There are some ideas so wrong that only a very intelligent person
  could believe in them.                              -- George Orwell
It doesn't. It says that an implementation only can define
__STDC_IEC_559__ when it comforms to that standard. C doesn't even
require FLT_RADIX to be a power of 2.
(Hey, n1124.pdf has a blank between the two underscores...)
Or simplily 1.0 + DBL_EPSILON * 3 / 4 gets rounded up to
1.0 + DBL_EPSILON.
In C99 I would use nextafter(1.0, 2.0) - 1.0. 
That's true!
I forgot about the rounding issue completely. The good "test" would be
DBL_EPSILON*1/4
No it doesn't. It allows it.
I'm not sure whose point I'm taking but I offer the following:
Some time ago I wrote a utility, one of whose facilities is
to display binary representations of Floating Point numbers.
The following is the representation of DBL_EPSILON.
00111100 10110000 00000000 00000000 00000000 00000000 00000000 00000000
Exp = 971 (-51)
  111 11001101
Man =      .10000 00000000 00000000 00000000 00000000 00000000 00000000
2.2204460492503131e-16
Now this is DBL_EPSILON + 1.0
00111111 11110000 00000000 00000000 00000000 00000000 00000000 00000001
Exp = 1023 (1)
  000 00000001
Man =      .10000 00000000 00000000 00000000 00000000 00000000 00000001
1.0000000000000002e+00
-- 
Joe Wright
"Everything should be made as simple as possible, but not simpler."
                     --- Albert Einstein ---
NO! The rounding mode needn't be to nearest!
#include <stdio.h>
int main(void)
{
volatile double oneplus = 2, epsilon = 1;
while (1 + epsilon/2 > 1) {
epsilon /= 2;
oneplus = 1 + epsilon;
}
epsilon = oneplus - 1;
printf("DBL_EPSILON is %g\n", epsilon);
return 0;
}
And i'm not very sure wheter it'd work if FLT_RADIX were not a power of 2.
It most certainly does not, and it never has.
-- 
Keith Thompson (The_Other_Keith) ks...@mib.org  <http://www.ghoti.net/~kst>
San Diego Supercomputer Center             <*>  <http://users.sdsc.edu/~kst>
"We must do something.  This is something.  Therefore, we must do this."
    -- Antony Jay and Jonathan Lynn, "Yes Minister"
In my copy of the standard there is a lengthy
Annex F (normative) IEC 60559 floating-point arithmetic
This annex specifies C language support for the IEC 60559 floating-point 
standard. The
IEC 60559 floating-point standard is specifically Binary floating-point 
arithmetic for
microprocessor systems, second edition (IEC 60559:1989), previously 
designated
IEC 559:1989 and as IEEE Standard for Binary Floating-Point Arithmetic
(ANSI/IEEE 754−1985). IEEE Standard for Radix-Independent Floating-Point
Arithmetic (ANSI/IEEE 854−1987) generalizes the binary standard to remove
dependencies on radix and word length. IEC 60559 generally refers to the 
floating-point
standard, as in IEC 60559 operation, IEC 60559 format, etc. An 
implementation that
defines _ _STDC_IEC_559_ _ shall conform to the specifications in this 
annex. Where
a binding between the C language and IEC 60559 is indicated, the IEC 
60559-specified
behavior is adopted by reference, unless stated otherwise.
So, obviously in some systems no standard floating
point will be used, but that should be extremely rare.
And what if inexact results are always rounded up?  I don't believe
anything in the C standard disallows that.
Wrong again, Jacob. Twice.
    First, the machine I'm using does not even have an
extended precision, much less use one.  Its double is
a plain vanilla IEEE 64-bit double, and that's that.
What it does have, though, in common with other IEEE
implementations, is the ability to deliver a correctly
rounded answer.  1+DBL_EPSILON*3/4 is (mathematically)
closer to 1+DBL_EPSILON than it is to 1, so the sum
rounds up rather than down.  (Did you not read the hint
I wrote earlier, and that you have now quoted twice?
"Not," I guess.)
    Second, you have again mis-stated what the Standard
says (even after quoting the Standard's own text). The
Standard does *not* say that epsilon is the smallest
value which when added to 1 yields a sum greater than 1;
it says that epsilon is the difference between 1 and the
next larger representable number.  If you think the two
statements are equivalent, you have absolutely no call
to be writing a tutorial on floating-point arithmetic!
    For IEEE double, there are about four and a half
thousand million million distinct values strictly less
than DBL_EPSILON which when added to 1 will produce the
sum 1+DBL_EPSILON in "round to nearest" mode, which is
the mode in effect when a C program starts.
Great Eric, unnormalized numbers exist.
What's your point?
Jacob is always wrong.
Granted.
I am always wrong. Happy?
>>     For IEEE double, there are about four and a half
>> thousand million million distinct values strictly less
>> than DBL_EPSILON which when added to 1 will produce the
>> sum 1+DBL_EPSILON in "round to nearest" mode, which is
>> the mode in effect when a C program starts.
>Great Eric, unnormalized numbers exist.
>What's your point?
Eric isn't talking about unnormalized numbers.
Consider DBL_EPSILON . It is noticably less than 1/2 so in IEEE
754 format, it will be normalized as some negative exponent
followed by a hidden 1 followed by some 53 bit binary fraction.
Take that 53 bit binary fraction as an integer and subtract 1 from
it, and construct a double with the same exponent as DBL_EPSILON
but the reduced fraction. Call the result, say, NEARLY_DBL_EPSILON . 
Now, take 1 + NEARLY_DBL_EPSILON . What is the result?
Algebraicly, it isn't quite 1 + DBL_EPSILON, but floating point
arithmetic doesn't obey normal algebra, so the result depends upon
the IEEE rounding mode in effect. If "round to nearest" (or
perhaps some other modes) is in effect, the result is close enough to
1 + DBL_EPSILON that the processor will round the result to
1 + DBL_EPSILON . C only promises accuracy to at best 1 ULP
("Unit in the Last Place"), so this isn't "wrong" (and what
exactly is right or wrong in such a case is arguable.)
-- 
  "It is important to remember that when it comes to law, computers
  never make copies, only human beings make copies.  Computers are given
  commands, not permission. Only people can be given permission."
                                               -- Brad Templeton
<snip>
>>>>     1.0 + DBL_EPSILON * 3 / 4
>>>>
>>>> is greater than one, even though `DBL_EPSILON * 3 / 4' is
>>>> 25% smaller than DBL_EPSILON.
>>>>
>>>
>>> This is because your machine makes calculations in extended
>>> precision since DBL_EPSILON must by definition be the smallest
>>> quantity where 1+DBL_EPSILON != 1
>>
>>     Wrong again, Jacob.  Twice.
>>
>>     First, the machine I'm using does not even have an
>> extended precision, much less use one.  Its double is
<snip>
>>     For IEEE double, there are about four and a half
>> thousand million million distinct values strictly less
>> than DBL_EPSILON which when added to 1 will produce the
>> sum 1+DBL_EPSILON in "round to nearest" mode, which is
>> the mode in effect when a C program starts.
> 
> Great Eric, unnormalized numbers exist.
> 
> What's your point?
His point was that the information you were putting in your tutorial was 
incorrect, and your refutation of the correction was incorrect.
> Jacob is always wrong.
> 
> Granted.
> 
> I am always wrong. Happy?
No. People would be happy if you would gracefully accept corrections. 
Why do you think that if someone finds an error in what you post they 
are out to get you?
-- 
Flash Gordon.
>Richard Tobin said:
>
><snip>
> 
>> You're not being reasonable.
>
>Shurely shome mishtake?
On your part, yes.
>> He's writing a C tutorial
>
>The internal evidence of his articles suggests otherwise.
And now you're being gratuitous too. Give Jacob his due - he's trying
to write a C tutorial, he's very sensibly asking for a review here,
and is prepared to take in comments even from people who he has had
run-ins with before. Stop being so childish. 
-- 
Mark McIntyre
"Debugging is twice as hard as writing the code in the first place. 
 Therefore, if you write the code as cleverly as possible, you are, 
 by definition, not smart enough to debug it."
--Brian Kernighan
    Not in IEEE float or double (extended formats are
less tightly specified, and might allow unnormalized
numbers; I'm not sure).  IEEE float and double have
"denormal numbers," which are a different matter.
    And for what it's worth, none of the four and a
half thousand million million values I mentioned is a
denormal.  All are in the approximate range 1.1e-16
through 2.2e-16, while the largest IEEE double denormal
is about 2.2e-308.
    Are you sure you're the right person to write a
tutorial on floating-point numbers?  The mistakes you
have made in this thread are of an elementary nature
uncharacteristic of someone even moderately familiar
with the topic.  You might want to consider offering
your readers a URL to someone else's writeup instead,
lest you commit self-Schildtification.
> What's your point?
> 
> Jacob is always wrong.
> 
> Granted.
> 
> I am always wrong. Happy?
    That is not my point; "always" overstates the case.
My point is that Jacob, right or wrong, doesn't appear
to read with much care, even when what he's reading is
a response to his own request for assistance.  "If you
see any errors/ambiguities/etc please just answer in
this thread" so Jacob can feel he's being persecuted.
Right. The relevant sentence is:
An implementation that defines __STDC_IEC_559__ shall conform to
    the specifications in this annex.
See also C99 6.18.8p2, "Predefined macro names":
    The following macro names are conditionally defined by the
    implementation:
    __STDC_IEC_559__ The integer constant 1, intended to indicate
    conformance to the specifications in annex F (IEC 60559
    floating-point arithmetic).
[...]
An implementation is not required to conform to annex F.  It's merely
required to do so *if* it defines __STDC_IEC_559__.
> So, obviously in some systems no standard floating
> point will be used, but that should be extremely rare.
Why should they be rare?  Most new systems these days do implement
IEEE floating-point, or at least use the formats, but there are still
systems that don't (VAX, some older Crays, IBM mainframes).  Annex F
merely provides a framework for implementations that do support IEEE
FP, but it's explicitly optional.
I can also imagine an implementation that uses the IEEE floating-point
formats, but doesn't meet the requirements of Annex F; such an
implementation could not legally define __STDC_IEC_559__.  (I have no
idea whether such implementations exist, or how common they are.)
And of course the __STDC_IEC_559__ and annex F are new in C99, so they
don't apply to the majority of implementations that don't conform to
the C99 standard.
Read "What every computer scientist should know about floating-point
arithmetic", by David Goldberg.
This program produces in my machine (x86)
DBL_EPSILON is 0
According to the C standard:
(Annex F.7.3)
At program startup the floating-point environment is initialized as 
prescribed by IEC 60559:
— All floating-point exception status flags are cleared.
— The rounding direction mode is rounding to nearest. (!!!!!)
— The dynamic rounding precision mode (if supported) is set so that 
results are not shortened.
As has been pointed out to you, annex F does not apply unless
__STDC_IEC_559__ is defined by the implementation.
jacob navia <j...@jacob.remcomp.fr> wrote:
> According to the C standard:
> (Annex F.7.3)
>
> At program startup the floating-point environment
> is initialized as prescribed by IEC 60559:
> - All floating-point exception status flags are cleared.
> - The rounding direction mode is rounding to nearest. (!!!!!)
To Jacob:  Eric Sosman already SAID THAT on June 29
that "rounding to nearest" is the start-up default.
But the program can change the default after start up.
So the FLAWED PARAPHRASE you keep insisting is not
a universal fact. You should use the OFFICIAL DEFINITION
of epsilon, rather than the FLAWED paraphrase
you KEEP insisting.
On Jun 29, 7:46 pm, Eric Sosman wrote:
> Wrong again, Jacob.  Twice.
> First...
> Second...
> there are about four and a half
> thousand million million distinct values
> strictly less than DBL_EPSILON which
> when added to 1 will produce the
> sum 1+DBL_EPSILON in "round to nearest" mode,
> which is the mode in effect when a C program starts.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^