Comparison to std::numeric_limits<double>::max()

Andrea Venturoli

unread,

Apr 27, 2018, 2:44:47 AM4/27/18

to

Hello.

I'm aware of all the quirks with FP math.
However I thought comparing std::numeric_limits<double>::max() to
std::numeric_limits<double>::max() should yield a deterministic result.
Am I correct?

I've got code that works when compiled without optimizations, but fails
with clang 4.0's "-O" option.

By eliminating "if"s and irrelevant variables, it can be reduced to
something like the following:

> #include <limits>
> #include <iostream>
>
> double f() {return std::numeric_limits<double>::max();}
> double g() {return f();}
>
> int main(int,char**) {
> double A=g();
> if (A==std::numeric_limits<double>::max())
> std::cout<<true<<std::endl;
> else
> std::cout<<false<<std::endl;
> }

This snippet in fact works up to -O3, always outputting 1.
The original, more complex, code, however, only works without optimizations.

Before I start trying different compilers (or versions), is my original
assumption true or wrong?

bye & Thanks
av.

Alf P. Steinbach

unread,

Apr 27, 2018, 3:36:27 AM4/27/18

to

I think the shown code should work, as it appears you say that it does,
but generally, for /arithmetic results/ it depends on the platform.

On the PC the original 1980's (actually 1979) architecture was a main
processor with a separate math co-processor. The co-processor calculated
internally with 80-bit floating point, for precision, while the main
program used 64 bit floating point, presumably to save memory (as a
16-bit processor it couldn't fit those values into registers anyway).
And the PC architecture's evolution is a study in backwards
compatibility: AFAIK to this day it works /as if/ it were like that.

This means that depending on the optimization level some operations can
be done in 80-bit format, some as-if in 64-bit format (when the result
is converted back down), and not always with exactly the same result at
the end of a sequence of operations even when you start with apparently
the same values.

Cheers & hth.,

- Alf

Andrea Venturoli

unread,

Apr 27, 2018, 5:48:21 AM4/27/18

to

On 04/27/18 09:36, Alf P. Steinbach wrote:

> I think the shown code should work

No certainity, though :(

> On the PC the original 1980's (actually 1979) architecture was a main
> processor with a separate math co-processor. The co-processor calculated
> internally with 80-bit floating point, for precision, while the main
> program used 64 bit floating point, presumably to save memory (as a
> 16-bit processor it couldn't fit those values into registers anyway).
> And the PC architecture's evolution is a study in backwards
> compatibility: AFAIK to this day it works /as if/ it were like that.

Right, I studied this several years ago. However I'm not sure this still
holds today, with all other instruction sets (e.g. SSE) possibly being
used by the compiler.

> This means that depending on the optimization level some operations can
> be done in 80-bit format, some as-if in 64-bit format (when the result
> is converted back down), and not always with exactly the same result at
> the end of a sequence of operations even when you start with apparently
> the same values.

This shouldn't be the case, though.
We start with a 64b value (std::numeric_limits<double>::max()) and we
compare it to a 64b value without any intermediate math.
Even if it was converted to 80b and back it shouldn't lose precision,
should it?

I added:
>std::cout<<std::setprecision(20)<<std::fixed<<A<<"\n"<<std::numeric_limits<double>::max()<<std::endl;
and get:
> 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00000000000000000000
> 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00000000000000000000

The two numbers are the same to the last digit.

If the problem was an 80b intermediate format, the only explanation
would be that, when converted to 80b values, the least significant bit
are not zeroed, but left undefined.
This would surprise me a lot.

bye & Thanks
av.

Paavo Helde

unread,

Apr 27, 2018, 6:00:43 AM4/27/18

to

On 27.04.2018 9:44, Andrea Venturoli wrote:
> Hello.
>
> I'm aware of all the quirks with FP math.
> However I thought comparing std::numeric_limits<double>::max() to
> std::numeric_limits<double>::max() should yield a deterministic result.
> Am I correct?
>
>
>
> I've got code that works when compiled without optimizations, but fails
> with clang 4.0's "-O" option.
>
> By eliminating "if"s and irrelevant variables, it can be reduced to
> something like the following:
>
>> #include <limits>
>> #include <iostream>
>>
>> double f() {return std::numeric_limits<double>::max();}
>> double g() {return f();}
>>
>> int main(int,char**) {
>> double A=g();
>> if (A==std::numeric_limits<double>::max())
>> std::cout<<true<<std::endl;
>> else
>> std::cout<<false<<std::endl;
>> }
>
> This snippet in fact works up to -O3, always outputting 1.
> The original, more complex, code, however, only works without
> optimizations.

The C++ standard says that if a value is exactly representable in the
destination floating-point type then it must be stored exactly. I
believe this means that if you store an integer 0 or 1 to an IEEE double
you can compare directly with 0 or 1, because all 32-bit integers are
exactly representable in an IEEE double.

Of course, any computations will ruin that result, e.g.

double x = 1;
assert(x==1); // guaranteed with IEEE floating-point
x += 0.0;
assert(x==1); // not guaranteed any more as far as I understand

One can argue that std::numeric_limits<double>::max() ought to be
exactly representable in double, by definition, and the above
considerations should hold. Nevertheless, comparing with
std::numeric_limits<double>::max() seems pretty fragile, I would use
std::numeric_limits<double>::quiet_NaN() instead if I needed a special
double value (together with a
static_assert(std::numeric_limits<double>::has_quiet_NaN)).

Andrea Venturoli

unread,

Apr 27, 2018, 6:22:57 AM4/27/18

to

On 04/27/18 12:00, Paavo Helde wrote:

Thanks for your answer.

> One can argue that std::numeric_limits<double>::max() ought to be
> exactly representable in double, by definition, and the above
> considerations should hold.

Do you think this is a bug in Clang/LLVM then?

> Nevertheless, comparing with
> std::numeric_limits<double>::max() seems pretty fragile

Why then?

> I would use
> std::numeric_limits<double>::quiet_NaN() instead if I needed a special
> double value

Well, max() has other advantages, like being able to be used with other
operators besides ==.

In my case the function is returning A=max(), so that x<A will hold.
Using NaN, would mean mean first checking if A is NaN, then acting as
normal, while returning max() does not require two comparisons in many
places (except of course in the case where I'm seeing the problem I
wrote about).

For now I changed the code to use a big enough number (i.e. 100 or 1000)
for the specific situation, instead, but max() must have looked like a
"sure" value.

bye & Thanks
av.

Manfred

unread,

Apr 27, 2018, 7:44:02 AM4/27/18

to

On 4/27/2018 12:22 PM, Andrea Venturoli wrote:
> On 04/27/18 12:00, Paavo Helde wrote:
>
> Thanks for your answer.
>
>> One can argue that std::numeric_limits<double>::max() ought to be
>> exactly representable in double, by definition, and the above
>> considerations should hold.
>
> Do you think this is a bug in Clang/LLVM then?

I think the key point here is about optimizations: the values /should/
compare equal, but apparently as they are propagated through different
pipelines something changes. One thing I could imagine is that the final
comparison be performed as long double, and there a difference is
(wrongly) detected.
For the sake of curiosity, inspecting the binary representation of the
values (as hex bytesequence) and/or the generated asm might give some
more insight.

This in theory, but the practice what would be the goal of the code? The
question is about the best usage of FP arithmetics; IME this kind of
problems are usually solved by revising the rationale at the basis of
the code - more below.

>
>
>
>
>
>> Nevertheless, comparing with std::numeric_limits<double>::max() seems
>> pretty fragile
>
> Why then?

In general, /any/ comparison for exact equality between FP values is
inherently brittle.
As Paavo pointed out, in principle exact representation of any FP value
is lost after any math operations - this means any FP value actually
used in any real-world code.

>
>
>
>> I would use std::numeric_limits<double>::quiet_NaN() instead if I
>> needed a special double value
>
> Well, max() has other advantages, like being able to be used with other
> operators besides ==.

Back to the "in practice" point, I wonder what is the use of max() in
the actual code.
Usually when I need a large value for initialization I use HUGE_VAL
(which translates to infinity() in numeric_limits), which is guaranteed
to yield the correct result for < > comparison with any other value.
When I need to test for finiteness, is_bounded() (or the finite()
function where it is available) provide the desired result.
If the problem domain has some specific bounds, then I would use such
specific limits instead.

In short, it looks like it would be advisable to modify the code to
avoid the == comparison in the first place.

Andrea Venturoli

unread,

Apr 27, 2018, 7:58:09 AM4/27/18

to

On 04/27/18 13:43, Manfred wrote:

> In general, /any/ comparison for exact equality between FP values is
> inherently brittle.
> As Paavo pointed out, in principle exact representation of any FP value
> is lost after any math operations - this means any FP value actually
> used in any real-world code.

Agree on that.
Except there are no math operations here: just assignment/function
return/comparison.

> Back to the "in practice" point, I wonder what is the use of max() in
> the actual code.

Return a value that V, that, for each reasonable value of x, will yield x<V.

> If the problem domain has some specific bounds, then I would use such
> specific limits instead.

Agree.
That's how I've modified the code now (luckily I was able to bound the
domain).
Still... you know curiosity... :)

bye & Thanks
av.

Manfred

unread,

Apr 27, 2018, 8:02:38 AM4/27/18

to

On 4/27/2018 1:57 PM, Andrea Venturoli wrote:
> On 04/27/18 13:43, Manfred wrote:
>
>> In general, /any/ comparison for exact equality between FP values is
>> inherently brittle.
>> As Paavo pointed out, in principle exact representation of any FP
>> value is lost after any math operations - this means any FP value
>> actually used in any real-world code.
>
> Agree on that.
> Except there are no math operations here: just assignment/function
> return/comparison.
>
>
>
>
>
>> Back to the "in practice" point, I wonder what is the use of max() in
>> the actual code.
>
> Return a value that V, that, for each reasonable value of x, will yield
> x<V.

In general, infinity() is what has been designed for the purpose.

Ralf Goertz

unread,

Apr 27, 2018, 8:10:50 AM4/27/18

to

Am Fri, 27 Apr 2018 13:57:59 +0200
schrieb Andrea Venturoli <ml.die...@netfence.it>:

> On 04/27/18 13:43, Manfred wrote:
>
> > In general, /any/ comparison for exact equality between FP values
> > is inherently brittle.
> > As Paavo pointed out, in principle exact representation of any FP
> > value is lost after any math operations - this means any FP value
> > actually used in any real-world code.
>
> Agree on that.
> Except there are no math operations here: just assignment/function
> return/comparison.

Do I understand correctly, you assign a floating point value to a
variable do nothing mathematically with it and compare it later with a
similarly treated floating point value? If that is the case can't those
values be translated to something more reliably comparable like int via
hashing or string via decimal representation?

Manfred

unread,

Apr 27, 2018, 8:23:32 AM4/27/18

to

On 4/27/2018 2:02 PM, Manfred wrote:
> On 4/27/2018 1:57 PM, Andrea Venturoli wrote:
>> On 04/27/18 13:43, Manfred wrote:
>>
>>
>>
>>> Back to the "in practice" point, I wonder what is the use of max() in
>>> the actual code.
>>
>> Return a value that V, that, for each reasonable value of x, will
>> yield x<V.
>
> In general, infinity() is what has been designed for the purpose.

Forget this. This was dumb..

Andrea Venturoli

unread,

Apr 27, 2018, 9:15:25 AM4/27/18

to

On 04/27/18 14:10, Ralf Goertz wrote:

> Do I understand correctly,

Partially.

> you assign a floating point value to a
> variable do nothing mathematically with it and compare it later with a
> similarly treated floating point value? If that is the case can't those
> values be translated to something more reliably comparable like int via
> hashing or string via decimal representation?

I've got:

> double f(...) {
> if (...)

> return std::numeric_limits<double>::max();

> else
> return [some computation];
> }

Then, in several places:

> double A=f(...);
> while (x<A) ...

or

> double A=f(...);
> if (x<A) ...

or

> double A=f(...),x=min(A,...);

etc...

Only in a few places I see:

> double A=f(...)

> if (A==std::numeric_limits<double>::max())

> ...;
> else
> ...

bye & Thanks
av.

Tim Rentsch

unread,

Apr 27, 2018, 10:26:33 AM4/27/18

to

Andrea Venturoli <ml.die...@netfence.it> writes:

> [lightly edited]

> I'm aware of all the quirks with FP math.
> However I thought comparing std::numeric_limits<double>::max() to
> std::numeric_limits<double>::max() should yield a deterministic
> result.
>

> I've got code that works when compiled without optimizations, but
> fails with clang 4.0's "-O" option.
>
> By eliminating "if"s and irrelevant variables, it can be reduced
> to something like the following:
>
> #include <limits>
> #include <iostream>
>
> double f() {return std::numeric_limits<double>::max();}
> double g() {return f();}
>
> int main(int,char**) {
> double A=g();
> if (A==std::numeric_limits<double>::max())
> std::cout<<true<<std::endl;
> else
> std::cout<<false<<std::endl;
> }
>
> This snippet in fact works up to -O3, always outputting 1. The
> original, more complex, code, however, only works without
> optimizations.

Looking at the C and C++ standards, I don't see any wiggle room.
The function std::numeric_limits<double>::max() is of type
double, and must return a particular (finite) value. Any
particular value must compare equal to itself.

You should post code that shows the problem you're asking about.
By "reducing" the code what you've actually done is change it so
the problem is no longer there. Always post code that does in
fact exhibit the behavior you want to ask about.

Marcel Mueller

unread,

Apr 27, 2018, 10:59:11 AM4/27/18

to

On 27.04.18 08.44, Andrea Venturoli wrote:
> Before I start trying different compilers (or versions), is my original
> assumption true or wrong?

The assumption is true. In fact I was not able to reproduce your problem
with different versions of gcc from 3.35 up to 6.3.0 on different
platforms (Linux amd64, Linux x86, OS/2, Linux ARMv6) with optimizations
even with -ffast-math.

Marcel

Andrea Venturoli

unread,

Apr 28, 2018, 3:52:37 AM4/28/18

to

On 04/27/18 16:26, Tim Rentsch wrote:
> Looking at the C and C++ standards, I don't see any wiggle room.
> The function std::numeric_limits<double>::max() is of type
> double, and must return a particular (finite) value. Any
> particular value must compare equal to itself.

Thanks.

> You should post code that shows the problem you're asking about.
> By "reducing" the code what you've actually done is change it so
> the problem is no longer there. Always post code that does in
> fact exhibit the behavior you want to ask about.

You are absolutely right.
I was only asking for confirmation before I wasted hours "reducing" on a
false assumption.
In fact it looks like this was not the actual problem.
In any case, here we go now...

I'm working on FreeBSD 11.1/amd64 and tried clang 4.0.0, 5.0.1 and
6.0.0: all show the problem I'll describe when compiling with -O1.

Some warning:
_ of course this snippet doesn't make sense, it's just a Proof Of Concept;
_ I know "feenableexcept" is FreeBSD specific and I don't know what a
Linux equivalent would be;
_ at -O3 probably the compilers realizes f() will always return max()
and optimizes everything away, so a more complicated example is
required; this can be surely done as the original software fails with
this option.

The snippet:
> #include <iostream>
> #include <numeric>
> #include <fenv.h>
>
> double f(double A,double B) {
> if (A<B) return std::numeric_limits<double>::max();
> return A-B;
> }
>
> int main(int /*argc*/,char**/*argv*/) {
> feenableexcept(FE_OVERFLOW); //**** NOTE 1
> double A=f(.0001002773902563,1.);
> std::cout<<A<<std::endl;
> double B;

> if (A==std::numeric_limits<double>::max()) {

> std::cout<<"Good"<<std::endl; //**** NOTE 2
> B=A;
> } else {
> std::cout<<"Bad"<<std::endl; //**** NOTE 2
> B=A*2.;
> }
> std::cout<<B<<std::endl;
> }

This code as is will work fine, printing:
1.79769e+308
Good
1.79769e+308

Commenting the lines noted by NOTE 1 & 2 will still work fine (of course
you'll see no "Good" in the output).

Removing only NOTE 1 (but leaving NOTE 2) will of course still produce a
correct output.

Leaving NOTE 1, but removing NOTE 2 will output the first value, but
then generate an FP exception at "B=A*2".

Now some thoughts:

_ IMO adding or removing std::cout should not change a program behaviour
(apart obviously from the output);

_ in any case the instruction "B=A*2" should never be executed;

_ perhaps, when there's no messing with std::cout, the compiler will
label this branch as worth of speculatively being executed? In that case
I think any collateral effect should be thrown away, so the FP exception
should be handled.

Is this just my opinion or a bug worth reporting?
Maybe it's feenableexcept that makes the whole system deviate from the
standard?

bye & Thanks
av.

Tim Rentsch

unread,

Apr 28, 2018, 5:42:48 AM4/28/18

to

Andrea Venturoli <ml.die...@netfence.it> writes:

> On 04/27/18 16:26, Tim Rentsch wrote:

> [both parts trimmed]

>
>> You should post code that shows the problem you're asking about.
>> By "reducing" the code what you've actually done is change it so
>> the problem is no longer there. Always post code that does in
>> fact exhibit the behavior you want to ask about.
>

> You are absolutely right. [...]

You're darn tootin'. :)

> I'm working on FreeBSD 11.1/amd64 and tried clang 4.0.0, 5.0.1 and
> 6.0.0: all show the problem I'll describe when compiling with -O1.
>

> _ I know "feenableexcept" is FreeBSD specific and I don't know what a
> Linux equivalent would be;

Maybe feenableexcept() is POSIX? Anyway it worked fine on my
linux system.

> The snippet:
>
> #include <iostream>
> #include <numeric>
> #include <fenv.h>
>
> double f(double A,double B) {
> if (A<B) return std::numeric_limits<double>::max();
> return A-B;
> }
>
> int main(int /*argc*/,char**/*argv*/) {
> feenableexcept(FE_OVERFLOW); //**** NOTE 1
> double A=f(.0001002773902563,1.);
> std::cout<<A<<std::endl;
> double B;
> if (A==std::numeric_limits<double>::max()) {
> std::cout<<"Good"<<std::endl; //**** NOTE 2
> B=A;
> } else {
> std::cout<<"Bad"<<std::endl; //**** NOTE 2
> B=A*2.;
> }
> std::cout<<B<<std::endl;
> }

I recommend just indenting, or alternatively using some lead
character other than > to mark the code. If > is used it looks
like an extra level of quoting (ie, to a previous message) in a
news posting.

> Leaving NOTE 1, but removing NOTE 2 will output the first value, but
> then generate an FP exception at "B=A*2".

Here is my stripped down version (just the main() function):

int
main( int /*argc*/ , char** /*argv*/ ){
double A = f( .0001002773902563, 1. );
bool equals = A == std::numeric_limits<double>::max();

std::cout << equals << std::endl;

feenableexcept( FE_OVERFLOW );
std::cout << (equals ? A : A*2) << std::endl;

return 0;
}

Prints '1' and value of A on -O0.
Prints '1' and then fails on -O1.

> Now some thoughts:
>
> _ IMO adding or removing std::cout should not change a program
> behaviour (apart obviously from the output);
>
> _ in any case the instruction "B=A*2" should never be executed;
>
> _ perhaps, when there's no messing with std::cout, the compiler
> will label this branch as worth of speculatively being executed?
> In that case I think any collateral effect should be thrown away,
> so the FP exception should be handled.

First, the problem is not the comparison. The '1' being printed
shows the two values compare equal.

Most likely what has happened is that clang decided to calculate
both 'A' and 'A*2', and then use a conditional move (which in
your program would assign to 'B') to choose the appropriate
value. Unfortunately, calculating 'A*2' causes a floating
exception before the conditional move can take effect.

That is consistent with the change when removing the call to
cout. When the cout calls are there, a branch instruction
needs to be done anyway, so a conditional move isn't used.

> Is this just my opinion or a bug worth reporting?
> Maybe it's feenableexcept that makes the whole system deviate from
> the standard?

I think using feenableexcept() already puts you outside the realm
of the C and C++ standards. So I'm not sure where the finger
should be pointed.

On the other hand, if I were on the clang team, I would like to
know that this behavior occurs, because the result is awful
whether or not it is technically considered a bug. And the bad
behavior does not occur with g++.

Christiano

unread,

Apr 28, 2018, 7:14:26 AM4/28/18

to

Don't use fenv.h, use cfenv instead.

The minimalist code that is generating the bug is:

// ------------ a.cpp ------------------
#include <iostream>
#include <numeric>
#include <limits>
#include <cfenv>

double f(double A,double B) {
if (A<B) return std::numeric_limits<double>::max();
return A-B;
}

int main(int , char**) {

feenableexcept(FE_OVERFLOW); //**** NOTE 1
double A=f(.0001002773902563,1.);
std::cout<<A<<std::endl;
double B;
if (A==std::numeric_limits<double>::max()) {

// std::cout<<"Good"<<std::endl; //**** NOTE 2
B=A;
} else {
// std::cout<<"Bad"<<std::endl; //**** NOTE 2

B=A*2.;
}
std::cout<<B<<std::endl;
}

// ---------------------------------------

The command:
$ clang++ -O1 a.cpp
$ ./a.out
1.79769e+308
Floating point exception (core dumped)

This seems like a bug with clang optimization. Please, go to:
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
And report the problem using the minimalist code above and the compilation command.

Andrea Venturoli

unread,

Apr 28, 2018, 8:35:17 AM4/28/18

to

On 04/28/18 11:42, Tim Rentsch wrote:

> Maybe feenableexcept() is POSIX? Anyway it worked fine on my
> linux system.

Didn't know.
Thanks for trying this on Linux.

> I recommend just indenting, or alternatively using some lead
> character other than > to mark the code. If > is used it looks
> like an extra level of quoting (ie, to a previous message) in a
> news posting.

Sorry for that. I had been told that's the only way to avoid line breaks
in many clients.
At least it seems to be so in ThunderBird.

> First, the problem is not the comparison. The '1' being printed
> shows the two values compare equal.

Yes, as I said I originally tought this was the problem, but it isn't.
Strangely enough, however, just using 1000 instead of max(), with no
other changes to the program, made the problem go away!!!

> Most likely what has happened is that clang decided to calculate
> both 'A' and 'A*2', and then use a conditional move (which in
> your program would assign to 'B') to choose the appropriate
> value. Unfortunately, calculating 'A*2' causes a floating
> exception before the conditional move can take effect.

That's my hypotesis too.

>> Is this just my opinion or a bug worth reporting?
>> Maybe it's feenableexcept that makes the whole system deviate from
>> the standard?
>
> I think using feenableexcept() already puts you outside the realm
> of the C and C++ standards. So I'm not sure where the finger
> should be pointed.

I guess at least the compiler should know that exceptions where enabled.
How? I don't know.

> On the other hand, if I were on the clang team, I would like to
> know that this behavior occurs, because the result is awful
> whether or not it is technically considered a bug. And the bad
> behavior does not occur with g++.

I'll do this.

bye & Thanks
av.

K. Frank

unread,

Apr 28, 2018, 10:33:43 AM4/28/18

to

Hi Andrea!

On Friday, April 27, 2018 at 2:44:47 AM UTC-4, Andrea Venturoli wrote:
> Hello.
>
> I'm aware of all the quirks with FP math.
> However I thought comparing std::numeric_limits<double>::max() to
> std::numeric_limits<double>::max() should yield a deterministic result.
> Am I correct?

> ...

Short answer:

The failure you see of std::numeric_limits<double>::max()
to compare equal to itself violates any reasonable de facto
standard, but (probably) does not violate the letter of the
c++ standard.

Even judged by the letter of the standard rather than the
de facto standard, it represents a significant quality-of-
implementation bug.

Some further explanation:

The c++ standard gives quite concrete guarantees about how
unsigned integral arithmetic works, and somewhat weaker
(rooted in giving flexibility to how sign bits are implemented)
guarantees for signed integral arithmetic.

Some time back -- probably the c++ 11 or 14 draft standard -- I
read carefully the standard's floating point verbiage. The c++
standard is SURPRISINGLY vacuous about floating-point arithmetic.
Basically it says that floating-point numbers represent thingys
that kind-of, sort-of behave like floating-point numbers, you can
use arithmetic operators on them, and that long double has at
least as much precision as double, which has at least as much
precision as float. (Hmm ...)

(Things may have been tightened up in the most recent standard,
but I doubt it. Also, an implementation has the option to return
true for std::numeric_limits<double>::is_iec559, in which case
it promises to meet (some level) of the ieee 754 floating-standard.)

In short, the c++ standard guarantees for floating-point arithmetic
are so vacuous as to be useless, so we are forced to rely on de
facto standards or quality-of-implementation criteria to get any
(floating-point) work done.

While we're on the subject, let me address some of the floating-point
FUD that infects many nooks and crannies of the internet (this news
group included).

Floating-point arithmetic definitely has its subtleties, but, no,
floating-point numbers are not mere shape-shifting specters of some
sort of actual numbers. They are all-together well defined (although
different implementations legitimately define them differently).

If your implementation permits

float x = 3.3;
x != x + 0.0;

your implementation has a bug and violates the (de facto) standard.

No, floating-point numbers are not permitted to grow some sort of
floating-point-FUD fuzz when they move along the data bus. If you
store a floating-point number in memory and read it back ten times,
your implementation has a bug if it gives you back ten different
values. (Just sayin'.)

Floating-point arithmetic is commutative:

x + y == y + x;

It cannot be (and is not) associative for all values:

(x + y) + z == x + (y + z);

need not (and does not) hold for all values.

Floating-point arithmetic* should be exact to the precision of
the representation. Being off by the least-significant bit I
would call a quality-of-implementation imperfection, and being
off by two bits I would call a bug.

*) By this I mean a single floating-point operation. Round-off
error does -- unavoidably -- accumulate when performing a series
of operations, and managing this reality is an important part of
numerical analysis. Also, implementations, legitimately, give
themselves a couple of bits slop for mathematical functions, i.e.,
sqrt, sin, etc.

If someone tells you that floating-point arithmetic isn't precise
or that floating-point numbers start to get moldy when they sit
in memory for too long, they're spewing FUD.

If someone tells you that you can't meaningfully test for equality
between floating-point numbers, they're spewing FUD.

To be fair, floating-point arithmetic is not the arithmetic of real
numbers, and it has a number of subtleties. Everyone should check
out Goldberg's "What Every Computer Scientist Should Know About
Floating-Point Arithmetic," which can be found many places on the
internet, for example:

www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf

> Before I start trying different compilers (or versions), is my original
> assumption true or wrong?

You're right. Your compiler has a bug and violates the (de facto)
standard. (You might want to be nice, and file a bug report.)

> bye & Thanks
> av.

Happy Floating-Point Hacking!

K. Frank

Andrea Venturoli

unread,

Apr 28, 2018, 10:45:26 AM4/28/18

to

On 04/28/18 14:35, Andrea Venturoli wrote:

> Yes, as I said I originally tought this was the problem, but it isn't.
> Strangely enough, however, just using 1000 instead of max(), with no
> other changes to the program, made the problem go away!!!

Sorry, forget this: wrong wording.
Of course using 1000 will make 1000*2 a legal statement.

What I meant to say: at first I came up with an hypotesis, which later
proved not to be the correct one.

Andrea Venturoli

unread,

Apr 28, 2018, 10:52:14 AM4/28/18

to

On 04/28/18 16:33, K. Frank wrote:
> Hi Andrea!
>
> On Friday, April 27, 2018 at 2:44:47 AM UTC-4, Andrea Venturoli wrote:
>> Hello.
>>
>> I'm aware of all the quirks with FP math.
>> However I thought comparing std::numeric_limits<double>::max() to
>> std::numeric_limits<double>::max() should yield a deterministic result.
>> Am I correct?
>> ...
>
> Short answer:
>
> The failure you see of std::numeric_limits<double>::max()
> to compare equal to itself violates any reasonable de facto
> standard, but (probably) does not violate the letter of the
> c++ standard.

Thanks a lot for your post.

However, as I said, my first hypotesis (wrong comparison) was not the
correct one.
I apologize for this... sometimes, when dealing with optimizers, things
are not so linear.

The problem lies elsewhere, as described in my other posts in this thread.
I think it's still an interesing one, possibly a compiler bug; so, if
you are curious, read them :)

bye
av.

James Kuyper

unread,

Apr 28, 2018, 4:26:49 PM4/28/18

to

On 04/28/2018 10:33 AM, K. Frank wrote:
...

> Some time back -- probably the c++ 11 or 14 draft standard -- I
> read carefully the standard's floating point verbiage. The c++
> standard is SURPRISINGLY vacuous about floating-point arithmetic.

"... this International Standard imposes no restrictions on the accuracy
of floating-point operations, ..." (5.20p6). While that is a "Note", and
therefore non-normative, it does seem to correctly describe the
normative text of the standard: I couldn't find any such restrictions.

> (Things may have been tightened up in the most recent standard,

> but I doubt it. ...

The above quotation is from n4567.pdf, the closest free thing to C++17.

> ... Also, an implementation has the option to return

> true for std::numeric_limits<double>::is_iec559, in which case
> it promises to meet (some level) of the ieee 754 floating-standard.)

"static constexpr bool is_iec559;
True if and only if the type adheres to IEC 559 standard." (18.3.2.4p56).

I don't see the term "adheres" as providing a whole lot of wiggle room -
an implementation either does or does not adhere to the standard. Is
there an IEC document somewhere which defines "adhere" in a way that
provides wiggle room?
The IEC 559 requirements on the precision of results are pretty nearly
as tight as it is practically possible to make them. I'm willing to rely
on that when is_iec559() is true for the relevant type(s).

...

> If someone tells you that floating-point arithmetic isn't precise> or that floating-point numbers start to get moldy when they sit
> in memory for too long, they're spewing FUD.

If, on the other hand, they tell that that floating point arithmetic
isn't infinitely precise, they're telling you the exact truth.

> If someone tells you that you can't meaningfully test for equality
> between floating-point numbers, they're spewing FUD.

However, if someone tells you that two different calculations which, if
carried out with infinite precision, mathematically should produce
exactly identical results, might not produce exactly identical results
when performed using floating point arithmetic, they're telling you the
truth.

Paavo Helde

unread,

Apr 29, 2018, 4:12:56 PM4/29/18

to

On 27.04.2018 13:22, Andrea Venturoli wrote:
> On 04/27/18 12:00, Paavo Helde wrote:

>> Nevertheless, comparing with std::numeric_limits<double>::max() seems
>> pretty fragile
>
> Why then?

Comparing floating-point numbers for exact equality is always fragile,
there is always a chance that somebody adds some computation like divide
by 10, multiply by 10, ruining the results. A deeply "floating-point"
value like std::numeric_limits<double>::max() is doubly suspect just
because it is far away from normal and well-tested range of values.

I just checked how it is defined in MSVC. It appears the value is
defined by the macro

#define DBL_MAX 1.7976931348623158e+308

From here you can clearly see there might be problems. This constant is
specified in decimal and I believe there is a fair chance this number
does not have an exact representation in binary. It will probably yield
different results when loaded into either a 64-bit or a 80-bit register.
Add some minor optimizer bugs and one can easily imagine that there
might be problems when comparing this number with itself, even if it
should work by the letter of the standard.

Tim Rentsch

unread,

Apr 30, 2018, 1:16:54 AM4/30/18

to

"K. Frank" <kfran...@gmail.com> writes:

> Hi Andrea!
>
> On Friday, April 27, 2018 at 2:44:47 AM UTC-4, Andrea Venturoli wrote:
>
>> Hello.
>>
>> I'm aware of all the quirks with FP math.
>> However I thought comparing std::numeric_limits<double>::max() to
>> std::numeric_limits<double>::max() should yield a deterministic result.
>> Am I correct?
>> ...
>
> Short answer:
>
> The failure you see of std::numeric_limits<double>::max()

> to compare equal to itself [...] (probably) does not violate

> the letter of the c++ standard.

What leads you to think the C++ standard allows this? Are
there any specific citations you can offer that support
this view? AFAICT the C and C++ standards admit no leeway,
and the comparison must give a result of equal.

Tim Rentsch

unread,

Apr 30, 2018, 1:41:35 AM4/30/18

to

Paavo Helde <myfir...@osa.pri.ee> writes:

> On 27.04.2018 13:22, Andrea Venturoli wrote:
>
>> On 04/27/18 12:00, Paavo Helde wrote:
>>
>>> Nevertheless, comparing with std::numeric_limits<double>::max() seems
>>> pretty fragile
>>
>> Why then?
>
> Comparing floating-point numbers for exact equality is always fragile,
> there is always a chance that somebody adds some computation like
> divide by 10, multiply by 10, ruining the results. A deeply
> "floating-point" value like std::numeric_limits<double>::max() is
> doubly suspect just because it is far away from normal and well-tested
> range of values.
>
> I just checked how it is defined in MSVC. It appears the value is
> defined by the macro
>
> #define DBL_MAX 1.7976931348623158e+308
>
> From here you can clearly see there might be problems. This constant
> is specified in decimal and I believe there is a fair chance this
> number does not have an exact representation in binary. It will
> probably yield different results when loaded into either a 64-bit or a
> 80-bit register.

None of those things matter. The Standard requires a particular
value be returned, however the implementation chooses to do it.

> Add some minor optimizer bugs and one can easily
> imagine that there might be problems when comparing this number with
> itself, even if it should work by the letter of the standard.

If you don't trust your compiler, get a different compiler.

If you think it's important to run sanity checks be sure the
compiler doesn't have bugs, by all means do so.

But don't give in to superstitious programming practices. Insist
on solid understanding and a rational decision process, not murky
justifications based on uncertainty and fear. Anyone promoting
voodoo programming principles should be encouraged to change
occupations from developer to witchdoctor.

Juha Nieminen

unread,

Apr 30, 2018, 1:57:25 AM4/30/18

to

Andrea Venturoli <ml.die...@netfence.it> wrote:
> I added:
>>std::cout<<std::setprecision(20)<<std::fixed<<A<<"\n"<<std::numeric_limits<double>::max()<<std::endl;
> and get:
>> 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00000000000000000000
>> 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.00000000000000000000
>
> The two numbers are the same to the last digit.

If you want to print two floating point numbers in order to see if they
are bit-by-bit identical, you shouldn't print the in decimal. Conversion
to decimal may cause rounding errors (because floating point values are
internally represented in terms of powers of 2, while decimal is in
terms of powers of 10).

Either print the raw byte values of the floating point variable (eg.
in binary format), or use std::printf with the format specifier "%a"
which prints it in hexadecimal. (This is a bit-by-bit accurate
representation because converting from base-2 to base-16 can be
done losslessly, without need for rounding.)

There might have been an equivalent to "%a" for std::ostream, but
I don't remember now if there was.

Andrea Venturoli

unread,

Apr 30, 2018, 4:05:12 AM4/30/18

to

On 04/27/18 08:44, Andrea Venturoli wrote:
> Hello.

> ...

First off, thanks to anyone who got intersted in the matter.
I wrote to the clang-dev mailing list and received a precise answer.

I'll try to summarize everything here:

_ my first assumption that comparing std::numeric_limts<double>::max()
to itself was failing was wrong; the problem was another;

_ BTW, this comparing must work (or it would be a bug in the
compiler/system/etc...);

_ my real problem was that:
a) I had enabled FP exceptions (in particular overflow);
b) with optimizations on, the compiler would speculatively execute a
branch that would not run under proper program flow and such a branch
generated an FP exception.

_ My code was deemed as problematic because, in order to enable
exceptions (or use anything from <cenv>), the compiler should be
informed (by using #pragma STDC FENV_ACCESS on). Failure to do this will
let the optimizer take wrong assumptions.

_ BTW, I found some sources which say the above actually is C++
standard, some say it's C standard (possibly inherited by C++ or not),
some say it's an extension a compiler might support. I don't have access
to C++ standard.

_ In any case, Clang does not support that #pragma, so there's right now
no way to get FP exception to play nicely with optimizations.
There's work going on, but no estimate on the release.

bye
av.

Paavo Helde

unread,

Apr 30, 2018, 4:10:10 AM4/30/18

to

That's not about my compiler. I need the code to be compiled by
different compilers and we do not have time or resources to test them
all, especially those which have not yet been written.

> If you think it's important to run sanity checks be sure the
> compiler doesn't have bugs, by all means do so.

Thanks, our software has a huge suite of automatic unit and integration
tests. With their help we recently located and eliminated a randomly
flipping bit in the physical memory of the testing server which the
memory diagnostic tools failed to diagnose.

>
> But don't give in to superstitious programming practices. Insist
> on solid understanding and a rational decision process, not murky
> justifications based on uncertainty and fear. Anyone promoting
> voodoo programming principles should be encouraged to change
> occupations from developer to witchdoctor.

What one man calls superstitious hunch, another man calls experience. I
would not have written a direct comparison with
std::numeric_limits<double>::max() because I have had some experience
with compiler/optimizer bugs and where are the murky corners. As it came
out else-thread my suspicions were justified, the problem indeed appears
to be a bug in the compiler, triggered indeed by the presence of
std::numeric_limits<double>::max() in the code (albeit the bug was a
different and more interesting one from what I had imagined).

I get payed for writing software working as reliably as possible in the
real world. This has a lot to do with anticipating and avoiding or
working around any bugs or problems in the
standards/OS-es/compilers/toolchains/third-party libraries, etc.

Tim Rentsch

unread,

Apr 30, 2018, 7:42:55 AM4/30/18

to

Andrea Venturoli <ml.die...@netfence.it> writes:

> On 04/27/18 08:44, Andrea Venturoli wrote:
>

> [trimmed and edited lightly]

>
> My code was deemed as problematic because, in order to enable
> exceptions (or use anything from <cenv>), the compiler should be
> informed (by using #pragma STDC FENV_ACCESS on). Failure to do
> this will let the optimizer take wrong assumptions.
>

> I found some sources which say the above actually is C++ standard,
> some say it's C standard (possibly inherited by C++ or not), some
> say it's an extension a compiler might support.

Support is required in ISO C, implementation-defined in C++.

> I don't have access to C++ standard.

Get free draft here:

http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf

James R. Kuyper

unread,

Apr 30, 2018, 8:58:33 AM4/30/18

to

On 04/30/2018 01:57 AM, Juha Nieminen wrote:
...

> Either print the raw byte values of the floating point variable (eg.
> in binary format), or use std::printf with the format specifier "%a"
> which prints it in hexadecimal. (This is a bit-by-bit accurate
> representation because converting from base-2 to base-16 can be
> done losslessly, without need for rounding.)
>
> There might have been an equivalent to "%a" for std::ostream, but
> I don't remember now if there was.

Starting with C++2011, if str.flags() has both ios_base::fixed and
ios_base::scientific set at the same time, that's equivalent to %a or
%A, depending upon whether ios_base::uppercase is also set.
(24.4.2.2.2p5 - Table 76).

James R. Kuyper

unread,

Apr 30, 2018, 9:29:27 AM4/30/18

to

On 04/29/2018 04:12 PM, Paavo Helde wrote:
...

> Comparing floating-point numbers for exact equality is always fragile,
> there is always a chance that somebody adds some computation like divide
> by 10, multiply by 10, ruining the results. A deeply "floating-point"
> value like std::numeric_limits<double>::max() is doubly suspect just
> because it is far away from normal and well-tested range of values.
>
> I just checked how it is defined in MSVC. It appears the value is
> defined by the macro
>
> #define DBL_MAX 1.7976931348623158e+308
>
> From here you can clearly see there might be problems. This constant is
> specified in decimal and I believe there is a fair chance this number
> does not have an exact representation in binary.

True. But keep in mind that this definition is intended to be used only
by a particular implementation of C++ - one that provides the <limits>
and <cfloat> headers that you're looking at. The implementor has a
responsibility for making sure that the this particular floating point
constant will be converted BY THAT IMPLEMENTATION to the particular
floating point value which represents the maximum possible double value.
It can be proven, following the rules governing the interpretation of
floating point constants, that there do exist constants for which that
would be true. I would expect the implementor to choose the shortest
such constant.

This is complicated by the fact that some implementations of C++
(including, for some reason, some of the most popular ones on Windows)
consist of a compiler created by one vendor, combined with a C++
standard library created by a different vendor. However, if the compiler
and the library don't work together to interpret DBL_MAX correctly, then
that combination of compiler and library does not constitute a fully
conforming implementation of C++. Neither vendor should endorse using
their products together unless they've made sure that they do, together,
qualify as fully conforming (at least, when the appropriate options are
chosen). You shouldn't use them together unless at least one of the two
vendors has endorsed using them together.

If you don't have good reason to believe that an implementor has
bothered checking whether their implementation actually conforms (at
least, when you've chose the options that are supposed to make it
conform), then whether or not DBL_MAX is exactly the maximum finite
value for a double is going to be the least of your problems.

James R. Kuyper

unread,

Apr 30, 2018, 10:02:52 AM4/30/18

to

On 04/30/2018 04:05 AM, Andrea Venturoli wrote:
...

> _ My code was deemed as problematic because, in order to enable
> exceptions (or use anything from <cenv>), the compiler should be
> informed (by using #pragma STDC FENV_ACCESS on). Failure to do this will
> let the optimizer take wrong assumptions.
>
> _ BTW, I found some sources which say the above actually is C++
> standard, some say it's C standard (possibly inherited by C++ or not),

The STDC FENV_ACCESS pragma is defined by the C standard. The entire C
standard library was incorporated by reference into the C++ standard,
with precisely specified modifications, but for the rest of the C
language, it's incorporated into the C++ standard only if the C++
standard explicitly says so. What it says about this pragma is that
support for it is implementation-defined.

> some say it's an extension a compiler might support. ...

"Implementation-defined" is a marginally stronger requirement than
"supportable extension": an implementation's documentation is required
to specify whether or not it's supported, and the standard specifies
what it means if supported.

> ... I don't have access
> to C++ standard.

The final draft of C++2017 is almost identical to the final approved
standard, and is a LOT cheaper:
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/n4659.pdf>

Manfred

unread,

Apr 30, 2018, 10:17:15 AM4/30/18

to

On 4/28/2018 2:35 PM, Andrea Venturoli wrote:
> On 04/28/18 11:42, Tim Rentsch wrote:

[...]

>
> Here is my stripped down version (just the main() function):
>
> int
> main( int /*argc*/ , char** /*argv*/ ){
> double A = f( .0001002773902563, 1. );
> bool equals = A == std::numeric_limits<double>::max();
>
> std::cout << equals << std::endl;
>
> feenableexcept( FE_OVERFLOW );
> std::cout << (equals ? A : A*2) << std::endl;
>
> return 0;
> }
>
> Prints '1' and value of A on -O0.
> Prints '1' and then fails on -O1.

[...]

>
>> Most likely what has happened is that clang decided to calculate
>> both 'A' and 'A*2', and then use a conditional move (which in
>> your program would assign to 'B') to choose the appropriate
>> value. Unfortunately, calculating 'A*2' causes a floating
>> exception before the conditional move can take effect.
>
> That's my hypotesis too.
>
>
>
>>> Is this just my opinion or a bug worth reporting?

This is definitely a bug worth reporting:
n4659 sec. 8.16 (Conditional operator) p.1:

"... The first expression is contextually converted to bool (Clause 7).
It is evaluated and if it is true, the result of the conditional
expression is the value of the second expression,
otherwise that of the third expression. Only one of the second and third
expressions is evaluated. ..."

The relevant part is that "Only one of the second and third expressions
is evaluated.", so A*2 should /not/ be evaluated.
clang is definitely wrong here.

Manfred

unread,

Apr 30, 2018, 4:32:20 PM4/30/18

to

On 4/30/2018 7:41 AM, Tim Rentsch wrote:
> But don't give in to superstitious programming practices. Insist
> on solid understanding and a rational decision process, not murky
> justifications based on uncertainty and fear. Anyone promoting
> voodoo programming principles should be encouraged to change
> occupations from developer to witchdoctor.

The fact that comparing floating point values with == is inherently
brittle is a *fact*, and there is nothing superstitious with it.

Floating point programming has its own peculiarities; knowing that ==
comparisons are to be avoided is one of them.

bol...@cylonhq.com

unread,

May 1, 2018, 4:49:23 AM5/1/18

to

On 30 Apr 2018 22:15:08 GMT
r...@zedat.fu-berlin.de (Stefan Ram) wrote:

>Manfred <non...@invalid.add> writes:
>>The fact that comparing floating point values with == is inherently
>>brittle is a *fact*, and there is nothing superstitious with it.
>

> The following program will print whether under the
> implementation used »0.1 + 0.2« is equal to »0.3«.
>
> main.cpp
>
>#include <iostream>
>#include <ostream>
>
>int main()
>{ ::std::cout <<( 0.1 + 0.2 == 0.3 )<< '\n'; }
>
> transcript
>
>0
>
> The value representation of floating-point types is
> implementation-defined in C++, so the behavior of the above
> program might be implementation-defined (if this is what you
> call "brittle"). But the program will /reliably/ report,
> whether under the implementation used »0.1 + 0.2« is »0.3«.
> So, »==« does exactly what it's job is.

I can't help thinking that perhaps the compiler and/or FPU designers should
make an effort to make this work. After all, if a computer can't reliably
compare numbers then that rather goes against its whole raison d'etre. Yes
I know there are ways around it but frankly one shouldn't have to bugger about
doing conversions just to carry out such a basic operation.

Paavo Helde

unread,

May 1, 2018, 6:57:23 AM5/1/18

to

On 1.05.2018 11:49, bol...@cylonHQ.com wrote:
> On 30 Apr 2018 22:15:08 GMT
> r...@zedat.fu-berlin.de (Stefan Ram) wrote:
>> Manfred <non...@invalid.add> writes:
>>> The fact that comparing floating point values with == is inherently
>>> brittle is a *fact*, and there is nothing superstitious with it.
>>
>> The following program will print whether under the
>> implementation used »0.1 + 0.2« is equal to »0.3«.
>>
>> main.cpp
>>
>> #include <iostream>
>> #include <ostream>
>>
>> int main()
>> { ::std::cout <<( 0.1 + 0.2 == 0.3 )<< '\n'; }
>>
>> transcript
>>
>> 0
>>
>> The value representation of floating-point types is
>> implementation-defined in C++, so the behavior of the above
>> program might be implementation-defined (if this is what you
>> call "brittle"). But the program will /reliably/ report,
>> whether under the implementation used »0.1 + 0.2« is »0.3«.
>> So, »==« does exactly what it's job is.
>
> I can't help thinking that perhaps the compiler and/or FPU designers should
> make an effort to make this work.

It is simple to make 0.1+0.2 to "work", one just needs to design and
build base 10 floating-point hardware, instead of base 2. It would
probably be several times more expensive than the current hardware.
Considering that one would not get rid of round-off errors on this
hardware either, it seems like a massive overkill which does not
actually solve anything.

There is nothing inherently valuable in decimal representation of
numbers. The idea to build hardware which would guarantee limited exact
arithmetics in the case determined by the number of our fingers seems a
bit preposterous to me.

Robert Wessel

unread,

May 1, 2018, 11:50:36 AM5/1/18

to

On Tue, 01 May 2018 13:57:09 +0300, Paavo Helde
<myfir...@osa.pri.ee> wrote:

>On 1.05.2018 11:49, bol...@cylonHQ.com wrote:
>> On 30 Apr 2018 22:15:08 GMT
>> r...@zedat.fu-berlin.de (Stefan Ram) wrote:
>>> Manfred <non...@invalid.add> writes:
>>>> The fact that comparing floating point values with == is inherently
>>>> brittle is a *fact*, and there is nothing superstitious with it.
>>>
>>> The following program will print whether under the

>>> implementation used 禄0.1 + 0.2芦 is equal to 禄0.3芦.

>>>
>>> main.cpp
>>>
>>> #include <iostream>
>>> #include <ostream>
>>>
>>> int main()
>>> { ::std::cout <<( 0.1 + 0.2 == 0.3 )<< '\n'; }
>>>
>>> transcript
>>>
>>> 0
>>>
>>> The value representation of floating-point types is
>>> implementation-defined in C++, so the behavior of the above
>>> program might be implementation-defined (if this is what you
>>> call "brittle"). But the program will /reliably/ report,

>>> whether under the implementation used 禄0.1 + 0.2芦 is 禄0.3芦.
>>> So, 禄==芦 does exactly what it's job is.

>>
>> I can't help thinking that perhaps the compiler and/or FPU designers should
>> make an effort to make this work.
>
>It is simple to make 0.1+0.2 to "work", one just needs to design and
>build base 10 floating-point hardware, instead of base 2. It would
>probably be several times more expensive than the current hardware.
>Considering that one would not get rid of round-off errors on this
>hardware either, it seems like a massive overkill which does not
>actually solve anything.

It seems unlikely that adding DFP would be a huge cost driver. IBM
and Fujitsu have added DFP to Z, POWER and SPARC, with minimal impact
of core size - a few percent at worst. I'm more familiar with IBM's
hardware, and they seem to have done several iterations where single
FPUs handle both IEEE binary and decimal FP, and on some versions of
Z, HFP as well.

So if Intel saw a purpose, I'm sure they could add it. And I suspect
it would be a lot less costly than something like TSX or AVX-512.

>There is nothing inherently valuable in decimal representation of
>numbers. The idea to build hardware which would guarantee limited exact
>arithmetics in the case determined by the number of our fingers seems a
>bit preposterous to me.

I mostly agree. There *are* places where the decimal nature of some
calculations need to be considered, notably those involving currency,
but those can be done in binary, provided reasonable support exists
for conversion between decimal formats and binary, and for decimal
scaling. And currency calculations are usually served better by fixed
point formats anyway.

bol...@cylonhq.com

unread,

May 1, 2018, 11:57:20 AM5/1/18

to

On Tue, 01 May 2018 13:57:09 +0300

Why would you even need to do that? I don't know much about the maths behind
floating point operations, but if the floating point was simply stored as 2
integers - 1 for the integer part, 1 for the fractional part - then any
floating point value up to the limits of each integer could be stored exactly
rather than the current system of using a mantissa and exponent.

Paavo Helde

unread,

May 1, 2018, 1:12:26 PM5/1/18

to

On 1.05.2018 18:57, bol...@cylonHQ.com wrote:
> On Tue, 01 May 2018 13:57:09 +0300
> Paavo Helde <myfir...@osa.pri.ee> wrote:
>> On 1.05.2018 11:49, bol...@cylonHQ.com wrote:
>>> On 30 Apr 2018 22:15:08 GMT
>>> r...@zedat.fu-berlin.de (Stefan Ram) wrote:
>>>> Manfred <non...@invalid.add> writes:
>>>>> The fact that comparing floating point values with == is inherently
>>>>> brittle is a *fact*, and there is nothing superstitious with it.
>>>>
>>>> The following program will print whether under the

>>>> implementation used 禄0.1 + 0.2芦 is equal to 禄0.3芦.
>>>>

>>>> main.cpp
>>>>
>>>> #include <iostream>
>>>> #include <ostream>
>>>>
>>>> int main()
>>>> { ::std::cout <<( 0.1 + 0.2 == 0.3 )<< '\n'; }
>>>>
>>>> transcript
>>>>
>>>> 0
>>>>
>>>> The value representation of floating-point types is
>>>> implementation-defined in C++, so the behavior of the above
>>>> program might be implementation-defined (if this is what you
>>>> call "brittle"). But the program will /reliably/ report,

>>>> whether under the implementation used 禄0.1 + 0.2芦 is 禄0.3芦.

>>>> So, 禄==芦 does exactly what it's job is.

>>>
>>> I can't help thinking that perhaps the compiler and/or FPU designers should
>>> make an effort to make this work.
>>
>> It is simple to make 0.1+0.2 to "work", one just needs to design and
>> build base 10 floating-point hardware, instead of base 2. It would
>
> Why would you even need to do that? I don't know much about the maths behind
> floating point operations, but if the floating point was simply stored as 2
> integers - 1 for the integer part, 1 for the fractional part - then any
> floating point value up to the limits of each integer could be stored exactly
> rather than the current system of using a mantissa and exponent.

It could, but then the dynamic range of the values would be limited to
e.g. from 1/2^32 to 2^32 (i.e. 2E-10 .. 4E+9), which is pretty narrow
compared to current double range 2E-308 .. 2E+308 (assuming 2x32-bit
integers here for a fair comparison with a 64-bit double). A lot of
actually needed calculations involve things like gigahertzes or
picometers which would not be easily representable/convertible in that
system.

OTOH, there are rational number libraries out there doing such things.
Hardware support would be probably faster, but I believe no one has
bothered to implement it. Things like pi or sin(1) would still be
represented inexactly, as well as seemingly innocent things like
100000/100001 + 100001/100000.

There is also
"https://en.wikipedia.org/wiki/Arbitrary-precision_arithmetic" but this
is ultimately software-based as one cannot (does not want to?) make
CPU/FPU registers very large.

james...@alumni.caltech.edu

unread,

May 1, 2018, 1:19:29 PM5/1/18

to

On Tuesday, May 1, 2018 at 4:49:23 AM UTC-4, bol...@cylonhq.com wrote:
> On 30 Apr 2018 22:15:08 GMT
> r...@zedat.fu-berlin.de (Stefan Ram) wrote:
> >Manfred <non...@invalid.add> writes:
> >>The fact that comparing floating point values with == is inherently
> >>brittle is a *fact*, and there is nothing superstitious with it.
> >
> > The following program will print whether under the
> > implementation used »0.1 + 0.2« is equal to »0.3«.
> >
> > main.cpp
> >
> >#include <iostream>
> >#include <ostream>
> >
> >int main()
> >{ ::std::cout <<( 0.1 + 0.2 == 0.3 )<< '\n'; }
> >
> > transcript
> >
> >0
> >
> > The value representation of floating-point types is
> > implementation-defined in C++, so the behavior of the above
> > program might be implementation-defined (if this is what you
> > call "brittle"). But the program will /reliably/ report,
> > whether under the implementation used »0.1 + 0.2« is »0.3«.
> > So, »==« does exactly what it's job is.
>
> I can't help thinking that perhaps the compiler and/or FPU designers should
> make an effort to make this work. After all, if a computer can't reliably
> compare numbers then that rather goes against its whole raison d'etre.

Unreliable comparisons are the subject of this thread, but a conforming
implementation of C++ is not allowed to implement comparisons
unreliably. The key problem isn't unreliable comparisons, but the fact
that any type with a finite size can only represent exactly a finite set
of values. And between any two different consecutive representable
values lie a continuous infinity of values that could be the
mathematically correct result of calculations involving representable
values. How would you have them deal with that fact, so as to avoid this
problem?
IEEE 754 requires that most operations produce the same result as if
calculated with infinite precision, and then rounded to the nearest
representable value. I don't see how you could request more than that,
and even with that specification you still have comparisons such as
these failing: the representable number that is closest to 0.1, when
added to the representable number that is closest to 0.2, has a
mathematically exact result that is not necessarily closest to the same
representable value that 0.3 is closest to.
By using fixed-point math or decimal floating point, you can make
certain values representable, with the result that particular
comparisons become exact. However, all you're doing by that approach is
rearranging the deck chairs on the Titanic. There will always be
infinitely times as many unrepresentable values as representable ones,
and that fact will always result in some comparisons failing that
should, mathematically, have been true (and vice versa).

bol...@cylonhq.com

unread,

May 2, 2018, 7:25:16 AM5/2/18

to

On Tue, 01 May 2018 20:12:14 +0300
Paavo Helde <myfir...@osa.pri.ee> wrote:

>On 1.05.2018 18:57, bol...@cylonHQ.com wrote:
>> Why would you even need to do that? I don't know much about the maths behind
>> floating point operations, but if the floating point was simply stored as 2
>> integers - 1 for the integer part, 1 for the fractional part - then any
>> floating point value up to the limits of each integer could be stored exactly
>> rather than the current system of using a mantissa and exponent.
>
>It could, but then the dynamic range of the values would be limited to
>e.g. from 1/2^32 to 2^32 (i.e. 2E-10 .. 4E+9), which is pretty narrow
>compared to current double range 2E-308 .. 2E+308 (assuming 2x32-bit
>integers here for a fair comparison with a 64-bit double). A lot of

The current range might be large, but it has huge gaps in it where the
number simply can't be represented accurately and towards the top and
bottom of the range the current system is virtually useless.

>actually needed calculations involve things like gigahertzes or
>picometers which would not be easily representable/convertible in that
>system.

Probably not, OTOH how often do such small numbers get used in most
software? The GNU GMP library handles arbitrarily small values already and
if you really want floating point accuracy you'd use that anyway.

>bothered to implement it. Things like pi or sin(1) would still be
>represented inexactly, as well as seemingly innocent things like
>100000/100001 + 100001/100000.

Pi can't be represented accurately on any machine without an infinite
register size anyway. And plenty of division operations will exceed current
FPU limits. However I think trading apparent range for mathematical
accuracy in the CPU would be worth it.

bol...@cylonhq.com

unread,

May 2, 2018, 7:28:29 AM5/2/18

to

On Tue, 1 May 2018 10:19:17 -0700 (PDT)
james...@alumni.caltech.edu wrote:
>By using fixed-point math or decimal floating point, you can make
>certain values representable, with the result that particular
>comparisons become exact. However, all you're doing by that approach is
>rearranging the deck chairs on the Titanic. There will always be
>infinitely times as many unrepresentable values as representable ones,
>and that fact will always result in some comparisons failing that
>should, mathematically, have been true (and vice versa).

Yes, but at least with fixed point you know what the limits are and that
within those you will always get accurate results. With the current system
you never know whether a == comparison will work or not which means that you
can never use it if you want consistent code behaviour.

james...@alumni.caltech.edu

unread,

May 2, 2018, 9:22:42 AM5/2/18

to

On Wednesday, May 2, 2018 at 7:25:16 AM UTC-4, bol...@cylonhq.com wrote:
> On Tue, 01 May 2018 20:12:14 +0300
> Paavo Helde <myfir...@osa.pri.ee> wrote:
> >On 1.05.2018 18:57, bol...@cylonHQ.com wrote:
> >> Why would you even need to do that? I don't know much about the maths behind
> >> floating point operations, but if the floating point was simply stored as 2
> >> integers - 1 for the integer part, 1 for the fractional part - then any
> >> floating point value up to the limits of each integer could be stored exactly
> >> rather than the current system of using a mantissa and exponent.
> >
> >It could, but then the dynamic range of the values would be limited to
> >e.g. from 1/2^32 to 2^32 (i.e. 2E-10 .. 4E+9), which is pretty narrow
> >compared to current double range 2E-308 .. 2E+308 (assuming 2x32-bit
> >integers here for a fair comparison with a 64-bit double). A lot of
>
> The current range might be large, but it has huge gaps in it where the
> number simply can't be represented accurately

Huge gaps? Throughout the entire range between DBL_MIN and DBL_MAX, the
gaps bracketing x are never larger than x*DBL_EPSILON, and can be
smaller than that by a factor as large as FLT_RADIX. DBL_EPSILON is
defined in <cfloat>, with specifications incorporated by reference from
the C standard (C+ 21.3.6p1), which requires that DBL_EPSILON be no
larger than 1E-9 (C 5.2.4.2.2p13), and if
std::numeric_limits<double>::is_iec559(), then DBL_EPSILON would be
2.2204460492503131E-16. Those are pretty small gaps, as far as I'm
concerned. The fact that they get larger for larger values of x matches
the way numbers are typically used in real life: less absolute precision
is needed when working with large numbers than with small ones; the
relative precision needed tends to be roughly constant over the entire
range of representable numbers.

> ... and towards the top and

> bottom of the range the current system is virtually useless.

Yes - and with floating point representations, unlike the one you're
proposing, the top and bottom of the range are well outside the range of
normal use. The top and bottom of the range of your system both fall
well within the range of many ordinary scientific and engineering
calculations - it doesn't have enough precision to handle calculations
with very small numbers well, and it overflows too easily for
calculations involving very large numbers.

> >actually needed calculations involve things like gigahertzes or
> >picometers which would not be easily representable/convertible in that
> >system.
>
> Probably not, OTOH how often do such small numbers get used in most
> software? The GNU GMP library handles arbitrarily small values already and
> if you really want floating point accuracy you'd use that anyway.

No, I wouldn't - floating point is much faster than GMP, and can handle
such cases with accuracy that is more than sufficient for typical uses
of such numbers.

> >bothered to implement it. Things like pi or sin(1) would still be
> >represented inexactly, as well as seemingly innocent things like
> >100000/100001 + 100001/100000.
>
> Pi can't be represented accurately on any machine without an infinite
> register size anyway. And plenty of division operations will exceed current
> FPU limits. However I think trading apparent range for mathematical
> accuracy in the CPU would be worth it.

The fact that you hold that belief suggests that you don't do number-
crunching for a living. Those who do tend to have a strong preference
for constant relative precision over a large range, rather than a
constant absolute precision over a very limited range. That's the reason
why floating point representations are popular.

james...@alumni.caltech.edu

unread,

May 2, 2018, 9:44:30 AM5/2/18

to

> you never know whether a == comparison will work or not ...

You might not know - but I do, and so do most people who do serious
number crunching for a living. The answer, of course, is that it almost
never makes sense to compare floating point values for exact equality,
for reasons that have more to do with the finite precision of
measurements than it does with the finite precision of floating point
representations. The main exceptions are flag values (with the case
under discussion in this thread being a prime example).

Fixed point and decimal floating point can make a lot of sense when most
of the numbers you're working with can be represented exactly as decimal
fractions, and have a small fixed maximum number of digits after the
decimal point - a situation that applies to many financial calculations.

However, fixed point has too limited a range, and too little precision
when working with small numbers, to be useful in most contexts where
you're doing calculations based upon measurements of physical quantities,
as is commonplace in scientific and engineering applications. Decimal
floating point has no advantage over binary floating point in such
contexts, and will generally make (marginally) less efficient use of
memory and/or time than binary floating point.

Paavo Helde

unread,

May 2, 2018, 3:01:28 PM5/2/18

to

Ah, good to see you are starting to understand the matters. Make that
last sentence "you can almost never use it" and I believe most people
would agree.

You know, there are a lot of things in C++ which one should almost never
use, starting from trigraphs and strtok() and up to std::list and
multiple inheritance. Adding a floating-point == comparison to this list
is no big deal.

Manfred

unread,

May 2, 2018, 5:53:35 PM5/2/18

to

On 5/1/2018 10:49 AM, bol...@cylonHQ.com wrote:
> On 30 Apr 2018 22:15:08 GMT
> r...@zedat.fu-berlin.de (Stefan Ram) wrote:
>> Manfred <non...@invalid.add> writes:
>>> The fact that comparing floating point values with == is inherently
>>> brittle is a *fact*, and there is nothing superstitious with it.
>>
>> The following program will print whether under the

>> implementation used Â»0.1 + 0.2Â« is equal to Â»0.3Â«.

>>
>> main.cpp
>>
>> #include <iostream>
>> #include <ostream>
>>
>> int main()
>> { ::std::cout <<( 0.1 + 0.2 == 0.3 )<< '\n'; }
>>
>> transcript
>>
>> 0
>>

[...]

>
> I can't help thinking that perhaps the compiler and/or FPU designers should
> make an effort to make this work. After all, if a computer can't reliably
> compare numbers then that rather goes against its whole raison d'etre. Yes
> I know there are ways around it but frankly one shouldn't have to bugger about
> doing conversions just to carry out such a basic operation.
>

This obvious for integer arithmetic, but for floating point math it is a
totally different matter.

This behavior is the direct consequence of the finiteness of binary
number representation, combined with floating point technology, which is
one of the major features of computers.
Besides, this is in fact a non-problem in the main application field
where floating point math is required, which is scientific/engineering
computing where FP numbers typically represent physical quantities.
An other major computing application field is finance, but, as someone
else correctly pointed out, currencies are better handled in fixed point
anyway (and yet elsethread someone reported that IBM saw a business
opportunity in decimal number representation).

In physics and engineering the following makes no sense:
if (this_apple weighs /exactly/ 0.2kg) then { do something; }

This is to say that floating point arithmetic is not designed to yield
/exact/ results, but this is not a problem for the application domains
for which it is targeted.
A consequence of such approximate computing is that FP math requires
some specific skills.

From the gcc manpage:
-Wfloat-equal
Warn if floating-point values are used in equality
comparisons.

The idea behind this is that sometimes it is convenient (for
the programmer) to consider floating-point values as
approximations to infinitely precise real numbers. If you
are doing this, then you need to compute (by analyzing the
code, or in some other way) the maximum or likely maximum
error that the computation introduces, and allow for it when
performing comparisons (and when producing output, but that's
a different problem). In particular, instead of testing for
equality, you should check to see whether the two values have
ranges that overlap; and this is done with the relational
operators, so equality comparisons are probably mistaken.

bol...@cylonhq.com

unread,

May 3, 2018, 4:51:30 AM5/3/18

to

On Wed, 2 May 2018 06:22:29 -0700 (PDT)
james...@alumni.caltech.edu wrote:
>Yes - and with floating point representations, unlike the one you're
>proposing, the top and bottom of the range are well outside the range of
>normal use. The top and bottom of the range of your system both fall
>well within the range of many ordinary scientific and engineering
>calculations - it doesn't have enough precision to handle calculations

Do stop going on about scientific use FFS. Thats probably a fraction of a
percent of the applications C/C++ gets used for and anyone serious about coding
in those arenas would use matlab or fortran anyway with python for odd jobs.
As for engineering - it doesn't generally require values down to 20 decimal
places, however it DOES require accurate comparisons.

FWIW I worked in fintech for years, and having to use integers instead of
floating point values in order to be able to do accurate comparisons (rather
important with monetary values don't you think?) was a PITA and I suspect the
amount of C/C++ code running in fintech is an order of magnitude more than for
science and engineering.

>with very small numbers well, and it overflows too easily for
>calculations involving very large numbers.

If 64 bits isn't enough to represent your fractional values then perhaps you
need to take a look at the problem again.

>> Pi can't be represented accurately on any machine without an infinite
>> register size anyway. And plenty of division operations will exceed current
>> FPU limits. However I think trading apparent range for mathematical
>> accuracy in the CPU would be worth it.
>
>The fact that you hold that belief suggests that you don't do number-
>crunching for a living. Those who do tend to have a strong preference

Well that depends on what number crunching doesn't it. You think banks don't
so much of it? You think they use floats?

>constant absolute precision over a very limited range. That's the reason
>why floating point representations are popular.

No, its because the current system was invented by scientists and is now the
standard regardless of its suitability for other areas.

james...@alumni.caltech.edu

unread,

May 3, 2018, 3:46:46 PM5/3/18

to

On Thursday, May 3, 2018 at 4:51:30 AM UTC-4, bol...@cylonhq.com wrote:
> On Wed, 2 May 2018 06:22:29 -0700 (PDT)
> james...@alumni.caltech.edu wrote:
> >Yes - and with floating point representations, unlike the one you're
> >proposing, the top and bottom of the range are well outside the range of
> >normal use. The top and bottom of the range of your system both fall
> >well within the range of many ordinary scientific and engineering
> >calculations - it doesn't have enough precision to handle calculations
>
> Do stop going on about scientific use FFS. Thats probably a fraction of a
> percent of the applications C/C++ gets used for and anyone serious about coding
> in those arenas would use matlab or fortran anyway with python for odd jobs.

Most of the programming I've been doing since 1980 has been scientific
programming, and most of it has been done in C. I'm very well aware that
there's a lot of people out there doing programming that's very
different from the kind that I do. Are you properly aware of the fact
that there's a lot of people out there doing programming that is very
different from the kind you do?

> As for engineering - it doesn't generally require values down to 20 decimal

It's hard to represent really small quantities with acceptable accuracy
using only 20 decimal places - for any number less than 1e-20 (and
engineering often involves numbers that small or smaller), it's
impossible. What does matter is the number of significant digits.
Double precision only has about 14 significant digits. While
calculations requiring all of those significant digits are rare,
calculations for which single precision is inadequate are pretty common.
Matrix operations, in particular, involve so many multiplies and adds
that single precision roundoff errors would make the results unusable
for even relatively small matrices, such as 10x10.

> places, however it DOES require accurate comparisons.

Actually, it usually doesn't require accurate comparisons, not in the
sense of exact floating point equality comparisons. A large fraction of
all engineering calculations are based upon measurements with a finite
accuracy. You might have two different calculated quantities that
should, theoretically, be equal - but if they were derived from
different sets of measurements, you'll generally find that they don't
compare exactly equal, and no experienced engineer would write code that
depended upon them comparing exactly.

> FWIW I worked in fintech for years, and having to use integers instead of
> floating point values in order to be able to do accurate comparisons (rather
> important with monetary values don't you think?)

Did you miss the paragraph where I acknowledged the appropriateness of
using fixed-point or decimal floating point in financial context? What
you were doing with integers is essentially equivalent to fixed-point
math, but more complicated than would be possible with language-level
support for fixed-point math.

> ... was a PITA and I suspect the

> amount of C/C++ code running in fintech is an order of magnitude more than for
> science and engineering.

I have no idea how to locate solid data that would either support or
contradict that claim. Do you? However, I suspect that it reflects a
lack of familiarity with the scientific and engineering communities on
your part. Some of the biggest and most powerful machines in the world
crunch numbers 24 hours a day to perform tasks like weather prediction
and quantum field theory calculations, and a lot of that code is written
in C nowadays.

> >with very small numbers well, and it overflows too easily for
> >calculations involving very large numbers.
>
> If 64 bits isn't enough to represent your fractional values then perhaps you
> need to take a look at the problem again.

A 64 bit floating point type has all the precision I need for most of
the work I do. However, a 64 bit fixed point format like the one you
advocated wouldn't even come close to being adequate.

> >> Pi can't be represented accurately on any machine without an infinite
> >> register size anyway. And plenty of division operations will exceed current
> >> FPU limits. However I think trading apparent range for mathematical
> >> accuracy in the CPU would be worth it.
> >
> >The fact that you hold that belief suggests that you don't do number-
> >crunching for a living. Those who do tend to have a strong preference
>
> Well that depends on what number crunching doesn't it. You think banks don't
> so much of it? You think they use floats?

No, the banks don't do very much of it. Creating all of the financial
reports needed by anyone in the world on all of the financial
transactions performed annually world wide requires a much smaller
number of mathematical operations than a single day's worth of weather
forecasting simulations.

> >constant absolute precision over a very limited range. That's the reason
> >why floating point representations are popular.
>
> No, its because the current system was invented by scientists and is now the
> standard regardless of its suitability for other areas.

That may have been true of Fortran - after all the name comes from
"Formula Translation". However, while K&R could be accurately described
as scientists, that's only because they specialized in computer science.
Which is why integer math plays a fundamental role in C, while floating
point math is more peripheral. Many early C compilers had floating point
support turned off by default, which is the historical reason why you
need to use the -lm option on many compilers to load in the math
library. It's the only part of the C standard library that is treated
that way by most implementations.
If the language had been designed by physical scientists, it would have
had complex math from the beginning, rather than waiting until C99.

bol...@cylonhq.com

unread,

May 4, 2018, 4:34:36 AM5/4/18

to

On Thu, 3 May 2018 12:46:30 -0700 (PDT)
james...@alumni.caltech.edu wrote:
>On Thursday, May 3, 2018 at 4:51:30 AM UTC-4, bol...@cylonhq.com wrote:
>> As for engineering - it doesn't generally require values down to 20 decimal
>
>It's hard to represent really small quantities with acceptable accuracy
>using only 20 decimal places - for any number less than 1e-20 (and
>engineering often involves numbers that small or smaller), it's

Does it? Give a real world engineering example that requires accuracy to
20 decimal places. I suspect even space probe navigation isn't that accurate.

>Matrix operations, in particular, involve so many multiplies and adds
>that single precision roundoff errors would make the results unusable
>for even relatively small matrices, such as 10x10.

If you're going to suggest that floating point somehow alleviates the problem
of chaos in calculations that would otherwise occur with fixed point then
I think you're on to a bit of a loser.

>Did you miss the paragraph where I acknowledged the appropriateness of
>using fixed-point or decimal floating point in financial context? What
>you were doing with integers is essentially equivalent to fixed-point
>math, but more complicated than would be possible with language-level
>support for fixed-point math.

And thats my whole point about having fixed point floats being native to
a language instead of having to do calculations using integers.

>contradict that claim. Do you? However, I suspect that it reflects a
>lack of familiarity with the scientific and engineering communities on
>your part. Some of the biggest and most powerful machines in the world

I'll grant you that I don't work in that sphere, but I do know that C & C++
are a long way from being the most popular languages there.

>crunch numbers 24 hours a day to perform tasks like weather prediction
>and quantum field theory calculations, and a lot of that code is written
>in C nowadays.

And a lot isn't.

>> If 64 bits isn't enough to represent your fractional values then perhaps you
>> need to take a look at the problem again.
>
>A 64 bit floating point type has all the precision I need for most of
>the work I do. However, a 64 bit fixed point format like the one you
>advocated wouldn't even come close to being adequate.

So a 64 bit int + 64 bit fractional part isn't enough for you? Wtf are you
calculating? A 64 bit u_long has a max of 18446744073709551615 which is 20
digits , of which 19 would be usable as a fractional part. Are you seriously
suggesting 19 decimal places isn't enough for your job??

>> Well that depends on what number crunching doesn't it. You think banks don't
>> so much of it? You think they use floats?
>
>No, the banks don't do very much of it. Creating all of the financial
>reports needed by anyone in the world on all of the financial

Financial reports?? Thats the tip of the iceberg my friend. How the hell do you
think automated trading works not to mention all the other investment
instrument calculations done all the time?

>transactions performed annually world wide requires a much smaller
>number of mathematical operations than a single day's worth of weather
>forecasting simulations.

Utter BS.

>> No, its because the current system was invented by scientists and is now the
>> standard regardless of its suitability for other areas.
>
>That may have been true of Fortran - after all the name comes from
>"Formula Translation". However, while K&R could be accurately described
>as scientists, that's only because they specialized in computer science.
>Which is why integer math plays a fundamental role in C, while floating
>point math is more peripheral. Many early C compilers had floating point
>support turned off by default, which is the historical reason why you
>need to use the -lm option on many compilers to load in the math
>library. It's the only part of the C standard library that is treated
>that way by most implementations.
>If the language had been designed by physical scientists, it would have
>had complex math from the beginning, rather than waiting until C99.

I wasn't talking about C per se, I was talking about the standard representation
of floating point numbers.

Robert Wessel

unread,

May 4, 2018, 12:24:54 PM5/4/18

to

On Fri, 4 May 2018 08:34:18 +0000 (UTC), bol...@cylonHQ.com wrote:

>On Thu, 3 May 2018 12:46:30 -0700 (PDT)
>james...@alumni.caltech.edu wrote:
>>On Thursday, May 3, 2018 at 4:51:30 AM UTC-4, bol...@cylonhq.com wrote:
>>> As for engineering - it doesn't generally require values down to 20 decimal
>>
>>It's hard to represent really small quantities with acceptable accuracy
>>using only 20 decimal places - for any number less than 1e-20 (and
>>engineering often involves numbers that small or smaller), it's
>
>Does it? Give a real world engineering example that requires accuracy to
>20 decimal places. I suspect even space probe navigation isn't that accurate.

...

>>> If 64 bits isn't enough to represent your fractional values then perhaps you
>>> need to take a look at the problem again.
>>
>>A 64 bit floating point type has all the precision I need for most of
>>the work I do. However, a 64 bit fixed point format like the one you
>>advocated wouldn't even come close to being adequate.
>
>So a 64 bit int + 64 bit fractional part isn't enough for you? Wtf are you
>calculating? A 64 bit u_long has a max of 18446744073709551615 which is 20
>digits , of which 19 would be usable as a fractional part. Are you seriously
>suggesting 19 decimal places isn't enough for your job??

He didn't say that. He said that having to deal with inputs that are
20 orders of magnitude apart is common. The precision provided by ~64
bits of whatever format is usually adequate, but the range is not.

Dombo

unread,

May 4, 2018, 2:31:20 PM5/4/18

to

Op 04-May-18 om 10:34 schreef bol...@cylonHQ.com:

> On Thu, 3 May 2018 12:46:30 -0700 (PDT)
> james...@alumni.caltech.edu wrote:
>> On Thursday, May 3, 2018 at 4:51:30 AM UTC-4, bol...@cylonhq.com wrote:
>>> As for engineering - it doesn't generally require values down to 20 decimal
>>
>> It's hard to represent really small quantities with acceptable accuracy
>> using only 20 decimal places - for any number less than 1e-20 (and
>> engineering often involves numbers that small or smaller), it's
>
> Does it? Give a real world engineering example that requires accuracy to
> 20 decimal places. I suspect even space probe navigation isn't that accurate.

The reason for using floating point is to deal with a large dynamic
range, without having to sacrifice relative accuracy at one end of the
range.

<snip>

>>> If 64 bits isn't enough to represent your fractional values then perhaps you
>>> need to take a look at the problem again.
>>
>> A 64 bit floating point type has all the precision I need for most of
>> the work I do. However, a 64 bit fixed point format like the one you
>> advocated wouldn't even come close to being adequate.
>
> So a 64 bit int + 64 bit fractional part isn't enough for you? Wtf are you
> calculating? A 64 bit u_long has a max of 18446744073709551615 which is 20
> digits , of which 19 would be usable as a fractional part. Are you seriously
> suggesting 19 decimal places isn't enough for your job??

Here is a piece of code used in a real product for you:

const double h = 6.62607004e-34;
const double e = 1.60217662e-19;
const double m = 9.10938356e-31;
const double c = 2.99792458e8;

return h / std::sqrt(e * voltage * m * (e * voltage / (m * c*c) + 2.0));

I leave it as an exercise for you to figure out how many bits would
required to do the same thing with fixed point numbers without loosing
accuracy compared to the floating point implementation.

Christian Gollwitzer

unread,

May 5, 2018, 1:40:45 AM5/5/18

to

Am 04.05.18 um 10:34 schrieb bol...@cylonHQ.com:

>> Did you miss the paragraph where I acknowledged the appropriateness of
>> using fixed-point or decimal floating point in financial context? What
>> you were doing with integers is essentially equivalent to fixed-point
>> math, but more complicated than would be possible with language-level
>> support for fixed-point math.
>
> And thats my whole point about having fixed point floats being native to
> a language instead of having to do calculations using integers.

You keep repeating this, but I don't understand why you need fixed point
"native" to the langugage? In C++, surely you wouldn't do the
application logic using explicit integers, but write your own
fixed-point class and use that. I.e., instead of

int64_t euro = 1000; // have to remember that there is a scale of 1000
int64_t my_balance = 10860;
my_balance += 30*euro;
std::cout << "My balance has now "<< my_balance / euro << "." <<
setfill('0')<< setw(3)<< my_balance % euro;

you'd do

#include "fpmath.hpp"

fp64 my_balance = "10.86";
my_balance += 30;
std::cout << "My balance has now"<< my_balance <<"€";

where fpmath.hpp is written once and forever and used throughout the
program.

Christian

Rosario19

unread,

May 5, 2018, 5:55:22 AM5/5/18

to

On Fri, 4 May 2018 20:30:37 +0200, Dombo wrote:
>Op 04-May-18 om 10:34 schreef boltar:

your forgot to say:

1) the range are considered correct result or something reflect that
2) the value of voltage or more value of voltage
3) the C++ compiler result

in Axiom with 200 digits precision, for a voltage=2.3 result for

f(voltage)==h/sqrt(e* voltage *m*(e*voltage/(m*c*c)+2.0))

(19) -> f 2.3
Compiling function f with type Float -> Float
(19)
0.8086804205 7457036769 1013364305 2562414446 4716385376 9984367043
5205653260 0350183091 3518975795 1432982356 8746726538 0519062259
4038078110 9262350697 2465422215 3892230230 3051688329 3822066680
4248641341 6150608215 E -9

because c is 2.99e8[c*c is not better... 1e16 seems to me] and not
somethin as 6.6e-20 as others i would say that the C++ ieee float
point result: ***could be meaningless*** or effected error in many
digits

i'm for fixed poing float

bol...@cylonhq.com

unread,

May 7, 2018, 4:31:42 AM5/7/18

to

On Fri, 4 May 2018 20:30:37 +0200

Dombo <do...@disposable.invalid> wrote:
>Op 04-May-18 om 10:34 schreef bol...@cylonHQ.com:
>> On Thu, 3 May 2018 12:46:30 -0700 (PDT)
>> james...@alumni.caltech.edu wrote:
>>> On Thursday, May 3, 2018 at 4:51:30 AM UTC-4, bol...@cylonhq.com wrote:
>>>> As for engineering - it doesn't generally require values down to 20 decimal
>>>
>>> It's hard to represent really small quantities with acceptable accuracy
>>> using only 20 decimal places - for any number less than 1e-20 (and
>>> engineering often involves numbers that small or smaller), it's
>>
>> Does it? Give a real world engineering example that requires accuracy to
>> 20 decimal places. I suspect even space probe navigation isn't that accurate.
>
>The reason for using floating point is to deal with a large dynamic
>range, without having to sacrifice relative accuracy at one end of the
>range.

Look at those goalposts shift!

>> So a 64 bit int + 64 bit fractional part isn't enough for you? Wtf are you
>> calculating? A 64 bit u_long has a max of 18446744073709551615 which is 20
>> digits , of which 19 would be usable as a fractional part. Are you seriously
>> suggesting 19 decimal places isn't enough for your job??
>
>Here is a piece of code used in a real product for you:
>
>const double h = 6.62607004e-34;
>const double e = 1.60217662e-19;
>const double m = 9.10938356e-31;
>const double c = 2.99792458e8;
>
>return h / std::sqrt(e * voltage * m * (e * voltage / (m * c*c) + 2.0));

And the result of that would be accurate would it? I hope its not code for any
safety related control system.

>I leave it as an exercise for you to figure out how many bits would
>required to do the same thing with fixed point numbers without loosing
>accuracy compared to the floating point implementation.

I wasn't proposing either/or, a language could have both types.

bol...@cylonhq.com

unread,

May 7, 2018, 4:32:21 AM5/7/18

to

On Sat, 5 May 2018 07:40:28 +0200
Christian Gollwitzer <auri...@gmx.de> wrote:
>Am 04.05.18 um 10:34 schrieb bol...@cylonHQ.com:
>>> Did you miss the paragraph where I acknowledged the appropriateness of
>>> using fixed-point or decimal floating point in financial context? What
>>> you were doing with integers is essentially equivalent to fixed-point
>>> math, but more complicated than would be possible with language-level
>>> support for fixed-point math.
>>
>> And thats my whole point about having fixed point floats being native to
>> a language instead of having to do calculations using integers.
>
>You keep repeating this, but I don't understand why you need fixed point
>"native" to the langugage? In C++, surely you wouldn't do the

Why not have it?

>application logic using explicit integers, but write your own
>fixed-point class and use that. I.e., instead of

I was talking about C too. Good luck writing a number class in that.

Robert Wessel

unread,

May 7, 2018, 1:38:33 PM5/7/18

to

Why wouldn't it be? Assuming a suitable range of values for voltage
(within a few orders of magnitude of 10**-4), the various constants
being correct, and the formula being correct. Performing
multiplicative* operations on FP numbers of wildly differing
magnitudes does not lose accuracy. Additive operations *do*, which is
where the range limit mentioned for voltage comes from.

*Multiplication, division and square root in this case

Christian Gollwitzer

unread,

May 7, 2018, 1:46:03 PM5/7/18

to

Am 07.05.18 um 10:32 schrieb bol...@cylonHQ.com:

> On Sat, 5 May 2018 07:40:28 +0200
> Christian Gollwitzer <auri...@gmx.de> wrote:
>> Am 04.05.18 um 10:34 schrieb bol...@cylonHQ.com:
>>>> Did you miss the paragraph where I acknowledged the appropriateness of
>>>> using fixed-point or decimal floating point in financial context? What
>>>> you were doing with integers is essentially equivalent to fixed-point
>>>> math, but more complicated than would be possible with language-level
>>>> support for fixed-point math.
>>>
>>> And thats my whole point about having fixed point floats being native to
>>> a language instead of having to do calculations using integers.
>>
>> You keep repeating this, but I don't understand why you need fixed point
>> "native" to the langugage? In C++, surely you wouldn't do the
>
> Why not have it?

Because you can easily do it yourself, where "yourself" can be a 3rd
party library. Let's see:

https://app.cear.ufpb.br/~lucas.hartmann/2015/08/27/easy-fixed-point-math-with-c/
https://gist.github.com/dflemstr/294959/aa90ff5b1a66b45b9edb30a432a66f8383d368e6
https://embeddedartistry.com/blog/2017/8/25/c11-fixed-point-arithemetic-library
-> https://github.com/johnmcfarlane/fixed_point

As it seems, there is a number of options to choose from if you don't
want to reinvent the wheel.

>> application logic using explicit integers, but write your own
>> fixed-point class and use that. I.e., instead of
>
> I was talking about C too. Good luck writing a number class in that.
>

There is an easy solution, get a C++ compiler. They are free now.

Christian

bol...@cylonhq.com

unread,

May 8, 2018, 4:50:00 AM5/8/18

to

On Mon, 7 May 2018 19:45:42 +0200
Christian Gollwitzer <auri...@gmx.de> wrote:
>Am 07.05.18 um 10:32 schrieb bol...@cylonHQ.com:

>> Why not have it?
>
>Because you can easily do it yourself, where "yourself" can be a 3rd
>party library. Let's see:

No! Really?? Wow, who knew?

Congratulations on completely missing the point.

>> I was talking about C too. Good luck writing a number class in that.
>>
>
>There is an easy solution, get a C++ compiler. They are free now.

No, really?? Wow, who knew?

Congra... etc etc

bol...@cylonhq.com

unread,

May 8, 2018, 4:51:50 AM5/8/18

to

Safety systems like exact values, not "within a reasonable distance of what
we want" values. Get the flaps angle on an aircraft a fraction of a degree out
and you could find yourself in a stall or a dive PDQ. But hey, its close
enough, right?

Reinhardt Behm

unread,

May 8, 2018, 4:59:24 AM5/8/18

to

You really expect an physical quantities to be exact to 10^-7?
When the pilot sets the flaps he does this at best to about +/-10 degrees.

You should better talk about things you know something about.

--
Reinhardt

David Brown

unread,

May 8, 2018, 5:34:39 AM5/8/18

to

Close enough is close enough - /that/ is what you aim for. The people
who think you should be able to get exact results, or that you should
even consider aiming for "as good as possible", are mostly found as
hated PHB's. They are certainly not engineers.

bol...@cylonhq.com

unread,

May 8, 2018, 5:53:43 AM5/8/18

to

On Tue, 08 May 2018 16:59:14 +0800
Reinhardt Behm <rb...@hushmail.com> wrote:
>AT Tuesday 08 May 2018 16:51, bol...@cylonHQ.com wrote:
>
>> On Mon, 07 May 2018 12:38:13 -0500
>> Robert Wessel <robert...@yahoo.com> wrote:
>>>On Mon, 7 May 2018 08:31:31 +0000 (UTC), bol...@cylonHQ.com wrote:
>>>>>return h / std::sqrt(e * voltage * m * (e * voltage / (m * c*c) + 2.0));
>>>>
>>>>And the result of that would be accurate would it? I hope its not code
>>>>for any safety related control system.
>>>
>>>
>>>Why wouldn't it be? Assuming a suitable range of values for voltage
>>>(within a few orders of magnitude of 10**-4), the various constants
>>>being correct, and the formula being correct. Performing
>>>multiplicative* operations on FP numbers of wildly differing
>>>magnitudes does not lose accuracy. Additive operations *do*, which is
>>>where the range limit mentioned for voltage comes from.
>>
>> Safety systems like exact values, not "within a reasonable distance of
>> what we want" values. Get the flaps angle on an aircraft a fraction of a
>> degree out and you could find yourself in a stall or a dive PDQ. But hey,
>> its close enough, right?
>
>You really expect an physical quantities to be exact to 10^-7?

No, but inaccuracies with floating point calcs soon add up.

>When the pilot sets the flaps he does this at best to about +/-10 degrees.

10 degrees?? Are you serious??

>You should better talk about things you know something about.

You should take your own advice mate!

bol...@cylonhq.com

unread,

May 8, 2018, 5:58:14 AM5/8/18

to

Don't ever get a job in aerospace if you think aiming for as good as possible
isn't something an engineer should be ever bothered with. Stick to designing
lamp posts or washing machines or whetever the don't-give-a-toss industry it is
you work in.

David Brown

unread,

May 8, 2018, 6:14:34 AM5/8/18

to

I have done safety work. Not for aerospace - but other safety work.
And a key point is to avoid over-engineering. If 0.1 degree accuracy is
good enough for your aeroplane flaps, then making a system for 0.01
degree accuracy means a system that is bigger, more complex, and with a
higher risk of failure. And someone who thinks that you should get it
/exactly/ right should not be involved in the process at all. "Exact"
is fine for mathematical number theory - it has no place in physical
engineering.

When prototyping and researching, then of course you will try to see how
good you can get it - that will allow more possibilities in the design,
or for future systems with tighter requirements. And of course you do
not make things unnecessarily inexact.

But your target for your design is to be within the specifications
needed - if there were any point in going beyond that, the
specifications are the problem, not the design.

Simplicity, reliability, testability, repeatability - these are all far
more important than getting a little closer to "perfect".

David Brown

unread,

May 8, 2018, 6:17:55 AM5/8/18

to

On 08/05/18 11:53, bol...@cylonHQ.com wrote:
> On Tue, 08 May 2018 16:59:14 +0800
> Reinhardt Behm <rb...@hushmail.com> wrote:
>> AT Tuesday 08 May 2018 16:51, bol...@cylonHQ.com wrote:
>>
>>> On Mon, 07 May 2018 12:38:13 -0500
>>> Robert Wessel <robert...@yahoo.com> wrote:
>>>> On Mon, 7 May 2018 08:31:31 +0000 (UTC), bol...@cylonHQ.com wrote:
>>>>>> return h / std::sqrt(e * voltage * m * (e * voltage / (m * c*c) + 2.0));
>>>>>
>>>>> And the result of that would be accurate would it? I hope its not code
>>>>> for any safety related control system.
>>>>
>>>>
>>>> Why wouldn't it be? Assuming a suitable range of values for voltage
>>>> (within a few orders of magnitude of 10**-4), the various constants
>>>> being correct, and the formula being correct. Performing
>>>> multiplicative* operations on FP numbers of wildly differing
>>>> magnitudes does not lose accuracy. Additive operations *do*, which is
>>>> where the range limit mentioned for voltage comes from.
>>>
>>> Safety systems like exact values, not "within a reasonable distance of
>>> what we want" values. Get the flaps angle on an aircraft a fraction of a
>>> degree out and you could find yourself in a stall or a dive PDQ. But hey,
>>> its close enough, right?
>>
>> You really expect an physical quantities to be exact to 10^-7?
>
> No, but inaccuracies with floating point calcs soon add up.

And that is why you have to /understand/ the inaccuracies and how they
build up - so that you know the final answer is good enough to use.
"Just do it exactly" or "as accurately as possible" are cover-ups for a
failure to understand the situation and the mathematics.

Reinhardt Behm

unread,

May 8, 2018, 9:20:39 AM5/8/18

to

AT Tuesday 08 May 2018 17:53, bol...@cylonHQ.com wrote:

> On Tue, 08 May 2018 16:59:14 +0800
> Reinhardt Behm <rb...@hushmail.com> wrote:
>>AT Tuesday 08 May 2018 16:51, bol...@cylonHQ.com wrote:
>>
>>> On Mon, 07 May 2018 12:38:13 -0500
>>> Robert Wessel <robert...@yahoo.com> wrote:
>>>>On Mon, 7 May 2018 08:31:31 +0000 (UTC), bol...@cylonHQ.com wrote:
>>>>>>return h / std::sqrt(e * voltage * m * (e * voltage / (m * c*c) +
>>>>>>2.0));
>>>>>
>>>>>And the result of that would be accurate would it? I hope its not code
>>>>>for any safety related control system.
>>>>
>>>>
>>>>Why wouldn't it be? Assuming a suitable range of values for voltage
>>>>(within a few orders of magnitude of 10**-4), the various constants
>>>>being correct, and the formula being correct. Performing
>>>>multiplicative* operations on FP numbers of wildly differing
>>>>magnitudes does not lose accuracy. Additive operations *do*, which is
>>>>where the range limit mentioned for voltage comes from.
>>>
>>> Safety systems like exact values, not "within a reasonable distance of
>>> what we want" values. Get the flaps angle on an aircraft a fraction of a
>>> degree out and you could find yourself in a stall or a dive PDQ. But
>>> hey, its close enough, right?
>>
>>You really expect an physical quantities to be exact to 10^-7?
>
> No, but inaccuracies with floating point calcs soon add up.

Only if you don't know how to do numerics.

>
>>When the pilot sets the flaps he does this at best to about +/-10 degrees.
>
> 10 degrees?? Are you serious??

Yes. I design avionics.

>
>>You should better talk about things you know something about.
>
> You should take your own advice mate!

I do.

--
Reinhardt

james...@alumni.caltech.edu

unread,

May 8, 2018, 10:22:14 AM5/8/18

to

On Tuesday, May 8, 2018 at 5:53:43 AM UTC-4, bol...@cylonhq.com wrote:
> On Tue, 08 May 2018 16:59:14 +0800
> Reinhardt Behm <rb...@hushmail.com> wrote:
> >AT Tuesday 08 May 2018 16:51, bol...@cylonHQ.com wrote:
> >
> >> On Mon, 07 May 2018 12:38:13 -0500
> >> Robert Wessel <robert...@yahoo.com> wrote:
> >>>On Mon, 7 May 2018 08:31:31 +0000 (UTC), bol...@cylonHQ.com wrote:
> >>>>>return h / std::sqrt(e * voltage * m * (e * voltage / (m * c*c) + 2.0));
> >>>>
> >>>>And the result of that would be accurate would it? I hope its not code
> >>>>for any safety related control system.
> >>>
> >>>
> >>>Why wouldn't it be? Assuming a suitable range of values for voltage
> >>>(within a few orders of magnitude of 10**-4), the various constants
> >>>being correct, and the formula being correct. Performing
> >>>multiplicative* operations on FP numbers of wildly differing
> >>>magnitudes does not lose accuracy. Additive operations *do*, which is
> >>>where the range limit mentioned for voltage comes from.
> >>
> >> Safety systems like exact values, not "within a reasonable distance of
> >> what we want" values. Get the flaps angle on an aircraft a fraction of a
> >> degree out and you could find yourself in a stall or a dive PDQ. But hey,
> >> its close enough, right?
> >
> >You really expect an physical quantities to be exact to 10^-7?
>
> No, but inaccuracies with floating point calcs soon add up.

Do they? Let's see.

I haven't had any practical use of my knowledge of elementary particle
physics since I was a research assistant in grad school, but I can look
at that formula and realize that if it's related to acceleration of an
electron across a voltage drop. If that experience had been only two
decades more recent, I probably would even recognize that formula. Since
I don't, I'm not sure what range of values are reasonable for voltage,
and my knowledge of experimental physics was never sufficiently practical
for me to be sure with what accuracy such voltages could be measured.

Simple analysis indicates that any value for voltage between -2*m*c*c/e
and 0.0 would cause a domain error for std::sqrt(). The heaviest
elementary particle known so far is only about 172GeV, so I used an
upper limit of 1 TV. To give your thesis it's best opportunity to be
valid, I'll assume that the voltage is measured so accurately that
double precision floating point limits the accuracy with which it can be
represented - which is absurd - that would make it one of the most
accurately measured physical quantities ever known.

#include <cmath>
#include <iomanip>
#include <iostream>
#include <limits>
// <https://physics.nist.gov/cuu/Constants/index.html>
const double h = 6.626070040e-34;
const double hp = std::nextafter(h,1);
const double hm = std::nextafter(h,0);
const double hmin = h - 0.000000081e-34;
const double hmax = h + 0.000000081e-34;

const double e = 1.6021766208e-19;
const double ep = std::nextafter(e,1);
const double em = std::nextafter(e,0);
const double emin = e-0.0000000098e-19;
const double emax = e+0.0000000098e-19;

const double m = 9.10938356e-31;

const double mp = std::nextafter(m,1);
const double mm = std::nextafter(m,0);
const double mmin = m - 0.00000011e-31;
const double mmax = m + 0.00000011e-31;
// Note that none of the physical constants listed above even comes close
// to being limited by the use of double precision floating point.

// The modern metric system defines this value of c to be exact:

const double c = 2.99792458e8;

const double cp = std::nextafter(m,1);
const double cm = std::nextafter(m,0);

const double twom = nextafter(2.0, 1.0);
const double twop = nextafter(2.0, 3.0);

void func(double voltage)
{
const double vp = nextafter(voltage, std::numeric_limits<double>::max());
const double vm = nextafter(voltage, -std::numeric_limits<double>::max());
const double vs = voltage > 0 ? vm : vp;
const double vl = voltage > 0 ? vp : vm;

const double result = h/std::sqrt(e*voltage*m*(e*voltage/(m*c*c)+2.0));
const double low_physical =
hmin/std::sqrt(emax*vl*mmax*(emax*vl/(mmin*c*c)+2.0));
const double high_physical =
hmax/std::sqrt(emin*vs*mmin*(emin*vs/(mmax*c*c)+2.0));
const double low_fp =
hm/std::sqrt(ep*vl*mp*(ep*vl/(mm*cm*cm)+twop));
const double high_fp =
hp/std::sqrt(em*vs*mm*(em*vs/(mp*cp*cp)+twom));
std::cout << std::setw(11) << voltage << "\t" << result << "\t"
<< high_physical-low_physical << "\t" << high_fp-low_fp << "\n";
}

int main(void)
{
const double vcrit = 2.0*m*c*c/e;
const double vmax = 1e15;
std::cout << "Voltage\t\tResult\t\tPhysical\tFP\n" << std::setw(11);
for(double voltage = std::numeric_limits<double>::epsilon()*vcrit;
voltage < vmax; voltage *= 2.0)
func(voltage);
for(double voltage = -vcrit; voltage > -vmax; voltage *= 2.0)
func(voltage);
return 0;
}

Do you see any cases where the floating point round-off error even comes
close to the error due to the uncertainties in the values of the
physical constants?

Would you care to demonstrate how you could use a fixed point type with
only 64 bits before and after the decimal point to perform such
calculations with accuracy limited more by the accuracy of the physical
constants than by the very limited precision of that format?

bol...@cylonhq.com

unread,

May 8, 2018, 11:57:57 AM5/8/18

to

On Tue, 08 May 2018 21:20:24 +0800

Of course you do, in between standing in for superman and updating Stephen
Hawkings theories, right?

bol...@cylonhq.com

unread,

May 8, 2018, 11:59:49 AM5/8/18

to

Yes they do, just keep dividing any non even number by 2 for starters.

[rest of crap snipped, TL;DR]

Paavo Helde

unread,

May 8, 2018, 1:50:35 PM5/8/18

to

I'll take the bait. I'm not quite sure what you mean by a non-even
floating-point number but I guess 1.0 should qualify. Here is the test:

#include <iostream>
#include <iomanip>

int main() {

double x = 1.0;
double y = x;

int n = 1000;
for (int i = 0; i < n; ++i) {
y = y / 2.0;
}
std::cout << "Divided by 2 " << n << " times: " <<
std::setprecision(40) << y << "\n";

for (int i = 0; i < n; ++i) {
y = y * 2.0;
}
std::cout << "Multiplied by 2 " << n << " times: " << std::fixed <<
std::setprecision(40) << y << "\n";
double diff = x - y;

std::cout << "Accumulated error: " << std::fixed <<
std::setprecision(40) << diff << "\n";

// The dreaded equality test
if (x == y) {
std::cout << "The accumulated error is zilch\n";
}

}

And here is the output:

Divided by 2 1000 times: 9.332636185032188789900895447238171696171e-302
Multiplied by 2 1000 times: 1.0000000000000000000000000000000000000000
Accumulated error: 0.0000000000000000000000000000000000000000
The accumulated error is zilch

From here you can see that not only do the inaccuracies not accumulate,
but there is no error at all!

Now please demonstrate to us your fixed-point number design with
comparable number of bits (64) where one can divide the number 1.0 1,000
times without losing precision.

james...@alumni.caltech.edu

unread,

May 8, 2018, 1:59:18 PM5/8/18

to

Do they add up enough to be important compared to the errors that are
unavoidable when taking measurements? That "crap", as you call it,
demonstrates that, in this particular case (which you expressed concern
about), floating point inaccuracies never even come close to
accumulating large enough to matter.

That's not particularly unusual. If you're careful, and know what to
look for, that's normally what happens. It's a waste of time, money, and
programming skills to calculate with significantly more accuracy than is
supported by the accuracy of your input data. A little more accuracy can
be helpful for coping with floating point round-off errors. But
insisting on getting the mathematically exact result, rather than the
nearest representible value, is a sign of your lack of numerical
sophistication.

The biggest thing to look out for is subtracting two numbers that are
very nearly equal - the formula shown has it's largest relative
inaccuracies near the place where e*voltage/(m*c*c) is almost exactly
-2.0. Avoiding such problems is reason for expm1(), which might otherwise
seem like a very odd thing to put in the standard library; log1p()
solves the reverse of that problem. The next biggest thing is to avoid
evaluating transcendental functions near their singular points, because
such evaluations inherently involve such subtractions.

Worry about the loss of accuracy in a long train of multiplies and
divides is usually misplaced - overflow and underflow usually become a
concern long before such a train is long enough for loss of accuracy to
become an issue.

Paavo Helde

unread,

May 8, 2018, 2:45:54 PM5/8/18

to

Maybe I should have mentioned that the absence of precision loss in the
above example is because of the excellent choice of the divisor 2.0. I
guess this might not be obvious to some... By using another divisor,
e.g. 1.99 the accumulated error becomes non-zero:

1 divided by 1.99 1000 times: 1.402566915437632038350848034701129107735e-299
Multiplied by 1.99 1000 times: 0.9999999999999029665076477613183669745922
Accumulated error: 0.0000000000000970334923522386816330254078

The relative error is in range 1E-14 which is still readily acceptable
for most engineering and scientific purposes.

james...@alumni.caltech.edu

unread,

May 8, 2018, 3:06:42 PM5/8/18

to

On Tuesday, May 8, 2018 at 1:50:35 PM UTC-4, Paavo Helde wrote:
> On 8.05.2018 18:59, bol...@cylonHQ.com wrote:
> > On Tue, 8 May 2018 07:22:01 -0700 (PDT)
> > james...@alumni.caltech.edu wrote:
> >> On Tuesday, May 8, 2018 at 5:53:43 AM UTC-4, bol...@cylonhq.com wrote:
> >>>
> >>> No, but inaccuracies with floating point calcs soon add up.
> >>
> >> Do they? Let's see.
> >
> > Yes they do, just keep dividing any non even number by 2 for starters.
> >
>
> I'll take the bait. I'm not quite sure what you mean by a non-even
> floating-point number but I guess 1.0 should qualify. Here is the test:

I have no idea why he'd choose to call the relevant numbers "non-even",
but there are numbers for which division by 2.0 does cause loss of
accuracy. This only happens when the result of the division has a value
less than DBL_MIN; you're only guaranteed to be able to divide such
numbers by 2.0 multiple times without producing a value exactly equal to
0.0 if std::numeric_limits<double>::has_denorm == std::denorm_present.
The case I could come up with that loses relative accuracy fastest is
0x1.5555555555555p-1022; a constant I can express using C, but not C++,
so I did my testing with C code. But the actual loss of accuracy is the
same using either language.

Paavo Helde

unread,

May 8, 2018, 3:28:19 PM5/8/18

to

Sure, everything has limits. Even infinite precision libraries are
limited by the available memory amount. The art of programming (and
engineering in general) is to find a way to produce useful results even
in the presence of all kind of limitations. That's what it makes it so
different from maths.

The floating-point number format is just one example of that.

Vir Campestris

unread,

May 8, 2018, 4:51:11 PM5/8/18

to

That's a big surprise to me. I've seen painted flap position indicators
on airliners, and they often have 5 degree increments. And I suspect I'd
be seriously inconvenienced if there was a 10 degree discrepancy between
the flap settings on the two wings.

On the other hand, 1 degree sort of feels as if it should be OK. Given
the max flap settings are under 90 degrees storing the position in a
byte should do just fine.

Feel free to toss in a reference disproving this.

Andy

Reinhardt Behm

unread,

May 9, 2018, 10:01:19 AM5/9/18

to

AT Tuesday 08 May 2018 23:57, bol...@cylonHQ.com wrote:

>>
>>Yes. I design avionics.
>
> Of course you do, in between standing in for superman and updating Stephen
> Hawkings theories, right

I am too old for such superman nonsense. And while I mostly understand
Hawking's work I have never been active in cosmology. My thesis many moons
ago was in high energy and heavy ion physics.

--
Reinhardt

Tim Rentsch

unread,

Jun 13, 2018, 9:12:04 PM6/13/18

to

Manfred <non...@invalid.add> writes:

> On 4/30/2018 7:41 AM, Tim Rentsch wrote:
>
>> But don't give in to superstitious programming practices. Insist
>> on solid understanding and a rational decision process, not murky
>> justifications based on uncertainty and fear. Anyone promoting
>> voodoo programming principles should be encouraged to change
>> occupations from developer to witchdoctor.
>
> The fact that comparing floating point values with == is inherently
> brittle is a *fact*, and there is nothing superstitious with it.

I'm sorry you didn't understand the point I was making.

Incidentally, to be a question of fact the question must be
objectively decidable. Whether using == with floating point
values is "inherently brittle" involves subjective judgment,
because it depends on what someone thinks "inherently brittle"
means. Rational people may reasonably disagree. In such cases
the question cannot be a question of fact.

Tim Rentsch

unread,

Jun 13, 2018, 9:14:24 PM6/13/18

to

Paavo Helde <myfir...@osa.pri.ee> writes:

> On 30.04.2018 8:41, Tim Rentsch wrote:
>
>> Paavo Helde <myfir...@osa.pri.ee> writes:
>>
>>> On 27.04.2018 13:22, Andrea Venturoli wrote:
>>>
>>>> On 04/27/18 12:00, Paavo Helde wrote:
>>>>
>>>>> Nevertheless, comparing with std::numeric_limits<double>::max() seems
>>>>> pretty fragile
>>>>
>>>> Why then?
>>>
>>> Comparing floating-point numbers for exact equality is always fragile,
>>> there is always a chance that somebody adds some computation like
>>> divide by 10, multiply by 10, ruining the results. A deeply
>>> "floating-point" value like std::numeric_limits<double>::max() is
>>> doubly suspect just because it is far away from normal and well-tested
>>> range of values.
>>>
>>> I just checked how it is defined in MSVC. It appears the value is
>>> defined by the macro
>>>
>>> #define DBL_MAX 1.7976931348623158e+308
>>>
>>> From here you can clearly see there might be problems. This constant
>>> is specified in decimal and I believe there is a fair chance this
>>> number does not have an exact representation in binary. It will
>>> probably yield different results when loaded into either a 64-bit or a
>>> 80-bit register.
>>
>> None of those things matter. The Standard requires a particular
>> value be returned, however the implementation chooses to do it.
>>
>>> Add some minor optimizer bugs and one can easily
>>> imagine that there might be problems when comparing this number with
>>> itself, even if it should work by the letter of the standard.
>>[...]

>> But don't give in to superstitious programming practices. Insist
>> on solid understanding and a rational decision process, not murky
>> justifications based on uncertainty and fear. Anyone promoting
>> voodoo programming principles should be encouraged to change
>> occupations from developer to witchdoctor.
>

> What one man calls superstitious hunch, another man calls
> experience. I would not have written a direct comparison with
> std::numeric_limits<double>::max() because I have had some experience
> with compiler/optimizer bugs and where are the murky corners. As it
> came out else-thread my suspicions were justified, the problem indeed
> appears to be a bug in the compiler, triggered indeed by the presence
> of std::numeric_limits<double>::max() in the code (albeit the bug was
> a different and more interesting one from what I had imagined).

Your experience seems to have led you to a superstitious and
erroneous conclusion. The problem has nothing to do with
using std::numeric_limits<double>::max(), or using equality
comparison.

Juha Nieminen

unread,

Jun 14, 2018, 4:57:08 AM6/14/18

to

Paavo Helde <myfir...@osa.pri.ee> wrote:
> It could, but then the dynamic range of the values would be limited to
> e.g. from 1/2^32 to 2^32 (i.e. 2E-10 .. 4E+9), which is pretty narrow
> compared to current double range 2E-308 .. 2E+308 (assuming 2x32-bit
> integers here for a fair comparison with a 64-bit double). A lot of
> actually needed calculations involve things like gigahertzes or
> picometers which would not be easily representable/convertible in that
> system.

The other advantage of floating point, besides the larger range, is that
the accuracy of the most-signficant-digits doesn't depend on the magnitude
of the number. In other words, you will have about 15 (decimal) most
significant digits of accuracy regardless of whether your values are
in the magnitude of 1e1, 1e10, or 1e100, for instance.

(Floating point values should always be thought as "the n most significant
digits of the actual value being represented", rather than an absolutely
exact value. This amount is always the same, regardless of the magnitude,
sans perhaps the absolutely edge cases.)

As for comparing floating point values with operator ==, it depends.
In some situations it's completely reliable, such as:

double d = 1.0;
if(d == 1.0) ...

That will always evaluate to true. (I sometimes get the feeling that
some people have the misconception that floating point values operate under
quantum physics and the Heisenberg uncertainty principle, in that you can
never trust them to have an exact value at any given point. Obviously
that's not the case.)