A right alternative to IEEE-754's format

Rick C. Hodgin

unread,

Mar 27, 2018, 10:26:11 AM3/27/18

to

I've been thinking about John Gustafson's unums, and something occur-
red to me. All of our IEEE-754 floating point values are stored with
either an explicit-1 or implicit-1, with the mantissa bits being
ascribed to the bits after that 1, moving away from the left.

What if we did it differently?

What if we stored the bits from the right (like the way we would
write our own numbers, of "125" being the equivalent of "125.",
rather than "1.25e+2"? This would require along two additional
pieces of information, which are (1) the length of the "mantissa"
portion in bytes, and (2) position of the period (which moves left
from the fully stored bit pattern), but we would gain the ability
to have far greater precision, and to have precision out beyond
our need to round. It would guarantee sufficient resolution based
on the needs of the computation to never have invalid results in
rounding in our computations.

-----
It seems to me that IEEE-754 is working form the perspective of a
binary 1.xxxxxx. This format I propose would be a "xxxxxxx." pat-
tern. The period position would indicate how far from the right
to move, such that if it were 0 it would be "xxxxxxx.", and if it
were 2 it would be "xxxxx.xx" and so on.

Instead of the explicit-1 or implicit-1 mantissa:

[sign][exp][mantissa]
[sign][exp]1.[mantissa]

It would be either this or this (depending on whether or not we would
even need to store the exponent portion using this methodology):

[sign][exp][period][length][bits]
[sign] [period][length][bits]

And if we wanted to use a fixed format, the bits [length] could be
removed and the assumed bit storage would be various bit sizes
based on resolution.

[sign][exp][period][bits]
[sign] [period][bits]

Given the nature of numbers represented in binary there could also
be a few bits reserved for special case scenarios, like for repeating
bit patterns so they can work out to whatever degree of precision is
required, but are stored minimally, with low integer, zero, and plus
and minus infinity values all stored with minimal bits, and so on.

-----
It just makes sense to me to store the data you need for the thing,
and to leave guesswork to something other than the computational
math engine of your CPU.

I think it would be desirable to integrate this kind of design
alongside traditional IEEE-754 engines, so that you have their
traditional support for fast/semi-reliable computation, but then
to add this new engine which guarantees exact value computation
(out to a rounding level), no matter how long the computation
takes or how much memory it requires.

MPFR already does this same sort of thing in software. I think
adding it to hardware would be desirable at this point in our
available transistor budgets and mature design toolsets.

--
Rick C. Hodgin

Mr Flibble

unread,

Mar 27, 2018, 2:08:05 PM3/27/18

to

On 27/03/2018 15:26, Rick C. Hodgin wrote:
> I've been thinking about John Gustafson's unums, and something occur-
> red to me. All of our IEEE-754 floating point values are stored with
> either an explicit-1 or implicit-1, with the mantissa bits being
> ascribed to the bits after that 1, moving away from the left.
>
> What if we did it differently?

The only thing wrong with IEEE-754 is allowing a representation of
negative zero. There is no such thing as negative zero.

>
[snip tl;dr]
/Flibble

--
"Suppose it’s all true, and you walk up to the pearly gates, and are
confronted by God," Bryne asked on his show The Meaning of Life. "What
will Stephen Fry say to him, her, or it?"
"I’d say, bone cancer in children? What’s that about?" Fry replied.
"How dare you? How dare you create a world to which there is such misery
that is not our fault. It’s not right, it’s utterly, utterly evil."
"Why should I respect a capricious, mean-minded, stupid God who creates
a world that is so full of injustice and pain. That’s what I would say."

Rick C. Hodgin

unread,

Mar 27, 2018, 2:22:58 PM3/27/18

to

On Tuesday, March 27, 2018 at 2:08:05 PM UTC-4, Mr Flibble wrote:
> On 27/03/2018 15:26, Rick C. Hodgin wrote:
> > I've been thinking about John Gustafson's unums, and something occur-
> > red to me. All of our IEEE-754 floating point values are stored with
> > either an explicit-1 or implicit-1, with the mantissa bits being
> > ascribed to the bits after that 1, moving away from the left.
> >
> > What if we did it differently?
>
> The only thing wrong with IEEE-754 is allowing a representation of
> negative zero. There is no such thing as negative zero.

There are many problems with IEEE-754. Different architectures
aren't even required to produce the same result. And even in our
C++ compilers, we have an option set to allow for compliant round-
ing, or fast rounding, which results in an extra store-and-load
being done in the 80387-based FPUs, because internally they do
not round properly without a store.

Stanford Seminar: Beyond Floating Point
https://www.youtube.com/watch?v=aP0Y1uAA-2Y

Beating Floats At Their Own Game
https://www.youtube.com/watch?v=N05yYbUZMSQ

https://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868

--
Rick C. Hodgin

PS -- This is not "evangelistic" except in the context of getting
people to leave IEEE-754 standards. :-)

Öö Tiib

unread,

Mar 27, 2018, 2:26:30 PM3/27/18

to

On Tuesday, 27 March 2018 21:08:05 UTC+3, Mr Flibble wrote:
> On 27/03/2018 15:26, Rick C. Hodgin wrote:
> > I've been thinking about John Gustafson's unums, and something occur-
> > red to me. All of our IEEE-754 floating point values are stored with
> > either an explicit-1 or implicit-1, with the mantissa bits being
> > ascribed to the bits after that 1, moving away from the left.
> >
> > What if we did it differently?
>
> The only thing wrong with IEEE-754 is allowing a representation of
> negative zero. There is no such thing as negative zero.

All these number formats are about efficiency of certain calculations, efficiency of calculations means transistors. Transistors however are
somewhat under level of topicality here.

Bonita Montero

unread,

Mar 28, 2018, 8:30:37 AM3/28/18

to

> There are many problems with IEEE-754. Different architectures
> aren't even required to produce the same result.

You can tune each compiler and FPU (through setting the control word)
so that you can have identical results on different platforms. On x87
this will result in slower code to chop results but with SSE and AVX
the code won't run slower.

Rick C. Hodgin

unread,

Mar 28, 2018, 8:45:52 AM3/28/18

to

The compiler enables you to overcome limitations and variations
allowed for by the IEEE-754 standard so as to obtain reproducible
results across platforms ... in that compiler, and possibly to a
(or an implicit) C++ standard observed by various compiler authors.

But you can write a Java-based compute engine, and a C++-base
compute engine, and they may not produce the same results using
IEEE-754 because they have different mechanisms to process data
using the same IEEE-754 "compliant" FPU.

Some compilers also have a fused_multiply_add operation, and many
compilers will try to optimize the multiple, add into a single
multiply_add operation, and the results can be different on each
because of the way different portions of the computation are
completed.

IEEE-754 has a lot of issues. It is unsuitable for high-precision
numerical work, but is good enough to get you in the ball-park.
But all truly detailed (often arbitrary precision) work will use
a different software-based engine that bypasses the shortcomings
of IEEE-754 in favor of accuracy at the sacrifice of much speed.

My goal in creating a new numeric processing unit is to overcome
those issues by allowing for arbitrary precision computation in
hardware, and to define a standard that does not allow for any
ambiguity in implementation. Corner cases and results will be
explicitly defined, for example, so that any compliant implemen-
tation will produce identical results.

It's part of my long-term goal and Arxoda CPU project.

--
Rick C. Hodgin

Bonita Montero

unread,

Mar 28, 2018, 9:32:27 AM3/28/18

to

> But you can write a Java-based compute engine, and a C++-base
> compute engine, and they may not produce the same results using
> IEEE-754 because they have different mechanisms to process data
> using the same IEEE-754 "compliant" FPU.

That's irrelevant because Java is a different language with a
different behaviour. And Java even doesn't claim to be IEEE-754
-compliant.

> Some compilers also have a fused_multiply_add operation, and many
> compilers will try to optimize the multiple, add into a single
> multiply_add operation, and the results can be different on each
> because of the way different portions of the computation are
> completed.

FMA produces the same results as expliticit instructoins with
some special behaviour when some operands are -0, Inf or NaN.

> It is unsuitable for high-precision numerical work, ...

That's not true. You can control how the FPU behaves through
instructing the compiler or the FPU control word.

David Brown

unread,

Mar 28, 2018, 10:38:30 AM3/28/18

to

On 28/03/18 14:45, Rick C. Hodgin wrote:
> On Wednesday, March 28, 2018 at 8:30:37 AM UTC-4, Bonita Montero wrote:
>>> There are many problems with IEEE-754. Different architectures
>>> aren't even required to produce the same result.
>>
>> You can tune each compiler and FPU (through setting the control word)
>> so that you can have identical results on different platforms. On x87
>> this will result in slower code to chop results but with SSE and AVX
>> the code won't run slower.
>
> The compiler enables you to overcome limitations and variations
> allowed for by the IEEE-754 standard so as to obtain reproducible
> results across platforms ... in that compiler, and possibly to a
> (or an implicit) C++ standard observed by various compiler authors.
>
> But you can write a Java-based compute engine, and a C++-base
> compute engine, and they may not produce the same results using
> IEEE-754 because they have different mechanisms to process data
> using the same IEEE-754 "compliant" FPU.
>
> Some compilers also have a fused_multiply_add operation, and many
> compilers will try to optimize the multiple, add into a single
> multiply_add operation, and the results can be different on each
> because of the way different portions of the computation are
> completed.

For most floating point work, that is absolutely fine. Doubles give you
about 16 significant digits. That is high enough resolution to measure
the distance across the USA to the nearest /atom/. Who cares if an atom
or two gets lost in the rounding when you do some arithmetic with it?

Yes, there are some uses of floating point where people want replicable
results across different machines (or different implementations -
software, FPU, SIMD, etc. on the same machine). But these are rarities,
and IEEE and suitable compilers support them. It is not uncommon to
have to use slower modes and drop optimisation (x * y is no longer
always y * x) - for the best guarantees of repeatability, use a software
floating point library.

In most cases, however, it is fine to think that floating point
calculations give a close but imprecise result. Simply assume that you
will lose a bit of precision for each calculation, and order your code
appropriately. (Fused multiply-add, and other such optimisations,
reduce the precision loss.)

Sometimes you need to do a lot of calculations, or you need to mix
wildly different sizes of numbers (such as for big numerical
calculations, if you can't order them appropriately) - there are
use-cases for 128-bit IEEE numbers. (256-bit IEEE is also defined, but
rarely used.)

And there are use-cases for arbitrary precision numbers too. These are
more typically integers than floating point.

>
> IEEE-754 has a lot of issues. It is unsuitable for high-precision
> numerical work, but is good enough to get you in the ball-park.
> But all truly detailed (often arbitrary precision) work will use
> a different software-based engine that bypasses the shortcomings
> of IEEE-754 in favor of accuracy at the sacrifice of much speed.

Do you have any references or statistics to back this up? For most
uses, AFAIK, strict IEEE-754 is used when you favour repeatability (not
accuracy) over speed - and something like "gcc -ffast-math" is for when
you favour speed. The IEEE formats are well established as a practical
and sensible format for floating point that cover a huge range of
use-cases. The situations where they are not good enough are rare.

Rick C. Hodgin

unread,

Mar 28, 2018, 11:16:24 AM3/28/18

to

On Wednesday, March 28, 2018 at 9:32:27 AM UTC-4, Bonita Montero wrote:
> > But you can write a Java-based compute engine, and a C++-base
> > compute engine, and they may not produce the same results using
> > IEEE-754 because they have different mechanisms to process data
> > using the same IEEE-754 "compliant" FPU.
>
> That's irrelevant because Java is a different language with a
> different behaviour. And Java even doesn't claim to be IEEE-754
> -compliant.

Java running on the same computer using the same sequence of math
ops can produce a different total than the same sequence coded in C++.
It is not supposed to, but C++ will inject periodic stores and reloads
to ensure proper rounding at times.

This means the IEEE-754 "standard" allows for variability in computationn
based on when certain steps are performed. It is not mathematically
accurate, nor is it required to produce identical results across
architectures.

> > Some compilers also have a fused_multiply_add operation, and many
> > compilers will try to optimize the multiple, add into a single
> > multiply_add operation, and the results can be different on each
> > because of the way different portions of the computation are
> > completed.
>
> FMA produces the same results as expliticit instructoins with
> some special behaviour when some operands are -0, Inf or NaN.

See section 2.3. FMA performs one round. Separate multiply-add
perform two. The results arwe different:

2.3. The Fused Multiply-Add (FMA)
http://docs.nvidia.com/cuda/floating-point/index.html

> > It is unsuitable for high-precision numerical work, ...
>
> That's not true. You can control how the FPU behaves through
> instructing the compiler or the FPU control word.

But you cannot guarantee the same operation across platforms.

IEEE-754 is not a true standard as it leaves some wriggle room in
actual implementation. And compilers often introduce optimizations
which change the result:

https://www.nccs.nasa.gov/images/FloatingPoint_consistency.pdf

John Gustafson also talks about it in his Stanford talk posted above.

--
Rick C. Hodgin

Rick C. Hodgin

unread,

Mar 28, 2018, 11:27:33 AM3/28/18

to

On Wednesday, March 28, 2018 at 10:38:30 AM UTC-4, David Brown wrote:
> On 28/03/18 14:45, Rick C. Hodgin wrote:
> > IEEE-754 has a lot of issues. It is unsuitable for high-precision
> > numerical work, but is good enough to get you in the ball-park.
> > But all truly detailed (often arbitrary precision) work will use
> > a different software-based engine that bypasses the shortcomings
> > of IEEE-754 in favor of accuracy at the sacrifice of much speed.
>
> Do you have any references or statistics to back this up? For most
> uses, AFAIK, strict IEEE-754 is used when you favour repeatability (not
> accuracy) over speed - and something like "gcc -ffast-math" is for when
> you favour speed.

Weather modeling uses the double-double and quad-double libraries,
which leverage x87 hardware to chain operations to create 128-bit,
and 256-bit computations.

http://crd-legacy.lbl.gov/~dhbailey/mpdist/

MPFR maintains a list of citations using their work:

http://www.mpfr.org/pub.html

And John Gustafson discusses the shortcomings in his Stanford video
you didn't watch I posted above.

--
Rick C. Hodgin

Bonita Montero

unread,

Mar 28, 2018, 11:59:25 AM3/28/18

to

> Java running on the same computer using the same sequence of math
> ops can produce a different total than the same sequence coded in C++.

Where's the problem? Java is a different language and may have a
different arithmetic behaviour.

> It is not supposed to, but C++ will inject periodic stores and reloads
> to ensure proper rounding at times.

Only with the x87-FPU. With SSE or AVX, there are explicit operations
with less precsion. And other architectures behave also such instruc-
tions.

> This means the IEEE-754 "standard" allows for variability
> in computationn based on when certain steps are performed.

But a consistent behaviour as one would naively would expect
can be configured.

> It is not mathematically accurate, nor is it required to
> produce identical results across architectures.

You can easily configure different compilers of the same language
on different architectures to give identical binary result.s

>> FMA produces the same results as expliticit instructoins with
>> some special behaviour when some operands are -0, Inf or NaN.

> See section 2.3. FMA performs one round. Separate multiply-add
> perform two. The results arwe different:
> 2.3. The Fused Multiply-Add (FMA)
> http://docs.nvidia.com/cuda/floating-point/index.html

NVidia GPUs aren't general purpose CPUs. And for the purpose these
GPUs are designed, this behaviour isn't a restriction.

>> That's not true. You can control how the FPU behaves through
>> instructing the compiler or the FPU control word.

> But you cannot guarantee the same operation across platforms.

It can be configured so.

Bonita Montero

unread,

Mar 28, 2018, 12:04:13 PM3/28/18

to

> Weather modeling uses the double-double and quad-double libraries,
> which leverage x87 hardware to chain operations to create 128-bit,
> and 256-bit computations.

I'll bet my right hand that weather modelling usually isn't done
with 128- or 256-bit FPs as these operations are ultimatively slow
when done in software.
If there would be a noteworthy demand for 128- or 256-bit FP, there
would be a hardware-support by many CPUs.

Scott Lurndal

unread,

Mar 28, 2018, 12:13:29 PM3/28/18

to

Bonita Montero <Bonita....@gmail.com> writes:
>> Weather modeling uses the double-double and quad-double libraries,
>> which leverage x87 hardware to chain operations to create 128-bit,
>> and 256-bit computations.
>
>I'll bet my right hand that weather modelling usually isn't done
>with 128- or 256-bit FPs as these operations are ultimatively slow
>when done in software.

A large fraction of it is either done using SIMD operations or GPU (OpenCL/Cuda).

Machine learning is leaning towards highly-parallel (SIMD) 16-bit FP.

Rick C. Hodgin

unread,

Mar 28, 2018, 12:27:14 PM3/28/18

to

On Wednesday, March 28, 2018 at 11:59:25 AM UTC-4, Bonita Montero wrote:
> > Java running on the same computer using the same sequence of math
> > ops can produce a different total than the same sequence coded in C++.
>
> Where's the problem? Java is a different language and may have a
> different arithmetic behaviour.

That is the problem. It's using the same IEEE-754 engine under the
hood. The result of a calculation using the same source code copied
from a C++ app into a Java app (something like "c = a + b", but more
complex) can produce different results in the Java app than it can
in the C++ app, or vice-versa.

This is directly the result of the IEEE-754 standard being insufficient
in and of itself to perform math properly. It requires coddling to get
it to produce identical results across platforms (and even sometimes on
the same machine across processes which are scheduled on different
cores).

> > It is not supposed to, but C++ will inject periodic stores and reloads
> > to ensure proper rounding at times.
>
> Only with the x87-FPU. With SSE or AVX, there are explicit operations
> with less precsion. And other architectures behave also such instruc-
> tions.

Various SIMD extensions have improved upon the issue relative to
the x87 FPU, but the results still remain. You cannot guarantee
that an ARM-based CPU will produce the same result as an AMD64-
based CPU if they both use the same sequence of operations. You
have to manually validate it, and that's the issue being discussed
by math-based hardware people and IEEE-754, and specifically at
the present time, John Gustafson.

> > This means the IEEE-754 "standard" allows for variability
> > in computationn based on when certain steps are performed.
>
> But a consistent behaviour as one would naively would expect
> can be configured.

Not without manual effort. You can't take a CPU that sports a fully
IEEE-754 compliant FPU and guarantee the results will be the same.
They likely will be, or as David points out, will be close enough
that no one would ever see the variance ... but I'm talking about
the actual variance which is there.

In the last few bits, rounding at various levels enters in and it
does change the result. It may have no real impact on the final
result rounded to four base-10 decimal places, but if you look at
the result the hardware computed, it is (can be) different.

> > It is not mathematically accurate, nor is it required to
> > produce identical results across architectures.
>
> You can easily configure different compilers of the same language

> on different architectures to give identical binary results.

This has not been my experience. I have listened to many lectures
where they state this is one of the biggest problems with IEEE-754.
In fact, one of the authors of Java was absolutely stunned to learn
that IEEE-754 on one machine may produce different results than the
same series of calculations on another. It floored him that such a
result was possible in a "standard" such as IEEE-754:

https://people.eecs.berkeley.edu/~wkahan/JAVAhurt.pdf

> >> FMA produces the same results as expliticit instructoins with
> >> some special behaviour when some operands are -0, Inf or NaN.
>
> > See section 2.3. FMA performs one round. Separate multiply-add
> > perform two. The results arwe different:
> > 2.3. The Fused Multiply-Add (FMA)
> > http://docs.nvidia.com/cuda/floating-point/index.html
>
> NVidia GPUs aren't general purpose CPUs. And for the purpose these
> GPUs are designed, this behaviour isn't a restriction.

The document I posted outlines the issue. It is prevalent in any
implementation of the IEEE-754-2008 FMA extension. The operation
requires only one rounding, whereas separate multiply-add ops will
have two. This invariably produces different results, be it on a
GPU, CPU, or with trained mice mimicking FPU gates switching on
and off in a laboratory.

FMA is a different mathematical operation than two separate multi-
ply add operations. That's the end of that discussion.

The results from FMA are typically better than those of the two
separate multiply add operations, but the results are different.
One of the biggest arguments is whether or not a compiler should
be able to replace a multiply followed by an add with a single
fused_multiply_add operation. Most people think it shouldn't,
but to the compiler author, and in a pure mathematical sense,
it should not make any difference, so they optimize in that way.
But it does make a difference, and they shouldn't.

Some modern compilers are wising up to that realization. Intel's
compiler, for example, will only use FMA under certain flags for
optimization.

> >> That's not true. You can control how the FPU behaves through
> >> instructing the compiler or the FPU control word.
>
> > But you cannot guarantee the same operation across platforms.
>
> It can be configured so.

It cannot be guaranteed. You can work around it. You can wriggle
your way into it, but you cannot know with certainty if a particular
IEEE-754 compliant FPU will produce the same result on architectures
A, B, C, and D, without testing them.

IIRC, even some Pentium III-era and older CPUs produce some different
results than modern Intel-based hardware (and not due to the Pentium
FDIV bug), but to the way Intel removed "legacy DOS support" in the
Pentium 4 and later. You must now explicitly enable backward compati-
bility flags in the Pentium 4's x87 FPU and later CPUs to get the same
results in your debugger as you would've observed previously.

Such things have been altered for performance, but they impact the
fundamental operations of the x87 FPU.

With all things floating point, you must rigorously test things to
see if they are the same across platforms, and even arguably revisions
within platforms.

--
Rick C. Hodgin

Rick C. Hodgin

unread,

Mar 28, 2018, 12:33:56 PM3/28/18

to

On Wednesday, March 28, 2018 at 12:04:13 PM UTC-4, Bonita Montero wrote:
> > Weather modeling uses the double-double and quad-double libraries,
> > which leverage x87 hardware to chain operations to create 128-bit,
> > and 256-bit computations.
>
> I'll bet my right hand that weather modelling usually isn't done
> with 128- or 256-bit FPs as these operations are ultimatively slow
> when done in software.

The QD library I cited above was created to use x87-based hardware
to allow for fast 128-bit and 256-bit computation.

This information came to me first-hand during an interview with
John Stone who leases time on the supercomputers in Urbana, IL.

> If there would be a noteworthy demand for 128- or 256-bit FP, there
> would be a hardware-support by many CPUs.

A discussion today on this subject on comp.arch yielded Terje
Mathisen writing this as an aid introduced into hardware to make
software-based 128-bit and 256-bit compute go faster:

https://groups.google.com/d/msg/comp.arch/igzzuO9cwwM/lzqZ5KROAAAJ

You should download and benchmark that 128-bit and 256-bit QD
library, and compare it to the same size operations in MPFR. You
will be amazed how fast it is.

In 2010 when I was doing extensive large FPU computation, I was
using MPFR for many months. I interviewed John Stone at that time
and mentioned my work (as I was using a widely parallel math system
that I commented would work really well on the supercomputers there).
He asked me what I was using and I told him. He said most people he
knows use the QD library. I downloaded it and tried it and it was
notably faster, something like 6x to 10x faster IIRC.

He said it was expressly developed or enhanced to work with the
weather modeling done on supercomputers because it uses the FPU
hardware but is more precise than even the 80-bit formats. It
uses the 64-bit double format and spreads the mantissa bits across
two 64-bit quantities for 128-bit, and across four for 256-bit.

Search for "QD":
http://crd-legacy.lbl.gov/~dhbailey/mpdist/

--
Rick C. Hodgin

asetof...@gmail.com

unread,

Mar 28, 2018, 1:24:12 PM3/28/18

to

I like fixed point float but I have not too much experience if it is ok.
It seems at first saw, better than all IEEE float implementation
For implement it is enough one CPU can do operations on unsigned as the usual one in x86

Hergen Lehmann

unread,

Mar 28, 2018, 2:00:22 PM3/28/18

to

Fixed point arithmetic is fine, if your application has a predictable
and rather limited value range.

However, proper overflow handling will be a pain in the ass (many CPUs
do not even generate an exception for integer overflows!). And more
complex operations like logarithms or trigonometric functions will be
slow without the help of the FPU.

Gareth Owen

unread,

Mar 28, 2018, 2:00:59 PM3/28/18

to

Bonita Montero <Bonita....@gmail.com> writes:

>> Weather modeling uses the double-double and quad-double libraries,
>> which leverage x87 hardware to chain operations to create 128-bit,
>> and 256-bit computations.
>
> I'll bet my right hand that weather modelling usually isn't done
> with 128- or 256-bit FPs as these operations are ultimatively slow
> when done in software.

There's very little benefit in keeping precisions that's that many
orders of magnitudes more precise than the measurements driving the
model -- doubly so when the process your modelling is the poster child
for sensitive dependence on those initial conditions.

Back when I did a bit of it professionally, we'd run the same models and
multiple architectures, and the variation in FP behaviour was just
another way to add the ensembles.

asetof...@gmail.com

unread,

Mar 28, 2018, 2:08:22 PM3/28/18

to

I not agree in 1 word you say
The real problem for error are float point IEEE not the fixed point float
Because in IEEE float the error depends from size of the number too

Mr Flibble

unread,

Mar 28, 2018, 4:13:03 PM3/28/18

to

On 27/03/2018 19:22, Rick C. Hodgin wrote:
> On Tuesday, March 27, 2018 at 2:08:05 PM UTC-4, Mr Flibble wrote:
>> On 27/03/2018 15:26, Rick C. Hodgin wrote:
>>> I've been thinking about John Gustafson's unums, and something occur-
>>> red to me. All of our IEEE-754 floating point values are stored with
>>> either an explicit-1 or implicit-1, with the mantissa bits being
>>> ascribed to the bits after that 1, moving away from the left.
>>>
>>> What if we did it differently?
>>
>> The only thing wrong with IEEE-754 is allowing a representation of
>> negative zero. There is no such thing as negative zero.
>
> There are many problems with IEEE-754.

IEEE-754 serves its purpose quite adequately evidenced by its successful
pervasiveness. If there was a significantly better way of doing things
we would be doing it that way by now, modulo Gustafson's promising unums
(rather than any amateurish God bothering alternative).

Rick C. Hodgin

unread,

Mar 28, 2018, 4:19:46 PM3/28/18

to

On Wednesday, March 28, 2018 at 4:13:03 PM UTC-4, Mr Flibble wrote:
> On 27/03/2018 19:22, Rick C. Hodgin wrote:
> > On Tuesday, March 27, 2018 at 2:08:05 PM UTC-4, Mr Flibble wrote:
> >> The only thing wrong with IEEE-754 is allowing a representation of
> >> negative zero. There is no such thing as negative zero.
> >
> > There are many problems with IEEE-754.
>
> IEEE-754 serves its purpose quite adequately evidenced by its successful
> pervasiveness. If there was a significantly better way of doing things
> we would be doing it that way by now, modulo Gustafson's promising unums

There is inertia at this point, and the transformation from IEEE-754
to another format is a lot different than moving from FPUs to SIMD.
It will require a major overhaul of many apps to be able to handle
the new format.

It's not as easy a sell as you might think.

--
Rick C. Hodgin

Mr Flibble

unread,

Mar 28, 2018, 4:33:55 PM3/28/18

to

Read my reply again: I used the word "pervasiveness" and I what I didn't
say is moving away from the status quo would be "an easy sell".
Changing from something that pervasively bedded in is never easy.

Rick C. Hodgin

unread,

Mar 29, 2018, 8:46:24 AM3/29/18

to

On Wednesday, March 28, 2018 at 2:00:59 PM UTC-4, gwowen wrote:
> Bonita Montero <Bonita....@gmail.com> writes:
>
> >> Weather modeling uses the double-double and quad-double libraries,
> >> which leverage x87 hardware to chain operations to create 128-bit,
> >> and 256-bit computations.
> >
> > I'll bet my right hand that weather modelling usually isn't done
> > with 128- or 256-bit FPs as these operations are ultimatively slow
> > when done in software.
>
> There's very little benefit in keeping precisions that's that many
> orders of magnitudes more precise than the measurements driving the
> model -- doubly so when the process your modelling is the poster child
> for sensitive dependence on those initial conditions.

The initial value will be whatever the source data provides, but
when you are dealing with calculations in series it is desirable
to conduct your work out to a much greater decimal place, so that
in each calculation (where you lose significant bits), that bit
loss accumulation does not consume your desired significant bits
at the level of your original source data.

Your calculation shouldn't be a limiting factor in precision, even
if your source data may be the limiting factor.

128-bits to 256-bits is able to handle nearly everything we need.
But not in all cases, which is why I think it's important to provide
an arbitrary precision numeric processing engine in a CPU, so the
limitation on utility of use is not in the hardware. And, if done
correctly, it will be faster than software libraries which do the
same. And, as Terje Mathison points out in the comp.arch thread,
there may be a way to provide features which enable a software version
to drive the compute ability through a few new features added, which
would then not require the numeric processor to do all the work, but
just to be flexible enough to easily allow the software to use it for
arbitrary precision calculations.

> Back when I did a bit of it professionally, we'd run the same models and
> multiple architectures, and the variation in FP behaviour was just
> another way to add the ensembles.

What does this mean?

--
Rick C. Hodgin

David Brown

unread,

Mar 29, 2018, 1:38:03 PM3/29/18

to

On 28/03/18 17:27, Rick C. Hodgin wrote:
> On Wednesday, March 28, 2018 at 10:38:30 AM UTC-4, David Brown wrote:
>> On 28/03/18 14:45, Rick C. Hodgin wrote:
>>> IEEE-754 has a lot of issues. It is unsuitable for high-precision
>>> numerical work, but is good enough to get you in the ball-park.
>>> But all truly detailed (often arbitrary precision) work will use
>>> a different software-based engine that bypasses the shortcomings
>>> of IEEE-754 in favor of accuracy at the sacrifice of much speed.
>>
>> Do you have any references or statistics to back this up? For most
>> uses, AFAIK, strict IEEE-754 is used when you favour repeatability (not
>> accuracy) over speed - and something like "gcc -ffast-math" is for when
>> you favour speed.
>
> Weather modeling uses the double-double and quad-double libraries,
> which leverage x87 hardware to chain operations to create 128-bit,
> and 256-bit computations.
>

That's fine - but weather modelling is not an example of "most uses".
As I said, some cases need more about their numerics, but /most/ do not.

> http://crd-legacy.lbl.gov/~dhbailey/mpdist/
>
> MPFR maintains a list of citations using their work:
>
> http://www.mpfr.org/pub.html
>
> And John Gustafson discusses the shortcomings in his Stanford video
> you didn't watch I posted above.
>

I haven't bothered watching the video. I know that some people like
them as a way of learning about a topic, but I usually find them
extremely inefficient compared to a webpage or a paper that I can read
at my own pace, spending more time on the parts I find difficult or
particularly interesting.

I have read a little of Gustafson's stuff on unums. I am not
particularly impressed. As far as I can tell, it seems to consist of
two ideas - making the number of significant bits variable, and allowing
for a limited form of ranges. A variable number of bits makes hardware
and software implementations painful and inefficient. Until you are
getting to very large sizes, such as beyond octal precision IEEE (256
bits, or 32 bytes), it the simplicity of always dealing in a known fixed
size outweighs the extra calculations needed for when a smaller bit
length would be sufficient. The use of ranges sounds like a reasonable
idea, but is only partly handled, and again it is more efficient to know
that you always have a range pair rather than having to figure it out
for each operation.

There are definitely uses for other formats than standard floating
point, or where you want to track ranges, error sizes, etc., as part of
your number types. But any attempt at making a "universal number
format" is guaranteed to fail - anyone who thinks they have made one, is
wrong. IEEE floating point formats and calculation standards are a
solid compromise between working for a wide range of uses and being
efficient to implement, and have stood the test of time. They don't
cover all applications, but no alternate format would do so either.

Gareth Owen

unread,

Mar 29, 2018, 2:49:08 PM3/29/18

to

"Rick C. Hodgin" <rick.c...@gmail.com> writes:

> On Wednesday, March 28, 2018 at 2:00:59 PM UTC-4, gwowen wrote:
>> Bonita Montero <Bonita....@gmail.com> writes:
>>
>> >> Weather modeling uses the double-double and quad-double libraries,
>> >> which leverage x87 hardware to chain operations to create 128-bit,
>> >> and 256-bit computations.
>> >
>> > I'll bet my right hand that weather modelling usually isn't done
>> > with 128- or 256-bit FPs as these operations are ultimatively slow
>> > when done in software.
>>
>> There's very little benefit in keeping precisions that's that many
>> orders of magnitudes more precise than the measurements driving the
>> model -- doubly so when the process your modelling is the poster child
>> for sensitive dependence on those initial conditions.
>
> The initial value will be whatever the source data provides, but
> when you are dealing with calculations in series it is desirable
> to conduct your work out to a much greater decimal place, so that
> in each calculation (where you lose significant bits), that bit
> loss accumulation does not consume your desired significant bits
> at the level of your original source data.
>
> Your calculation shouldn't be a limiting factor in precision, even
> if your source data may be the limiting factor.

That's why I said "many orders of magnitudes". You need more precision,
but you don't need about 64 (or more) bits of precision.

>> Back when I did a bit of it professionally, we'd run the same models and
>> multiple architectures, and the variation in FP behaviour was just
>> another way to add the ensembles.
>
> What does this mean?

Well the first bit means "I used to work running weather modelling programs".

So listen up.

Due to the non-linear nature of fluid dynamics, the equations have
sensitive dependence on initial conditions.

So, imagine you start with (say) a sea-surface-temperature of 3.0
degrees at some point area your grid and run the code. Of course no-one
measures temperatures to 3 decimal places - its idiotic - precision
without accuracy. And even if you had perfect information, you're going
to have to discretize that onto a fairly-coarse grid to model it, and
now you've lost all the precision that no-one collects in the first
place.

Now imagine your discretisation/interpolation changes so you have an SST
of 3.05 degrees in that area (which would be just a different
interpolation of your temporally- and geographically-sparse
observation). You run the code again, and in just a few days or weeks of
time (in the model, i.e. predicting a few weeks ahead) the results have
diverged *qualitatively* from your former run. This is exactly what is
referred to as the butterfly effect, which was first described in
weather modelling.

But this sensitivity isn't just on initial conditions, its also on
rounding. So you use the same inputs, and a different rounding mode --
or a processor that uses a weird 80-bit FPU -- and in a finite amount of
time, your results are quantitatively different.

So to get a good idea of what might happen you run the model multiple
times with similar-but-different initial conditions, and on
similar-but-different FPUs. This collection of models is called an
ensemble, and they are "averaged" in some sense to give an overall
estimate of what will happen, and bounds on how wrong that prediction is
likely to be.

Rick C. Hodgin

unread,

Mar 29, 2018, 3:13:23 PM3/29/18

to

On Thursday, March 29, 2018 at 2:49:08 PM UTC-4, gwowen wrote:
> "Rick C. Hodgin" <rick.c...@gmail.com> writes:
>
> > On Wednesday, March 28, 2018 at 2:00:59 PM UTC-4, gwowen wrote:
> >> Bonita Montero <Bonita....@gmail.com> writes:
> >>
> >> >> Weather modeling uses the double-double and quad-double libraries,
> >> >> which leverage x87 hardware to chain operations to create 128-bit,
> >> >> and 256-bit computations.
> >> >
> >> > I'll bet my right hand that weather modelling usually isn't done
> >> > with 128- or 256-bit FPs as these operations are ultimatively slow
> >> > when done in software.
> >>
> >> There's very little benefit in keeping precisions that's that many
> >> orders of magnitudes more precise than the measurements driving the
> >> model -- doubly so when the process your modelling is the poster child
> >> for sensitive dependence on those initial conditions.
> >
> > The initial value will be whatever the source data provides, but
> > when you are dealing with calculations in series it is desirable
> > to conduct your work out to a much greater decimal place, so that
> > in each calculation (where you lose significant bits), that bit
> > loss accumulation does not consume your desired significant bits
> > at the level of your original source data.
> >
> > Your calculation shouldn't be a limiting factor in precision, even
> > if your source data may be the limiting factor.
>
> That's why I said "many orders of magnitudes". You need more precision,
> but you don't need about 64 (or more) bits of precision.

I think you need 1 bit per calculation, because every calculation
will potentially introduce new rounding and decrease your precision.

If you take a value through 64 separate calculation operations to
arrive at the final value, you need the extra 64 bits to start with.
Of course, as you say, it will greatly exceed the accuracy of your
starting values, but that's a separate issue. Were you to improve
your data set or modeling ability, then by having the extra precision
already in there, nothing else would have to be changed. Each of
the calculations would proceed as they did before, and now you'd have
a better model.

Understood. I thought it was something like that with your use of
the word "ensamble," but I didn't see how you were generating those
"ensambles." I presume now there are the many models which then all
get input and weight-averaged to produce a "central result" which
the one broadcast on the evening news that night. :-)

--
Rick C. Hodgin

Christian Gollwitzer

unread,

Mar 29, 2018, 5:13:06 PM3/29/18

to

Am 29.03.18 um 21:13 schrieb Rick C. Hodgin:

> I think you need 1 bit per calculation, because every calculation
> will potentially introduce new rounding and decrease your precision.
>
> If you take a value through 64 separate calculation operations to
> arrive at the final value, you need the extra 64 bits to start with.

No, this is nonsense. Lookup "error propagation" or "uncertainty
propagation". There is a simple textbook model to incorporate
uncertainties of values into computations, by calculating a quadratic
sum of the input errors times the partial derivatives. Simple examples
for demonstration are catastrophic cancellation or square roots.

Catastrophic cancellation means that you subtract two close values and
get much less precision. E.g. if you know
a=3.05 +/- 0.01, b= 3.00 +/- 0.01
then
c = a - b = 0.05 +/- 0.014
SO even if ou knew a and b to 0.3% precision, a-b is known only to 20%
precision. That's a loss of 6 bits of precision!

OTOH if you take a square root, the number of significant figures
increases by 1 bit. Yes - the square root is more precise than the
radicand, which can be seen by expanding (1+epsilon)^(1/2) into a series.

What Gareth describes is known as "Monte-Carlo error propagation" in
metrology, and it is more accurate than the simple model, because it
doesn't assume which statistical distribution the errors follow. Another
way is to use the textbook model or special rounding rules to propagate
the uncertainty with the data - this is known as interval arithmetics.

Christian

Rick C. Hodgin

unread,

Mar 29, 2018, 5:33:47 PM3/29/18

to

On Thursday, March 29, 2018 at 5:13:06 PM UTC-4, Christian Gollwitzer wrote:
> Am 29.03.18 um 21:13 schrieb Rick C. Hodgin:
> > I think you need 1 bit per calculation, because every calculation
> > will potentially introduce new rounding and decrease your precision.
> >
> > If you take a value through 64 separate calculation operations to
> > arrive at the final value, you need the extra 64 bits to start with.
>
> No, this is nonsense.

The results of every floating point calculation cannot be determined
quite exactly. As a result, you take two values of 53-bit mantissa
precision and perform some operation on them, and you then have a
result where the 53rd bit is no longer certain. It has been rounded
to the nearest bit, resulting in 0.5 bits of error max, but that last
bit is no longer reliable. You're now down to 52. Repeat again, and
that 53rd bit now affects the result which will be in the 52nd position,
and both and 53 are now uncertain, leaving you 51 bits.

And it continues.

It's a side-effect of the last digit rounding on an FPU. The hardware
designers have actually gone to great lengths to maintain greater bits
internally so that when you perform a few calculations in succession,
the rounded bits are beyond the ones that can be stored, therefore the
final bits written out are all correct.

Another such example is the fused_multiply_add operation introduced in
IEEE-754-2008. It allows a multiply and an add to take place without
intermediate rounding. This results in a value using FMA which will
nearly always be different than if multiply, then add operations were
performed in serial.

Floating point is a good approximation of real numbers, but it has
many flaws. If you want to maintain precision over many successive
calculations in series, you need to have high precision computation
so the ULPs being affected at each stage are way out there, and not
close to your target bit requirement for your required precision
when converted to base-10.

> Lookup "error propagation" or "uncertainty
> propagation". There is a simple textbook model to incorporate
> uncertainties of values into computations, by calculating a quadratic
> sum of the input errors times the partial derivatives. Simple examples
> for demonstration are catastrophic cancellation or square roots.
>
> Catastrophic cancellation means that you subtract two close values and
> get much less precision. E.g. if you know
> a=3.05 +/- 0.01, b= 3.00 +/- 0.01
> then
> c = a - b = 0.05 +/- 0.014
> SO even if ou knew a and b to 0.3% precision, a-b is known only to 20%
> precision. That's a loss of 6 bits of precision!
>
> OTOH if you take a square root, the number of significant figures
> increases by 1 bit. Yes - the square root is more precise than the
> radicand, which can be seen by expanding (1+epsilon)^(1/2) into a series.
>
> What Gareth describes is known as "Monte-Carlo error propagation" in
> metrology, and it is more accurate than the simple model, because it
> doesn't assume which statistical distribution the errors follow. Another
> way is to use the textbook model or special rounding rules to propagate
> the uncertainty with the data - this is known as interval arithmetics.

I'm only speaking about FPU hardware losses of precision. There is
the last bit in computations called the ULP. One half the ULP is
potentially lost with each calculation depending on whether it can
be represented accurately or not.

https://en.wikipedia.org/wiki/Unit_in_the_last_place
https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

Once you lose the guarantee of reliability of the last bit, the
next calculation cannot be accurate to the bit before that, and
so on.

In many cases, however, the rounding will round up here, and down
there, resulting in a reasonable answer out to most mantissa bits.
But you can construct purposeful series that will demonstrate the
loss significantly.

--
Rick C. Hodgin