A right alternative to IEEE-754's format

Rick C. Hodgin

unread,

Mar 27, 2018, 10:39:22 AM3/27/18

to

I've been thinking about John Gustafson's unums, and something occur-
red to me. All of our IEEE-754 floating point values are stored with
either an explicit-1 or implicit-1, with the mantissa bits being
ascribed to the bits after that 1, moving away from the left.

What if we did it differently?

What if we stored the bits from the right (like the way we would
write our own numbers, of "125" being the equivalent of "125.",
rather than "1.25e+2"? This would require along two additional
pieces of information, which are (1) the length of the "mantissa"
portion in bytes, and (2) position of the period (which moves left
from the fully stored bit pattern), but we would gain the ability
to have far greater precision, and to have precision out beyond
our need to round. It would guarantee sufficient resolution based
on the needs of the computation to never have invalid results in
rounding in our computations.

-----
It seems to me that IEEE-754 is working form the perspective of a
binary 1.xxxxxx. This format I propose would be a "xxxxxxx." pat-
tern. The period position would indicate how far from the right
to move, such that if it were 0 it would be "xxxxxxx.", and if it
were 2 it would be "xxxxx.xx" and so on.

Instead of the explicit-1 or implicit-1 mantissa:

[sign][exp][mantissa]
[sign][exp]1.[mantissa]

It would be either this or this (depending on whether or not we would
even need to store the exponent portion using this methodology):

[sign][exp][period][length][bits]
[sign] [period][length][bits]

And if we wanted to use a fixed format, the bits [length] could be
removed and the assumed bit storage would be various bit sizes
based on resolution.

[sign][exp][period][bits]
[sign] [period][bits]

Given the nature of numbers represented in binary there could also
be a few bits reserved for special case scenarios, like for repeating
bit patterns so they can work out to whatever degree of precision is
required, but are stored minimally, with low integer, zero, and plus
and minus infinity values all stored with minimal bits, and so on.

-----
It just makes sense to me to store the data you need for the thing,
and to leave guesswork to something other than the computational
math engine of your CPU.

I think it would be desirable to integrate this kind of design
alongside traditional IEEE-754 engines, so that you have their
traditional support for fast/semi-reliable computation, but then
to add this new engine which guarantees exact value computation
(out to a rounding level), no matter how long the computation
takes or how much memory it requires.

MPFR already does this same sort of thing in software. I think
adding it to hardware would be desirable at this point in our
available transistor budgets and mature design toolsets.

--
Rick C. Hodgin

Quadibloc

unread,

Mar 27, 2018, 11:49:50 AM3/27/18

to

Burroughs stored the bits from the right to allow integer arithmetic to
be done as well as floating-point arithmetic with only one number format.

In general, floats start from the left because that makes
multiplication simpler.

Rick C. Hodgin

unread,

Mar 27, 2018, 1:05:55 PM3/27/18

to

On Tuesday, March 27, 2018 at 11:49:50 AM UTC-4, Quadibloc wrote:
> Burroughs stored the bits from the right to allow integer arithmetic to
> be done as well as floating-point arithmetic with only one number format.

It's a good idea, and something approaching a true universal number.

> In general, floats start from the left because that makes
> multiplication simpler.

How much silicon real-estate and how long would an arbitrary shift
take on modern process technologies?

Part of the goal of this engine would be to allow for arbitrary-
precision floating point values to be computed, so like MPFR's
engine, you'd be down to "attacking" 64-bits at a time. If you
could create an arbitrary shift loader to deal with the first
shift, and then pulls in the values after that, it shouldn't be
too difficult.

--
Rick C. Hodgin

Quadibloc

unread,

Mar 27, 2018, 2:00:46 PM3/27/18

to

On Tuesday, March 27, 2018 at 11:05:55 AM UTC-6, Rick C. Hodgin wrote:

> How much silicon real-estate and how long would an arbitrary shift
> take on modern process technologies?

You need an arbitrary shift anyways to perform addition. A barrel shifter, the
fast and efficient way to do large arbitrary shifts, does take a fair amount of
area, but so does a Wallace (or Dadda) tree multiplier.

However, variable length floating-point formats so as to maintain complete
precision... are not something that I expect would be terribly useful. I would
not oppose, say, an interpreter supporting such types in software; arbitrary-
length strings didn't slow down BASIC; but as they're not relevant to the usual
things computers do, implementing them in fast hardware... is just an invitation
to programmers to come up with new bugs that use up a computer's whole memory in
seconds.

Perhaps I'm wrong, and such features would be useful, for example, in the
context of a hardware assist for programs like Mathematica or Macsyma or Maple
or MathCAD. As these programs tend to be based on LISP, other hardware features
that assist with the speed of execution of programs in that language would also
be relevant.

John Savard

Rick C. Hodgin

unread,

Mar 27, 2018, 2:18:27 PM3/27/18

to

Many people use MPFR. If hardware-assisted arbitrary precision fp
were more available, I'd be willing to be more people would use it.
I know the Double-double and Quad-double libraries are used by super-
computers to more accurately compute larger floating point values
than their 64-bit or 80-bit maximums on various hardware. The DD/QD
library uses the FPU hardware to shift mantissa components into
larger values, resulting in 128-bit and 256-bit values allowing for
up to 35 digits of significance in base-10.

MPFR has many citations for where it's been found being used in the
wild:

http://www.mpfr.org/pub.html

I used it in 2010 for a project I was working on. I needed high
precision floating point values to do iterative calculations with
a low degree of error. The quad-double turned out to be sufficient
in the end, but I didn't know that at first. I initially used MPFR
on 1024-bit and 512-bit floats.

--
Rick C. Hodgin

Nick Maclaren

unread,

Mar 27, 2018, 2:23:04 PM3/27/18

to

In article <0d4dc7f8-1819-43e5...@googlegroups.com>,

Rick C. Hodgin <rick.c...@gmail.com> wrote:
>
>I think it would be desirable to integrate this kind of design
>alongside traditional IEEE-754 engines, so that you have their
>traditional support for fast/semi-reliable computation, but then
>to add this new engine which guarantees exact value computation
>(out to a rounding level), no matter how long the computation
>takes or how much memory it requires.

I think that you are seriously confused about numerical computation.
That (and all its variants) have been well-known to be a near-total
waste of time for over 40 years. Exact floating-point is a delusion,
at best, and has been shown to be such (for fundamental reasons) ad
tedium.

Regards,
Nick Maclaren.

Rick C. Hodgin

unread,

Mar 27, 2018, 2:30:51 PM3/27/18

to

On Tuesday, March 27, 2018 at 2:23:04 PM UTC-4, Nick Maclaren wrote:
> In article <0d4dc7f8-1819-43e5...@googlegroups.com>,
> Rick C. Hodgin <rick.c...@gmail.com> wrote:
> >
> >I think it would be desirable to integrate this kind of design
> >alongside traditional IEEE-754 engines, so that you have their
> >traditional support for fast/semi-reliable computation, but then
> >to add this new engine which guarantees exact value computation
> >(out to a rounding level), no matter how long the computation
> >takes or how much memory it requires.
>
> I think that you are seriously confused about numerical computation.
> That (and all its variants) have been well-known to be a near-total
> waste of time for over 40 years.

You've mentioned this every time any alternative to IEEE-754 come
up. You have sources you've read which found these conclusions
that you're recounting here I presume.

Do you have any links for me to look at to figure out why it is
invalid?

And even if my knowledge of numerical computation is inadequate,
I look to someone like John Gustafson who has a PhD in mathematics
and recognize that he's still searching for a solid replacement.
He acknowledges that IEEE-754 is inadequate in a wide range of
cases. His goal is to address something better.

> Exact floating-point is a delusion,
> at best, and has been shown to be such (for fundamental reasons) ad
> tedium.

To be clear, I don't expect to get "exact floating point," but I do
desire to get floating point representation to a point well beyond
the need of the thing. I can use Nnn-bits to store something, giving
my need 50+ significant bits of precision..

The ability to enter arbitrary precision allows us to reach a level
of precision where the trailing rounding will no longer be a factor
in our calculations. It will come at the expense of speed, but as I
say it should be done in parallel with an existing IEEE-754 engine,
one which has the Double-double and Quad-double extensions created
here added in to hardware:

(Look for QD):

http://crd-legacy.lbl.gov/~dhbailey/mpdist/

--
Rick C. Hodgin

Nick Maclaren

unread,

Mar 27, 2018, 2:51:06 PM3/27/18

to

In article <9e137c61-e375-4bea...@googlegroups.com>,

Rick C. Hodgin <rick.c...@gmail.com> wrote:
>> >
>> >I think it would be desirable to integrate this kind of design
>> >alongside traditional IEEE-754 engines, so that you have their
>> >traditional support for fast/semi-reliable computation, but then
>> >to add this new engine which guarantees exact value computation
>> >(out to a rounding level), no matter how long the computation
>> >takes or how much memory it requires.
>>
>> I think that you are seriously confused about numerical computation.
>> That (and all its variants) have been well-known to be a near-total
>> waste of time for over 40 years.
>
>You've mentioned this every time any alternative to IEEE-754 come
>up. You have sources you've read which found these conclusions
>that you're recounting here I presume.

Of course but, equally importantly, I have confirmed the mathematics
myself.

>Do you have any links for me to look at to figure out why it is
>invalid?

Er, LINKS? From that far back? The reason that there aren't many
(or any?) is that it so well-known. Just get any decent book on
numerical analysis and read about forward and backward methods. Then
consider any plausible algorithm (say, Cholesky and Newton-Raphson)
and work out how the errors in the BASIC calculations build up - i.e.
do it WITHOUT using the error analysis that shows they are limited,
because that's what such arithmetic would do.

It's a lot trickier to read up about the computational complexity
aspects, both because it needs much more mathematics and because most
modern CS doesn't address the numerical aspect.

>And even if my knowledge of numerical computation is inadequate,
>I look to someone like John Gustafson who has a PhD in mathematics
>and recognize that he's still searching for a solid replacement.
>He acknowledges that IEEE-754 is inadequate in a wide range of
>cases. His goal is to address something better.

I am fully aware of that, and am also aware what his reputation is
among the leading people in these fields. I have also read some of
his justifications for unums, and am unconvinced.

>To be clear, I don't expect to get "exact floating point," but I do
>desire to get floating point representation to a point well beyond
>the need of the thing. I can use Nnn-bits to store something, giving
>my need 50+ significant bits of precision..

The days when a factor of two in memory usage was a deal-breaker have
long since gone. While such a format COULD be implemented in hardware,
it doesn't make any sense to so so. Just emulate 256-bit floating-
point in software - or longer, if you prefer.

There ARE things that the hardware could do to help this, quite a lot,
but it's NOT by implementing a new number format - it's by providing
the primitives that would help to implement one efficiently. Terje is
our current expert, I believe.

>The ability to enter arbitrary precision allows us to reach a level
>of precision where the trailing rounding will no longer be a factor

>in our calculations. ...

And that is what is not true, and is shown not to be true by good
numerical analysis textbooks. Unless you take careful action to stop
it, exponential build-up of error is common, and that can be in
either the size of the problem or the number of steps, or both.

Regards,
Nick Maclaren.

MitchAlsup

unread,

Mar 27, 2018, 9:09:12 PM3/27/18

to

On Tuesday, March 27, 2018 at 1:00:46 PM UTC-5, Quadibloc wrote:
> On Tuesday, March 27, 2018 at 11:05:55 AM UTC-6, Rick C. Hodgin wrote:
>
> > How much silicon real-estate and how long would an arbitrary shift
> > take on modern process technologies?
>
> You need an arbitrary shift anyways to perform addition. A barrel shifter, the
> fast and efficient way to do large arbitrary shifts, does take a fair amount of
> area, but so does a Wallace (or Dadda) tree multiplier.

We quit using barrel shifters at the 1.0 micron level. IBM wrote a paper
on simply using multiplexers and showed that it was both faster and more
compact. A barrel shifter is necessarily 2*n wires wide (minimum) whereas
a multiplexer shifter is n+ln2(n) bits wide. n+ln2(n) is always smaller
than n*n except when n=1.

BGB

unread,

Mar 27, 2018, 11:18:14 PM3/27/18

to

On 3/27/2018 1:00 PM, Quadibloc wrote:
> On Tuesday, March 27, 2018 at 11:05:55 AM UTC-6, Rick C. Hodgin wrote:
>
>> How much silicon real-estate and how long would an arbitrary shift
>> take on modern process technologies?
>
> You need an arbitrary shift anyways to perform addition. A barrel shifter, the
> fast and efficient way to do large arbitrary shifts, does take a fair amount of
> area, but so does a Wallace (or Dadda) tree multiplier.
>

yeah...

I was doing a smaller core design (64V) which does the funkiness of
using 16-bit multipliers for a 64-bit ISA.

ex:
16-bit instruction words;
16*16=>32b multiplier;
fixed shifts (1/2/4/8/16/32 bits).
currently only logical shift right though.

the fixed shifts can fake both arithmetic and logical shift-right by
using sign/zero extension, ex ("i>>21"):
EXTS.L R9 //sign extend input
SHLR16 R9 //R9=R9>>16
SHLR4 R9 //R9=R9>>4
SHLR1 R9 //R9=R9>>1
EXTS.L R9 //eliminate high-order zero bits
//technically optional if subsequent operations ignore high bits

for implementing variable-shifts, a jump table is used (to the
appropriate combination of piece-wise shift ops).

the result of this is that this 64-bit core doesn't cost significantly
more than an otherwise similar 32-bit core (~ 3.5 kLUT). (it is also
considerably cheaper than the cores implementing the 64A and 64C ISA's).

this particular core though will also lack an FPU, but I did go and
implement software floating-point emulation for this.

I did test, and Quake1 still "more or less" works on all this, if albeit
the performance is pretty abysmal (basically a slide show), and there
are some unresolved bugs.

did realize a trick though (as part of this) of using an 8-bit lookup
table to implement floating point division (as an adjustment for the
fast-approximate value), which while still not able to directly generate
an accurate result, does at least get the result close enough to
significantly reduce the number of Newton-Raphson steps needed.

when I went from my 64A ISA design to the 64C design, also dropped the
64*64=>128b multiplier with this ISA in favor of only offering
32*32=>64b (or, a 32-bit multiplier); though 64C retains the full 64-bit
barrel shifter as this doesn't seem to be nearly as slow or expensive as
a 64-bit multiplier.

I discovered mostly that cases which need 64-bit multiplication are rare
enough that performance doesn't seem to suffer significantly by faking it.

however, sadly, still don't have a fast/cheap double-precision FPU
design (partly a problem as it seems that Quake is a lot more dependent
on Double and FDIV performance than is ideal; it seems it actually does
some FDIV operations as part of some of its inner rasterization loops,
...). this may be partly because Quake1 sort of went "shiny new toy"
when it came to the use of floating-point stuff (using it for some
things I probably wouldn't use the FPU for even on modern HW).

though, could, technically, rewrite the renderer loops to not use
floating point math; but this is sort of a cop-out... (similarly would
go for switching from using double-precision Taylor-series for the
"math.h" functions to single-precision LUT+NR or CORDIC versions or
something).

while technically infrequent, the cost of the current trigonometric
functions is quite substantial (the Taylor series logic doing lots of
exponents and divisions, which in turn need require use of
Newton-Raphson to find the reciprocal, and using multiply operations
which rely on an integer multiply itself done as a large blob of 16-bit
multiply and addition ops; all using "who knows how many" clock cycles).

all this does basically mean though, that if I want Quake to perform
acceptably, will have to figure out how to make a cheap and fast
fully-working FPU for 64C.

which is a way kind of funny that Quake's performance doesn't really
seem to notice too much when being run on 16-bit integer multiplier, or
a jump-table based integer shift, so long as it has a working FPU; which
seems somehow a bit backwards in a way.

> However, variable length floating-point formats so as to maintain complete
> precision... are not something that I expect would be terribly useful. I would
> not oppose, say, an interpreter supporting such types in software; arbitrary-
> length strings didn't slow down BASIC; but as they're not relevant to the usual
> things computers do, implementing them in fast hardware... is just an invitation
> to programmers to come up with new bugs that use up a computer's whole memory in
> seconds.
>

yep.

doesn't seem worth it though.

however, OTOH, for my ISA I have made some use of half-float, as it
turns out a significant majority of typical floating constants can be
encoded exactly as a half (and loading the constants directly is better
than an indirect load from memory; and sometimes they are useful as a
way to "compress" things like lookup tables and similar).

for example, something like:
"const float foo_arr[256]={...};"
might be quietly demoted to:
"const short float foo_arr[256]={...};"
if the array is never taken as a pointer, and the values can all be
represented exactly.

and "f=3.5;" is very likely to see the constant's representation
silently demoted, even if the value is semantically a float or double.

however, unlike integer constants (which tend strongly towards a
bell-curve centered at zero); only a minority of literal values can be
represented exactly as an 8-bit microfloat.

however, all this is done statically by the compiler.

for dynamic values in memory, it seems like the overhead from trying to
deal with dynamic storage sizes would overshadow any gains.

> Perhaps I'm wrong, and such features would be useful, for example, in the
> context of a hardware assist for programs like Mathematica or Macsyma or Maple
> or MathCAD. As these programs tend to be based on LISP, other hardware features
> that assist with the speed of execution of programs in that language would also
> be relevant.
>

on the bigger end, it seems like it could be similarly effective to just
have several different sizes, and for a dynamic case, maybe select and
promote to the largest size.

dynamic can be used for cases where this works out cheaper.

for example, float128 and float256 can be emulated in software.

jim.bra...@ieee.org

unread,

Mar 27, 2018, 11:31:30 PM3/27/18

to

]> You've mentioned this every time any alternative to IEEE-754 come

]> up. You have sources you've read which found these conclusions
]> that you're recounting here I presume.

IMHO computer floating-point is wack-a-mole: every time you come up with what appears to be a better format or algorithm some part of it doesn't work the way you want. Trying to patch only continues the wack-a-mole.

That said, there are alternatives to IEEE-754 that improve the situation.

Jim Brakefield

EricP

unread,

Mar 27, 2018, 11:51:45 PM3/27/18

to

Weren't barrel shifters always built with muxes
(either nmos pass gates or t-gates)?
Surely no one would build one out of Nand and Nor gates.

Quadibloc

unread,

Mar 28, 2018, 4:25:45 AM3/28/18

to

On Tuesday, March 27, 2018 at 12:30:51 PM UTC-6, Rick C. Hodgin wrote:

> You've mentioned this every time any alternative to IEEE-754 come
> up.

Even IEEE-754 is overkill, involving extra complications that cause unnecessary
delays, for much scientific floating-point computation. So an even more elaborate
format is quite naturally questionable.

There is no substitute for programmers actually learning some numerical analysis.

John Savard

Rick C. Hodgin

unread,

Mar 28, 2018, 8:35:40 AM3/28/18

to

I haven't started working on any logic or wiring for how this right-
justified system would work, but in thinking about it the idea of
being able to have an arbitrary precision value in floating point in
hardware is more desirable than being limited to certain size in
hardware, with software algorithms designed to cover the rest.

We know there are issues with IEEE-754. John Gustafson enumerates
some of them here:

"Stanford Seminar: Beyond Floating Point: Next Generation
Computer Arithmetic" by John Gustafson

https://www.youtube.com/watch?v=aP0Y1uAA-2Y

As John Savard points out, IEEE-754 has issues. If we are going to
re-think its design, I think the goal should be to design a new all-
time solution, rather than merely a patch (recognizing, of course,
that an all-time solution doesn't truly exist, but rather one that's
designed not as a temporary measure, but is well thought out and
engineered to be a real long-term solution).

Given where we are in process technology today, and where we're
heading in N years, it seems logical to create something that is
comprehensive and beneficial to developers.

Hardware is so fast today, and software toolsets are so prevalent
and in such high use by millions of developers, that we need to
shift some of the burden of doing the hard work away from their
shoulders, and onto our shoulders.

An arbitrary-precision numeric unit, one capable of handling both
integer and floating point, is a desirable addition to hardware.
It enables native software to be created without reliance upon
third party libraries. It could be created as a special engine
within the core designed only to handle those workloads which
require such a beast, but when such a beast is available people
will find a use for it.

For most applications, 32-bit floating point is enough. And
when it isn't, for most of the rest 64-bit or 80-bit floating
point is enough. But for those applications where it is not
enough, I believe the hardware group should provide facilities
to the software group which enable them to enter into the world
of arbitrary precision without any additional need.

Instead of encoding the "fld st0,tbyte ptr x" instruction, to
do a floating point load on a 10-byte (80-bit) source, you use
another instruction like "fld st0,x,512" to load a 512-bit
arbitrary precision value. It might even be such that it's
merely loaded as an alias or reference to the location in
memory, and not actually populating a register just yet, but
is loaded on-demand when a calculation requiring it is needed.

-----
For my purposes in designing a new architecture as an offering
unto Christ first, and people second, my goals are to re-think
all things without regard to money purposes, or timelines, or
schedules or anything apart from using the greatest talents and
gifts God first gave us to provide a desirable tool for other
people to use, to lighten the burden upon them by our additional
hard work, to make their lives easier across the board.

I view it as a very admirable and desirable goal ... to help
people have the tools they need to become better and do more.

--
Rick C. Hodgin

Terje Mathisen

unread,

Mar 28, 2018, 8:59:12 AM3/28/18

to

Nick Maclaren wrote:
> The days when a factor of two in memory usage was a deal-breaker have
> long since gone. While such a format COULD be implemented in hardware,
> it doesn't make any sense to so so. Just emulate 256-bit floating-
> point in software - or longer, if you prefer.
>
> There ARE things that the hardware could do to help this, quite a lot,
> but it's NOT by implementing a new number format - it's by providing
> the primitives that would help to implement one efficiently. Terje is
> our current expert, I believe.

The main things I'd like in order to do 128 or 256-bit sw floating point
would be double-wide regular and sticky shifts, along with the usual
widening multiply.

(A sticky shift corresponds to rounding away from zero, i.e. the bottom
bit is OR'ed with all the bits that are shifted out.)

You also need a fast find-first-set-bit/count-leading-zeroes. If this
can work across pairs of registers it would be even better...

It also helps to have a fast classifier, i.e. something that can look at
sign+exponent+(mantissa != 0), with the exponent/mantissa boundary
either freely variable or along the correct FP128/FP256 locations.

The return value tells you if the input was
zero/denormal/normal/inf/nan, referably ordered so that a single
test/compare can flag all special cases.

This way you can replace the missing hidden bit very quickly and emulate
a quad FMUL with 4 64x64->128 MULs while ADD/ADC the parts together,
doing the core 112x112 bit multiplication in ~10 cycles.

Since the final normalization (in the case of one denorm input) can
require a shift of up to 112 (111?) bits for FP128 and 230+ bits for
FP256 it helps to be able to run multiple code paths in parallel and
merge them without branching.

Terje

--
- <Terje.Mathisen at tmsw.no>
"almost all programming can be viewed as an exercise in caching"

Terje Mathisen

unread,

Mar 28, 2018, 9:11:21 AM3/28/18

to

I've never seen a shifter (barrel or otherwise) but my mental model of
it always had log2(n) stages:

n = (count & 32) ? n >> 32 : n;
n = (count & 16) ? n >> 16 : n;
n = (count & 8) ? n >> 8 : n;
n = (count & 4) ? n >> 4 : n;
n = (count & 2) ? n >> 2 : n;
n = (count & 1) ? n >> 1 : n;

Is this just a stupid model?

Anton Ertl

unread,

Mar 28, 2018, 9:47:54 AM3/28/18

to

Terje Mathisen <terje.m...@tmsw.no> writes:
>EricP wrote:

>> MitchAlsup wrote:
>>> We quit using barrel shifters at the 1.0 micron level. IBM wrote a
>>> paper on simply using multiplexers and showed that it was both
>>> faster and more compact. A barrel shifter is necessarily 2*n wires
>>> wide (minimum) whereas a multiplexer shifter is n+ln2(n) bits wide.
>>> n+ln2(n) is always smaller than n*n except when n=1.
>>
>> Weren't barrel shifters always built with muxes (either nmos pass
>> gates or t-gates)? Surely no one would build one out of Nand and Nor
>> gates.
>
>I've never seen a shifter (barrel or otherwise) but my mental model of
>it always had log2(n) stages:

The Wikipedia picture shows a wired-or-implementation (subtitled
"Crossbar Barrel Shifter"):
<https://en.wikipedia.org/wiki/File:Crossbar_barrel_shifter.svg>

OTOH,
<https://www.researchgate.net/figure/Barrel-Shifter-design-1_fig1_281666651>
shows ld(n) levels of muxes.

There is also
<https://www.researchgate.net/figure/Shifter-architectures-a-Logarithmic-shifter-b-Barrel-shifter_fig20_269818702>,
which makes a distinction between logarithmic shifters and barrel
shifters, but both graphics are too incomplete to be sure about their
meaning; the "logarithmic shifter" might use 2->1 muxes, and the
"barrel shifter" 4-input muxes.

As far as I can see, a design using ld(n) levels of muxes also uses 2n
wires at each level (plus the control lines of the muxes): each mux
has 2 n-bit inputs

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

jim.bra...@ieee.org

unread,

Mar 28, 2018, 10:38:06 AM3/28/18

to

]> Surely no one would build one out of Nand and Nor gates.

The CDC 6600 did exactly that. Three input nands and output "bundling" so that a pair of nands served as a 2:1 mux (plus inversion).

Jim Brakefield

MitchAlsup

unread,

Mar 28, 2018, 12:02:52 PM3/28/18

to

On Tuesday, March 27, 2018 at 10:51:45 PM UTC-5, EricP wrote:

Well, yes and no. We build Barrel Shifters with n-channel pass gates.
You could call this a mux, but a real mux is build with a Nand gate
followed by a Nand Gate.

A 32*32 barrel shifter uses 1024 N-channel pass gates, with 32-drivers
ono the left, 32-drivers on the right, and 32 sense amps on whatever
side is convenient.

In the 2 layers of metal era, we could hide the barrel shifter under
the wires with the decoder on top. In the 3 metal era (and beyond)
we could hide the shifter and decoder under the wires; thus cooupying
less area, and they were faster to mainly due to reducing fan-in and
fan-out (edge speed reductions).

MitchAlsup

unread,

Mar 28, 2018, 12:08:01 PM3/28/18

to

On Wednesday, March 28, 2018 at 7:35:40 AM UTC-5, Rick C. Hodgin wrote:

> As John Savard points out, IEEE-754 has issues.

IEEE has known qualities.

Programmers have been so cuddled by IEEE qualities that they forgot
how to do numerical analysis. This is the problem.

If you took a typical teenager and teach him how to drive a car on
city streets, and then put him in a 1000 HP supercar, he WILL wreck
it. The same goes for teaching programmers floating point arithmetics
and then assigning them to a task suitable for a Supercomputer to run.

They will both crash (and likely burn).

THe problem is NOT the arithmetics, it is large scale temperance of
the algorithm(s).

Nick Maclaren

unread,

Mar 28, 2018, 12:22:11 PM3/28/18

to

In article <84ffe294-ef08-4c71...@googlegroups.com>,

<jim.bra...@ieee.org> wrote:
>
>IMHO computer floating-point is wack-a-mole: every time you come up with
>what appears to be a better format or algorithm some part of it doesn't
>work the way you want. Trying to patch only continues the wack-a-mole.

Yes. There are alternatives, plenty of them, but they are all better
in some respects and worse in others. And there are good theoretical
reasons to believe that is fundamental to the problem!

>That said, there are alternatives to IEEE-754 that improve the situation.

Yes. Like IEEE-754 with division by zero giving a NaN, and propagating
NaN reliably. But that's heresy ....

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Mar 28, 2018, 12:25:44 PM3/28/18

to

In article <p9g3it$hkp$1...@gioia.aioe.org>,

Thanks. That's pretty close to what I wanted when I looked at this
area, a long time back. Surprise, surprise :-)

Regards,
Nick Maclaren.

Rick C. Hodgin

unread,

Mar 28, 2018, 2:53:02 PM3/28/18

to

Terje Mathisen, I wanted to ask your opinion of the idea of having
arbitrary precision computation abilities in hardware? It could be
micro-coded to use a type of software algorithm, but the hardware
resources themselves could be made visible to developers in a native
way.

Also, have you any experience with right-aligned bit data for the
mantissa portion? Or have you ever considered it?

What are some of your thoughts on why it wouldn't work or why it's
a bad idea?

Thank you in advance.

--
Rick C. Hodgin

Quadibloc

unread,

Mar 28, 2018, 2:59:56 PM3/28/18

to

I suppose that if that _isn't_ what the spec calls for, then it's "too late
now". But it certainly seems appropriate to me.

Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
unknown", so that what little information one might have is not thrown away.

Because -5000 divided by 0 can't possibly be either 0 or 3, unlike 0 divided by
0.

John Savard

Quadibloc

unread,

Mar 28, 2018, 3:01:48 PM3/28/18

to

However, I *am* willing to tolerate the fudge that 0^0, instead of being a NaN,
is 1, since that is required in the majority of cases. (However, if one can
control rounding direction - by the way, all the 8087 features ought to be
brought back - it should be possible to turn that *off*.)

John Savard

Quadibloc

unread,

Mar 28, 2018, 3:15:19 PM3/28/18

to

On Wednesday, March 28, 2018 at 12:59:56 PM UTC-6, Quadibloc wrote:

> Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
> indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
> unknown", so that what little information one might have is not thrown away.

And infinity times zero would also produce "totally unknown", while *overflow*
times zero would produce zero. So a finite number too large to represent due to
an exponent overflow would be distinguished from a true infinity resulting from
division by zero.

The trouble with that is, what do you do at the small end? Making an underflow
into another kind of NaN would make sense if one only multiplied and never
added. Taking underflows to zero makes sense for the usual arithmetic
operations, but it does lose information.

One way out would be reserve the exponent 00...001 for gradual underflow, and
the exponent 00...000 for zero (or maybe the other way around, to remove the
binary point misalignment), with the last few bits of zero containing a NaN
code, so that infinitesimal NaNs could be processed as plain arithmetical
zeroes, without delays to computation, for addition and subtraction. That would
be a new format that isn't IEEE 754 any more, of course.

But if you're going to have NaNs, then either you intended to be picky, or
what's the point?

John Savard

Rick C. Hodgin

unread,

Mar 28, 2018, 3:18:05 PM3/28/18

to

On Wednesday, March 28, 2018 at 2:59:56 PM UTC-4, Quadibloc wrote:
> On Wednesday, March 28, 2018 at 10:22:11 AM UTC-6, Nick Maclaren wrote:
> > Yes. Like IEEE-754 with division by zero giving a NaN, and propagating
> > NaN reliably. But that's heresy ....
>
> I suppose that if that _isn't_ what the spec calls for, then it's "too late
> now". But it certainly seems appropriate to me.

John Gustafson's position is that anytime you receive a NaN, the
process should call a handler at that point which allows it to
either replace the NaN with something valid (like a 1 or 0) and
keep going, or to be able to terminate the operation completely
at that point and save the compute time and power due to the
error condition. This was stated in his Stanford video.

As I understand it today, that's the whole purpose of signaling
NaNs and quiet NaNs, to either signal its error state, or just
fall through each computation polluting every result with itself.

Intel's Developer Manual states in 4.8.3.4:

Page 101
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-vol-1-manual.pdf

"QNaNs are allowed to propagate through most arithmetic
operations without signaling an exception."

It does say "most arithmetic operations."

Is there an example where NaNs do not propagate reliably?

--
Rick C. Hodgin

Terje Mathisen

unread,

Mar 28, 2018, 3:52:30 PM3/28/18

to

One of the largest changes in the 2018 update of 754 is a fix to min/max
in order to make NaN propagation work reliably (as well as
commutatively, more or less).

Terje Mathisen

unread,

Mar 28, 2018, 3:58:39 PM3/28/18

to

Quadibloc wrote:
> On Wednesday, March 28, 2018 at 10:22:11 AM UTC-6, Nick Maclaren
> wrote:
>> In article
>> <84ffe294-ef08-4c71...@googlegroups.com>,
>> <jim.bra...@ieee.org> wrote:
>
>>> That said, there are alternatives to IEEE-754 that improve the
>>> situation.
>
>> Yes. Like IEEE-754 with division by zero giving a NaN, and
>> propagating NaN reliably. But that's heresy ....
>
> I suppose that if that _isn't_ what the spec calls for, then it's
> "too late now". But it certainly seems appropriate to me.
>
> Myself, I would have (nonzero) divided by (zero) produce the NaN
> "infinity of indeterminate sign" while (zero) divided by (zero)
> produces the NaN "totally unknown", so that what little information
> one might have is not thrown away.

The idea is to throw away information as late as possible (ref denorm),
which means that a divison by epsilon should return a large number, and
when epsilon is very small and the dividend too large, the result will
overflow to infinity.

I.e. (x / 0), with x nonzero, must return Inf.

>
> Because -5000 divided by 0 can't possibly be either 0 or 3, unlike 0
> divided by 0.

(0 / 0) is indeed NaN, just like (Inf/Inf).

MitchAlsup

unread,

Mar 28, 2018, 4:13:33 PM3/28/18

to

On Wednesday, March 28, 2018 at 7:59:12 AM UTC-5, Terje Mathisen wrote:
> Nick Maclaren wrote:
> > The days when a factor of two in memory usage was a deal-breaker have
> > long since gone. While such a format COULD be implemented in hardware,
> > it doesn't make any sense to so so. Just emulate 256-bit floating-
> > point in software - or longer, if you prefer.
> >
> > There ARE things that the hardware could do to help this, quite a lot,
> > but it's NOT by implementing a new number format - it's by providing
> > the primitives that would help to implement one efficiently. Terje is
> > our current expert, I believe.
>
> The main things I'd like in order to do 128 or 256-bit sw floating point
> would be double-wide regular and sticky shifts, along with the usual
> widening multiply.

Gate delay wise, a 256-bit shifter takes 4 layers of 4-input multiplexers
(or 8 gates of delay). Wire delay will completely swamp gate delay.

>
> (A sticky shift corresponds to rounding away from zero, i.e. the bottom
> bit is OR'ed with all the bits that are shifted out.)
>
> You also need a fast find-first-set-bit/count-leading-zeroes. If this
> can work across pairs of registers it would be even better...

I have shown, here in comp.arch, a FF1 circuit that can FF1 over 32-bits
in 3 gates of delay (4-input NAND and 3-input NOR), one of the outputs of
this is a "did not find any" so one can cascade 3 more levels (128-bits)
into 4 gates of delay. At this point wire is much more significant in the
delay than are the gates. In comparison, a 32-bit adder is 10-gates of delay
(Carry select Adder DEC Alpha style).

Count leading zeros is a much harder problem.

>
> It also helps to have a fast classifier, i.e. something that can look at
> sign+exponent+(mantissa != 0), with the exponent/mantissa boundary
> either freely variable or along the correct FP128/FP256 locations.

Exponent == 0 is 2 gates of delay (four 3-input NOR gates and one 4-input
NAND gate.) It is so fast we don't even bother inventing a hidden 1 bit,
we just use the !(exp==0) as the hidden bit.

Fraction == 0 is 4 gates of delay (about the speed of a flip-flop).

So f==0 is only 5 gates of delay, less than 1/2 cycle on the fastest
machines (12 gates per cycle +flop/jitter/skew) or 1/3rd on the also
ran (16-gates +flop/jitter/skew).

MitchAlsup

unread,

Mar 28, 2018, 4:19:11 PM3/28/18

to

On Wednesday, March 28, 2018 at 2:15:19 PM UTC-5, Quadibloc wrote:
> On Wednesday, March 28, 2018 at 12:59:56 PM UTC-6, Quadibloc wrote:
>
> > Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
> > indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
> > unknown", so that what little information one might have is not thrown away.
>
> And infinity times zero would also produce "totally unknown", while *overflow*
> times zero would produce zero. So a finite number too large to represent due to
> an exponent overflow would be distinguished from a true infinity resulting from
> division by zero.
>
> The trouble with that is, what do you do at the small end? Making an underflow
> into another kind of NaN would make sense if one only multiplied and never
> added. Taking underflows to zero makes sense for the usual arithmetic
> operations, but it does lose information.

One could recode WHY the NaN was produced in the NaN. An encoding space
of 4-bits would cover the vast majority of things programmers care about.
Since there are 52-bits available, one COULD use the low bits of INFINITY
to record why this infinity was generated,...

But this would require sensible people making decisions in IEEE process.

MitchAlsup

unread,

Mar 28, 2018, 4:22:24 PM3/28/18

to

On Wednesday, March 28, 2018 at 2:52:30 PM UTC-5, Terje Mathisen wrote:
> Nick Maclaren wrote:
> > In article <84ffe294-ef08-4c71...@googlegroups.com>,
> > <jim.bra...@ieee.org> wrote:
> >>
> >> IMHO computer floating-point is wack-a-mole: every time you come up
> >> with what appears to be a better format or algorithm some part of
> >> it doesn't work the way you want. Trying to patch only continues
> >> the wack-a-mole.
> >
> > Yes. There are alternatives, plenty of them, but they are all
> > better in some respects and worse in others. And there are good
> > theoretical reasons to believe that is fundamental to the problem!
> >
> >> That said, there are alternatives to IEEE-754 that improve the
> >> situation.
> >
> > Yes. Like IEEE-754 with division by zero giving a NaN, and
> > propagating NaN reliably. But that's heresy ....
>
> One of the largest changes in the 2018 update of 754 is a fix to min/max
> in order to make NaN propagation work reliably (as well as
> commutatively, more or less).

But MAX(x,NaN) = x. You have not preserved/propagated the NaN, you have
filtered it.

Terje Mathisen

unread,

Mar 28, 2018, 6:16:17 PM3/28/18

to

MitchAlsup wrote:
> On Wednesday, March 28, 2018 at 2:15:19 PM UTC-5, Quadibloc wrote:
>> The trouble with that is, what do you do at the small end? Making
>> an underflow into another kind of NaN would make sense if one only
>> multiplied and never added. Taking underflows to zero makes sense
>> for the usual arithmetic operations, but it does lose information.
>
> One could recode WHY the NaN was produced in the NaN. An encoding
> space of 4-bits would cover the vast majority of things programmers
> care about. Since there are 52-bits available, one COULD use the low
> bits of INFINITY to record why this infinity was generated,...
>
> But this would require sensible people making decisions in IEEE
> process.

Mitch! :-)

There are a lot of sensible, I would even say very smart, people who
have been involved with 754 since its invention, but today the largest
(by far!) road block is the embedded base.

In hindsight it could have been nice to reserve one or two more exponent
values, i.e. 0 -> zero, 1 -> denorm, 2-fffd -> normal, fffe -> inf, ffff
-> NaN but that will definitely never happen now, and at the time it was
considered that halving or quartering the max range was unacceptable.

Terje Mathisen

unread,

Mar 28, 2018, 6:24:40 PM3/28/18

to

No!

We replaced the old (completely broken imnsho) MIN/MAX which said that
MAX(x,SNaN) = QNaN and MAX(x,QNaN) = x with two new pairs of functions:

One that suppresses all NaNs, i.e. only considers non-NaN inputs, it is
there for those people who considers NaN a replacement for Mill's None,
i.e. an empty value.

The other new pair handles all NaNs properly, i.e. always propagate them.

I was one of the more vocal supporters of having this as the only sane
default behavior, but it is up to languages and their runtime libraries
which form they prefer.

jim.bra...@ieee.org

unread,

Mar 28, 2018, 7:23:27 PM3/28/18

to

]> Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of

]> indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
]> unknown", so that what little information one might have is not thrown away.

On saving info: one idea is to take two exponent or mantissa bits and encode four types of numbers:
0) IEEE 754 exacts
1) Inexacts centered on IEEE 754 exacts
2) Inexacts centered between IEEE 754 exacts: value is somewhere between adjacent IEEE 754 exacts
3) Inexact zero or infinity: an inexact centered on zero or unsigned infinity with the exponent giving the width of the inexactness

For some embedded applications it is known that zero or infinity or denorms will not happen or if they do happen it is a fatal error, so #2 has the best combination of circuit cost and circuit delay. No two bit encoding field needed.

For short floats probably best to limit the encoding field to zero or one bit: codes 0 and/or 2.

For larger floats the two bit code field is affordable.

Jim Brakefield

matt...@gmail.com

unread,

Mar 28, 2018, 8:24:55 PM3/28/18

to

On Wednesday, March 28, 2018 at 5:24:40 PM UTC-5, Terje Mathisen wrote:
> MitchAlsup wrote:
> > On Wednesday, March 28, 2018 at 2:52:30 PM UTC-5, Terje Mathisen
> > wrote:
> >> Nick Maclaren wrote:
> >>> Yes. Like IEEE-754 with division by zero giving a NaN, and
> >>> propagating NaN reliably. But that's heresy ....
> >>
> >> One of the largest changes in the 2018 update of 754 is a fix to
> >> min/max in order to make NaN propagation work reliably (as well as
> >> commutatively, more or less).
> >
> > But MAX(x,NaN) = x. You have not preserved/propagated the NaN, you
> > have filtered it.
> >
>
> No!
>
> We replaced the old (completely broken imnsho) MIN/MAX which said that
> MAX(x,SNaN) = QNaN and MAX(x,QNaN) = x with two new pairs of functions:

This was a poor choice even with any input SNaN requiring a trap and/or SNaN->QNaN result.

> One that suppresses all NaNs, i.e. only considers non-NaN inputs, it is
> there for those people who considers NaN a replacement for Mill's None,
> i.e. an empty value.

MAX(x,SNaN) = x and MAX(x,QNaN) = x

This choice is likely to be popular as it is closest to C99 fmax()/fmin() behavior. Fast FPU implementations like SIMD/vector FPUs are likely to favor this choice but I expect most would like to do away with the concept and complexity of a SNaN. Isn't this a violation of SNaN handling which requires converting input SNaN->QNaN output with optional trapping?

> The other new pair handles all NaNs properly, i.e. always propagate them.

MAX(x,SNaN) = QNaN and MAX(x,QNaN) = QNaN

I believe this would have been the best choice from the beginning but it is probably too late to be popular unless the C standard adopts replacement functions for fmax()/fmin(). I would prefer to add FMAX/FMIN instructions to an FPU with this behavior (probably simplest to support in hardware) but I'm afraid it would receive little support in C. Are you aware of any communications or attempts to add C standard functions supporting this choice?

P.S. There are 51 bits double precision and 22 bits single precision of mantissa/fraction to define custom QNaN or SNaN values. They have been supported since at least the 1980's in FPUs but I haven't seen much use made of this feature.

Quadibloc

unread,

Mar 28, 2018, 8:30:05 PM3/28/18

to

On Wednesday, March 28, 2018 at 1:15:19 PM UTC-6, Quadibloc wrote:
> (or maybe the other way around, to remove the
> binary point misalignment)

Not thinking; that would make it worse - the exponents need to be the same, not
two different instead of one different.

John Savard

Nick Maclaren

unread,

Mar 29, 2018, 4:49:37 AM3/29/18

to

In article <p9grpq$1u9c$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>>
>>> That said, there are alternatives to IEEE-754 that improve the
>>> situation.
>>
>> Yes. Like IEEE-754 with division by zero giving a NaN, and
>> propagating NaN reliably. But that's heresy ....
>
>One of the largest changes in the 2018 update of 754 is a fix to min/max
>in order to make NaN propagation work reliably (as well as
>commutatively, more or less).

Interesting. I am afraid that is too little, too late, because the
damage is done.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Mar 29, 2018, 4:57:31 AM3/29/18

to

In article <p9gs5b$1vb5$1...@gioia.aioe.org>,
Terje Mathisen <terje.m...@tmsw.no> wrote:

>Quadibloc wrote:
>>
>>>> That said, there are alternatives to IEEE-754 that improve the
>>>> situation.
>>
>>> Yes. Like IEEE-754 with division by zero giving a NaN, and
>>> propagating NaN reliably. But that's heresy ....
>>
>> I suppose that if that _isn't_ what the spec calls for, then it's
>> "too late now". But it certainly seems appropriate to me.
>>
>> Myself, I would have (nonzero) divided by (zero) produce the NaN
>> "infinity of indeterminate sign" while (zero) divided by (zero)
>> produces the NaN "totally unknown", so that what little information
>> one might have is not thrown away.
>
>The idea is to throw away information as late as possible (ref denorm),
>which means that a divison by epsilon should return a large number, and
>when epsilon is very small and the dividend too large, the result will
>overflow to infinity.
>
>I.e. (x / 0), with x nonzero, must return Inf.

Yes, but they hadn't thought it through. That converts an essentially
random bit (the sign of zero) into information that is visible to
ordinary code, and makes it appear significant. That's nonsense.

Yes, I know that the facility is useful when the programmer is a skilled
arithmetician writing in assembler, but just HOW many such people are
there? Well, there's you, (once upon a time) me, and all of the others
I know of are now dead.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Mar 29, 2018, 5:48:08 AM3/29/18

to

Nick Maclaren wrote:
> In article <p9gs5b$1vb5$1...@gioia.aioe.org>, Terje Mathisen
> <terje.m...@tmsw.no> wrote:
>> The idea is to throw away information as late as possible (ref
>> denorm), which means that a divison by epsilon should return a
>> large number, and when epsilon is very small and the dividend too
>> large, the result will overflow to infinity.
>>
>> I.e. (x / 0), with x nonzero, must return Inf.
>
> Yes, but they hadn't thought it through. That converts an
> essentially random bit (the sign of zero) into information that is
> visible to ordinary code, and makes it appear significant. That's
> nonsense.

When first defined, 754 hadn't decided if they were going to treat the
number line as straight or as a circle wrapping around at Inf.

Afair the original 8087 had a control bit to choose between those two
behaviours?

>
> Yes, I know that the facility is useful when the programmer is a
> skilled arithmetician writing in assembler, but just HOW many such
> people are there? Well, there's you, (once upon a time) me, and all
> of the others I know of are now dead.
>

Thanks, but I don't have nearly enough theoretical background to call me
an arithmetician, I've just been thinking about these issues since high
school when I was calculating pi by hand using the classic arctan
formula. A couple of 2-hour physics classes resulted in 24 digits afair. :-)

As soon as I got access to real computers I started writing arbitrary
precision codes to improve on this, while using the same formulas.

1700 digits using base 1e10 math on a 36-bit Unisys 1100.
2500 using binary on an HP85 (or 87?) lab computer.

I stopped updating my own code some decades ago, realizing that I didn't
want to spend the time needed to implement FFT-based multiplication.

I do have this though:

http://tmsw.no/pi-search/

Plugging in today's date I get

Find 20180329
Found at 41,864,160: 243981776638931 20180329 205441429937165

Total time = 0.138973 seconds (9 suffix lookups)

With 1e9 digits most 8-digit numbers occur multiple times. :-)

Nick Maclaren

unread,

Mar 29, 2018, 6:56:24 AM3/29/18

to

In article <p9icok$7lk$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>>
>>> The idea is to throw away information as late as possible (ref
>>> denorm), which means that a divison by epsilon should return a
>>> large number, and when epsilon is very small and the dividend too
>>> large, the result will overflow to infinity.
>>>
>>> I.e. (x / 0), with x nonzero, must return Inf.
>>
>> Yes, but they hadn't thought it through. That converts an
>> essentially random bit (the sign of zero) into information that is
>> visible to ordinary code, and makes it appear significant. That's
>> nonsense.
>
>When first defined, 754 hadn't decided if they were going to treat the
>number line as straight or as a circle wrapping around at Inf.

Not quite, as I understand it. The design always used the affine
extension for the real line (quite correctly), but they believed
or acceded to the myth that the same model could be made to cover
the complex case (where the correct extension is the Riemann sphere).
I was taught that was impossible (and how to prove that) in my second
year in university!

As I have posted before, that wouldn't be so catastrophic if complex
numbers used a polar representation rather than a Cartesian one - but
that battle was lost before modern computers were invented.

Regards,
Nick Maclaren.

MitchAlsup

unread,

Mar 29, 2018, 11:59:34 AM3/29/18

to

On Wednesday, March 28, 2018 at 6:23:27 PM UTC-5, jim.bra...@ieee.org wrote:
> On Wednesday, March 28, 2018 at 1:59:56 PM UTC-5, Quadibloc wrote:
> > On Wednesday, March 28, 2018 at 10:22:11 AM UTC-6, Nick Maclaren wrote:
> > > In article <84ffe294-ef08-4c71...@googlegroups.com>,
> > > <jim.bra...@ieee.org> wrote:
> >
> > > >That said, there are alternatives to IEEE-754 that improve the situation.
> >
> > > Yes. Like IEEE-754 with division by zero giving a NaN, and propagating
> > > NaN reliably. But that's heresy ....
> >
> > I suppose that if that _isn't_ what the spec calls for, then it's "too late
> > now". But it certainly seems appropriate to me.
> >
> > Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
> > indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
> > unknown", so that what little information one might have is not thrown away.
> >
> > Because -5000 divided by 0 can't possibly be either 0 or 3, unlike 0 divided by
> > 0.
> >
> > John Savard
>
> ]> Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
> ]> indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
> ]> unknown", so that what little information one might have is not thrown away.
>
> On saving info: one idea is to take two exponent or mantissa bits and encode four types of numbers:

Take the mantissa bits for infinities/NaNs!

> 0) IEEE 754 exacts
> 1) Inexacts centered on IEEE 754 exacts
> 2) Inexacts centered between IEEE 754 exacts: value is somewhere between adjacent IEEE 754 exacts
> 3) Inexact zero or infinity: an inexact centered on zero or unsigned infinity with the exponent giving the width of the inexactness

Where does one put::
5) infinitives from overflow
6) zeros from underflow
7) infinities from DIV 0
8) zeros from DIV infinity

MitchAlsup

unread,

Mar 29, 2018, 12:02:50 PM3/29/18

to

On Thursday, March 29, 2018 at 5:56:24 AM UTC-5, Nick Maclaren wrote:
> In article <p9icok$7lk$1...@gioia.aioe.org>,
> Terje Mathisen <terje.m...@tmsw.no> wrote:
> >>
> >>> The idea is to throw away information as late as possible (ref
> >>> denorm), which means that a divison by epsilon should return a
> >>> large number, and when epsilon is very small and the dividend too
> >>> large, the result will overflow to infinity.
> >>>
> >>> I.e. (x / 0), with x nonzero, must return Inf.
> >>
> >> Yes, but they hadn't thought it through. That converts an
> >> essentially random bit (the sign of zero) into information that is
> >> visible to ordinary code, and makes it appear significant. That's
> >> nonsense.
> >
> >When first defined, 754 hadn't decided if they were going to treat the
> >number line as straight or as a circle wrapping around at Inf.
>
> Not quite, as I understand it. The design always used the affine
> extension for the real line (quite correctly), but they believed
> or acceded to the myth that the same model could be made to cover
> the complex case (where the correct extension is the Riemann sphere).
> I was taught that was impossible (and how to prove that) in my second
> year in university!

Perhaps it is time for 754 to define arithmetic on complex numbers
(maybe even Quaternions) and bring the 5 major arithmetic operators
into first class citizens with defined semantics for these boundary
conditions.

Quadibloc

unread,

Mar 29, 2018, 12:41:56 PM3/29/18

to

On Thursday, March 29, 2018 at 2:57:31 AM UTC-6, Nick Maclaren wrote:

> Yes, I know that the facility is useful when the programmer is a skilled
> arithmetician writing in assembler, but just HOW many such people are
> there? Well, there's you, (once upon a time) me, and all of the others
> I know of are now dead.

The knowledge is preserved in our libraries. And, in fact, the mathematics on
which it is based has not been lost. IEEE 754, for that matter, while it has
helped a bit, hasn't made numerical considerations irrelevant, and so those that
remain are still being taught.

Since it's impossible to devise a floating-point format that *takes the place
of* knowledge of numerical analysis, features that are useful to those with such
a knowledge certainly do have a place.

Things are bad, but it is not time for despair.

John Savard

Quadibloc

unread,

Mar 29, 2018, 12:45:43 PM3/29/18

to

On Thursday, March 29, 2018 at 10:02:50 AM UTC-6, MitchAlsup wrote:

> Perhaps it is time for 754 to define arithmetic on complex numbers
> (maybe even Quaternions) and bring the 5 major arithmetic operators
> into first class citizens with defined semantics for these boundary
> conditions.

I had thought that implementing complex numbers in hardware was... excessive...
but this discussion supplies the first *valid* reason for it that I had seen;
that the overflow, NaN, and so on conditions for them are not implementable in
terms of reasonable such conditions on their components.

John Savard

Quadibloc

unread,

Mar 29, 2018, 12:47:46 PM3/29/18

to

On Thursday, March 29, 2018 at 4:56:24 AM UTC-6, Nick Maclaren wrote:

> As I have posted before, that wouldn't be so catastrophic if complex
> numbers used a polar representation rather than a Cartesian one - but
> that battle was lost before modern computers were invented.

And a polar representation for complex numbers on a computer is like a
logarithmic representation for real numbers: suddenly, addition requires
evaluating an elementary transcendental function instead of being simple
arithmetic.

John Savard

MitchAlsup

unread,

Mar 29, 2018, 1:08:58 PM3/29/18

to

When you think about CRAY-1 like vectors, doing the ADD and SUB that
go along with complex MULtiplication, and the fact that the data are
side by side in memory, but need to be loaded in to different V registers
JUST BECAUSE one needs A*C-B*D while the other needs A*D+B*C. Whereas
if one HAD complex CMUL instruction, the loads/stores are more regular
and the math is more regular, leading to more loops that can be vectorized.

All that is required is "a bit" of sequencing logic at the function units.
The algorithms are well established, 754 can argue about the boundary
conditions and special cases, HW guys like me an go out and build them.

In addition: one could implement the FMA such that A*C-B*D was performed
with a single rounding, as is A*D+B*C !

EricP

unread,

Mar 29, 2018, 1:26:05 PM3/29/18

to

jim.bra...@ieee.org wrote:
>
> On saving info: one idea is to take two exponent or mantissa bits and encode four types of numbers:
> 0) IEEE 754 exacts
> 1) Inexacts centered on IEEE 754 exacts
> 2) Inexacts centered between IEEE 754 exacts: value is somewhere between adjacent IEEE 754 exacts
> 3) Inexact zero or infinity: an inexact centered on zero or unsigned infinity with the exponent giving the width of the inexactness

The inexact bits harks back to unums and seems like an interesting idea
- biggest+inexact= bigger than biggest but not infinity
- zero+inexact = smaller than smallest but not zero

but for the other inexact values one thought I had at the time was
whether it would be a "sticky" bit and just get set and never reset.
And because the inexact bit would often somehow get set,
and would then be propagated in expressions afterwards,
the net effect would be to just remove one bit of precision.

Eric

Robert Wessel

unread,

Mar 29, 2018, 4:56:47 PM3/29/18

to

Doesn't all that just move the problem one step further along? Now
you have to figure out what an infinity-from-overflow times a
zero-from-underflow should result in (as just one example). I'm sure
any and all standardized choices are going to get at least some people
screaming.

OTOH, I see no problem with storing that kind of information in the
"extra" bits of a NaN, so you can determine why a parituclar operation
resulting in a NaN, but when a subsequent operation trys to consume
that NaN, it seems only sane that the reason is a somewhat generic
"one of the operands was a NaN".

le

Robert Wessel

unread,

Mar 29, 2018, 4:58:09 PM3/29/18

to

Ugh. Hit send before finishing the spell check...

Doesn't all that just move the problem one step further along? Now
you have to figure out what an infinity-from-overflow times a
zero-from-underflow should result in (as just one example). I'm sure
any and all standardized choices are going to get at least some people
screaming.

OTOH, I see no problem with storing that kind of information in the

"extra" bits of a NaN, so you can determine why a particular operation
resulting in a NaN, but when a subsequent operation tries to consume

MitchAlsup

unread,

Mar 29, 2018, 7:32:11 PM3/29/18

to

On Thursday, March 29, 2018 at 12:26:05 PM UTC-5, EricP wrote:
> jim.bra...@ieee.org wrote:
> >
> > On saving info: one idea is to take two exponent or mantissa bits and encode four types of numbers:
> > 0) IEEE 754 exacts
> > 1) Inexacts centered on IEEE 754 exacts
> > 2) Inexacts centered between IEEE 754 exacts: value is somewhere between adjacent IEEE 754 exacts
> > 3) Inexact zero or infinity: an inexact centered on zero or unsigned infinity with the exponent giving the width of the inexactness
>
> The inexact bits harks back to unums and seems like an interesting idea
> - biggest+inexact= bigger than biggest but not infinity
> - zero+inexact = smaller than smallest but not zero
>

That only works when the inexact bit gets stored along with the other bits of the number. That is inexact should not be a flag, but a bit of the datum.

Nick Maclaren

unread,

Mar 30, 2018, 5:05:29 AM3/30/18

to

In article <89a5f8b0-5fa1-4424...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>
>> Yes, I know that the facility is useful when the programmer is a skilled
>> arithmetician writing in assembler, but just HOW many such people are
>> there? Well, there's you, (once upon a time) me, and all of the others
>> I know of are now dead.
>
>The knowledge is preserved in our libraries. And, in fact, the mathematics on
>which it is based has not been lost. IEEE 754, for that matter, while it has
>helped a bit, hasn't made numerical considerations irrelevant, and so
>those that
>remain are still being taught.

Regrettably, that is the sort of skill that is NOT preserved in libraries,
because it is a 'workshop' skill at least as much as a theoretical one.
The same applies to the skills required to write high-quality exception
handling in run-time systems, where there are damn few around today, and
possibly none that do it for code that does not trap errors in advance.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Mar 30, 2018, 5:19:02 AM3/30/18

to

In article <9c28dea4-a6b4-477b...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>On Thursday, March 29, 2018 at 10:02:50 AM UTC-6, MitchAlsup wrote:
>
>> Perhaps it is time for 754 to define arithmetic on complex numbers
>> (maybe even Quaternions) and bring the 5 major arithmetic operators
>> into first class citizens with defined semantics for these boundary
>> conditions.
>
>I had thought that implementing complex numbers in hardware was... excessive...

>but this discussion supplies the first *valid* reason for it that I had seen;
>that the overflow, NaN, and so on conditions for them are not implementable in
>terms of reasonable such conditions on their components.

I suggest that you actually work out what needs to be done if starting
from a Cartesian representation. It's sufficiently foul that I don't
regard it as suitable for hardware implementation or within IEEE 754's
ability to specify correctly. Note that the latter is for political
reasons, especially with the pernicious influence of C99, not that the
IEEE people involved are technically incapable of it.

>> As I have posted before, that wouldn't be so catastrophic if complex
>> numbers used a polar representation rather than a Cartesian one - but
>> that battle was lost before modern computers were invented.
>
>And a polar representation for complex numbers on a computer is like a
>logarithmic representation for real numbers: suddenly, addition requires
>evaluating an elementary transcendental function instead of being simple
>arithmetic.

Really? :-) I can do it without, using a direction cosine representation
of the angle. That needs 50% more space, but is only a small factor
slower than Cartesian. Yes, those were killer factors in the 1950s,
but today?

What would help with that would be hardware for sqrt(A^2+B^2), which is
pretty trivial and would be useful in many contexts, and perhaps another
trivial scaling function.

What I would adtually do, of course, would be to make a complex number
three reals and an integer, with the last being the winding number.
That would solve the problem that IEEE signed zeroes were meant to
assist with, PROPERLY.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Mar 30, 2018, 5:35:27 AM3/30/18

to

Robert Wessel wrote:
> On Thu, 29 Mar 2018 08:59:30 -0700 (PDT), MitchAlsup

>> Where does one put::
>> 5) infinitives from overflow
>> 6) zeros from underflow
>> 7) infinities from DIV 0
>> 8) zeros from DIV infinity
>
> Doesn't all that just move the problem one step further along? Now
> you have to figure out what an infinity-from-overflow times a
> zero-from-underflow should result in (as just one example). I'm sure
> any and all standardized choices are going to get at least some people
> screaming.
>
> OTOH, I see no problem with storing that kind of information in the
> "extra" bits of a NaN, so you can determine why a parituclar operation
> resulting in a NaN, but when a subsequent operation trys to consume
> that NaN, it seems only sane that the reason is a somewhat generic
> "one of the operands was a NaN".

NaN propagation is relatively straightforward:

Any operation that gets a NaN as one of the inputs will forward this,
unmodified except for possibly turning a SNaN into QNaN, i.e. all the
other bits should stay the same.

The exceptions to this rule is a very short list of functions like the
new MIN/MAX version I mentioned that silently skips them instead.

The only real problem with propagation occurs when you have more than
one NaN input, in which case the implementation is allowed to forward
either of them:

We have discussed various ways to improve this in order to make it
commutative, i.e. "always forward the NaN with the largest payload", but
this will always be a platform-specific behavior.

There is another issue related to narrowing conversions, i.e. a double
NaN input to a function returning float: Which part(s) of the payload
should survive?

IBM has spent a lot of time and effort trying to make this portable
across both binary and decimal FP, and they ended up with bit-reversed
binary payloads.

The most serious binary NaN issue imho is the fact that the standard was
too late to determine how to differentiate between Signalling and Quiet
NaNs!

Which bit is the S/Q bit (normally the top mantissa bit)?

If you set that bit, is it a QNaN or a SNaN?

I strongly prefer to have the SNaN bit set, so that all SNaNs will sort
as larger than all other numbers, but that leaves a small issue when
turning a SNaN into a QNaN, something that happens to most NaNs:

If the SNaN bit was the only set bit in the mantissa, then clearing it
will turn the value into an Inf instead of a QNaN, i.e. you need at
least one other set bit and which bit should that be?

The only reasonable implementation is one that makes sure that no NaN
can ever be generated without at least one such extra bit set, typically
the second highest.

Doing it this ways means that you can use the normal double-to-float
narrowing hardware on all numbers, including NaNs without having to
worry about accidentally turning them into Inf.

This also works for FP16 (1:5:10) and FP8 (1:3:4).

The smallest ieee754-style fp format would be 1:2:2 which still supports
all types of values, dropping down to 1:1:2 means that all numbers are
denormal!

S:0:00 Zero
S:0:01 0.25
S:0:10 0.5
S:1:00 Inf
S:1:01 QNaN
S:1:10 SNaN should not occur, could be used as None/Empty placeholder
S:1:11 SNaN canonical form

already...@yahoo.com

unread,

Mar 30, 2018, 5:43:16 AM3/30/18

to

On Wednesday, March 28, 2018 at 3:59:12 PM UTC+3, Terje Mathisen wrote:
> Nick Maclaren wrote:

> > The days when a factor of two in memory usage was a deal-breaker have
> > long since gone. While such a format COULD be implemented in hardware,
> > it doesn't make any sense to so so. Just emulate 256-bit floating-
> > point in software - or longer, if you prefer.
> >
> > There ARE things that the hardware could do to help this, quite a lot,
> > but it's NOT by implementing a new number format - it's by providing
> > the primitives that would help to implement one efficiently. Terje is
> > our current expert, I believe.
>
> The main things I'd like in order to do 128 or 256-bit sw floating point
> would be double-wide regular and sticky shifts, along with the usual
> widening multiply.
>

> (A sticky shift corresponds to rounding away from zero, i.e. the bottom
> bit is OR'ed with all the bits that are shifted out.)

As you probably remember, I implemented pretty fast 128-bit (128-bit mantissa, rather than IEEE binary128) floating point package in software not so long ago.
I don't recollect a need for sticky shifts.
But, may be, I already forgot. This sort of details can bother you in specific day or two and be forgotten soon thereafter.

>
> You also need a fast find-first-set-bit/count-leading-zeroes. If this
> can work across pairs of registers it would be even better...

Better, but not measurably so. In practice, cases of full cancellation of 64-bit word are rare, so branch predictor works very well.

Things missing in x86 that could help, not necessarily by much, but , IMO, more than sticky shift and dual-word CLZ?
1. Integer abs.
2. Integer equivalent of FP copy_sign. I.e. dst = src1 < 0 ? -src2 : src2

>
> It also helps to have a fast classifier, i.e. something that can look at
> sign+exponent+(mantissa != 0), with the exponent/mantissa boundary
> either freely variable or along the correct FP128/FP256 locations.
>

> The return value tells you if the input was
> zero/denormal/normal/inf/nan, referably ordered so that a single
> test/compare can flag all special cases.
>

That, I'd guess, is helpful in packed formats, like IEEE binary128.
Less so when exponent is in separate word.
Again, rare cases are rare, so branch prediction does wonders.

> This way you can replace the missing hidden bit very quickly and emulate
> a quad FMUL with 4 64x64->128 MULs while ADD/ADC the parts together,
> doing the core 112x112 bit multiplication in ~10 cycles.
>

I specifically measured advantage of ADC for fp128 multiplication.
It is small, certainly not 10%. May be, 5%, I don't remember an exact number.
May be, on less capable CPU (than IvyBridge) the advantage is bigger.

> Since the final normalization (in the case of one denorm input) can
> require a shift of up to 112 (111?) bits for FP128 and 230+ bits for
> FP256 it helps to be able to run multiple code paths in parallel and
> merge them without branching.
>

I don't know about Mill, but on more conventional CPUs it sounds like bad idea.
As as are most of non-short branch avoidance ideas.

Nick Maclaren

unread,

Mar 30, 2018, 8:08:30 AM3/30/18

to

In article <p9l0cs$p2r$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>Robert Wessel wrote:
>> On Thu, 29 Mar 2018 08:59:30 -0700 (PDT), MitchAlsup
>>> Where does one put::
>>> 5) infinitives from overflow
>>> 6) zeros from underflow
>>> 7) infinities from DIV 0
>>> 8) zeros from DIV infinity
>>
>> Doesn't all that just move the problem one step further along? Now
>> you have to figure out what an infinity-from-overflow times a
>> zero-from-underflow should result in (as just one example). I'm sure
>> any and all standardized choices are going to get at least some people
>> screaming.
>>
>> OTOH, I see no problem with storing that kind of information in the
>> "extra" bits of a NaN, so you can determine why a parituclar operation
>> resulting in a NaN, but when a subsequent operation trys to consume
>> that NaN, it seems only sane that the reason is a somewhat generic
>> "one of the operands was a NaN".
>
>NaN propagation is relatively straightforward:
>
>Any operation that gets a NaN as one of the inputs will forward this,
>unmodified except for possibly turning a SNaN into QNaN, i.e. all the
>other bits should stay the same.
>
>The exceptions to this rule is a very short list of functions like the
>new MIN/MAX version I mentioned that silently skips them instead.

It's unlike you to make that sort of mistake! The list is NOT
'very short' and, worse, some of those functions are among the most
heavily used. They include the power functions, arc tangent and
copysign, as well as max/min, and (in the 1985 one) (in)equality
comparisons :-(

While it is somewhat unfair to blame IEEE 754 for the fact that it,
in practice, ALSO includes conversion to integer and all comparisons,
that is a direct consequence of its utterly ghastly exception flag
model, which was known to be a damn-fool idea by 1980.

Regards,
Nick Maclaren.

already...@yahoo.com

unread,

Mar 30, 2018, 9:12:06 AM3/30/18

to

What would be a right way to define a [default] comparison with QNAN?
Without trying to think too much, returning false and setting sticky FE_INALID flag sound like A Right Thing.
Except I'd want the issue of coverage of stickness (per thread vs per process) to be clearly standardized.

Terje Mathisen

unread,

Mar 30, 2018, 9:16:25 AM3/30/18

to

already...@yahoo.com wrote:
> On Wednesday, March 28, 2018 at 3:59:12 PM UTC+3, Terje Mathisen

>> The main things I'd like in order to do 128 or 256-bit sw floating
>> point would be double-wide regular and sticky shifts, along with
>> the usual widening multiply.
>>
>> (A sticky shift corresponds to rounding away from zero, i.e. the
>> bottom bit is OR'ed with all the bits that are shifted out.)
>
> As you probably remember, I implemented pretty fast 128-bit (128-bit
> mantissa, rather than IEEE binary128) floating point package in
> software not so long ago. I don't recollect a need for sticky
> shifts. But, may be, I already forgot. This sort of details can
> bother you in specific day or two and be forgotten soon thereafter.

Sticky shifts are absolutely required for correct rounding, if you don't
have it then you have to emulate it by islating all the bits that are
about to be shifted out and testing for non-zero.

When I did something similar to you back in 1995 during FDIV I used a
1:31:96 format to verify correct answers for our workaround code,
including the FPATAN2 replacement. I did not bother at all with correct
rounding, just did a half-assed "round up if >= 0.5".

>
>>
>> You also need a fast find-first-set-bit/count-leading-zeroes. If
>> this can work across pairs of registers it would be even better...
>
> Better, but not measurably so. In practice, cases of full
> cancellation of 64-bit word are rare, so branch predictor works very
> well.

Branches don't work at all if you want your code to work across SIMD
vectors, i.e. branchless algorithms required.

>
> Things missing in x86 that could help, not necessarily by much, but ,
> IMO, more than sticky shift and dual-word CLZ? 1. Integer abs. 2.
> Integer equivalent of FP copy_sign. I.e. dst = src1 < 0 ? -src2 :
> src2

In my emulation code all the variables are always unsigned, so I haven't
seen a need for that.

>
>>
>> It also helps to have a fast classifier, i.e. something that can
>> look at sign+exponent+(mantissa != 0), with the exponent/mantissa
>> boundary either freely variable or along the correct FP128/FP256
>> locations.
>>
>> The return value tells you if the input was
>> zero/denormal/normal/inf/nan, referably ordered so that a single
>> test/compare can flag all special cases.
>>
>
> That, I'd guess, is helpful in packed formats, like IEEE binary128.
> Less so when exponent is in separate word. Again, rare cases are
> rare, so branch prediction does wonders.

See above: All the code should work without branching.

>
>> This way you can replace the missing hidden bit very quickly and
>> emulate a quad FMUL with 4 64x64->128 MULs while ADD/ADC the parts
>> together, doing the core 112x112 bit multiplication in ~10 cycles.
>>
>
> I specifically measured advantage of ADC for fp128 multiplication. It
> is small, certainly not 10%. May be, 5%, I don't remember an exact
> number. May be, on less capable CPU (than IvyBridge) the advantage is
> bigger.

With a wide enough integer core you can do the Alpha style compare/set
bit/add as a replacement for ADC, and mostly do it overlapped with the
other operations but I would guess that it would add at least a couple
of cycles.

>
>> Since the final normalization (in the case of one denorm input) can
>> require a shift of up to 112 (111?) bits for FP128 and 230+ bits
>> for FP256 it helps to be able to run multiple code paths in
>> parallel and merge them without branching.
>>
>
> I don't know about Mill, but on more conventional CPUs it sounds like
> bad idea. As as are most of non-short branch avoidance ideas.

Again, see my need for branchless code.

Terje Mathisen

unread,

Mar 30, 2018, 9:23:35 AM3/30/18

to

Nick Maclaren wrote:
> In article <p9l0cs$p2r$1...@gioia.aioe.org>,
> Terje Mathisen <terje.m...@tmsw.no> wrote:
>> NaN propagation is relatively straightforward:
>>
>> Any operation that gets a NaN as one of the inputs will forward this,
>> unmodified except for possibly turning a SNaN into QNaN, i.e. all the
>> other bits should stay the same.
>>
>> The exceptions to this rule is a very short list of functions like the
>> new MIN/MAX version I mentioned that silently skips them instead.
>
> It's unlike you to make that sort of mistake! The list is NOT
> 'very short' and, worse, some of those functions are among the most
> heavily used. They include the power functions, arc tangent and
> copysign, as well as max/min, and (in the 1985 one) (in)equality
> comparisons :-(

Mea culpa:

I should have mentioned that we also worked on a sane power function,
but you are right, I did lose the vote on a few of the special cases of
some of these functions.

Copysign I agreed with though, it is defined to blindly copy the sign
bit, so that is what it should do, even if that sign comes from a NaN value.

Before unordered comparisons, the only feasible way to test for NaN was
to see if it would fail two opposite compares.

>
> While it is somewhat unfair to blame IEEE 754 for the fact that it,
> in practice, ALSO includes conversion to integer and all comparisons,
> that is a direct consequence of its utterly ghastly exception flag
> model, which was known to be a damn-fool idea by 1980.

Signed integer arithmetic would have been less error prone if MININT
instead had been defined as a sticky error value.

already...@yahoo.com

unread,

Mar 30, 2018, 9:40:26 AM3/30/18

to

It seems we are looking at different use cases.
You appear to be concerned with emulation of ISA that has 2x or 4x binary128 SIMD on hardware that has integer SIMD of similar width.

I am concerned with something that can be described as faster, less generic implementation of boost::multiprecision::cpp_bin_float for relatively low precisions, like 113, 128 and 256 binary digits, primarily on fat OoO cores.

Nick Maclaren

unread,

Mar 30, 2018, 10:28:52 AM3/30/18

to

In article <8c6b17b7-b857-4632...@googlegroups.com>,

<already...@yahoo.com> wrote:
>
>What would be a right way to define a [default] comparison with QNAN?
>Without trying to think too much, returning false and setting sticky
>FE_INALID flag sound like A Right Thing.
>Except I'd want the issue of coverage of stickness (per thread vs per
>process) to be clearly standardized.

Yes, within IEEE 754's model, but my point was that it DIDN'T raise
the exception in the first version!

My remarks about the utterly ghastly exception flag model (and the
word incompetent is not too strong) is that it was universally
agreed that future programming would be in high-level languages and
it was universally known among language experts that flags were all of
Bad News in programming languages, impossible to support in most, and
incompatible with serious optimisation!

One can also damn IEEE 754 for its gross and unreasonable prejudice
against the ONE known, implementable interrupt model, which had also
been made a preferred mode by ISO Language Independent Arithmetic.
I am referring to trap-diagnose-and-terminate, of course.

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Mar 30, 2018, 10:38:06 AM3/30/18

to

In article <p9ldok$1h9j$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>>
>
>Copysign I agreed with though, it is defined to blindly copy the sign
>bit, so that is what it should do, even if that sign comes from a NaN value.

I hope that you didn't agree that it should not signal an exception,
not even if copying the 'sign' from a signalling NaN!

>Before unordered comparisons, the only feasible way to test for NaN was
>to see if it would fail two opposite compares.

Yes. Which relies on no serious optimisation.

>> While it is somewhat unfair to blame IEEE 754 for the fact that it,
>> in practice, ALSO includes conversion to integer and all comparisons,
>> that is a direct consequence of its utterly ghastly exception flag
>> model, which was known to be a damn-fool idea by 1980.
>
>Signed integer arithmetic would have been less error prone if MININT
>instead had been defined as a sticky error value.

Yes.

Regards,
Nick Maclaren.

BGB

unread,

Mar 30, 2018, 11:36:20 AM3/30/18

to

hmm, FNI128:
S: Sign, 1-bit, MSB
E: Exponent, 15-bits
T: Tag, 3+ bits
N: Secondary bits, depends on tag
M: Mantissa, variable sized.

Tags(T):
00z, N=0: behave like Denormalized
Mantissa is 110 bits (no hidden bit).
With a special exponent, may be assumed to represent integers.
01z, N=0: behave like Normalized
Mantissa is 110 bits (with hidden bit).
100, N=7: Inexact Range Value
Mantissa encodes lower bound (102 bits).
N bits give the error magnitude in bits.
Allows computing an upper bound.
Values >= 102 are invalid/reserved.
101, N=7: Inexact Range Value
Mantissa gives center value.
...
111, N=5+: Escape to more tags

the NaN/Inf case (Exponent=7FFF) would be similar, but different:
000, N=13: Inf, N gives Reason
Mantissa (96 bits) is ignored, assumed to be zeroed.
001, N=13: Inf, Reserved
010, N=13: SNaN, N gives Reason
011, N=13: QNaN, N gives Reason
100, N=13: User Expansion Range
Treated as 13 bits for a user-defined tag.
Mantissa (96 bits) is treated as user-defined tagged payload.
...
111, N=5+: Escape

unlike IEEE, exponent of 0000 would not be treated as special.

probably not particularly suitable for hardware though.
possibly needlessly over-complicated, but oh well.

not defining narrower formats, but a 64-bit format could be sane.
much smaller and this is likely to adversely effect precision for not
much gain (usually IME 32-bit and 16-bit floats needed for "fast/small",
with much less concern for precision).

or such...

MitchAlsup

unread,

Mar 30, 2018, 12:26:16 PM3/30/18

to

WHen faced with comparisons, and having already chosen a branch instruction
that looks at one bit in a register and takes or ignores the branch;

I had the comparison function generate a bit vector of all reasonable
results. When the operands are uncomparable, none of the typical comparison
result bits are set, but the uncomparable bit is set.

I also needed the inverse of the comparisons, so the compilers can flip
the then and else clauses and still leave the NaN in the correct clause.

So starting at bit 0 I use:
bit query
0 .... ne
1 .... eq
2 .... gt
3 .... ge
4 .... lt
5 .... le
6 .... or (ordered)
7 .... un (unordered)
8 .... not ne
9 .... not eq
10.... not gt
11.... not ge
12.... not lt
13.... not le

When the un bit is set, none of the first 6 are set and all of the second 6
are set.

Now, given that the compare HW has already had to determine the class of
the floating point operands, creating the following was straightforward:

16.... 0 <= S1 <= S2
17.... 0 < S1 <= S2
18.... 0 <= S1 < S2
19.... 0 < S1 < S2

32.... S1 is QNaN
33.... S1 is SNaN
34.... S1 is -Infinity
35.... S1 is -Normal
36.... S1 is -Denormal
37.... S1 is -Zero
38.... S1 is +zero
39.... S1 is +Denormal
40.... S1 is +Normal
41.... S1 is +Infinity

This last set is in direct support of the FPClass function in OpenGL. Their
use applies a (application supplied) 10-bit mask (AND) and if more than zero
bits remain set, the classification has been met and the branch determined.

MitchAlsup

unread,

Mar 30, 2018, 12:35:17 PM3/30/18

to

On Friday, March 30, 2018 at 9:28:52 AM UTC-5, Nick Maclaren wrote:
> In article <8c6b17b7-b857-4632...@googlegroups.com>,
> <already...@yahoo.com> wrote:
> >
> >What would be a right way to define a [default] comparison with QNAN?
> >Without trying to think too much, returning false and setting sticky
> >FE_INALID flag sound like A Right Thing.
> >Except I'd want the issue of coverage of stickness (per thread vs per
> >process) to be clearly standardized.
>
> Yes, within IEEE 754's model, but my point was that it DIDN'T raise
> the exception in the first version!
>
> My remarks about the utterly ghastly exception flag model (and the
> word incompetent is not too strong) is that it was universally
> agreed that future programming would be in high-level languages and
> it was universally known among language experts that flags were all of
> Bad News in programming languages, impossible to support in most, and
> incompatible with serious optimisation!

The same can be said about that collection of bits one typically calls
the 'condition codes'.

Nick Maclaren

unread,

Mar 30, 2018, 12:51:49 PM3/30/18

to

In article <23d53024-9cc2-48bb...@googlegroups.com>,

MitchAlsup <Mitch...@aol.com> wrote:
>>
>> My remarks about the utterly ghastly exception flag model (and the
>> word incompetent is not too strong) is that it was universally
>> agreed that future programming would be in high-level languages and
>> it was universally known among language experts that flags were all of
>> Bad News in programming languages, impossible to support in most, and
>> incompatible with serious optimisation!
>
>The same can be said about that collection of bits one typically calls
>the 'condition codes'.

Where do you thing the experience that it was 'universally known'
came from? :-)

Proper, genuinely sticky exceptional VALUES were known to work, but
did require some kind of a trap on the uses that could not propagate
them (mainly branches and relatives). Unless the language was like
SNOBOL (or COBOL, to some extent), and had constructs logically like:

IF ....
THEN
...
ELSE
...
EXCEPT
...
END IF

In this context, (C++-like) exceptions are a kind of trap.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Mar 30, 2018, 2:24:03 PM3/30/18

to

Nick Maclaren wrote:
> In article <p9ldok$1h9j$1...@gioia.aioe.org>, Terje Mathisen
> <terje.m...@tmsw.no> wrote:
>>>
>>
>> Copysign I agreed with though, it is defined to blindly copy the
>> sign bit, so that is what it should do, even if that sign comes
>> from a NaN value.
>
> I hope that you didn't agree that it should not signal an exception,
> not even if copying the 'sign' from a signalling NaN!

I decided to not worry about it, thinking that copysign() is typically
used late in an algorithm to get the sign right, in which case that
input value has already been used one or more times, and so any SNaN
have already trapped and been converted to QNaN.

I'm willing to be persuaded otherwise though if you have a plausible
scenario!

>
>> Before unordered comparisons, the only feasible way to test for NaN
>> was to see if it would fail two opposite compares.
>
> Yes. Which relies on no serious optimisation.

Yeah, it becomes one of many patterns compilers have to recognize and
NOT optimize away. :-(

jim.bra...@ieee.org

unread,

Mar 30, 2018, 3:58:41 PM3/30/18

to

On Thursday, March 29, 2018 at 10:59:34 AM UTC-5, MitchAlsup wrote:
> On Wednesday, March 28, 2018 at 6:23:27 PM UTC-5, jim.bra...@ieee.org wrote:
> > On Wednesday, March 28, 2018 at 1:59:56 PM UTC-5, Quadibloc wrote:
> > > On Wednesday, March 28, 2018 at 10:22:11 AM UTC-6, Nick Maclaren wrote:
> > > > In article <84ffe294-ef08-4c71...@googlegroups.com>,
> > > > <jim.bra...@ieee.org> wrote:
> > >
> > > > >That said, there are alternatives to IEEE-754 that improve the situation.
> > >
> > > > Yes. Like IEEE-754 with division by zero giving a NaN, and propagating
> > > > NaN reliably. But that's heresy ....
> > >
> > > I suppose that if that _isn't_ what the spec calls for, then it's "too late
> > > now". But it certainly seems appropriate to me.
> > >
> > > Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
> > > indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
> > > unknown", so that what little information one might have is not thrown away.
> > >
> > > Because -5000 divided by 0 can't possibly be either 0 or 3, unlike 0 divided by
> > > 0.
> > >
> > > John Savard
> >
> > ]> Myself, I would have (nonzero) divided by (zero) produce the NaN "infinity of
> > ]> indeterminate sign" while (zero) divided by (zero) produces the NaN "totally
> > ]> unknown", so that what little information one might have is not thrown away.
> >

> > On saving info: one idea is to take two exponent or mantissa bits and encode four types of numbers:
>

> Take the mantissa bits for infinities/NaNs!
>

> > 0) IEEE 754 exacts
> > 1) Inexacts centered on IEEE 754 exacts
> > 2) Inexacts centered between IEEE 754 exacts: value is somewhere between adjacent IEEE 754 exacts
> > 3) Inexact zero or infinity: an inexact centered on zero or unsigned infinity with the exponent giving the width of the inexactness
>

> Where does one put::
> 5) infinitives from overflow
> 6) zeros from underflow
> 7) infinities from DIV 0
> 8) zeros from DIV infinity
> >

> > For some embedded applications it is known that zero or infinity or denorms will not happen or if they do happen it is a fatal error, so #2 has the best combination of circuit cost and circuit delay. No two bit encoding field needed.
> >
> > For short floats probably best to limit the encoding field to zero or one bit: codes 0 and/or 2.
> >
> > For larger floats the two bit code field is affordable.
> >
> > Jim Brakefield
In brief:
]Where does one put::
] 5) infinitives from overflow
A code (from the gradual overflows) for values between largest representable and infinity
] 6) zeros from underflow
ibid
] 7) infinities from DIV 0
zeros represent a range of values centered about zero, flip the exponent and you have a range centered about infinity
] 8) zeros from DIV infinity
ibid

Jim Brakefield

Quadibloc

unread,

Mar 30, 2018, 5:55:51 PM3/30/18

to

On Friday, March 30, 2018 at 3:05:29 AM UTC-6, Nick Maclaren wrote:

> Regrettably, that is the sort of skill that is NOT preserved in libraries,
> because it is a 'workshop' skill at least as much as a theoretical one.
> The same applies to the skills required to write high-quality exception
> handling in run-time systems, where there are damn few around today, and
> possibly none that do it for code that does not trap errors in advance.

I realize that knowledge can be lost: I've recently been reading up on how,
after Bergonzi and Guadagnini, nobody knew how to make a proper violin any
more, let alone a *really* good one like what Stradivari had been making
previously.

John Savard

Quadibloc

unread,

Mar 30, 2018, 6:09:41 PM3/30/18

to

On Friday, March 30, 2018 at 3:19:02 AM UTC-6, Nick Maclaren wrote:

> What I would adtually do, of course, would be to make a complex number
> three reals and an integer, with the last being the winding number.

Oh. my. God.

My first reaction upon reading this was to note that if an engineer at Intel
were to go forwards to management to suggest that this sort of arithmetic
should be included in the next version of Streaming SIMD Extensions as a
standard feature on every new x86 microprocessor in everyone's microcomputer
out there... he would be laughed and/or thrown out of the office.

And the same would go for NVIDIA.

This, however, is not a comment on the intrinsic merit of your suggestion,
but rather an observation on the cold and cruel nature of the world in which
we live.

My second reaction is that for those problems in which winding number is
required to be taken into account, if the tool that one has is a FORTRAN
compiler with a software Cartesian implementation of complex numbers, one can
simply do a change of variables, so that while one's equations contain the
variable z, the corresponding variable in one's program contains ln(z).

The winding number (times two pi) is now in the complex part, and as a bonus
the representation is now polar - and multiplication is performed by mere
addition.

So if indeed what you suggest is what the world needs, I would be inclined to
suggest a gradual approach:

In the literature, has anyone written a set of FORTRAN libraries to do
complex number calculations on the logarithms of numbers so as to enable
keeping track of winding number?

Has someone using a supercomputer with FPGA assist used that to implement a
better complex arithmetic for a problem that required such precautions?

GRAPE even comes to mind as showing scientists are not above building
specialized hardware for their computational requirements if they are
desperate enough.

Then, with the results of said literature search in hand, one might go to the
offices of... Wolfram Research, rather less likely to dismiss this out of
hand with prejudice than Intel or Nvidia.

Since (in my opinion) Decimal Floating Point wound up being implemented
because it seemed like a way to idiot-proof spreadsheets, it is not at all
impossible that fancier complex numbers could see the light of day as well,
if a case for them could be made. It would have to be a good one, because
while transistors are cheap these days, they're still not so cheap that one
can include anything, however specialized or exotic, as a standard feature on
every mainstream microprocessor. (Actually, it's not so much the cost of the
transistors as the heat they generate, but anyways...)

John Savard

Terje Mathisen

unread,

Mar 31, 2018, 4:02:39 AM3/31/18

to

Quadibloc wrote:
> Since (in my opinion) Decimal Floating Point wound up being
> implemented because it seemed like a way to idiot-proof spreadsheets,
> it is not at all impossible that fancier complex numbers could see
> the light of day as well, if a case for them could be made. It would
> have to be a good one, because while transistors are cheap these
> days, they're still not so cheap that one can include anything,
> however specialized or exotic, as a standard feature on every
> mainstream microprocessor. (Actually, it's not so much the cost of
> the transistors as the heat they generate, but anyways...)

Isn't the real problem related to wires as well?

I.e. having native complex operations using SIMD register storage of
each value would require quite a bit of shuffling between the various
parts of the input registers.

Yes, there is a shuffle unit already, but the late arrival and
limitations of horizontal operations seems to indicate that they are costly.

Nick Maclaren

unread,

Mar 31, 2018, 4:28:13 AM3/31/18

to

In article <p9lvbt$fn0$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>>>>
>>>
>>> Copysign I agreed with though, it is defined to blindly copy the
>>> sign bit, so that is what it should do, even if that sign comes
>>> from a NaN value.
>>
>> I hope that you didn't agree that it should not signal an exception,
>> not even if copying the 'sign' from a signalling NaN!
>
>I decided to not worry about it, thinking that copysign() is typically
>used late in an algorithm to get the sign right, in which case that
>input value has already been used one or more times, and so any SNaN
>have already trapped and been converted to QNaN.
>
>I'm willing to be persuaded otherwise though if you have a plausible
>scenario!

Unfortunately, I do :-( I can't remember the algorithms where I
have used it and seen it used, but it's also allowing for the play
in gears, slatting due to wind etc. Basically, the formulae often
go like this (with lots of variations, of course):

expr_A(...) + sign(expr_B(...))*expr_C(...)

MOST of the time, expr_B is the same as one of expr_A or expr_B, but
occasionally it isn't. If I recall a specific requirement that will
definitely fail, in the next day or so, I will post it.

>>> Before unordered comparisons, the only feasible way to test for NaN
>>> was to see if it would fail two opposite compares.
>>
>> Yes. Which relies on no serious optimisation.
>
>Yeah, it becomes one of many patterns compilers have to recognize and
>NOT optimize away. :-(

Many? Not in civilised languages. And it's Bad News for RAS. Not
merely do compilers not always recognise exactly the same pattern, so
you can test code on one and have it fail on another, it has knock-on
consequences (depending on the structure of the optimiser).

I hit the first problem when writing precision-extension examples in
C++. Intel's compiler, for example, recognises Kahan summation but
not my almost identical equivalent. The standard says that either
explicit casts or assignment and reloading stop that - like hell they
do! You have to do both of those AND have the variable volatile (and,
if I recall, with external linkage) :-(

Regards,
Nick Maclaren.

Nick Maclaren

unread,

Mar 31, 2018, 4:36:34 AM3/31/18

to

In article <6832cfb3-dc55-4b12...@googlegroups.com>,

Quadibloc <jsa...@ecn.ab.ca> wrote:
>
>> What I would adtually do, of course, would be to make a complex number
>> three reals and an integer, with the last being the winding number.
>
>Oh. my. God.
>

>This, however, is not a comment on the intrinsic merit of your suggestion,
>but rather an observation on the cold and cruel nature of the world in which
>we live.

No dispute there :-(

>My second reaction is that for those problems in which winding number is
>required to be taken into account, if the tool that one has is a FORTRAN
>compiler with a software Cartesian implementation of complex numbers, one can
>simply do a change of variables, so that while one's equations contain the
>variable z, the corresponding variable in one's program contains ln(z).

That approach isn't great, because of the precision loss. I have never
had to use winding numbers - indeed, I have never had to use cut-points!
As an aside, that's where Kahan and IEEE 754 went wrong with signed
zeroes, in believing that most people used such methods. Well, most
numerical programmers don't - not even most ones that use complex numbers.
Contour integration is far rarer than its proponents make out.

The other advantage of my approach is that it doesn't have the problem
that addition is extremely expensive. Yes, it's slower than for
Cartesian, but not by a huge factor.

Regards,
Nick Maclaren.

Anton Ertl

unread,

Mar 31, 2018, 10:34:00 AM3/31/18

to

What you call "optimization" or "serious optimization" is not real
optimization, but miscompilation. Real optimization only replaces
code by code that gives the same result under all circumstances. And
since "a < 0 || a >= 0" does not produce "true" under all
circumstances, real optimization cannot replace it with "true"; it can
replace it with "isnan(a)", though.

There is no need to recognize things that you cannot optimize; just
limit yourself to transforming code that you can optimize into
equivalent code and you are good.

Whether it is a good idea to define FP comparison such that well-known
equivalences like "a < b <=> !(a>=b)" don't hold is a different
discussion, though.

IIRC Nick McLaren suggested producing an exception if the result of an
operation cannot represent the NaN property (as is the case with
integers on most architectures). It would be interesting to see how
often existing programs would produce exceptions in that case, and if
the code that produced the exception would actually have worked as
intended if the code would actually have behaved as IEEE 754
specifies.

- anton
--
M. Anton Ertl Some things have to be seen to be believed
an...@mips.complang.tuwien.ac.at Most things have to be believed to be seen
http://www.complang.tuwien.ac.at/anton/home.html

Terje Mathisen

unread,

Mar 31, 2018, 11:24:00 AM3/31/18

to

Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:
>> Nick Maclaren wrote:
>>> In article <p9ldok$1h9j$1...@gioia.aioe.org>, Terje Mathisen
>>> <terje.m...@tmsw.no> wrote:
>>>> Before unordered comparisons, the only feasible way to test for NaN
>>>> was to see if it would fail two opposite compares.
>>>
>>> Yes. Which relies on no serious optimisation.
>>
>> Yeah, it becomes one of many patterns compilers have to recognize and
>> NOT optimize away. :-(
>
> What you call "optimization" or "serious optimization" is not real
> optimization, but miscompilation. Real optimization only replaces
> code by code that gives the same result under all circumstances. And
> since "a < 0 || a >= 0" does not produce "true" under all
> circumstances, real optimization cannot replace it with "true"; it can
> replace it with "isnan(a)", though.

isnan() didn't exist when 754 was first defined, right?

Your example would be "!isnan(a)", something like
"((a < 0.0) == false) && ((a >= 0.0) == false)" could be used for isnan(a).

>
> There is no need to recognize things that you cannot optimize; just
> limit yourself to transforming code that you can optimize into
> equivalent code and you are good.

I think we both know that the real problem with modern C(++) compilers
is with the way some of them insist on taking any existence of
"undefined behavior" and use it to taint everything it touches.

The worst I've heard about is taking "implementation-defined" and
deciding to define it as "undefined", thereby allowing this to also be
removed.

>
> Whether it is a good idea to define FP comparison such that well-known
> equivalences like "a < b <=> !(a>=b)" don't hold is a different
> discussion, though.

If the compiler writers don't know enough about 754 arithmetic, they
could be excused for thinking that

>
> IIRC Nick McLaren suggested producing an exception if the result of an
> operation cannot represent the NaN property (as is the case with
> integers on most architectures). It would be interesting to see how
> often existing programs would produce exceptions in that case, and if
> the code that produced the exception would actually have worked as
> intended if the code would actually have behaved as IEEE 754
> specifies.

Integer code that includes proper/correct exception handling?

I think I'd rather wait for Godot...
:-)

Anton Ertl

unread,

Mar 31, 2018, 12:16:16 PM3/31/18

to

Terje Mathisen <terje.m...@tmsw.no> writes:
>Anton Ertl wrote:
>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>> Nick Maclaren wrote:
>>>> In article <p9ldok$1h9j$1...@gioia.aioe.org>, Terje Mathisen
>>>> <terje.m...@tmsw.no> wrote:
>>>>> Before unordered comparisons, the only feasible way to test for NaN
>>>>> was to see if it would fail two opposite compares.
>>>>
>>>> Yes. Which relies on no serious optimisation.
>>>
>>> Yeah, it becomes one of many patterns compilers have to recognize and
>>> NOT optimize away. :-(
>>
>> What you call "optimization" or "serious optimization" is not real
>> optimization, but miscompilation. Real optimization only replaces
>> code by code that gives the same result under all circumstances. And
>> since "a < 0 || a >= 0" does not produce "true" under all
>> circumstances, real optimization cannot replace it with "true"; it can
>> replace it with "isnan(a)", though.
>
>isnan() didn't exist when 754 was first defined, right?

Yes, but nowadays it exists and a compiler could make the (corrected)
transformation (if the code is actually faster; gcc on Debian 8
produces a call to a library function).

>Your example would be "!isnan(a)"

Yes, sorry.

>> There is no need to recognize things that you cannot optimize; just
>> limit yourself to transforming code that you can optimize into
>> equivalent code and you are good.
>
>I think we both know that the real problem with modern C(++) compilers
>is with the way some of them insist on taking any existence of
>"undefined behavior" and use it to taint everything it touches.

That is certainly a problem. And this attitude and the advocacy
supporting it may help promote the idea that any change to a program
that improves the run-time can be called an "optimization" even if it
changes the outcome, but that idea is wrong. And it's worth pointing
out what an optimization is, and what is not.

>> Whether it is a good idea to define FP comparison such that well-known
>> equivalences like "a < b <=> !(a>=b)" don't hold is a different
>> discussion, though.
>
>If the compiler writers don't know enough about 754 arithmetic, they
>could be excused for thinking that

That's one problem of such surprising rules.

>> IIRC Nick McLaren suggested producing an exception if the result of an
>> operation cannot represent the NaN property (as is the case with
>> integers on most architectures). It would be interesting to see how
>> often existing programs would produce exceptions in that case, and if
>> the code that produced the exception would actually have worked as
>> intended if the code would actually have behaved as IEEE 754
>> specifies.
>
>Integer code that includes proper/correct exception handling?

I was actually thinking of no particular exception handling, leading
to the termination of the program, and a report to the developer.
However, in a production run it would probably be better to just
record the occurence, where it happened, and what the program input
was, and continue in the way specified by IEEE 754. And for real-time
programs, you don't want the reporting, lest we have another Ariane 5.

David Brown

unread,

Mar 31, 2018, 1:45:50 PM3/31/18

to

On 31/03/18 17:23, Terje Mathisen wrote:

>
> I think we both know that the real problem with modern C(++) compilers
> is with the way some of them insist on taking any existence of
> "undefined behavior" and use it to taint everything it touches.

I think a bigger problem is when some programmers think that particular
code has defined behaviour, and some compilers treat it in a consistent
manner, but the standards don't give a relevant definition. These
programmers are then surprised when a newer compiler, or one with
different settings, does not follow this same behaviour.

Programmers make mistakes, and they have misunderstandings - such as
misunderstanding what undefined behaviour actually means in C (and C++),
and the freedoms a compiler has for generating code. I don't think it
is a problem that compilers use those freedoms in general, in their
quest for more efficient code, but I think compilers could put more
emphasis into informing developers about what they are doing and thereby
aiding development. It is, IMHO, fine for a compiler to use the nature
of undefined behaviour in order to remove parts of the code - but I
would be a lot happier to see warning messages saying what was
happening. That is the path to satisfying everyone, I think.

>
> The worst I've heard about is taking "implementation-defined" and
> deciding to define it as "undefined", thereby allowing this to also be
> removed.
>

Do you have examples of this?

My biggest complaint about implementation defined behaviour is that
compilers are often quite lax in their documentation of it.

MitchAlsup

unread,

Mar 31, 2018, 1:57:00 PM3/31/18

to

On Saturday, March 31, 2018 at 11:16:16 AM UTC-5, Anton Ertl wrote:
>
> That is certainly a problem. And this attitude and the advocacy
> supporting it may help promote the idea that any change to a program
> that improves the run-time can be called an "optimization" even if it
> changes the outcome, but that idea is wrong. And it's worth pointing
> out what an optimization is, and what is not.

Can we start calling this "depessimization" instead of "Optimization" ?

Anton Ertl

unread,

Mar 31, 2018, 2:17:35 PM3/31/18

to

David Brown <david...@hesbynett.no> writes:
>On 31/03/18 17:23, Terje Mathisen wrote:
>
>>
>> I think we both know that the real problem with modern C(++) compilers
>> is with the way some of them insist on taking any existence of
>> "undefined behavior" and use it to taint everything it touches.
>
>I think a bigger problem is when some programmers think that particular
>code has defined behaviour, and some compilers treat it in a consistent
>manner, but the standards don't give a relevant definition.

I don't see a problem there.

>These
>programmers are then surprised when a newer compiler, or one with
>different settings, does not follow this same behaviour.

A responsible software maintainer does not change behaviour that users
make use of. See, e.g.,
<https://felipec.wordpress.com/2013/10/07/the-linux-way/>.
Unfortunately, there's an epidemic of irresponsibility among C
compiler maintainers.

>Programmers make mistakes, and they have misunderstandings - such as
>misunderstanding what undefined behaviour actually means in C (and C++),
>and the freedoms a compiler has for generating code. I don't think it
>is a problem that compilers use those freedoms in general, in their
>quest for more efficient code, but I think compilers could put more
>emphasis into informing developers about what they are doing and thereby
>aiding development. It is, IMHO, fine for a compiler to use the nature
>of undefined behaviour in order to remove parts of the code - but I
>would be a lot happier to see warning messages saying what was
>happening. That is the path to satisfying everyone, I think.

I would not be satisfied. My tested program distributed as source
code would still not work with the next version of the same compiler,
and it does not help much that the user who compiles it gets a warning
that tells him that the compiler is now miscompiling my program.
Sure, the user could then try to use the older, better version of the
compiler, but that may no longer be easily available. And it may
actually not be possible to compile the older version of the compiler
with the newer version.

Some people have suggested only compiling the source code with the
version of the compiler that it was tested on. That would mean to
distribute the program in binary form only; that's not a big burden
for proprietary software, but it's not an appropriate suggestion for
free software. So, through this irresponsibility of its maintainers,
gcc is becoming a compiler that's more appropriate for proprietary
software than for free software. Way to go!

Nick Maclaren

unread,

Mar 31, 2018, 2:31:18 PM3/31/18

to

In article <p9o96c$1r72$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>Anton Ertl wrote:
>> Terje Mathisen <terje.m...@tmsw.no> writes:
>>
>> What you call "optimization" or "serious optimization" is not real
>> optimization, but miscompilation. Real optimization only replaces
>> code by code that gives the same result under all circumstances. And
>> since "a < 0 || a >= 0" does not produce "true" under all
>> circumstances, real optimization cannot replace it with "true"; it can
>> replace it with "isnan(a)", though.

"Under all circumstances"? Oh, dear. Including for timing-dependent
code? Multi-thread timing-dependent code? Code that is attempting
to check for (illegal) updates of a location or race-conditions? Code
that is attempting to check for bugs in the implementation?

In all of the above cases, been there, done that. No, that's not a
useful definition of optimisation, despite what the "HLLs are just
assemblers with fancy syntax" people say. Mitch is right.

>> There is no need to recognize things that you cannot optimize; just
>> limit yourself to transforming code that you can optimize into
>> equivalent code and you are good.

And damn-near pointless. Essentially all of the optimisations possible
under those circumstances have been used in NON-optimising compilers
since the 1960s. The main reason that HLLs specified things as
undefined (i.e. illegal) is to give the compiler freedom to assume
they didn't happen. And, in genuinely high-level languages (e.g.
Fortran but not C or the parts of C++ inherited from C), the compiler
is given liberty to change the code to one that has an equivalent
effect for the semantics the language defines.

>I think we both know that the real problem with modern C(++) compilers
>is with the way some of them insist on taking any existence of
>"undefined behavior" and use it to taint everything it touches.

As undefined behavior is SUPPOSED to mean "illegal" (i.e. your code is
broken - fix it), that's reasonable. But see below.

>The worst I've heard about is taking "implementation-defined" and
>deciding to define it as "undefined", thereby allowing this to also be
>removed.

There was a long debate about this in WG14, but the UK failed to get
it specified whether or not it is permitted. I agree that it's shoddy,
and see below.

One root cause of this problem is languages that abuse the original
concept of "undefined" to include "may be used to provide undocumented
implementation extensions" and even "legally defined but with totally
undefined semantics". Or even, "that's clearly critical but too much
like hard work to specify, so just leave it woolly".

But there is another: the political failure of Algol 68. Seriously.
That attempted to specify its semantics in a positive sense (i.e. what
the syntax should do) but, since then, languages have been mainly
specified by defining the INTENT of the syntax and a set of rules
saying what is forbidden. And that has loopholes the way a mediaeval
granary has mice.

>> IIRC Nick McLaren suggested producing an exception if the result of an
>> operation cannot represent the NaN property (as is the case with
>> integers on most architectures). It would be interesting to see how
>> often existing programs would produce exceptions in that case, and if
>> the code that produced the exception would actually have worked as
>> intended if the code would actually have behaved as IEEE 754
>> specifies.
>
>Integer code that includes proper/correct exception handling?
>
>I think I'd rather wait for Godot...
>:-)

You called? :-) More seriously, that has been done, but I agree not
in any of the languages being assumed here.

And back to the question. I have checked through a few such codes,
and attempted to write them myself. None had less than a 20% error
rate in the occurrences of such constructs I noted, INCLUDING my own
code. I came to the conclusion that mere mortals could not use that
aspect of IEEE 754 reliably in open code. I can (just) do so in
relatively small, clean algorithms.

Regards,
Nick Maclaren.

Niklas Holsti

unread,

Mar 31, 2018, 2:38:00 PM3/31/18

to

On 18-03-31 18:43 , Anton Ertl wrote:
> Terje Mathisen <terje.m...@tmsw.no> writes:

...

>> Integer code that includes proper/correct exception handling?
>
> I was actually thinking of no particular exception handling, leading
> to the termination of the program, and a report to the developer.
> However, in a production run it would probably be better to just
> record the occurence, where it happened, and what the program input
> was, and continue in the way specified by IEEE 754. And for real-time
> programs, you don't want the reporting, lest we have another Ariane 5.

We certainly want the error *reporting*, but not the (unjustified)
program termination.

--
Niklas Holsti
Tidorum Ltd
niklas holsti tidorum fi
. @ .

David Brown

unread,

Mar 31, 2018, 4:50:54 PM3/31/18

to

On 31/03/18 19:57, Anton Ertl wrote:
> David Brown <david...@hesbynett.no> writes:
>> On 31/03/18 17:23, Terje Mathisen wrote:
>>
>>>
>>> I think we both know that the real problem with modern C(++) compilers
>>> is with the way some of them insist on taking any existence of
>>> "undefined behavior" and use it to taint everything it touches.
>>
>> I think a bigger problem is when some programmers think that particular
>> code has defined behaviour, and some compilers treat it in a consistent
>> manner, but the standards don't give a relevant definition.
>
> I don't see a problem there.
>
>> These
>> programmers are then surprised when a newer compiler, or one with
>> different settings, does not follow this same behaviour.
>
> A responsible software maintainer does not change behaviour that users
> make use of.

That's nice in theory. But what if you have a hundred thousand users,
of which a large proportion know what they are doing, and want to get
the highest quality results from their source code - while a certain
proportion blatantly refuses to learn and obey the rules for the
software? There is no way a compiler can progress and make improvements
while retaining the expected behaviour for all its users - it can't
happen. The best you can do, and the way gcc does it, is to make sure
that new optimisations are configurable. They provide extra flags, and
disabled optimisation modes, precisely to please people or programs that
get things wrong in their C coding. Disabled optimisation is even the
default mode! It is hard to comprehend why a few (but loud) people
complain about this - if you don't like the gcc optimisations because
they don't follow your personal assumptions and concepts of what C
should be, then don't use them. Problem solved, I would have thought.

(Note that I don't agree with all the ways C works. Some of its
undefined or implementation defined behaviour could have been fully
defined, and certain behaviours could have been defined in better ways.
That is not the issue here.)

> See, e.g.,
> <https://felipec.wordpress.com/2013/10/07/the-linux-way/>.

This is a /totally/ different situation - and you would make a better
argument if you understood that. The "Linux way" is that you don't
change the public interface, as that is the contract between the Linux
user (or programmer) and the Linux kernel (and kernel programmers). In
the C world, the equivalent is /not/ a compiler, but the C standards.
And the C standards committee are very careful about not changing the
public interface - the C standards - without very good reason, and
without very careful consideration to avoid breaking existing code. The
C world follows exactly the same philosophy here as the Linux world.

What you are asking for is that the C compiler makers (gcc in
particular, I believe) should understand what different programmers mean
by code that does not have a defined behaviour specified in either the C
standards, the compiler documentation, or other standards (like POSIX).
This is /not/ an easy task. Sometimes it looks like you could make a
good guess about what the programmers mean - often this is only an
illusion or will only apply in some circumstances. In particular, there
is a common mistake in thinking that "I write what I mean and I mean
what I write". In C, however, code is moved around, copied, deleted,
etc., due to macros, inlining, LTO, and other manipulation. The general
case is never quite as simple as it looks, and most people want the
compiler to be able to do this sort of manipulation in order to get more
efficient results.

Could compilers be more helpful here? Absolutely. Do they make
optimisations that can break common - but unwritten and invalid -
assumptions, even though the speed gains are very small? Absolutely.
Could they do a better job of informing people about these situations,
or being more careful about which optimisations are enabled in common
settings (up to -O2)? Absolutely.

But do compiler makers add such optimisations because they are evil?
Because they care more about benchmark results than real code? Because
they don't test on real code, or talk to real programmers? No, that is
conspiracy theory nonsense, and that kind of drivel is
counter-productive in the grand aim of all this, which must surely be to
help programmers get their source code correct and generate efficient
object code.

> Unfortunately, there's an epidemic of irresponsibility among C
> compiler maintainers.

See my comment about counter-productive conspiracy theories.

>
>> Programmers make mistakes, and they have misunderstandings - such as
>> misunderstanding what undefined behaviour actually means in C (and C++),
>> and the freedoms a compiler has for generating code. I don't think it
>> is a problem that compilers use those freedoms in general, in their
>> quest for more efficient code, but I think compilers could put more
>> emphasis into informing developers about what they are doing and thereby
>> aiding development. It is, IMHO, fine for a compiler to use the nature
>> of undefined behaviour in order to remove parts of the code - but I
>> would be a lot happier to see warning messages saying what was
>> happening. That is the path to satisfying everyone, I think.
>
> I would not be satisfied. My tested program distributed as source
> code would still not work with the next version of the same compiler,
> and it does not help much that the user who compiles it gets a warning
> that tells him that the compiler is now miscompiling my program.

You have three options. One is to write code that does not rely on
undefined behaviour or undocumented behaviour - stop writing code that
is based on assumptions that coincidentally happened to be true when you
tested on older tools. Two is to work with compiler developers towards
documenting the assumptions you want and making them a feature of the
tools - compiler writers are open to that. Three is to complain
endlessly about it to people who are not in a position to change things,
using wild exaggerations and unjustified claims about miscompliation.
(If it really /were/ a miscompilation, you'd file a bug with gcc and
they would work towards fixing it.)

> Sure, the user could then try to use the older, better version of the
> compiler, but that may no longer be easily available. And it may
> actually not be possible to compile the older version of the compiler
> with the newer version.
>
> Some people have suggested only compiling the source code with the
> version of the compiler that it was tested on.

That is appropriate in some circumstances (it is important for the kind
of embedded development work I do), and inappropriate in other cases.

> That would mean to
> distribute the program in binary form only; that's not a big burden
> for proprietary software, but it's not an appropriate suggestion for
> free software. So, through this irresponsibility of its maintainers,
> gcc is becoming a compiler that's more appropriate for proprietary
> software than for free software. Way to go!
>

Have you ever thought that constructive dialogue with the gcc developers
might be more helpful than accusations?

MitchAlsup

unread,

Mar 31, 2018, 5:57:02 PM3/31/18

to

On Saturday, March 31, 2018 at 1:17:35 PM UTC-5, Anton Ertl wrote:
> A responsible software maintainer does not change behaviour that users
> make use of.

And a responsible software environment has attached test cases to check
if the new compiler (or run time library) changes the meaning of the
maintained software.

MitchAlsup

unread,

Mar 31, 2018, 6:11:29 PM3/31/18

to

On Saturday, March 31, 2018 at 3:50:54 PM UTC-5, David Brown wrote:
> On 31/03/18 19:57, Anton Ertl wrote:
> > A responsible software maintainer does not change behaviour that users
> > make use of.
>
> That's nice in theory. But what if you have a hundred thousand users,
> of which a large proportion know what they are doing, and want to get
> the highest quality results from their source code - while a certain
> proportion blatantly refuses to learn and obey the rules for the
> software?

I don't think you have the proportions right, here. Lets say we have a
large software package and several hundred thousand users. My guess is that fewer than 10% actually want to get the highest quality result, and that
80% just want something that is not completely farcical. The other 10%
probably try to use the software a few times and give up.

> There is no way a compiler can progress and make improvements
> while retaining the expected behaviour for all its users - it can't
> happen. The best you can do, and the way gcc does it, is to make sure
> that new optimisations are configurable. They provide extra flags, and
> disabled optimisation modes, precisely to please people or programs that
> get things wrong in their C coding.

All well and good. But what about documenting what the flags to depessi-
mization do using words the non-compiler aficionado can understand?

I wrote optimizing compilers for 7 years before switching into HW design,
I still can't read the compiler optim guide. I bet the majority are worse
than I, so flag choice boils down to run hundreds (possibly thousands) of
compilations and test runs to figure out which flags make the code faster
without breaking stuff and which make the code slower of break stuff.

> Disabled optimisation is even the
> default mode! It is hard to comprehend why a few (but loud) people
> complain about this - if you don't like the gcc optimisations because
> they don't follow your personal assumptions and concepts of what C
> should be, then don't use them. Problem solved, I would have thought.

And then there is the case where one has produced a quality software package, with robust arithmetic, and careful selection of compiler
flags being compared to another software package written with no concern
towards numerics, and only concerns towards speed. Way too often, the
later gets more sales than the former. But this is not a compiler problem.

Quadibloc

unread,

Mar 31, 2018, 10:53:37 PM3/31/18

to

If using numerics carefully with proper attention to flags and the
like has a performance penalty of more than about 10%, compared to code
written with no concern but speed, that is a technical problem.

Not, mind you, a _compiler_ problem. Rather, it's likely a problem with
the way floating-point has been implemented in the hardware.

I realize that your point concerned a lack of discerning buyers in the
marketplace. But one can only expect so much.

Anton Ertl

unread,

Apr 1, 2018, 8:47:22 AM4/1/18

to

What do you want to call "depessimization": changes that change the
outcome, or changes that don't? And why do you want to call them that?

Nick Maclaren

unread,

Apr 1, 2018, 9:03:24 AM4/1/18

to

In article <2018Apr...@mips.complang.tuwien.ac.at>,

Anton Ertl <an...@mips.complang.tuwien.ac.at> wrote:
>MitchAlsup <Mitch...@aol.com> writes:
>>On Saturday, March 31, 2018 at 11:16:16 AM UTC-5, Anton Ertl wrote:
>>>
>>> That is certainly a problem. And this attitude and the advocacy
>>> supporting it may help promote the idea that any change to a program
>>> that improves the run-time can be called an "optimization" even if it
>>> changes the outcome, but that idea is wrong. And it's worth pointing
>>> out what an optimization is, and what is not.
>>
>>Can we start calling this "depessimization" instead of "Optimization" ?
>
>What do you want to call "depessimization": changes that change the
>outcome, or changes that don't? And why do you want to call them that?

As I understand Mitch, changes that do not change the outcome, even
for erroneous use. And the reason to call it that is simple: there
was (and probably still is) a massive difference between non-optimising
compilers (including optimising ones with optimisation switched off).
Some produced highly inefficient code, often because they used crude
macro-generation, and the main optimisation developments of the 1950s
and 1960s were to avoid that, and make best use of registers (both in
their allocation and in removing redundant stores), storage layout,
choice of instructions etc. But that was and is standard in the better
NON-optimising compilers.

Since then, of course, we have got shared-memory threading, which means
that removing redundant stores, storage layout and even the choice of
instruction CAN change the outcome. So the remaining optimisations that
never change the outcome are simply not generating gratuitously poor
code. Hence de-pessimisation. Mitch didn't invent the term, unless he
has been using it a lot longer than I think he has.

Regards,
Nick Maclaren.

Terje Mathisen

unread,

Apr 1, 2018, 9:45:01 AM4/1/18

to

Nick Maclaren wrote:
> As I understand Mitch, changes that do not change the outcome, even
> for erroneous use. And the reason to call it that is simple: there
> was (and probably still is) a massive difference between
> non-optimising compilers (including optimising ones with optimisation
> switched off). Some produced highly inefficient code, often because
> they used crude macro-generation, and the main optimisation

(*)

> developments of the 1950s and 1960s were to avoid that, and make best
> use of registers (both in their allocation and in removing redundant
> stores), storage layout, choice of instructions etc. But that was
> and is standard in the better NON-optimising compilers.

Right.

>
> Since then, of course, we have got shared-memory threading, which
> means that removing redundant stores, storage layout and even the
> choice of instruction CAN change the outcome. So the remaining
> optimisations that never change the outcome are simply not generating
> gratuitously poor code. Hence de-pessimisation. Mitch didn't invent
> the term, unless he has been using it a lot longer than I think he
> has.

The original Borland Pascal v1.0, as written by Anders H, is still one
of the most impressive pieces of sw I have ever seen (37 kB for a
full-screen text editor, compiler, linker, rtl and debugger).

It was very close to pessimal, using fixed macros for every language
construct and zero register variables, and it still ran faster than all
the available alternatives except pure asm.

Nick Maclaren

unread,

Apr 1, 2018, 10:14:56 AM4/1/18

to

In article <p9qnoq$1gt3$1...@gioia.aioe.org>,

Terje Mathisen <terje.m...@tmsw.no> wrote:
>
>The original Borland Pascal v1.0, as written by Anders H, is still one
>of the most impressive pieces of sw I have ever seen (37 kB for a
>full-screen text editor, compiler, linker, rtl and debugger).

That's definitely impressive - even by the code-compactness standards
of the pre-VLSI days. I never used it but, by all accounts, it was
one of the main reasons for the IBM PC's success. It was a pity that
it ignored the standard so badly.

>It was very close to pessimal, using fixed macros for every language
>construct and zero register variables, and it still ran faster than all
>the available alternatives except pure asm.

On the other hand, that was not a high hurdle! If it had had to compare
with (say) the KDF9 compilers, that would be very different. And fixed
macros worked better on 1960s mainframes (and similar microprocessors
like the 8086) than on faster and fancier machines. They didn't do
too badly even on the System/370 range, if done right (which was a
fairly big 'if').

Regards,
Nick Maclaren.

Tim Rentsch

unread,

Apr 1, 2018, 11:52:57 AM4/1/18

to

an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:

> A responsible software maintainer does not change behaviour that users
> make use of. See, e.g.,
> <https://felipec.wordpress.com/2013/10/07/the-linux-way/>.

Interesting article. Thank you for posting it.

> Unfortunately, there's an epidemic of irresponsibility among C
> compiler maintainers.

I can't completely agree with this reaction. In some ways, sure,
but for choices that are allowed because of undefined behavior the
question is not so black-and-white. Some of the responsibility
belongs to the ISO C standard (and the people who produce it).
Unfortunately it's a difficult problem; I know there is interest
in the ISO C group to find a middle ground, somewhere between
unspecified behavior and undefined behavior, but it isn't easy to
find that. For example, consider this reasonable-sounding rule:
no library interface should ever result in undefined behavior, not
counting things like bad pointer inputs (and null pointers should
never be in the set of bad inputs). But what about printf()? In
printf() we have an interface with large parts of the input domain
that give undefined behavior. POSIX takes advantage of this to
define the behavior of positional format specifications, which are
quite useful in some contexts. But, and here is the important
part, formats other than those allowed in the POSIX spec /are
still undefined behavior/. Moreover that freedom is important, to
allow further extensions to be added at some later date.

I should add that I am mostly on your side. I think what compiler
writers are doing with so-called "aggressive optimization" belongs
more to the problem set than the solution set. But solving the
problem has to include getting changes made to the ISO C standard,
so that compiler writers have no choice if they want their stuff
to be conforming. I know doing that is not an easy task;
ultimately though it seems unavoidable if we are to get things to
improve.

Anton Ertl

unread,

Apr 1, 2018, 11:53:09 AM4/1/18

to

David Brown <david...@hesbynett.no> writes:
>On 31/03/18 19:57, Anton Ertl wrote:
>> David Brown <david...@hesbynett.no> writes:
>>> On 31/03/18 17:23, Terje Mathisen wrote:
>>>
>>>>
>>>> I think we both know that the real problem with modern C(++) compilers
>>>> is with the way some of them insist on taking any existence of
>>>> "undefined behavior" and use it to taint everything it touches.
>>>
>>> I think a bigger problem is when some programmers think that particular
>>> code has defined behaviour, and some compilers treat it in a consistent
>>> manner, but the standards don't give a relevant definition.
>>
>> I don't see a problem there.
>>
>>> These
>>> programmers are then surprised when a newer compiler, or one with
>>> different settings, does not follow this same behaviour.
>>
>> A responsible software maintainer does not change behaviour that users
>> make use of.
>
>That's nice in theory. But what if you have a hundred thousand users,
>of which a large proportion know what they are doing, and want to get
>the highest quality results from their source code - while a certain
>proportion blatantly refuses to learn and obey the rules for the
>software?

Yes, C programmers generally know what they are doing; they write
programs that work acceptably with the C compilers that they have
during development. The problem is compiler maintainers who blatantly
refuse to accept this, and claim that most of these programs (even gcc
and llvm themselves, see <https://blog.regehr.org/archives/761>) are
buggy, and that they are therefore free to miscompile them in any way
they can conceive (aka they want to produce code that summons nasal
demons).

>There is no way a compiler can progress and make improvements
>while retaining the expected behaviour for all its users - it can't
>happen.

There are many ways, but they tend to be ignored by the current
generation of C compiler maintainers, because they are so in love with
their nasal demons that they prefer to introduce new ways to summon
them, and work around the bugs introduced by the ensuing complexity
(e.g., see
<https://pldi17.sigplan.org/event/pldi-2017-papers-taming-undefined-behavior-in-llvm>).

As an example of a possible improvement, consider the following the
for x being at the minimal value of it's type:

x-1>=x

On AMD 64 gcc-5.2.0 -fwrapv compiles this (with long x) to

48 8d 47 ff lea -0x1(%rdi),%rax
48 39 c7 cmp %rax,%rdi
7f 06 jg ...

Nasal demon lovers would suggest that I change this to

x==LONG_MIN

This is not only type-specific, it also generates longer code:

48 b8 00 00 00 00 00 00 00 80 movabs $0x8000000000000000,%rax
48 39 c7 cmp %rax,%rdi
75 05 jne ...

A compiler could improve on that while retaining the intended
behaviour by compiling both variants to

48 ff cf dec %rdi
71 05 jno ...

But unfortunately, the gcc maintainers ignored this optimization
opportunity, prefering to spend time on introducing nasal demons.

>The best you can do, and the way gcc does it, is to make sure
>that new optimisations are configurable. They provide extra flags, and
>disabled optimisation modes, precisely to please people or programs that
>get things wrong in their C coding. Disabled optimisation is even the
>default mode! It is hard to comprehend why a few (but loud) people
>complain about this - if you don't like the gcc optimisations because
>they don't follow your personal assumptions and concepts of what C
>should be, then don't use them. Problem solved, I would have thought.

The problem is not solved, for several reasons:

1) Some versions of gcc introduce nasal demons even at default
optimization. More generally, gcc does not give any better
guarantees about the language it supports for default optimization
than for any other optimization level.

2) Even if the behaviour was guaranteed not to change with disabled
optimization, gcc with disabled optimization produces code that is
several times slower than what an earlier gcc version produces at a
higher optimization level (and where the code behaves as intended).

Is the goal of all this work on nasal demon "optimizations" really
to make all programs several times slower. And yes, it would
affect all programs: Which real-world application is guaranteed not
to contain any undefined behaviours? So they would all have to be
compiled with optimization disabled in order to be safe against
future gcc versions. But actually, even with optimizations
disabled, they would not be safe, see 1).

What gcc does provide, and what is moderately workable is flags for
disabling individual nasal demon assumptions. E.g., in my work with
gcc-5.2.0 I used

-fno-aggressive-loop-optimizations -fno-unsafe-loop-optimizations
-fno-delete-null-pointer-checks -fno-strict-aliasing
-fno-strict-overflow -fno-isolate-erroneous-paths-dereference -fwrapv

I am sure that the number of flags has increased in the meantime.

The problem with this is that a given application does not know which
such flags the gcc maintainers will introduce in the future, so they
cannot make their program proof against new nasal demons in future
versions of gcc. It is also not easy to see which of the large number
of -f flags that gcc provides protects against nasal demons.

So it would be nice to have a flag, say,
-fnasal-demons/-fno-nasal-demons that enables or disables all such
assumptions (ideally -fno-nasal-demons would be the default, making
programs that were written before the introduction of this flag
compile as intended).

Alternatively, define language levels: Say, if the program works as
intended on gcc-4.9, -std=gcc-4.9 would give me the behaviour
supported by gcc-4.9, while -std=gcc-6.3 would give me the smaller set
of behaviour that the gcc-6.3 language level supports.

>> See, e.g.,
>> <https://felipec.wordpress.com/2013/10/07/the-linux-way/>.
>
>This is a /totally/ different situation - and you would make a better
>argument if you understood that. The "Linux way" is that you don't
>change the public interface, as that is the contract between the Linux
>user (or programmer) and the Linux kernel (and kernel programmers). In
>the C world, the equivalent is /not/ a compiler, but the C standards.

In the Unix world, the equivalent of the C standard is the POSIX
standard: a specification of the intersection of various
implementations. And if the Linux maintainers were as irresponsible
as the gcc maintainers, they would declare all user-level programs
that do something beyond what POSIX guarantees as buggy, and would
feel free to break them with every new release.

But the Linux maintainers actually are responsible and do not take
this position. Instead, they take the position that a (real, not
theoretical) binary that worked on a previous version must continue to
work in future version. In consequence, if a binary worked in an
earler version of Linux, and is broken by a new version, and that is
reported as a bug, they don't try to talk the reporter into submission
with verbiage like "while a certain proportion blatantly refuses to
learn and obey the rules for the software" or "programs that get
things wrong". They just fix the bug!

And likewise, for a given C compiler, the standard that its
maintainers should apply is the behaviour of earlier versions of this
C compiler.

>What you are asking for is that the C compiler makers (gcc in
>particular, I believe) should understand what different programmers mean
>by code that does not have a defined behaviour specified in either the C
>standards, the compiler documentation, or other standards (like POSIX).
>This is /not/ an easy task.

On the contrary, it's easy: The compiler chose a particular behaviour
in its previous versions. That's the meaning.

>But do compiler makers add such optimisations because they are evil?
>Because they care more about benchmark results than real code?

In your words: Absolutely.

>Because
>they don't test on real code, or talk to real programmers?

The image I get after reading a lot of stuff by nasal demon advocats,
in particular
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>,
but also Usenet postings by gcc maintainers and others, and seeing
some of the bug reports against gcc (and how the gcc maintainers
react), is this:

The gcc and LLVM compiler maintainers believe (often with little or no
evidence) in vast speedups possible by giving as few guarantees as
possible, and undefined behaviour in the C standard is therefore much
loved by them (and so important to them that it even gets its own
acronym: UB).

Benchmark code (even with undefined behaviour, as present in pretty
much every program) is necessary to be able to show off what one has
done. Real code from real programmers is, at best, seen as a
necessary evil. When the compiler maintainers break their programs,
that worked in a previous version, and this is reported, they find
some case of "undefined behaviour", and reject the bug as invalid.

There seem to be some factors that moderate their zest for summoning
nasal demons. I guess that the people who sign their paychecks get
annoyed when they break code that is important to those people, so,
from what I have read, the gcc maintainers compile many packages of a
Linux distribution before releasing a new version these days. And
maybe not all of the gcc and LLVM compiler maintainers believe in the
large power of nasal demons, but those that do have enough influence
that the overall outcome is as described above, and when communicating
with programmers, or discussing these issues, the belief system comes
through, e.g., in phrases like "code written by people who should
never have been allowed to touch a keyboard"
<https://www.sourceware.org/bugzilla/show_bug.cgi?id=12518#c4> (in
this case wrt a standard C library function by a library maintainer,
so it is not limited to compiler maintainers).

>No, that is
>conspiracy theory nonsense, and that kind of drivel is
>counter-productive in the grand aim of all this, which must surely be to
>help programmers get their source code correct and generate efficient
>object code.

Given that the nasal demon stuff helps with neither objective (except
for the notion of "correct" that nasal demon advocates often use), it
is obvious that these are not the primary objectives of the gcc and
LLVM maintainers.

Concerning "conspiracy theory": The belief system of the LLVM and gcc
maintainers has been publically outlined in
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>,
and by many nasal demon advocacy posts, e.g., yours. There is no
conspiracy and no theory about it: It's publically documented.

>Two is to work with compiler developers towards
>documenting the assumptions you want and making them a feature of the
>tools - compiler writers are open to that.

I have yet to see evidence of that.

>(If it really /were/ a miscompilation, you'd file a bug with gcc and
>they would work towards fixing it.)

My experience is that they mark it as invalid for some reason or other
(but containing "undefined behaviour" is a favourite), and
consequently don't fix it.

>> That would mean to
>> distribute the program in binary form only; that's not a big burden
>> for proprietary software, but it's not an appropriate suggestion for
>> free software. So, through this irresponsibility of its maintainers,
>> gcc is becoming a compiler that's more appropriate for proprietary
>> software than for free software. Way to go!
>>
>
>Have you ever thought that constructive dialogue with the gcc developers
>might be more helpful than accusations?

Yes, once upon a time I thought that. Experience has proved
otherwise.

Actually, it's more complicated: In the beginning I had constructive
dialogue with the gcc maintainer. Maintainership changed, and the
dialogue aspect became less productive. At some point, it became
obvious that reporting bugs is pointless. I still had dialogue with a
gcc maintainer, but he pretended not to understand what the intended
behaviour of a program is. And various sources (including
<http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html>)
claimed significant performance improvements from nasal demons. So I
wrote

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_2015_submission_29.pdf

to discuss all that. There were complaints that this did not explain
the intended behaviour clearly enough, so I wrote

http://www.kps2017.uni-jena.de/proceedings/kps2017_submission_5.pdf

in order to clarify that. If, in your filter bubble, that is all just
"accusations", "endless complaining", "wild exaggerations" and
"unjustified claims", then I see no way to engage in a constructive
dialogue with you.

Tim Rentsch

unread,

Apr 1, 2018, 11:57:18 AM4/1/18

to

n...@wheeler.UUCP (Nick Maclaren) writes:

> In article <p9o96c$1r72$1...@gioia.aioe.org>,
> Terje Mathisen <terje.m...@tmsw.no> wrote:
>
>> The worst I've heard about is taking "implementation-defined" and
>> deciding to define it as "undefined", thereby allowing this to also be
>> removed.
>
> There was a long debate about this in WG14, but the UK failed to get
> it specified whether or not it is permitted.

According to meeting notes, paragraph 3 of section 4 in C99 was
added specifically to express that unspecified behavior (which
includes implementation-defined behavior) could not be as
far-ranging as being undefined behavior.

Anton Ertl

unread,

Apr 1, 2018, 12:14:55 PM4/1/18

to

I assume you mean a software project. Yes, having a test suite is
good, but not a panacea:

* What should the user do when he sees that the package he is
compiling does not pass the test suite?

* There has been at least one case where a library behaved differently
depending on the hardware
(<https://www.sourceware.org/bugzilla/show_bug.cgi?id=12518#c10>).
One cannot catch that reliably when building the binary. Should the
test suite be run every time the binary is started?

* There are some things that cannot be tested by its nature. E.g.,
OpenSSH erased secret keys as a defense-in-depth measure against a
possible (but unknown at programming time) vulnerability that would
allow access to that memory. Given that no such vulnerability was
known, how would one test that the code generated by the compiler
actually does erase the secret key. (As it happened, the compiler
actually did not generate code for erasing the key, for nasal demon
reasons, and a vulnerability actually was found, and the secret keys
were compromised).

* And really, even if it was possible, testing against all kinds of
miscompilation in addition to testing against programming errors
appears to be a harder job than just testing against programming
errors.

Walter Banks

unread,

Apr 1, 2018, 12:51:06 PM4/1/18

to

As someone who has done a significant amount of compiler development
there really never enough testing. Once past sanity tests and detailed
test suites a lot can be gained just by running regression tests even if
the tests are not executed. Detailed metrics alone can be a very
revealing indication of significant compiler problems.

This can be especially true while developing optimization a surprising
number of new optimizations do not have the intended effect on old
functioning programs.

w..

w..

Anton Ertl

unread,

Apr 1, 2018, 1:19:40 PM4/1/18

to

Tim Rentsch <t...@alumni.caltech.edu> writes:
>an...@mips.complang.tuwien.ac.at (Anton Ertl) writes:
>
>> A responsible software maintainer does not change behaviour that users
>> make use of. See, e.g.,
>> <https://felipec.wordpress.com/2013/10/07/the-linux-way/>.
>
>Interesting article. Thank you for posting it.
>
>> Unfortunately, there's an epidemic of irresponsibility among C
>> compiler maintainers.
>
>I can't completely agree with this reaction. In some ways, sure,
>but for choices that are allowed because of undefined behavior the
>question is not so black-and-white. Some of the responsibility
>belongs to the ISO C standard (and the people who produce it).

Many people argue that, but I blame the compiler maintainers (compiler
maintainers obviously love to point to the standard comittee for
responsibility).

Why the ISO C standard is not responsible: It does not require
compilers to change behaviour with every release, so it's completely
the choice of the compiler maintainers whether they want to go keep
the behaviour or change it.

Just compare with the Linux case: The POSIX standard is just as lax as
ISO C, yet the Linux maintainers support all previously working
programs, not just those few that fit in the narrow confines of POSIX.

>Unfortunately it's a difficult problem;

Yes, tightening ISO C so much that a compiler maintainer is forced to
be responsible in order to be compliant is not going to happen; I
think that the necessary measures would go against the spirit of the
language.

Some people tried to tighten C somewhat, but there has been no success
yet. Here's a post-mortem of one
<https://blog.regehr.org/archives/1287>.

What might be possible, and may be enough to make responsible
behaviour by compiler maintainers attractive enough is to tighten the
standard just enough that nasal demon behaviour is not possible. I am
not an expert on the nuances of non-standard behaviour in C, but
changing all occurences of "undefined" to "implementation-defined"
might achieve that.

But as long as compiler maintainers love nasal demons, such a proposal
would have no chance in the standards committee. And once they turn
away from nasal demons, the battle is won and whether the standard
follows or not is not important.

>I should add that I am mostly on your side. I think what compiler
>writers are doing with so-called "aggressive optimization" belongs
>more to the problem set than the solution set. But solving the
>problem has to include getting changes made to the ISO C standard,
>so that compiler writers have no choice if they want their stuff
>to be conforming. I know doing that is not an easy task;
>ultimately though it seems unavoidable if we are to get things to
>improve.

I don't think that's possible. But fortunately, it's not necessary.

Nick Maclaren

unread,

Apr 1, 2018, 1:50:00 PM4/1/18

to

In article <kfnvada...@x-alumni2.alumni.caltech.edu>,

Tim Rentsch <t...@alumni.caltech.edu> wrote:
>
>>> The worst I've heard about is taking "implementation-defined" and
>>> deciding to define it as "undefined", thereby allowing this to also be
>>> removed.
>>
>> There was a long debate about this in WG14, but the UK failed to get
>> it specified whether or not it is permitted.
>
>According to meeting notes, paragraph 3 of section 4 in C99 was
>added specifically to express that unspecified behavior (which
>includes implementation-defined behavior) could not be as
>far-ranging as being undefined behavior.

Yes, it was :-( And, if you can work out what on earth it means,
you are wiser than the whole of WG14. Inter alia, it conflicts with
paragraph 3: "Undefined behavior is otherwise indicated ... by the
omission of any explicit definition of behavior."

Oh, yes, there ARE cases (plenty of them) where unspecified behaviour
is clearly not erroneous, and many more where unspecified values aren't,
but those words don't help. Not even all of the cases mentioned in
Annex J are clearly not erroneous.

Regards,
Nick Maclaren.

MitchAlsup

unread,

Apr 1, 2018, 3:02:07 PM4/1/18

to

On Sunday, April 1, 2018 at 8:03:24 AM UTC-5, Nick Maclaren wrote:
> In article <2018Apr...@mips.complang.tuwien.ac.at>,

> Anton Ertl <?????@mips.complang.tuwien.ac.at> wrote:

I have been using depessimization since about 1985

Quadibloc

unread,

Apr 1, 2018, 3:06:21 PM4/1/18

to

It is legitimate to say that if a language specification states,
explicitly, that if you do such-and-so, what might happen is not
specified (undefined) or may vary from one machine and compiler to
another (implementation-dependent), then, as this is a thing that cannot
be relied upon, it may be dispensed with if that helps the compiler
generate faster code.

But it is also the case that many C programmers have, for a long time,
been using compilers which did not have advanced optimization
technology, so that certain now undefined behaviors could be relied upon.

Not to mention that the language specification has been updated over the years.

Having to use a special compiler option to make old programs work...
is better than nothing, but I can understand why it is felt to be not good enough.

On the other hand, changing the language spec so that all the old
tricks still work - at the cost of reducing the available level of
optimization for programs that don't use any tricks - well, I can see why that's rejected too.

As I've pointed out before, it seems like both sides could only be made
happy if each side had its own language to define.

But that has its own problem; one side, at least, will lose the
benefits of its language being C, the universal language!

Well, here's an idea. It's unlikely to be adopted, because it will be
viewed as a victory for your side.

Define C as offering, explicitly in its definition, all those behaviors
that programmers of old relied upon. Define subset C as a well-behaved
language without that stuff, that is as easy to optimize as Pascal or FORTRAN.

When you turn up the optimization switch on a compiler, that is not to
change which language it compiles. If you have a program in Subset C, you
have to use a different compiler switch to say so - and then the
various optimization levels will likely run faster.

There. Everyone happy.

gg...@yahoo.com

unread,

Apr 1, 2018, 10:26:18 PM4/1/18

to

> Some people tried to tighten C somewhat, but there has been no success
> yet. Here's a post-mortem of one
> <https://blog.regehr.org/archives/1287>.

You are tilting at windmills over an obsolete language.
Let C/C++ die, it had a time and a place, now it is time to move on.

20 years ago C was the top programming language, now it barely clings on in the bottom of the top 10.
https://insights.stackoverflow.com/survey/2017
http://pypl.github.io/PYPL.html?country=

Swift/Kotlin is the future.
Apple created Swift to replace Object C, then IBM got on board when they saw the compiler and the productivity improvement.
Swift uses the same memory layout so you can pass data structures back and forth with C code.

Now Google is pushing Kotlin hard as a replacement for Java.
http://steve-yegge.blogspot.com/2017/05/why-kotlin-is-better-than-whatever-dumb.html

Kotlin is nearly identical to Swift, with a nice IDE that corrects your mistakes like XCode does.
http://nilhcem.com/swift-is-like-kotlin/

Quit living in the past and learn a real programming language.

You will thank me after you have used the IntelliJ IDEA to write code.
http://kotlinlang.org/docs/tutorials/getting-started.html

Terje Mathisen

unread,

Apr 2, 2018, 8:17:53 AM4/2/18

to

Walter Banks wrote:
> As someone who has done a significant amount of compiler development
> there really never enough testing. Once past sanity tests and detailed
> test suites a lot can be gained just by running regression tests even if
> the tests are not executed. Detailed metrics alone can be a very
> revealing indication of significant compiler problems.

Right.

>
> This can be especially true while developing optimization a surprising
> number of new optimizations do not have the intended effect on old
> functioning programs.

As someone who has done a significant amount of optimization I can tell
you that for any already-optimized program, finding new significant
improvements is often a case of trying ten different things that all can
be showed will help in isolation (i.e. tested on micro benchmarks) and
realizing that 9 of them actually slow down the full program and
(hopefully!) the ast one gives a percent or two.

BTDT. :-(