The last sentence of the Abstract of the Standard says:
A posit compliant system may be realized using software or hardware or any combination.
The functions in Section 5 will almost always be supported with a few in native hardware, but the others will be math library calls. I know CORDIC algorithms have been put into microprocessors (more like a microcode sequence, presented to the user as
a single instruction), but the Working Group certainly isn't suggesting that you have to have a hardware instruction, say, for the arc hyperbolic cosine to be posit compliant!
The fMM (fused multiply-multiply) instruction is envisioned as a library call, performed with integer instructions. A posit-posit multiply, like a float-float multiply, requires
2 decodings into exponent and significand (as integers) (performed concurrently)
1 integer summation of exponents and 1 integer product of significands (performed concurrently)
1 encoding back to exponent-significand format
1 round-to-nearest, tie-to-even
With pipelined hardware, that can be done as quickly as 3 or 4 clock cycles. In SoftPosit, we see it taking about 20 clock cycles on a modern x86 processor. For 32-bit posits (up to 28-bit significands), a 32-by-32 integer multiplier is more than enough.
For fMM, the software would do 3 decodings, 2 summations of exponents, 2 or 3 integer multiplies (3 if the multiplier precision is not sufficient, so that a double-word product is needed), then 1 encoding and 1 rounding. Maybe 30 clock cycles.
The quire allows a program to stay in the "math layer" for billions of summation operations or summation-of-products operations, without rounding or overflow. That restores the associative and commutative property of addition, until of course the sum is
rounded to bring it back to the "compute layer". What the fMM(a,
b, c) operation does is provide a little bit of that same capability for multiplication, since it rounds the
exact value of a × b ×
c. It's especially helpful when either multiply sends the product into the range where posits have low accuracy, yet both products together have a result in the high-accuracy range.
It's not required for programmers to use fMM if their computer can do two rounded multiplies in hardware faster than a math library call to do it exactly in software, and they prefer speed to numerical safety. Speed is what people prefer,
in my experience. Also, compilers are forbidden (in the Posit Standard) from automatically generating fused operations unless explicitly requested in the source code, so you won't get the fMM operation unless you ask for it. Though there is discussion right
now in the Working Group about this.
I should probably post the latest version of the Standard (4.9, the one I attached to a recent email) on
posithub.org, but I have been holding off until we can resolve what seems to be the only remaining issue, which is how to specify when posit rounding happens in a computer language. For example, if a programmer writes
should the product p1 * p2 be performed exactly and then added to q with no rounding, or do the productions of the language not include the recognition of that combination of precision and operator and thus p1 * p2 would be rounded to a posit, then that
posit added to the quire. I had really hoped to resolve this, get the Working Group to ratify it, and
then put it up on
posithub.org, but it looks like it may be a while longer since we've all got our day jobs!
Yes, the original posits4.pdf paper and the "Beating Floating Point at its Own Game" paper made the guess that the
es value should be 0, 1, 2, 3, for precisions nbits = 8, 16, 32, 64. Since then, well over 100 papers have been written about posit arithmetic, in a wide range of applications and with various strategies for supporting them in
hardware. We learned a lot, and everything seemed to point to making the exponent size always 2, or perhaps always 3. Check out this controlled study of 16-bit float and posit variants for Deep Learning that was recently done by Dr. Himeshi De Silva here
at NUS:
For inference with 8-bit posits, again es = 2 or possibly 3 works best; certainly not
es = 0 because the dynamic range just can't cut it. And reducing the 64-bit posit exponent size to 2 means the quire need only be 1024 bits instead of the even more onerous 2048 (though still far better than the 4664 bits needed for IEEE 754
doubles to accumulate dot products exactly).
So, since last May, the Working Group agreed that it's not too late to make the change and we rewrote everything to assume 2 exponent bits. Which means "es" disappears from the Standard. Notice that the quire is always 16 times as many bits as the posit,
instead of growing as the square of the posit size. That's a huge simplification when you want to reallocate memory after changing precision.
But the best part of all is that changing posit precision is now trivial because it requires
no decoding. To increase precision, pad with zeros on the right. To decrease precision, round the binary, no need to take it apart into sign-regime-exponent-fraction. This makes practical a programming style where many different posit precisions
can be mixed in a program with almost no performance penalty, any more than the performance penalty for converting a 16-bit integer to a 32-bit integer by padding with digits on the left. Zero time cost to raise precision, a single clock cycle cost for rounding
to a lower precision.
John
On Nov 23, 2020, at 9:49 PM, rwatsicaa rwatsicaa <
rwat...@ya.ru> wrote: