ULP Math

theo

unread,

Oct 20, 2022, 3:28:08 PM10/20/22

to Unum Computing

With the University of New England, we are working on mixed-precision algorithm optimization using posits and we needed to build intuition around ULPs at different scales for both ieee-754 and posits.

We have added ULP math in both the standard posits and the generalized posits. Here is what that looks like for 32-bit floats and posits. We love for people to work with the ULP math and provide us feedback for your particular use case.

posit ULP tests: report test cases
sw::universal::posit< 8, 2> at 1 : 0b0.10.00.000 : ULP : 0b0.01.01.000 : 0.125
sw::universal::posit< 16, 2> at 1 : 0b0.10.00.00000000000 : ULP : 0b0.0001.01.000000000 : 0.000488281
sw::universal::posit< 32, 2> at 1 : 0b0.10.00.000000000000000000000000000 : ULP : 0b0.00000001.01.000000000000000000000 : 7.45058e-09
sw::universal::posit< 64, 2> at 1 : 0b0.10.00.00000000000000000000000000000000000000000000000000000000000 : ULP : 0b0.0000000000000001.01.000000000000000000000000000000000000000000000 : 1.73472e-18
sw::universal::posit<128, 2> at 1 : 0b0.10.00.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : ULP : 0b0.00000000000000000000000000000001.01.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : 9.40395e-38
sw::universal::posit<256, 2> at 1 : 0b0.10.00.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : ULP : 0b0.0000000000000000000000000000000000000000000000000000000000000001.01.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : 2.76357e-76

32-bit standard posit ULPs as baseline
posit epsilon : 0b0.00000001.01.000000000000000000000 : 7.45058e-09
sw::universal::posit< 32, 2> at 1 : 0b0.10.00.000000000000000000000000000 : ULP : 0b0.00000001.01.000000000000000000000 : 7.45058e-09
sw::universal::posit< 32, 2> at 1000 : 0b0.1110.01.1111010000000000000000000 : ULP : 0b0.00001.00.000000000000000000000000 : 1.52588e-05
sw::universal::posit< 32, 2> at 1e+06 : 0b0.111110.11.11101000010010000000000 : ULP : 0b0.01.00.000000000000000000000000000 : 0.0625
sw::universal::posit< 32, 2> at 1e+09 : 0b0.111111110.01.11011100110101100101 : ULP : 0b0.1110.01.0000000000000000000000000 : 512
sw::universal::posit< 32, 2> at 1e+12 : 0b0.11111111110.11.110100011010100101 : ULP : 0b0.1111110.01.0000000000000000000000 : 2.09715e+06
sw::universal::posit< 32, 2> at 1.00001e+15 : 0b0.11111111111110.01.110001101100000 : ULP : 0b0.1111111110.10.0000000000000000000 : 1.71799e+10
sw::universal::posit< 32, 2> at 1.00001e+18 : 0b0.1111111111111110.11.1011110000011 : ULP : 0b0.1111111111110.10.0000000000000000 : 7.03687e+13
sw::universal::posit< 32, 2> at 1.00016e+21 : 0b0.1111111111111111110.01.1011000111 : ULP : 0b0.1111111111111110.11.0000000000000 : 5.76461e+17
sw::universal::posit< 32, 2> at 1.00114e+24 : 0b0.111111111111111111110.11.10101000 : ULP : 0b0.1111111111111111110.11.0000000000 : 2.36118e+21
sw::universal::posit< 32, 2> at 1.00583e+27 : 0b0.111111111111111111111110.01.10100 : ULP : 0b0.11111111111111111111110.00.000000 : 1.93428e+25
sw::universal::posit< 32, 2> at 1.02997e+30 : 0b0.11111111111111111111111110.11.101 : ULP : 0b0.11111111111111111111111110.00.000 : 7.92282e+28

Native IEEE-754 single precision float ULPs to reference
float epsilon : 0b0.01101000.00000000000000000000000 : 1.19209e-07
float at 1 : 0b0.01111111.00000000000000000000000 : ULP : 0b0.01101000.00000000000000000000000 : 1.19209e-07
float at 1000 : 0b0.10001000.11110100000000000000000 : ULP : 0b0.01110001.00000000000000000000000 : 6.10352e-05
float at 1e+06 : 0b0.10010010.11101000010010000000000 : ULP : 0b0.01111011.00000000000000000000000 : 0.0625
float at 1e+09 : 0b0.10011100.11011100110101100101000 : ULP : 0b0.10000101.00000000000000000000000 : 64
float at 1e+12 : 0b0.10100110.11010001101010010100101 : ULP : 0b0.10001111.00000000000000000000000 : 65536
float at 1e+15 : 0b0.10110000.11000110101111110101001 : ULP : 0b0.10011001.00000000000000000000000 : 6.71089e+07
float at 1e+18 : 0b0.10111010.10111100000101101101011 : ULP : 0b0.10100011.00000000000000000000000 : 6.87195e+10
float at 1e+21 : 0b0.11000100.10110001101011100100110 : ULP : 0b0.10101101.00000000000000000000000 : 7.03687e+13
float at 1e+24 : 0b0.11001110.10100111100001000011011 : ULP : 0b0.10110111.00000000000000000000000 : 7.20576e+16
float at 1e+27 : 0b0.11011000.10011101100101110001110 : ULP : 0b0.11000001.00000000000000000000000 : 7.3787e+19
float at 1e+30 : 0b0.11100010.10010011111001011001001 : ULP : 0b0.11001011.00000000000000000000000 : 7.55579e+22
posit ULP tests: PASS

Theodore Omtzigt

unread,

Oct 20, 2022, 4:11:24 PM10/20/22

to Unum Computing

We have implemented this across all number systems in Universal, so here is an example for the classic floats that support IEEE-754 and any of the other DL floating points number systems that have been proposed. The usefulness of the proposed IEEE FP8 is questionable even for DL, but love to hear what others are finding.

classic floating-point ULP tests: report test cases
cfloat< 8, 2, unsigned char, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01.00000 : ULP : 0b0.00.00001 : 0.03125
cfloat< 16, 5, unsigned short, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01111.0000000000 : ULP : 0b0.00101.0000000000 : 0.000976562
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01111111.00000000000000000000000 : ULP : 0b0.01101000.00000000000000000000000 : 1.19209e-07
cfloat< 64, 11, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01111111111.0000000000000000000000000000000000000000000000000000 : ULP : 0b0.01111001011.0000000000000000000000000000000000000000000000000000 : 2.22045e-16
cfloat<128, 15, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.011111111111111.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : ULP : 0b0.011111110001111.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : 1.92593e-34
cfloat<256, 19, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.0111111111111111111.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : ULP : 0b0.0111111111100010011.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 : 9.05568e-72

FP8 classic floating-point ULPs
FP8 epsilon : 0b0.00.00001 : 0.03125
cfloat< 8, 2, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 0.09375 : 0b0.00.00011 : ULP : 0b0.00.00001 : 0.03125
cfloat< 8, 2, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 0.1875 : 0b0.00.00110 : ULP : 0b0.00.00001 : 0.03125
cfloat< 8, 2, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 0.40625 : 0b0.00.01101 : ULP : 0b0.00.00001 : 0.03125
cfloat< 8, 2, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 0.8125 : 0b0.00.11010 : ULP : 0b0.00.00001 : 0.03125
cfloat< 8, 2, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1.59375 : 0b0.01.10011 : ULP : 0b0.00.00001 : 0.03125
cfloat< 8, 2, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 3.1875 : 0b0.10.10011 : ULP : 0b0.00.00010 : 0.0625

half-precision FP16 classic floating-point ULPs
FP16 epsilon : 0b0.00101.0000000000 : 0.000976562
cfloat< 16, 5, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01111.0000000000 : ULP : 0b0.00101.0000000000 : 0.000976562
cfloat< 16, 5, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 10 : 0b0.10010.0100000000 : ULP : 0b0.01000.0000000000 : 0.0078125
cfloat< 16, 5, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 100 : 0b0.10101.1001000000 : ULP : 0b0.01011.0000000000 : 0.0625
cfloat< 16, 5, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1000 : 0b0.11000.1111010000 : ULP : 0b0.01110.0000000000 : 0.5

BFLOAT16: Brain floating-point ULPs
bfloat16 epsilon : 0b0.01111000.0000000 : 0.0078125
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01111111.0000000 : ULP : 0b0.01111000.0000000 : 0.0078125
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 10 : 0b0.10000010.0100000 : ULP : 0b0.01111011.0000000 : 0.0625
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 100 : 0b0.10000101.1001000 : ULP : 0b0.01111110.0000000 : 0.5
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1000 : 0b0.10001000.1111010 : ULP : 0b0.10000001.0000000 : 4
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 9984 : 0b0.10001100.0011100 : ULP : 0b0.10000101.0000000 : 64
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 99840 : 0b0.10001111.1000011 : ULP : 0b0.10001000.0000000 : 512
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 999424 : 0b0.10010010.1110100 : ULP : 0b0.10001011.0000000 : 4096
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1.0027e+07 : 0b0.10010110.0011001 : ULP : 0b0.10001111.0000000 : 65536
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1.00139e+08 : 0b0.10011001.0111111 : ULP : 0b0.10010010.0000000 : 524288
cfloat< 16, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 9.98244e+08 : 0b0.10011100.1101110 : ULP : 0b0.10010101.0000000 : 4.1943e+06

32-bit classic floating-point ULPs as baseline
cfloat epsilon : 0b0.01101000.00000000000000000000000 : 1.19209e-07
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1 : 0b0.01111111.00000000000000000000000 : ULP : 0b0.01101000.00000000000000000000000 : 1.19209e-07
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1000 : 0b0.10001000.11110100000000000000000 : ULP : 0b0.01110001.00000000000000000000000 : 6.10352e-05
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+06 : 0b0.10010010.11101000010010000000000 : ULP : 0b0.01111011.00000000000000000000000 : 0.0625
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+09 : 0b0.10011100.11011100110101100101000 : ULP : 0b0.10000101.00000000000000000000000 : 64
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+12 : 0b0.10100110.11010001101010010100101 : ULP : 0b0.10001111.00000000000000000000000 : 65536
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+15 : 0b0.10110000.11000110101111110101001 : ULP : 0b0.10011001.00000000000000000000000 : 6.71089e+07
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+18 : 0b0.10111010.10111100000101101101011 : ULP : 0b0.10100011.00000000000000000000000 : 6.87195e+10
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+21 : 0b0.11000100.10110001101011100100110 : ULP : 0b0.10101101.00000000000000000000000 : 7.03687e+13
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+24 : 0b0.11001110.10100111100001000011011 : ULP : 0b0.10110111.00000000000000000000000 : 7.20576e+16
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+27 : 0b0.11011000.10011101100101110001110 : ULP : 0b0.11000001.00000000000000000000000 : 7.3787e+19
cfloat< 32, 8, unsigned int, hasSubnormals, noSupernormals, notSaturating> at 1e+30 : 0b0.11100010.10010011111001011001001 : ULP : 0b0.11001011.00000000000000000000000 : 7.55579e+22

Native IEEE-754 single precision float ULPs to reference
float epsilon : 0b0.01101000.00000000000000000000000 : 1.19209e-07
float at 1 : 0b0.01111111.00000000000000000000000 : ULP : 0b0.01101000.00000000000000000000000 : 1.19209e-07
float at 1000 : 0b0.10001000.11110100000000000000000 : ULP : 0b0.01110001.00000000000000000000000 : 6.10352e-05
float at 1e+06 : 0b0.10010010.11101000010010000000000 : ULP : 0b0.01111011.00000000000000000000000 : 0.0625
float at 1e+09 : 0b0.10011100.11011100110101100101000 : ULP : 0b0.10000101.00000000000000000000000 : 64
float at 1e+12 : 0b0.10100110.11010001101010010100101 : ULP : 0b0.10001111.00000000000000000000000 : 65536
float at 1e+15 : 0b0.10110000.11000110101111110101001 : ULP : 0b0.10011001.00000000000000000000000 : 6.71089e+07
float at 1e+18 : 0b0.10111010.10111100000101101101011 : ULP : 0b0.10100011.00000000000000000000000 : 6.87195e+10
float at 1e+21 : 0b0.11000100.10110001101011100100110 : ULP : 0b0.10101101.00000000000000000000000 : 7.03687e+13
float at 1e+24 : 0b0.11001110.10100111100001000011011 : ULP : 0b0.10110111.00000000000000000000000 : 7.20576e+16
float at 1e+27 : 0b0.11011000.10011101100101110001110 : ULP : 0b0.11000001.00000000000000000000000 : 7.3787e+19
float at 1e+30 : 0b0.11100010.10010011111001011001001 : ULP : 0b0.11001011.00000000000000000000000 : 7.55579e+22

classic floating-point ULP tests: PASS

John Gustafson

unread,

Oct 20, 2022, 4:13:20 PM10/20/22

to Theodore Omtzigt, Unum Computing

The Standard for Posit™ Arithmetic (2022) specifies two math library functions, next(posit) and prior(posit) that return the posit value of the lexicographic successor and predecessor of posit's representation. With subtraction, those compute the ULP value above and below a particular posit. If posit is a (signed) integer power of 2, then the ULP above and ULP below can differ in size, so be careful.

There is certainly a need for the work you are doing. Some of the better float software out there analyzes itself for relative error, both to make the code more portable and provide a basis for convergence testing and the like. I hope we eventually accumulate theorems about posit accuracy like the ones you see about floats in textbooks.

John

--
You received this message because you are subscribed to the Google Groups "Unum Computing" group.
To unsubscribe from this group and stop receiving emails from it, send an email to unum-computin...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/unum-computing/7946bd3f-2630-4851-aa04-cd71ec342897n%40googlegroups.com.

Reply all

Reply to author

Forward