They need posit tm version...Fwd: [clash-language] Re: Synthesis with floating point arithmetic fails to find blackbox

68 views
Skip to first unread message

Art Scott

unread,
Feb 21, 2021, 10:18:54 AM2/21/21
to unum-co...@googlegroups.com

---------- Forwarded message ---------
From: peter.t...@gmail.com <peter.t...@gmail.com>
Date: Sun, Feb 21, 2021, 6:58 AM
Subject: Re: [clash-language] Re: Synthesis with floating point arithmetic fails to find blackbox
To: Clash - Hardware Description Language <clash-l...@googlegroups.com>


I checked the rounding mode (truncate towards zero) I used with the IEEE standard, and it's fine by them (I hate it). As one might guess, the standard was set up to be inclusive and it okays what any vendor did at the time, in order to offend nobody. The object is formalization, not policing. Since truncate is simplest, it was done, and is allowed. The standard speaks of "rounding modes", so I presume the big fancy FPUs can be switched from one rounding mode to another.  In practice I suppose they stay in what I call rounding (the standard calls it rounding to nearest expressible value, with tie-breaking (upward? Away from zero?)). If you want to implement rounding like that, you have to add a half of whatever may be lost before you throw away those bits of precision. So if one throws away 9 bits of precision during the calculation, by shifting 9 right, then one must first add a "bit 8". A "bit 8" on its own would be lost by a shift right 9, but there may already be a bit 8 set in what it is added to, which would make a bit 9, and be preserved by the shift. So if you want to change the multiplication I gave, for example, from truncation to rounding, you would change the calculation that drops 23 lower bits of a 48 bit intermediate result (and also drops 2 excess highest bits):

m = truncateB ((m1' * m2') .>>. 23) :: Word23

to

m = truncateB ((m1' * m2' + bit 22) .>>. 23) :: Word23

and that would make me, if nobody else, happier.

With respect to errors, the general principle is to not error. Instead one should produce a "NaN", which is (_,-1,*) as an IEEE triple (Bool,Int8,Word23) (i.e., sign,exponent,mantissa). The -1 is the exponent part. The * for the matissa part should be non-zero. That's because (_,-1,0) is infinity. NaNs should ideally be preserved through calculations, so most of the calculations I gave should have an extra first line or two to pick out the situation when one of the arguments to the floating point operation is a NaN, and make the result a NaN. I would suggest for multiplication, for example, a guard line like

myMulFloat (s1,-1,m1) (s2,e2,m2) = (s1/=s2,-1,m1)            -- NaN * _ = NaN

Sticklers may wish to pick out the infinity case just before that, so infinity times zero can be detected, and NaN produced. Otherwise one will get some ordinary value, and I have no clue what it will be. Maybe zero. Surely zero. I've kept the sign calculation as-is above, because it's info.

To build a complete floating point operation  one does (for example)

mul_f :: Float -> Float -> Float
mul_f x y = myencodeFloat(muMulFloat (mydecodeFloat x) (mydecodeFloat y))

The simplest encoding/decoding function expressions are probably:

myencodeFloat :: (Bool,Int8,Word23) -> Float
myencodeFloat (zSign,zExp,zFrac) = unpack (pack zSign ++# pack zExp ++# pack zFrac)

                   
mydecodeFloat :: Float -> (Bool,Int8,Word23) -- sign, exponent, mantissa (significand without leading bit)
mydecodeFloat x = (zSign,zExp,zFrac)
                where x' = pack x :: BitVector 32
                      zSign = testBit x' 31                  :: Bool
                      zExp  = unpack( truncateB(x' .>>. 23)) :: Int8
                      zFrac = unpack( truncateB x')          :: Word23

         
You should be able to do much better with more access to and knowledge of Clash internals. This assumes the haskell internal representation is the standard IEEE binary interchange format, and I believe it is. But on an other-endian machine that story needs checking.

I would really appreciate it if somebody would produce a new backend primitive operation from this. I have not succeeded in making head nor tail of the tutorial exposition, I am afraid - it fails to tell me what you want,  so I don't know what to supply, let alone how. I can dump a system verilog "module" for the mul_f above, for example. Here it is! Please modify as required, and maybe I can spot what is needed from that.

/* AUTOMATICALLY GENERATED SYSTEMVERILOG-2005 SOURCE CODE.
** GENERATED BY CLASH 1.2.4. DO NOT MODIFY.
*/
`timescale 100fs/100fs
module FPUa_mul_f
    ( // Inputs
      input logic [31:0] x
    , input logic [31:0] y
      // Outputs
    , output logic [31:0] result
    );
  // KPU/Float.hs:120:1-13
  logic zSign;
  // KPU/Float.hs:120:1-13
  logic signed [7:0] zExp;
  // KPU/Float.hs:120:1-13
  logic [22:0] zFrac;
  logic [0:0] c$app_arg;
  FPUa_types::Tup3 c$case_alt_0;
  FPUa_types::Tup3 c$case_alt_1;
  FPUa_types::Tup3 result_0;
  FPUa_types::Tup3 c$case_alt_2;
  logic c$app_arg_0;
  logic c$case_alt_3;
  logic [47:0] c$app_arg_1;
  logic [47:0] c$app_arg_2;
  logic c$app_arg_3;
  logic c$case_alt_4;
  FPUa_types::Tup3 result_1;
  FPUa_types::Tup3 result_2;
  FPUa_types::Tup3 c$case_alt_5;
  logic c$app_arg_4;
  logic c$case_alt_6;
  // KPU/Float.hs:136:1-10
  logic [22:0] m2;
  // KPU/Float.hs:136:1-10
  logic s2;
  // KPU/Float.hs:136:1-10
  logic [22:0] m1;
  // KPU/Float.hs:136:1-10
  logic s1;
  // KPU/Float.hs:126:1-13
  logic [31:0] \x' ;
  FPUa_types::Tup3 result_3;
  // KPU/Float.hs:126:1-13
  logic [31:0] \c$x'_0 ;
  FPUa_types::Tup3 result_4;
  logic signed [7:0] result_0_selection;
  logic [47:0] c$bv;
  logic signed [7:0] result_2_selection;
  logic [31:0] c$bv_0;
  logic [31:0] c$bv_1;
  assign zSign = c$case_alt_0.Tup3_sel0;
  assign zExp = $signed(c$case_alt_0.Tup3_sel1);
  assign zFrac = c$case_alt_0.Tup3_sel2;
  assign c$app_arg = zSign ? 1'b1 : 1'b0;
  assign c$case_alt_0 = ($signed(result_4.Tup3_sel1) == -8'sd1) ? {c$app_arg_0
                                                                  ,-8'sd1
                                                                  ,m1} : c$case_alt_1;
  assign c$case_alt_1 = ($signed(result_3.Tup3_sel1) == -8'sd1) ? {c$app_arg_0
                                                                  ,-8'sd1
                                                                  ,m2} : result_0;
  assign result_0_selection = $signed(result_4.Tup3_sel1);
  always_comb begin
    case(result_0_selection)
      8'sd0 : result_0 = c$case_alt_2;
      default : result_0 = result_2;
    endcase
  end
  always_comb begin
    case(m1)
      23'd0 : c$case_alt_2 = {c$app_arg_0
                             ,8'sd0
                             ,23'd0};
      default : c$case_alt_2 = result_2;
    endcase
  end
  assign c$app_arg_0 = s1 ? c$case_alt_3 : s2;
  assign c$case_alt_3 = s2 ? 1'b0 : 1'b1;
  // replaceBit start
  always_comb begin
    c$app_arg_1 = (({{(48-23) {1'b0}},m2}));
    c$app_arg_1[(64'sd23)] = (1'b1);
  end
  // replaceBit end
  // replaceBit start
  always_comb begin
    c$app_arg_2 = (({{(48-23) {1'b0}},m1}));
    c$app_arg_2[(64'sd23)] = (1'b1);
  end
  // replaceBit end
  assign c$app_arg_3 = s1 ? c$case_alt_4 : s2;
  assign c$case_alt_4 = s2 ? 1'b0 : 1'b1;
  assign c$bv = ((((c$app_arg_2) * (c$app_arg_1)) + 48'd4194304) >> (64'sd23));
  assign result_1 = {c$app_arg_3
                    ,(($signed(result_4.Tup3_sel1) - 8'sd127) + ($signed(result_3.Tup3_sel1) - 8'sd127)) + 8'sd127
                    ,c$bv[0+:23]};
  assign result_2_selection = $signed(result_3.Tup3_sel1);
  always_comb begin
    case(result_2_selection)
      8'sd0 : result_2 = c$case_alt_5;
      default : result_2 = result_1;
    endcase
  end
  always_comb begin
    case(m2)
      23'd0 : c$case_alt_5 = {c$app_arg_4
                             ,8'sd0
                             ,23'd0};
      default : c$case_alt_5 = result_1;
    endcase
  end
  assign c$app_arg_4 = s1 ? c$case_alt_6 : s2;
  assign c$case_alt_6 = s2 ? 1'b0 : 1'b1;
  assign m2 = result_3.Tup3_sel2;
  assign s2 = result_3.Tup3_sel0;
  assign m1 = result_4.Tup3_sel2;
  assign s1 = result_4.Tup3_sel0;
  assign \x'  = __VOID__;
  assign c$bv_0 = (\x'  >> (64'sd23));
  assign result_3 = {(\x' [(64'sd31)]) == (1'b1)
                    ,$signed((c$bv_0[0+:8]))
                    ,(\x' [0+:23])};
  assign \c$x'_0  = __VOID__;
  assign c$bv_1 = (\c$x'_0  >> (64'sd23));
  assign result_4 = {(\c$x'_0 [(64'sd31)]) == (1'b1)
                    ,$signed((c$bv_1[0+:8]))
                    ,(\c$x'_0 [0+:23])};
  assign result = (c$case_alt);
endmodule






On Friday, February 19, 2021 at 5:02:20 PM UTC peter.t...@gmail.com wrote:
I'll carry on to do (single precision) floating point addition and subtraction while I still have strength ... I've been leaving them because I know there'll be lots of cases and that's a headache and I'm sure to make a mistake ...

First let me correct my implementation of division. I can't count. I meant to shift the dividend left by 24 bits, not 23 bits. That way it clears the divisor, itself extended to 24 bits from the 23 bits in the IEEE representation by putting back the missing leading 1. That's how one gets maximal possible accuracy in a/b via integer division. Write it as a0000000/b and scale the extra 0s off again afterwards. I'll also put in an extra trap for 0 as dividend, else one gets a VSN ("very small number" - trademark) as result instead of 0. Shrug.

Recall that the IEEE arithmetic representation is a triple (Bool,Int8,Word23) representing sign, exponent, mantissa respectively, and the value it represents is (informally) sign * 1.mantissa * 2^exponent . The binary "interchange" format for that is a straight map bitwise of that triple, elements taken in order left to right, numbers bigendian (I think! I haven't done anything that would check the internal arrangment) and 2s complement. A nuance is that 127 (maxPos in Int8) as literal exponent really means 0, so I will write exponent-127 whenever I want the value. I'm not confused.

myDivFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) -> (Bool,Int8,Word23)
myDivFloat (s1,0,0) (s2,e2,m2) = (s1/=s2,0,0)              -- (new trap for 0/..)

myDivFloat (s1,e1,m1) (s2,e2,m2) = (s,e,m)
    where s = s1 /= s2                           :: Bool
          e = (e1 - 127) - (e2 - 127) + 128      :: Int8   -- (was +127)

          m1' = setBit (zeroExtend m1) 23        :: Word48 --
caculate in 48b

          m2' = setBit (zeroExtend m2) 23        :: Word48
          m = truncateB ((m1' .<<. 24) `div` m2'):: Word23 -- (was .<<. 23)

I should trap for 0 as divisor too, but I don't recall the NaN format (one of them!) so I won't. Please somebody look it up and trap for ../0.

Hmm. I also forgot the trap for 0 as multiplicand in the multiplication operator. Please rescue as follows:

myMulFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) -> (Bool,Int8,Word23)
myMulFloat (s1,0,0) (s2,e2,m2) = (s1/=s2,0,0)
myMulFloat (s1,e1,m1) (s2,0,0) = (s1/=s2,0,0)
myMulFloat (s1,e1,m1) (s2,e2,m2) = ...


The s1/=s2 business is so one can have the thrill of seeing an occasional "-0.0" pop out as result. Who am I to say no.

So, apologies, mea culpa, and errati apart, take a deep breath and on to addition/subtraction. I'll get rid of having to consider the sign bit by reducing everthing to sum of two positive numbers, or difference of two positive numbers, before even thinking:

myAddFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) -> (Bool,Int8,Word23)
myAddFloat (s1,e1,m1) (s2,e2,m2) =
   case (s1,s2) of
     (False,False) -> myAddFloatPos (e1,m1) (e2,m2)               --
pos + pos
     (False,True)  -> mySubFloatPos (e1,m1) (e2,m2)               --
pos + neg
     (True,False)  -> let (s,e,m) = mySubFloatPos (e1,m1) (e2,m2) --
neg + pos
                      in  (not s,e,m)
     (True,True)   -> let (s,e,m) = myAddFloatPos (e1,m1) (e2,m2) --
neg + neg
                      in  (not s,e,m)

mySubFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) -> (Bool,Int8,Word23)
mySubFloat (s1,e1,m1) (s2,e2,m2) = myAddFloat (s1,e1,m1) (not s2,e2,m2)

I set subtraction to be just addition of a negative, since all I have to do to negate a number is change its sign bit in the IEEE triple ... and hope. (You can just see that I ought to trap for zero here too, but let's go on without ...).

All I have to do to add up two positive numbers is put back the leading 1 on the mantissas and add as integers, then take the leading 1 back off to get the mantissa of the result.

I'll have to watch for the leading 1 moving left one place (when one does large + large), in order to hit it when aiming to knock it off, which is the "testBit m 24" below. The addends have leading 1 in position 23, so adding the leading 1 either stays in position 23 or moves one left to position 24. In the latter case I have to drop the 0th bit of the result, losing 1 bit of precision,
but keeping to the allotted 23 bits for the mantissa. That means a shift right by 1 (" .>>. 1"). The leading 1 gets dropped in any case by the obligatory truncation to 23 bits for the mantissa.

myAddFloatPos :: (Int8,Word23) -> (Int8,Word23) -> (Bool,Int8,Word23)
myAddFloatPos (e1,m1) (e2,m2) =
                    if testBit m 24
                    then (False,e+1,truncateB (m .>>. 1) :: Word23) --
grew!
                    else (False,e,  truncateB m          :: Word23)
  where
    m1' = setBit (zeroExtend m1) 23     :: Word25 --
caculate in 25b
    m2' = setBit (zeroExtend m2) 23     :: Word25
    (e,m) = case undefined of
               _ | e1-e2 > 0 -> (e1, m1' + (m2' .>>. fromEnum(e1-e2)))
               _ | e1-e2 < 0 -> (e2, (m1' .>>. fromEnum(e2-e1)) + m2')
               _ | e1 == e2  -> (e1, m1' + m2')


I had to shift the smaller addend to the right to match up its digits in the correct places with the digits of the larger addend. That loses some precision. The more caring should round rather than truncate when doing the shift right.

I yet have to handle subtraction of two positive numbers (below). This splits into several cases according to whether to expect a positive (nonegative) or negative result. Those cases are large - small and small - large, plus a couple in which it is harder to tell.  One can primarily tell by which has the larger exponent. If the exponents are equal, one can tell by which has the larger mantissa. If the mantissas are equal too, then the answer for the subtraction is 0. I will use "ff1" to return the position of a leading 1:

mySubFloatPos :: (Int8,Word23) -> (Int8,Word23) -> (Bool,Int8,Word23)
mySubFloatPos (e1,m1) (e2,m2) = (s,e',m')
      where
          m1' = setBit (zeroExtend m1) 23     :: Word25                          --
calculate in 25b
          m2' = setBit (zeroExtend m2) 23     :: Word25
          --
shift left or right if leading 1 not in expected posn 23
          (e',m') = let n = if e1==e2 && m1==m2 then 23 else ff1 m
                    in
                    case n of
                      23 -> (e,             truncateB m               :: Word23)
                      _ | n > 23                                                 --
impossible
                         -> (e-toEnum(23-n),truncateB (m .>>. (n-23)) :: Word23)
                      _ | n < 23                                                 --
implausible
                         -> (e-toEnum(23-n),truncateB (m .<<. (23-n)) :: Word23)
          (s,e,m) = case undefined of
                      _ | e1-e2 > 0 -> (False,e1,m1'-(m2' .>>. fromEnum(e1-e2))) --
big minus small
                      _ | e1-e2 < 0 -> (True,e2, m2'-(m1' .>>. fromEnum(e2-e1))) --
small minus big
                      _ | m2 < m1   -> (False,e1,m1'-m2')                        --
big minus small
                      _ | m1 < m2   -> (True,e2, m2'-m1')                        --
small minus big
                      _ -> (False,0,0)                                           -- x - x = 0

In the x-x case, I have sneaked a (_,0,0) result by setting an imaginary "leading 1 position" on the intermediate result at 23, the expected default position, which will stop the code attempting to further mutate it. (That I have to explain it means it is wrong code, in software engineering terms -- just trap for a zero result at top level and skip the subterfuge).

All that needs much more testing than I have given it. The subtraction is the more delicate code. It invokes the addition. So here is a test:

> myencodeFloat(mySubFloat (mydecodeFloat 17.3) (mydecodeFloat 9.3))
8.0
> myencodeFloat(mySubFloat (mydecodeFloat 17.3) (mydecodeFloat 19.3))
-2.0


Yes, I have tried subtracting and adding negative numbers. Please try more and let me know what I got wrong.

I forgot to give a function that maps from float to integer. This one truncates towards zero (which I hate ... but may even be right in some universe, maybe the IEEE one too!). Here it is. Recall that cast = unpack . pack maps equal-sized objects onto each other bitwise. Again, I'll remove the sign bit first before even thinking about it, in order to simplify this problem. That already guarantees symmetrical behavior around zero, for better or worse:

myFloatToInt :: (Bool,Int8,Word23) -> Int
myFloatToInt (_,0,0)     = 0
myFloatToInt (True,e,m)  = - myFloatToIntPos (e,m)
myFloatToInt (False,e,m) =   myFloatToIntPos (e,m)

--
needs Word/Int to have more than 24 bits
myFloatToIntPos :: (Int8,Word23) -> Int
myFloatToIntPos (e,m) = if e-127 > 23
                        then m' .<<. fromEnum ((e-127)-23)
                        else m' .>>. fromEnum (23-(e-127))
     where m' = cast(setBit (zeroExtend m) 23) :: Int


(A late reminder that .<<. is shiftL and .>>. is shiftR). Test:

> myFloatToInt (mydecodeFloat 98.7)
98
> myFloatToInt (mydecodeFloat (-101.1))
-101


That's what I mean by symmetrical  truncation towards zero. One might have expected -101.1 to truncate to -102 as an integer, instead. I certainly do. It's definitely the case that rounding is preferable to truncation here, in terms of creating least surprise. I'll do that if requested.

I can't think of anything much else one might need as floating point ops. Modulus and remainder, perhaps. Please do translate to primitive operation "templates" in the backend languages ... and add traps for zero etc to taste. It's not going to be terribly wrong whatever you do, because IEEE as a standard was originally intended to be inclusive. At worst some people might tut-tut until a correction.

That I've done only single precision floating point operations is not significant, other than that I intended to leave no space for the argument that  floating point logic cannot fit in one cycle.

For double precision, change some of the bitsize numbers.

I don't think we're far off being able to handle even 64-bit floating point logic in one cycle. Addition and subtraction should fit.

Regards

Peter


On Friday, February 19, 2021 at 4:36:32 AM UTC peter.t...@gmail.com wrote:
Let me streamline the "test" encoding of an IEEE triple as a haskell float. I'll also do division and one or two more. Assuming haskell uses the standard ieee binary layout internally (apparently yes, since this works), just map bit to bit from the triple bits to the haskell float bits:

myencodeFloatPos :: (Int8,Word23) -> Float
myencodeFloatPos (zExp,zFrac) = result
     where result = cast((zeroExtend(cast zExp) .<<. 23) .|. zeroExtend zFrac :: Word32)

By ".<<." I mean shiftL.

That will synthesize. Haskell float to/from whatever transforms probably won't, since they will have been written as recursive functions.

Sorry about the unnecessary complication originally in the encode function.

Further, one can avoid the  haskell negation of floats that I used in the top level encoding function by setting the sign bit in the underlying IEEE layout:

myencodeFloat :: (Bool,Int8,Word23) -> Float
myencodeFloat (True,zExp,zFrac)  =
               unpack(setBit (pack(myencodeFloatPos(zExp,zFrac))) 31)        -- was "- stuff"
myencodeFloat (False,zExp,zFrac) = myencodeFloatPos(zExp,zFrac)


or words to that effect.

I should emphasize that in principle the internal layout in terms of bits of an IEEE float is not really specified. It is just that _for interchange_ the formal is specified. That's called the interchange or binary format. One can't rely on it being adhered to inside one single platform/program. But, yes, in practice, to avoid surprises, the interchange format is the one generally used internally. Maybe one can come up wih examples in which it isn't. It looks like haskell uses it internally, at least in broad outline. That would be expected if they used libraries, or if they just read the docs and did what they say. I didn't bother investigating ... I don't really care outside of for testing.  There may still be differences ... I would be suspicious about the nonstandard form ("specialized"? Something like that as a name) that IEEE permits in order to get better precision for things out of range of the usual exponent limits. Probably it's not catered for. One doesn't have to care. And there are lots of kinds of NaN ... and please don't mention endianness. I seem to recall the interchange format says bigendian, but don't quote me. Maybe it doesn't.

Here is division, modeled on the multiplication I did already in the post above:

myDivFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) -> (Bool,Int8,Word23)
myDivFloat (s1,e1,m1) (s2,e2,m2) = (s,e,m)

    where s = s1 /= s2                             :: Bool
          e = (e1 - 127) - (e2 - 127) + 127        :: Int8

          m1' = setBit (zeroExtend m1) 23          :: Word48 -- caculate in 48b
          m2' = setBit (zeroExtend m2) 23          :: Word48
          m = truncateB ((m1' .<<. 23) `div` m2')  :: Word23
Test:
> myencodeFloat(myDivFloat (mydecodeFloat 1.3) (mydecodeFloat 4.5))
0.28888887
> 1.3 / 4.5 :: Float
0.28888887

I'm truncating, not rounding. Boo hoo. IEEE probably allows either. Somebody else can read the standard and tell me. I'll rely on supposing the standard accommodates all existing implementations at the time it was drawn up, and at that time some vendors did it the least complex way and saved on adding a half-something somewhere in there. So sue me. I'll fix if reqd.

For help in converting to/fro, here's how to turn an Int into an IEEE float. Recall that cast = unpack . pack maps equal bitsize things onto each other, such as Signed onto Unsigned (Int onto Word) and back again.

-- works on up to 128 bit ints (could use Integer instead of Int)
myIntToFloat :: Int -> (Bool,Int8,Word23)
myIntToFloat x = (x < 0,e,m)
             where (e,m) = myIntToFloatPos (abs x)

--
only on positive ints, up to 127 bit posints at least
myIntToFloatPos :: Int -> (Int8,Word23)
myIntToFloatPos 0 = (0,0)
myIntToFloatPos x = if n < 23
                    then (127+toEnum n,truncateB(cast x .<<. (23-n)))
                    else (127+toEnum n,truncateB(cast x .>>. (n-23)))
                      where n = ff1 x --
index of highest set bit


I used this helper function to find the position of the most significant bit in an Int, or whatever:

-- highest set bit number
ff1 x =  case findIndex (/=0) (bv2v (pack x)) of
           Just i  -> finiteBitSize x - 1 - fromEnum i
           Nothing -> -1 -- for x == 0 !!

Test:

> myencodeFloat(myIntToFloat (-8000001))
-8000001.0


That looks OK.

I'll need to compare floats. Comparing with zero would do if I had done addition and subtraction already, but I haven't, so I'll do the float against float compare.  Only one comparison is needed, as one can swap sides and use not and && to get the others. This is basically alphabetic ordering on the IEEE triples, with sign thrown in. I'll delegate the comparison to positive against positive, after dealing with the signs:

myLtFloat (s1,e1,m1) (s2,e2,m2) =
    case undefined of
      _ | s1 && not s2     -> True                          --
neg < pos
      _ | not s1 && s2     -> False                         --
pos < neg
      _ | not s1 && not s2 -> myLtFloatPos (e1,m1) (e2,m2)  --
pos < pos
      _ | s1 && s2         -> myLtFloatPos (e2,m2) (e1,m1)  --
neg < neg

myLtFloatPos (e1,m1) (e2,m2) =
    case undefined of
      _ | e2 - e1 > 0      -> True
      _ | e2 - e1 < 0      -> False
      _ | m2 > m1          -> True
      _ | m2 <= m1         -> False

Test:

> myLtFloat (mydecodeFloat 1.3) (mydecodeFloat 4.5)
True


And a few more like that which I won't show.

I do insist that the (single precision) descriptions above do not invoke complex logic, and I can see no reason why they should not be done in one cycle.  The multiplication and division were 48 bit. You will say that double precision floats (64 bit) are more complex, and yes, I guess they would require about 90 bit logic (I haven't looked it up - maybe it's 80, maybe it's 100) and maybe at present need two or more cycles for multiplication and division, but that does not derive from some frightening (hypothetical) fact of them being fundamentally different and mysterious by virtue of their floating point-ness. The operations and logic are just the same as integer, in all truth, plus a trivial unpacking and packing into/from integers fore and aft, and have that complexity and no more.

One can do better than the way I have set them out above, but I didn't wish to invoke any exotic techniques.

Regards

Peter



--
You received this message because you are subscribed to the Google Groups "Clash - Hardware Description Language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clash-languag...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/clash-language/070d6426-522a-45e9-a041-f92560826b6dn%40googlegroups.com.

Jeffrey Denison

unread,
Feb 21, 2021, 1:35:29 PM2/21/21
to Art Scott, unum-co...@googlegroups.com
Hi, I'm a newby to this, working on learning Haskell for scientific
computation & see this code looks like Haskell. Is this a package?
thanks, Jeffrey
> *m = truncateB ((m1' * m2') .>>. 23) :: Word23*
>
> to
>
> *m = truncateB ((m1' * m2' + bit 22) .>>. 23) :: Word23*
>
> and that would make me, if nobody else, happier.
>
> With respect to errors, the general principle is to not error. Instead one
> should produce a "NaN", which is (_,-1,*) as an IEEE triple
> (Bool,Int8,Word23) (i.e., sign,exponent,mantissa). The -1 is the exponent
> part. The * for the matissa part should be non-zero. That's because
> (_,-1,0) is infinity. NaNs should ideally be preserved through
> calculations, so most of the calculations I gave should have an extra first
> line or two to pick out the situation when one of the arguments to the
> floating point operation is a NaN, and make the result a NaN. I would
> suggest for multiplication, for example, a guard line like
>
> *myMulFloat (s1,-1,m1) (s2,e2,m2) = (s1/=s2,-1,m1)* -- NaN * _ =
> NaN
>
> Sticklers may wish to pick out the infinity case just before that, so
> infinity times zero can be detected, and NaN produced. Otherwise one will
> get some ordinary value, and I have no clue what it will be. Maybe zero.
> Surely zero. I've kept the sign calculation as-is above, because it's info.
>
> To build a complete floating point operation one does (for example)
>
>
>
> *mul_f :: Float -> Float -> Floatmul_f x y = myencodeFloat(muMulFloat
> (mydecodeFloat x) (mydecodeFloat y))*
>
> The simplest encoding/decoding function expressions are probably:
>
>
> *myencodeFloat :: (Bool,Int8,Word23) -> FloatmyencodeFloat
> (zSign,zExp,zFrac) = unpack (pack zSign ++# pack zExp ++# pack zFrac)*
>
> *mydecodeFloat :: Float -> (Bool,Int8,Word23) *-- sign, exponent, mantissa
> (significand without leading bit)
>
>
>
>
> *mydecodeFloat x = (zSign,zExp,zFrac) where x' = pack x ::
> BitVector 32 zSign = testBit x' 31
> :: Bool zExp = unpack( truncateB(x' .>>. 23)) ::
> Int8 zFrac = unpack( truncateB x') :: Word23*
>
> You should be able to do much better with more access to and knowledge of
> Clash internals. This assumes the haskell internal representation is the
> standard IEEE binary interchange format, and I believe it is. But on an
> other-endian machine that story needs checking.
>
> I would really appreciate it if somebody would produce a new backend
> primitive operation from this. I have not succeeded in making head nor tail
> of the tutorial exposition, I am afraid - it fails to tell me what you
> want, so I don't know what to supply, let alone how. I can dump a system
> verilog "module" for the mul_f above, for example. Here it is! Please
> modify as required, and maybe I can spot what is needed from that.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> */* AUTOMATICALLY GENERATED SYSTEMVERILOG-2005 SOURCE CODE.** GENERATED BY
> CLASH 1.2.4. DO NOT MODIFY.*/`timescale 100fs/100fsmodule FPUa_mul_f (
> result = (c$case_alt);endmodule*
>> *myDivFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) ->
>> (Bool,Int8,Word23)*
>> *myDivFloat (s1,0,0) (s2,e2,m2) = (s1/=s2,0,0) -- **(new
>> trap for 0/..)*
>>
>>
>>
>> *myDivFloat (s1,e1,m1) (s2,e2,m2) = (s,e,m) where s = s1 /=
>> s2 :: Bool*
>> * e = (e1 - 127) - (e2 - 127) + 128 :: Int8 -- **(was
>> +127)*
>>
>> * m1' = setBit (zeroExtend m1) 23 :: Word48 -- **caculate
>> in 48b*
>>
>>
>> * m2' = setBit (zeroExtend m2) 23 :: Word48*
>> * m = truncateB ((m1' .<<. 24) `div` m2'):: Word23 -- **(was
>> .<<. 23)*
>>
>> I should trap for 0 as divisor too, but I don't recall the NaN format
>> (one
>> of them!) so I won't. Please somebody look it up and trap for ../0.
>>
>> Hmm. I also forgot the trap for 0 as multiplicand in the multiplication
>> operator. Please rescue as follows:
>>
>>
>>
>>
>> *myMulFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) ->
>> (Bool,Int8,Word23)myMulFloat (s1,0,0) (s2,e2,m2) = (s1/=s2,0,0)myMulFloat
>> (s1,e1,m1) (s2,0,0) = (s1/=s2,0,0)myMulFloat (s1,e1,m1) (s2,e2,m2) = ...*
>>
>> The s1/=s2 business is so one can have the thrill of seeing an occasional
>> "-0.0" pop out as result. Who am I to say no.
>>
>> So, apologies, mea culpa, and errati apart, take a deep breath and on to
>> addition/subtraction. I'll get rid of having to consider the sign bit by
>> reducing everthing to sum of two positive numbers, or difference of two
>> positive numbers, before even thinking:
>>
>>
>>
>>
>> *myAddFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) ->
>> (Bool,Int8,Word23)myAddFloat (s1,e1,m1) (s2,e2,m2) = case (s1,s2) of
>> (False,False) -> myAddFloatPos (e1,m1) (e2,m2) -- **pos +
>> pos*
>> * (False,True) -> mySubFloatPos (e1,m1) (e2,m2) --
>> **pos
>> + neg*
>> * (True,False) -> let (s,e,m) = mySubFloatPos (e1,m1) (e2,m2) --
>> **neg
>> + pos*
>>
>> * in (not s,e,m) (True,True) -> let (s,e,m) =
>> myAddFloatPos (e1,m1) (e2,m2) -- **neg + neg*
>>
>>
>>
>>
>> * in (not s,e,m)mySubFloat :: (Bool,Int8,Word23) ->
>> (Bool,Int8,Word23) -> (Bool,Int8,Word23)mySubFloat (s1,e1,m1) (s2,e2,m2)
>> =
>> myAddFloat (s1,e1,m1) (not s2,e2,m2)*
>>
>> I set subtraction to be just addition of a negative, since all I have to
>> do to negate a number is change its sign bit in the IEEE triple ... and
>> hope. (You can just see that I ought to trap for zero here too, but let's
>> go on without ...).
>>
>> All I have to do to add up two positive numbers is put back the leading 1
>> on the mantissas and add as integers, then take the leading 1 back off to
>> get the mantissa of the result.
>>
>> I'll have to watch for the leading 1 moving left one place (when one does
>> large + large), in order to hit it when aiming to knock it off, which is
>> the "testBit m 24" below. The addends have leading 1 in position 23, so
>> adding the leading 1 either stays in position 23 or moves one left to
>> position 24. In the latter case I have to drop the 0th bit of the result,
>> losing 1 bit of precision,
>> but keeping to the allotted 23 bits for the mantissa. That means a shift
>> right by 1 (" .>>. 1"). The leading 1 gets dropped in any case by the
>> obligatory truncation to 23 bits for the mantissa.
>>
>>
>>
>>
>> *myAddFloatPos :: (Int8,Word23) -> (Int8,Word23) ->
>> (Bool,Int8,Word23)myAddFloatPos (e1,m1) (e2,m2) = if
>> testBit m 24 then (False,e+1,truncateB (m .>>. 1) ::
>> Word23) -- **grew!*
>>
>>
>> * else (False,e, truncateB m :: Word23)
>> where m1' = setBit (zeroExtend m1) 23 :: Word25 -- **caculate in
>> 25b*
>>
>>
>>
>>
>> * m2' = setBit (zeroExtend m2) 23 :: Word25 (e,m) = case
>> undefined of _ | e1-e2 > 0 -> (e1, m1' + (m2' .>>.
>> fromEnum(e1-e2))) _ | e1-e2 < 0 -> (e2, (m1' .>>.
>> fromEnum(e2-e1)) + m2') _ | e1 == e2 -> (e1, m1' + m2')*
>>
>> I had to shift the smaller addend to the right to match up its digits in
>> the correct places with the digits of the larger addend. That loses some
>> precision. The more caring should round rather than truncate when doing
>> the
>> shift right.
>>
>> I yet have to handle subtraction of two positive numbers (below). This
>> splits into several cases according to whether to expect a positive
>> (nonegative) or negative result. Those cases are large - small and small
>> -
>> large, plus a couple in which it is harder to tell. One can primarily
>> tell
>> by which has the larger exponent. If the exponents are equal, one can
>> tell
>> by which has the larger mantissa. If the mantissas are equal too, then
>> the
>> answer for the subtraction is 0. I will use "ff1" to return the position
>> of
>> a leading 1:
>>
>>
>>
>>
>> *mySubFloatPos :: (Int8,Word23) -> (Int8,Word23) ->
>> (Bool,Int8,Word23)mySubFloatPos (e1,m1) (e2,m2) = (s,e',m')
>> where m1' = setBit (zeroExtend m1) 23 ::
>> Word25 -- **calculate in 25b*
>>
>> * m2' = setBit (zeroExtend m2) 23 :: Word25 --
>> **shift
>> left or right if leading 1 not in expected posn 23*
>>
>>
>>
>>
>> * (e',m') = let n = if e1==e2 && m1==m2 then 23 else ff1
>> m in case n of
>> 23 -> (e, truncateB m ::
>> Word23) _ | n >
>> 23 -- **impossible*
>>
>> * -> (e-toEnum(23-n),truncateB (m .>>. (n-23)) ::
>> Word23) _ | n < 23
>> -- **implausible*
>>
>>
>> * -> (e-toEnum(23-n),truncateB (m .<<. (23-n)) ::
>> Word23) (s,e,m) = case undefined of _ |
>> e1-e2
>> > 0 -> (False,e1,m1'-(m2' .>>. fromEnum(e1-e2))) -- **big minus small*
>> * _ | e1-e2 < 0 -> (True,e2, m2'-(m1' .>>.
>> fromEnum(e2-e1))) -- **small minus big*
>> * _ | m2 < m1 ->
>> (False,e1,m1'-m2') -- **big minus small*
>> * _ | m1 < m2 -> (True,e2,
>> m2'-m1') -- **small minus big*
>> * _ ->
>> (False,0,0) **--* *x - x = 0*
>>
>> In the x-x case, I have sneaked a (_,0,0) result by setting an imaginary
>> "leading 1 position" on the intermediate result at 23, the expected
>> default
>> position, which will stop the code attempting to further mutate it. (That
>> I
>> have to explain it means it is wrong code, in software engineering terms
>> --
>> just trap for a zero result at top level and skip the subterfuge).
>>
>> All that needs much more testing than I have given it. The subtraction is
>> the more delicate code. It invokes the addition. So here is a test:
>>
>>
>>
>>
>> *> myencodeFloat(mySubFloat (mydecodeFloat 17.3) (mydecodeFloat 9.3))
>> 8.0>
>> myencodeFloat(mySubFloat (mydecodeFloat 17.3) (mydecodeFloat 19.3))-2.0*
>>
>> Yes, I have tried subtracting and adding negative numbers. Please try
>> more
>> and let me know what I got wrong.
>>
>> I forgot to give a function that maps from float to integer. This one
>> truncates towards zero (which I hate ... but may even be right in some
>> universe, maybe the IEEE one too!). Here it is. Recall that *cast *=
>> unpack . pack maps equal-sized objects onto each other bitwise. Again,
>> I'll
>> remove the sign bit first before even thinking about it, in order to
>> simplify this problem. That already guarantees symmetrical behavior
>> around
>> zero, for better or worse:
>>
>>
>>
>>
>>
>>
>> *myFloatToInt :: (Bool,Int8,Word23) -> IntmyFloatToInt (_,0,0) =
>> 0myFloatToInt (True,e,m) = - myFloatToIntPos (e,m)myFloatToInt
>> (False,e,m)
>> = myFloatToIntPos (e,m)-- **needs Word/Int to have more than 24 bits*
>>
>>
>>
>>
>>
>> *myFloatToIntPos :: (Int8,Word23) -> IntmyFloatToIntPos (e,m) = if e-127
>> >
>> 23 then m' .<<. fromEnum
>> ((e-127)-23) else m' .>>. fromEnum (23-(e-127))
>> where m' = cast(setBit (zeroExtend m) 23) :: Int*
>>
>> (A late reminder that .<<. is shiftL and .>>. is shiftR). Test:
>>
>>
>>
>>
>> *> myFloatToInt (mydecodeFloat 98.7)98> myFloatToInt (mydecodeFloat
>> (-101.1))-101*
>>
>> That's what I mean by *symmetrical *truncation towards zero. One might
>>> *myencodeFloatPos :: (Int8,Word23) -> Float*
>>> *myencodeFloatPos (zExp,zFrac) = result *
>>> * where result = cast((zeroExtend(cast zExp) .<<. 23) .|. zeroExtend
>>> zFrac :: Word32)*
>>>
>>> By "*.<<.*" I mean *shiftL*.
>>>
>>> That will synthesize. Haskell float to/from whatever transforms probably
>>> won't, since they will have been written as recursive functions.
>>>
>>> Sorry about the unnecessary complication originally in the encode
>>> function.
>>>
>>> Further, one can avoid the haskell negation of floats that I used in
>>> the
>>> top level encoding function by setting the sign bit in the underlying
>>> IEEE
>>> layout:
>>>
>>>
>>>
>>> *myencodeFloat :: (Bool,Int8,Word23) -> FloatmyencodeFloat
>>> (True,zExp,zFrac) =*
>>> * unpack(setBit (pack(myencodeFloatPos(zExp,zFrac)))
>>> 31) -- **was "- stuff"*
>>> *myencodeFloat (False,zExp,zFrac) = myencodeFloatPos(zExp,zFrac)*
>>> *myDivFloat :: (Bool,Int8,Word23) -> (Bool,Int8,Word23) ->
>>> (Bool,Int8,Word23)myDivFloat (s1,e1,m1) (s2,e2,m2) = (s,e,m)*
>>>
>>>
>>> * where s = s1 /= s2 :: Bool*
>>> * e = (e1 - 127) - (e2 - 127) + 127 :: Int8*
>>>
>>>
>>>
>>> * m1' = setBit (zeroExtend m1) 23 :: Word48 --
>>> caculate
>>> in 48b m2' = setBit (zeroExtend m2) 23 :: Word48 *
>>>
>>> * m = truncateB ((m1' .<<. 23) `div` m2') :: Word23*
>>> Test:
>>> *> myencodeFloat(myDivFloat (mydecodeFloat 1.3) (mydecodeFloat 4.5))*
>>> *0.28888887 *
>>> *> 1.3 / 4.5 :: Float*
>>> * 0.28888887 *
>>>
>>> I'm truncating, not rounding. Boo hoo. IEEE probably allows either.
>>> Somebody else can read the standard and tell me. I'll rely on supposing
>>> the
>>> standard accommodates all existing implementations at the time it was
>>> drawn
>>> up, and at that time some vendors did it the least complex way and saved
>>> on
>>> adding a half-something somewhere in there. So sue me. I'll fix if reqd.
>>>
>>> For help in converting to/fro, here's how to turn an Int into an IEEE
>>> float. Recall that *cast* = unpack . pack maps equal bitsize things onto
>>> each other, such as Signed onto Unsigned (Int onto Word) and back again.
>>>
>>> *-- **works on up to 128 bit ints (could use Integer instead of Int)*
>>>
>>>
>>>
>>>
>>> *myIntToFloat :: Int -> (Bool,Int8,Word23)myIntToFloat x = (x <
>>> 0,e,m) where (e,m) = myIntToFloatPos (abs x)-- **only on
>>> positive ints, up to 127 bit posints at least*
>>>
>>>
>>>
>>>
>>>
>>> *myIntToFloatPos :: Int -> (Int8,Word23)myIntToFloatPos 0 =
>>> (0,0)myIntToFloatPos x = if n < 23 then (127+toEnum
>>> n,truncateB(cast x .<<. (23-n))) else (127+toEnum
>>> n,truncateB(cast x .>>. (n-23))) where n = ff1 x --
>>> **index
>>> of highest set bit*
>>>
>>>
>>> I used this helper function to find the position of the most significant
>>> bit in an Int, or whatever:
>>>
>>> -- *highest set bit number*
>>>
>>>
>>>
>>> *ff1 x = case findIndex (/=0) (bv2v (pack x)) of Just i ->
>>> finiteBitSize x - 1 - fromEnum i Nothing -> -1 -- for x == 0
>>> !!*
>>> Test:
>>>
>>>
>>> *> myencodeFloat(myIntToFloat (-8000001))-8000001.0*
>>>
>>> That looks OK.
>>>
>>> I'll need to compare floats. Comparing with zero would do if I had done
>>> addition and subtraction already, but I haven't, so I'll do the float
>>> against float compare. Only one comparison is needed, as one can swap
>>> sides and use not and && to get the others. This is basically alphabetic
>>> ordering on the IEEE triples, with sign thrown in. I'll delegate the
>>> comparison to positive against positive, after dealing with the signs:
>>>
>>>
>>>
>>> *myLtFloat (s1,e1,m1) (s2,e2,m2) = case undefined of _ | s1 &&
>>> not s2 -> True -- **neg < pos*
>>> * _ | not s1 && s2 -> False -- **pos <
>>> neg*
>>> * _ | not s1 && not s2 -> myLtFloatPos (e1,m1) (e2,m2) -- **pos <
>>> pos*
>>> * _ | s1 && s2 -> myLtFloatPos (e2,m2) (e1,m1) -- **neg <
>>> neg*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *myLtFloatPos (e1,m1) (e2,m2) = case undefined of _ | e2 - e1 >
>>> 0 -> True _ | e2 - e1 < 0 -> False _ | m2 > m1
>>> -> True _ | m2 <= m1 -> False*
>>> Test:
>>>
>>>
>>> *> myLtFloat (mydecodeFloat 1.3) (mydecodeFloat 4.5)True*
> <https://groups.google.com/d/msgid/clash-language/070d6426-522a-45e9-a041-f92560826b6dn%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> --
> You received this message because you are subscribed to the Google Groups
> "Unum Computing" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to unum-computin...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/unum-computing/CAMpdyDYEgaeXAavLuiqa0JEpSryRqa6_XifB4ua%2BVz2A0-1gPQ%40mail.gmail.com.
>
Reply all
Reply to author
Forward
0 new messages