Major FP overhaul almost finished

11 views
Skip to first unread message

The Beez

unread,
Mar 14, 2012, 3:34:40 AM3/14/12
to 4tH-compiler
Hi 4tH-ers!

I've almost finished the ANS FP overhaul. Lots of - sometimes tiny -
optimizations have been implemented. Most notoriously FLNFLOG.4tH has
been changed a lot:
- Replacing 2 s>f F/ by F2/
- Replacing 1 s>f by 1 u>f (unsigned conversion)
- Replacing constants with more optimal calculations

The last one requires some explanation. There is no way to quickly
initialize an FP number. S>FLOAT is very slow. The source code shows
why:

IF 0 ?DO 10 S>F F/ LOOP \ positive exponent
ELSE 0 ?DO 10 S>F F* LOOP \ negative exponent
THEN

This is the way an exponent is scaled. E.g. when 1e-30 is converted a
massive 30 odd divisions are made. Same with fractions:

BEGIN fdigit?
WHILE 10 S>F F/ ( F: num digitmul )
FDUP S>F F* ( F: num digitmul delta )
FROT F+ FSWAP
REPEAT FDROP DROP DUP \ more to convert?

That's both conservative AND expensive. Imagine a constant embedded in
a loop..

That conservative approach by the author made me feel a bit unsure
when contemplating a more direct approach to initializing a constant.
Another solution had to be found. This seemed a viable solution:

create f10** 10 , 100 , 10000 , 100000000 ,
does> 1 u>f 4 bounds do dup 1 and if i @c u>f f* then 2/ loop drop ;

Although this one is only of use when n > 9 (see MAX-N) it scales
pretty quickly, requiring only 2-4 FP operations for the range 1e10 -
1e15. F**2 extends this range by squaring (a single operation). 1/F
does the same with one additional operation for 1e-10 to 1e-15. This
is even better than FPOW:

: fpow ( f - f')
dup if dup 1 = if drop else 2 /mod fdup fdup f* recurse fswap
recurse f* then
else drop fdrop 1 u>f
then
;

For 1e30 this one requires 8 FP operations, contrary to 5 for the
F10** / F**2 combo. There is however, one single drawback: it's not
improving readability. Still, I think it is a viable solution for
libraries. I think that a readable but very slow library is of less
use than a more difficult readable library with a good performance.

You can see the impact of this all in SVN. Comments are appreciated.

Hans Bezemer

田明

unread,
Mar 14, 2012, 4:53:08 AM3/14/12
to 4th-co...@googlegroups.com
great


--
You received this message because you are subscribed to the Google Groups "4tH-compiler" group.
To post to this group, send email to 4th-co...@googlegroups.com.
To unsubscribe from this group, send email to 4th-compiler...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/4th-compiler?hl=en.


ll...@writersglen.com

unread,
Mar 14, 2012, 11:57:31 AM3/14/12
to 4th-co...@googlegroups.com
Sounds like you're doing great work, Hans.

Now we know why Chuck Moore always preferred scaling fix point to floating point.

Best wishes,

LRP

Hi 4tH-ers!

Hans Bezemer

--

Whiteley

unread,
Mar 14, 2012, 11:30:16 AM3/14/12
to 4th-co...@googlegroups.com
Just to let you know I'm reading this 4th stuff and even saving some of
it for future (if unlikely) study. In between times I'm having to print
out music for Jennie's Chickens and Dill Pickle Rag for the upcoming
practice session of the Barley Shakers celtic band. Meanwhile
downstairs I'm trying to cut out a wooden plaque with my home-brew CNC
router.

Did you ever wonder what you might do in retirement?

Regards,

David Whiteley

The Beez

unread,
Mar 14, 2012, 2:13:17 PM3/14/12
to 4tH-compiler


On Mar 14, 4:30 pm, Whiteley <skidd...@rogers.com> wrote:
> Did you ever wonder what you might do in retirement?
I always wanted to write and maintain a compiler.. ;-)

Hans

The Beez

unread,
Mar 14, 2012, 2:27:04 PM3/14/12
to 4tH-compiler
On Mar 14, 4:57 pm, ll...@writersglen.com wrote:
> Sounds like you're doing great work, Hans.
> Now we know why Chuck Moore always preferred scaling fix point to floating point.
So do I. As a matter of fact, I NEVER wanted "native" support for
double or floating point number, because:
(a) Nine out of ten times you wont need it;
(b) It's quite complex, especially for beginners;
(c) I didn't want to drag the overhead around. Note due to 4tH's
(Harvard) design I would have to assign a whole new segment to FP and
have major trouble in breaking it down in a architecture independent
format (HX). Theoretically, HX files should be portable to little or
big endian machines, signed magnitude or one complement.

Drawback of it all is that FP is "emulated" and hence slow. So
avoiding FP operations as much as possible speeds up performance. The
most dramatic example is in ZENFSQRT.4tH where the integer square root
is calculated up to five digits of precision, before it is passed to
the FP routine. There the exact square root is done in two or three
iterations.

The integer part is so fast that the output is counted in thousands of
FP square roots per second. Another note: the fixed point routines of
Brody are part of 4tH too. You can see an example of this in QUADCALC.
4tH, the classic resolution of Ax^2 + Bx + C. It interfaces quite
nicely with MATH.4TH.

Hans Bezemer

The Beez

unread,
Mar 15, 2012, 12:51:14 PM3/15/12
to 4tH-compiler
On Mar 14, 8:34 am, The Beez <the.beez.spe...@gmail.com> wrote:
I've uploaded an improved version of F10**, extending the range to
1e24, knocking off an FP operation on average and making the DO..LOOP
a bit tighter as well. The trick is to use the "unused" space between
1 and 1e9 by starting off at 1e9 from the beginning and covering the
range from 1e10 thru 1e24 by FP and the range between 1 and 1e9 by
integer calculation.

Code in SVN

Hans Bezemer

The Beez

unread,
Mar 16, 2012, 3:51:23 AM3/16/12
to 4tH-compiler
On Mar 15, 5:51 pm, The Beez <the.beez.spe...@gmail.com> wrote:
OK, I'm done. All "S>FLOAT" constructs have been removed from the
libraries, unless they're used to initialize tables. They're called
only once, no need to fix that. Yes, readability suffered in some
areas, but even if I come up with a better solution I don't it will
change that much:

- Even a ME>F word (Mantissa/Exponent to Float) word won't help where
the precision exceeds 1e9;
- The current solution F10** is reasonably optimized, so rewriting
that one won't give performance much more "umph".

I also weeded out all FALSE [IF] .. [THEN] testing programs and move
them to /demo. Why not when I'm at it ;-)

Hans Bezemer

The Beez

unread,
Mar 16, 2012, 8:16:51 AM3/16/12
to 4tH-compiler


On Mar 16, 8:51 am, The Beez <the.beez.spe...@gmail.com> wrote:
Don't expect an "ME>F" anytime soon. As I expected the ANS Float FP
format is rather bizarre. Here is 60e0 for you (mantissa (bin),
exponent (bin), float):
1111000000000000000000000000000000000000000000000000000000000000
1111111111111111111111111000110 60.0000000000000000

The exponent seems to be in format mantissa*2^n rather than
mantissa*10^n. I found out that much. But I haven't succeeded in
poking in constants where exponent > 0.

Hans Bezemer

The Beez

unread,
Mar 17, 2012, 6:20:15 AM3/17/12
to 4th-co...@googlegroups.com


On Friday, March 16, 2012 1:16:51 PM UTC+1, The Beez wrote:

Ok, I found it. It's called "binary scientific notation". It looks so bizarre, because it is "normalized", that is: scaled so the mantissa is as big as possible (means: topmost digit is set). Converting a decimal mantissa/exponent to this format isn't entirely trivial (see: http://www.technical-recipes.com/2011/ieee-754-floating-point-to-binary-conversion/). Another drawback of this approach is known as well: you're fiddling with the internal format of the FP number.

So I came up with this solution: a wrapper around F10**:

  : me>f dup 0< if 1 u>f abs f10** f/ else f10** then s>f f* ;

  314159265 -8 me>f fs.

That takes only 2 FP operations, believe it or not! the same number with -24 exponent just 7, which saves us at least 17 FP operations. Best: doing it manually doesn't make it any more efficient. On top of that: it's something I can explain to a newbie, so this is 4tH manual material!!

Lemme know what you think.

Hans Bezemer

The Beez

unread,
Mar 17, 2012, 7:16:34 AM3/17/12
to 4th-co...@googlegroups.com


Op zaterdag 17 maart 2012 11:20:15 UTC+1 schreef The Beez het volgende:
I also added a dummy ZenFP definition to "fpow10.4th" for ME>F. It will create a little overhead, but may be a life saver if you want to create programs that run under both ZenFP and ANS FP.

Hans Bezemer

The Beez

unread,
Mar 17, 2012, 5:54:12 PM3/17/12
to 4th-co...@googlegroups.com
I also added a dummy ZenFP definition to "fpow10.4th" for ME>F. It will create a little overhead, but may be a life saver if you want to create programs that run under both ZenFP and ANS FP.

Well, since the introduction of [IGNORE] the overhead has gone; no penalty at all. Amazing how fast I came to it - I just analysed the options one by one by clearly stating the design objectives and it came to me almost automatically..

Anyway, I was leafing through the manual when I bumped into FSL-UTIL.4tH and all the warnings that came with it. That may have been required at some moment in time, but not anymore. I simply added an [ABORT] directive to the library and that was it: no more warnings.

We're not there yet, but it's getting cleaner and better designed with each release.

Hans Bezemer
Reply all
Reply to author
Forward
0 new messages