I've been quiet on the RC2014 front for a few months. But, finally it is a reasonable time to write a little about what has been occupying me.
When I started with z80, one of the first things that interested me was hooking up the
Am9511A Arithmetic Processing Unit (APU), for its floating point capabilities, to the
z180 CPU. When I did that I realised that none of the existing floating point libraries use the hardware multiply capabilities of the z180, or incidentally of the z80-zxn CPU in the
Spectrum Next (coming soon) and, to be fair, I needed to have a real z180 floating point library for comparison.
So after
procrastinating on the z180 and z80-zxn floating point library situation for nearly two years, I decided to write a
IEEE 754 32-bit floating point library for those two platforms which could also be used for the normal z80.
I started out with the
Digi International floating point library for the Rabbit R2000 and R3000 CPUs. The Rabbit CPU has a signed
32_16x16 hardware multiply, and this made a good starting point. But, after months of working on the problem the process turned out to be much more difficult than I intended:
- Rewrite Rabbit code to z80 code -> realise that this doesn't quite work so easily.
- Translate 32_16x16 Rabbit multiply algorithms to 16_8x8 z80 multiply algorithms -> accuracy is ugly.
- Learn about Newton-Raphson (a month passes).
- Learn about Horner's Method.
- Rewrite derived assembly functions to use compact IEEE intrinsic functions -> accuracy is middling but getting better.
- Rewrite intrinsic functions to use an expanded 32-bit mantissa -> accuracy is good as IEEE 754 gets.
Getting accuracy in the intrinsic functions required rewriting the code 3 times. If I knew what I was doing it would have been easier. But, as a learning experience rewrites are the only way to make progress. Now, after the rewrites, I'm pretty happy with the resulting code.
Finally, with a solid intrinsic function library, I used extracted the derived functions (trigonometric, hyperbolic, and power) from the
Hi-Tech C library. These C functions are known as the fastest floating point implementations in z88dk bench marking, but they are not particularly accurate. It is still a work in progress to rewrite these to reach an acceptable compromise between performance and accuracy.
Performance in bench marking is still being examined, but for the two targets of z180 and the Spectrum Next z80-zxn it is about four times (4x) faster for arithmetic intensive benchmarks (n-body). A 4x improvement is in-line with the kind of improvement revealed by writing the z180 and z80-zxn integer math library previously. There is less benefit for the z80, as there is no hardware multiply capability which can be exploited. Any z80 performance improvement would come directly from reduction of the number of bytes shuffled versus the existing 48-bit math libraries, and it is still to be quantified.
Hopefully, this library will become quite useful.
For me, it has been a very educational few months. Glad that it is (nearly) done - done.
Cheers, Phillip