Planetary Coordinates with (multiple) APU Modules.

144 views
Skip to first unread message

Phillip Stevens

unread,
Jul 3, 2021, 8:31:08 AM7/3/21
to RC2014-Z80
A question was raised elsewhere regarding the possibility of plotting planetary motion using the RC2014 using the various floating point libraries available through z88dk.

The question was intriguing, not because it is a unique application, but because it is a real test of the performance of the APU Module. And, the application can relatively easily be parallelised to effectively use multiple (up to 4x) APU Modules.

IMG_1488.jpg IMG_1489.jpg

So over the past few days I've been building some new tools to make effective use of multiple APU Modules from C (rather than from assembly), and also getting my head around the planetary motion equations.

The program is very rough, and doesn't account for perturbations of Jupiter, Saturn, etc, but it is sufficiently accurate to use compare with real life at CosineKitty.

solar day 7855.501000
Geocentric Coordinates
Sun x -0.207707 y 0.995251 z 0.000000
Moon x 0.002348 y 0.001304 z -0.000156
Mercury x 0.133501 y 0.801797 z -0.047167
Venus x -0.903895 y 1.170886 z 0.042585
Mars x -1.682168 y 1.768174 z 0.052462
Jupiter x -4.230069 y 4.028960 z 0.077058
Saturn x 6.109470 y -6.722128 z -0.115441
Uranus x 14.586440 y 14.054760 z -0.143821
Neptune x -29.764930 y 5.489574 z 0.584961

The initial results have one APU Module being about twice as fast as using the math32 soft floating point library.

Over the next while I'll do the parallelisation, and post the results and code.
This is lots of fun.
P.

Jay Cotton

unread,
Jul 3, 2021, 12:19:02 PM7/3/21
to RC2014-Z80
Look our Cray, we are coming for you.

Phillip Stevens

unread,
Jul 4, 2021, 10:37:53 AM7/4/21
to RC2014-Z80

Phillip Stevens wrote:
A question was raised elsewhere regarding the possibility of plotting planetary motion using the RC2014 using the various floating point libraries available through z88dk.
The question was intriguing, not because it is a unique application, but because it is a real test of the performance of the APU Module. And, the application can relatively easily be parallelised to effectively use multiple (up to 4x) APU Modules.
The initial results have one APU Module being about twice as fast as using the math32 soft floating point library.

 Drum roll... where did I get too?

Using very rough calculations (serving more as a benchmark than an accurate calculator), and doing 40 calculations (iterations) for 9 bodies on RC2014 (CPM-IDE) with reduced printing (-DPRINTF not defined), we have
  • math48 - 103.1 seconds
  • math32 - 34.4 seconds
  • am9511 - 17.2 seconds
  • am9511 4x - 12.5 seconds
So starting at math48 the default z88dk floating point library we are about 3x faster with math32. This aligns with other benchmarks pretty well.

And, adding an APU Module with am9511 doubles the performance again to 17.2 seconds for this test, which was also well established.

And the final step is to see how well I could parallelise the application to make good use of 4x APU Modules. And the result to date is not that great, to be honest.
I've been able to get the multiple over math32 software from 2x -> 3x, in my first pass of parallelising the important functions (vs the normal functions).

Suggestions on further improvements?

Cheers, Phillip
 

Phillip Stevens

unread,
Jul 8, 2021, 6:36:13 AM7/8/21
to RC2014-Z80
Phillip Stevens wrote:
A question was raised elsewhere regarding the possibility of plotting planetary motion using the RC2014 using the various floating point libraries available through z88dk.
The question was intriguing, not because it is a unique application, but because it is a real test of the performance of the APU Module. And, the application can relatively easily be parallelised to effectively use multiple (up to 4x) APU Modules.
Using very rough calculations (serving more as a benchmark than an accurate calculator), and doing 40 calculations (iterations) for 9 bodies on RC2014 (CPM-IDE) with reduced printing (-DPRINTF not defined), we have
  • math48 - 103.1 seconds
  • math32 - 34.4 seconds
  • am9511 - 17.2 seconds
  • am9511 4x - 12.5 seconds
So starting at math48 the default z88dk floating point library we are about 3x faster with math32. This aligns with other benchmarks pretty well.

And, adding an APU Module with am9511 doubles the performance again to 17.2 seconds for this test, which was also well established.

And the final step is to see how well I could parallelise the application to make good use of 4x APU Modules. And the result to date is not that great, to be honest.
I've been able to get the multiple over math32 software from 2x -> 3x, in my first pass of parallelising the important functions (vs the normal functions).

Some fun over the last few days straightening out some kinks with the atan2(y,x) functions in the various math libraries in z88dk, but that's all done.
And, with a bit more work on parallelising the slow functions, I've been able to pull another 3.4 seconds (25%) out of the 4x APU Module solution.

40 calculations for 9 bodies on z88dk-ticks - no printing (-DPRINTF not defined, no other printf()).
  • sccz80/classic/genmath  Ticks: 967407074
  • sccz80/new/math48       Ticks: 769092327
  • sccz80/new/math32       Ticks: 252826233
  • sdcc/new/math48         Ticks: 735457135
  • sdcc/new/math32         Ticks: 244811563
40 calculations for 9 bodies on RC2014 (CPM-IDE) with reduced printing (-DPRINTF not defined).
  • sccz80/new/math48       105.5 seconds
  • sdcc/new/math48         101.4 seconds
  • sdcc/new/math32          34.3 seconds
  • sdcc/new/am9511          17.1 seconds
  • sdcc/new/am9511 4x        9.1 seconds
This converts to being able to effectively use 2x APU Modules. I think it is going to be difficult to do better whilst remaining mainly in single threaded C functions.

And I  think this is the last update on this thread, before I move onto visualisation of the planet motion using ReGIS.

Bill Shen

unread,
Jul 8, 2021, 12:30:06 PM7/8/21
to RC2014-Z80
Phillip,
I'm interested in your ReGIS implementation.  I have overclocked a 6502 to 25.175MHz so the 6502 can drive a 640x480 VGA graphic monitor directly without a graphic controller.  However, only time it has for other task is during the vertical retrace period, too little time for timely transfer of images, so I thought about receives ReGIS commands serially and executes during the vertical retrace period.  By my calculation it has roughly the performance of 5MHz Z80 and 16K of program memory so it may not be enough for ReGIS tasks.
  Bill

Phillip Stevens

unread,
Jul 8, 2021, 9:21:12 PM7/8/21
to RC2014-Z80
Bill wrote:
I'm interested in your ReGIS implementation.  I have overclocked a 6502 to 25.175MHz so the 6502 can drive a 640x480 VGA graphic monitor directly without a graphic controller.  However, only time it has for other task is during the vertical retrace period, too little time for timely transfer of images, so I thought about receives ReGIS commands serially and executes during the vertical retrace period.

Phillip wrote:
And I  think this is the last update on this thread, before I move onto visualisation of the planet motion using ReGIS.

Hi Bill,
I'm interested in ReGIS as the easier option, rather than building GSX support and then a device driver for it.
I'll open a new thread with some background.
 
Reply all
Reply to author
Forward
0 new messages