Calculating PI

Ed

unread,

Dec 4, 2014, 3:49:54 AM12/4/14

to

Wanting to do a PI calculator in Forth, I figured it would be easier to
start with existing program source, even if written in a different language.

My first surprise was not finding anything on the CP/M CD-ROM.
I eventually settled on a 32-bit C program found on the internet which I
converted to 16-bit operation and then to Forth.

The second surprise was finding the DX-Forth version ran twice as fast
as the HITECH-C version. I was expecting the opposite as HITECH-C
compiles native code (?). OTOH I'm running MYZ80 - not a real Z80,
and DX-Forth uses only 8080 instructions which may run/emulate faster (?)

If anyone is interested in PI generation or trying out various compilers,
the sources and executables are here:
http://www.netbay.com.au/~dxforth/pi_cpm.zip

Charles Richmond

unread,

Dec 4, 2014, 8:42:54 AM12/4/14

to

"Ed" <inv...@nospam.com> wrote in message
news:m5p77d$khd$1...@speranza.aioe.org...

There is a Bob Bishop version of an Apple II BASIC Pi calculator at:

http://bob-bishop.awardspace.com/ApplePi/index.html

This also uses Machin's formula. This version was published in the Aug/Sept
1978 issue of Micro magazine. It might be instructive to compare this to the
C version you found.

--

numerist at aquaporin4 dot com

Mr. Emmanuel Roche, France

unread,

Dec 4, 2014, 11:28:20 AM12/4/14

to

Is anybody out there

who has either an Apple-II or an AII emulator?

I have retyped the article, but cannot check that it is working correctly.

(What does mean "CALL -936" ?)

Yours Sincerely,
Mr. Emmanuel Roche, France

APPLEPI.WS4
-----------

- "Apple Pi"
Robert Bishop
"Micro", Aug/Sept 1978, p.15

(Retyped by Emmanuel ROCHE.)

Everyone knows that the value of Pi is about 3.1416. In fact, its value was
known this accurately as far back as 150 A.D. But it was not until the 16th
century that Francisco Vieta succeeded in calculating Pi to 10 decimal places.

Around the end of the 16th century, the German mathematician Ludolph van
Ceulen worked on calculating the value of Pi until he died at the age of 70.
His efforts produced Pi to 35 decimal places.

During the next several centuries, a great deal of effort was spent in
computing the value of Pi to ever greater precision. In 1699, Abraham Sharp
calculated Pi to 71 decimal places. By the mid-1800's, its value was known to
several hundred decimal places. Finally, in 1873, an English mathematician,
William Shanks, determined Pi to 707 decimal places, an accuracy which
remained unchallenged for many years.

I was recently re-reading my old copy of Kasner & Newman's "Mathematics and
Imagination" (Simon & Schuster, 1940), where I found the series expansion:

Inf 16(-1)^K+1 Inf 4(-1)^K+1
Pi = > ------------ - > --------------
k=1 (2K-1)5^2K-1 k=1 (2K-1)239^2K-1

The book indicated that this series converged rather quickly but... "it would
require 10 years of calculation to determine Pi to 1000 decimal places."
Clearly, this statement was made before modern digital computers were
available. Since then, Pi has been computed to many thousands of decimal
places. But Kasner & Newman's conjecture of a 10-year calculation for Pi
aroused my curiosity to see just how long it would take my little Apple-II
computer to perform the task.

Program description
-------------------

My program to compute the value of Pi is shown in Figure 1. It was written
using the Apple-II computer's Integer BASIC, and requires a 16K system (2K for
the program itself; 12K for data storage). The program is fairly
straightforward, but a brief discussion may be helpful.

The main calculation loop consists of lines 100 through 300; the results are
printed in lines 400 through 600. The second half of the listing contains the
multiple-precision arithmetic subroutines. The division, addition, and
subtraction routines start at lines 1000, 2000, and 3000, respectively.

In order to use memory more efficiently, PEEK and POKE statements were used
for arrays, instead of DIM statements. Three such arrays are used by the
programs: POWER, TERM, and RESULT. Each are up to 4K bytes long, and start at
the memory locations specified in line 50 of the program.

The 3 arrays mentioned above each store partial and intermediate results of
the calculations. Each byte of an array contains either 1 or 2 digits,
depending on the value of the variable TEN. If the number of requested digits
for Pi is less than about 200, it is possible to store 2 digits per byte;
otherwise, each byte must contain no more than 1 digit. (The reason for this
distinction occurs in line 1070, where an arithmetic overflow can occur when
trying to evaluate higher order terms of the series, if too many digits are
packed into each byte.)

The program evaluates the series expansion for Pi until the next term of the
series results in a value less than the requested precision. Line 1055
computes the variable ZERO, which can be tested to see if an underflow in
precision has occurred. This value is then passed back to the main program,
where (in line 270) it determines whether or not the next term of the series
is needed.

Results
-------

Figure 2 shows the calculated value of Pi to 1000 decimal places. Running the
program to get these results took longer than it did to write the program!
(The program ran for almost 40 hours before it spit out the answer.) However,
it took less than 2 minutes to produce Pi to 35 decimal places, the same
accuracy to which Ludolph van Ceulen spent his whole life striving for!

Since the program is written entirely in BASIC, it is understandably slow. By
rewriting all or part of it in machine language, its performance could be
vastly improved. However, I will leave this implementation as an exercise for
anyone who is interested in pursuing it.

Figure 1. Program listing

2 REM "Apple-Pi", Robert Bishop, "Micro", p.15
3 :
5 CALL -936: VTAB 10 : PRINT "How many digits do you want? "
10 INPUT size
15 CALL -936
20 ten = 10 : IF size > 200 THEN GOTO 50
30 ten = 100 : size = (size + 1) / 2
50 power = 4096 : term = 8192 : result = 12288
60 div = 1000 : add = 2000 : sub = 3000 : init = 4000 : copy = 5000
70 DIM constant (2) : constant (1) = 25 : constant (2) = 239
90 :
100 ' Main loop
125 FOR pass = 1 TO 2
150 GOSUB 4000 ' Init
200 GOSUB 5000 ' Copy
210 point = term : divide = expo : GOSUB 1000 ' Div
220 IF sign > 0 THEN GOSUB 2000 ' Add
230 IF sign < 0 THEN GOSUB 3000 ' Sub
240 expo = expo + 2 : sign = - sign
250 point = power : divide = constant (pass) : GOSUB 1000 ' Div
260 IF pass = 2 THEN GOSUB 1000 ' Div
270 IF zero <> 0 THEN GOTO 200
300 NEXT pass
400 ' Print the result
500 PRINT : PRINT
510 PRINT "The value of Pi to " (ten / 100 + 1) * size " decimal places:" : PRINT
520 PRINT PEEK (result) "." ;
530 FOR place = result + 1 TO result + size
540 IF ten = 10 THEN GOTO 570
560 IF PEEK (place) < 10 THEN PRINT "0" ;
570 PRINT PEEK (place) ;
580 NEXT place
590 PRINT
600 END
990 :
1000 ' Division subroutine
1010 digit = 0 : zero = 0
1020 FOR place = point TO point + size
1030 digit = digit + PEEK (place)
1040 quotient = digit / divide
1050 residue = digit MOD divide
1055 zero = zero OR (quotient + residue)
1060 POKE place, quotient
1070 digit = ten * residue
1080 NEXT place
1090 RETURN
1990 :
2000 ' Addition subroutine
2010 carry = 0
2020 FOR place = size TO 0 STEP -1
2030 sum = PEEK (result + place) + PEEK (term + place) + carry
2040 carry = 0
2050 IF sum < ten THEN GOTO 2080
2060 sum = sum - ten
2070 carry = 1
2080 POKE result + place, sum
2090 NEXT place
2100 RETURN
2990 :
3000 ' Subtraction subroutine
3010 loan = 0
3020 FOR place = size TO 0 STEP -1
3030 difference = PEEK (result + place) - PEEK (term + place) - loan
3040 loan = 0
3050 IF difference >= 0 THEN GOTO 3080
3060 difference = difference + ten
3070 loan = 1
3080 POKE result + place, difference
3090 NEXT place
3100 RETURN
3990 :
4000 ' Initialize registers
4010 FOR place = 0 TO size
4020 POKE power + place, 0
4030 POKE term + place, 0
4040 IF pass = 1 THEN POKE result + place, 0
4050 NEXT place
4060 POKE power, 16 / pass ^ 2
4070 IF pass = 1 THEN divide = 5
4080 IF pass = 2 THEN divide = 239
4090 point = power : GOSUB 1000 ' Div
4100 expo = 1 : sign = 3 - 2 * pass
4110 RETURN
4990 :
5000 ' Copy "power" into "term"
5010 FOR place = 0 TO size
5020 POKE term + place, PEEK (power + place)
5030 NEXT place
5040 RETURN

Figure 2. Pi to 1000 decimal places

The value of Pi to 1000 decimal places:

3.14159265358979323846264338327950288419
7169399375105820974944592307816406286208
9986280348253421170679821480865132823066
4709384460955058223172535940812848111745
0284102701938521105559644622948954930381
9644288109756659334461284756482337867831
6527120190914564856692346034861045432664
8213393607260249141273724587006606315588
1748815209209628292540917153643678925903
6001133053054882046652138414695194151160
9433057270365759591953092186117381932611
7931051185480744623799627495673518857527
2489122793818301194912983367336244065664
3086021394946395224737190702179860943702
7705392171762931767523846748184676694051
3200056812714526356082778577134275778960
9173637178721468440901224953430146549585
3710507922796892589235420199561121290219
6086403441815981362977477130996051870721
1349999998372978049951059731732816096318
5950244594553469083026425223082533446850
3526193118817101000313783875288658753320
8381420617177669147303598253490428755468
7311595628638823537875937519577818577805
3217122680661300192787661119590921642019
96

EOF

David Schmidt

unread,

Dec 4, 2014, 12:05:33 PM12/4/14

to

On 12/4/2014 11:28 AM, Mr. Emmanuel Roche, France wrote:
> I have retyped the article, but cannot check that it is working correctly.

No need to retype the program - it's already on virtual disks. An example:
ftp://ftp.apple.asimov.net/pub/apple_II/images/misc/Bob%20Bishop%20Apple%20Pi.dsk
And lowercase letters would not work as expected on the target machine
(Integer-equipped Apple II non-plus). Here is a corrected copy:

0 REM *** APPLE-PI *** WRITTEN BY: BOB BISHOP
5 CALL -936: VTAB 10: TAB 5: PRINT "HOW MANY DIGITS DO YOU WANT";
10 INPUT SIZE
15 CALL -936
20 TEN=10: IF SIZE>200 THEN 50
30 TEN=100:SIZE=(SIZE+1)/2
50 POWER=4096:TERM=8192:RESULT=12288
60 DIV=1000:ADD=2000:SUB=3000:INIT=4000:COPY=5000
70 DIM CONSTANT(2):CONSTANT(1)=25:CONSTANT(2)=239
100 REM MAIN LOOP
125 FOR PASS=1 TO 2
150 GOSUB INIT
200 GOSUB COPY
210 POINT=TERM:DIVIDE=EXP: GOSUB DIV
220 IF SIGN>0 THEN GOSUB ADD
230 IF SIGN<0 THEN GOSUB SUB
240 EXP=EXP+2:SIGN=-SIGN
250 POINT=POWER:DIVIDE=CONSTANT(PASS): GOSUB DIV
260 IF PASS=2 THEN GOSUB DIV
270 IF ZERO<>0 THEN 200
300 NEXT PASS
400 REM PRINT THE RESULT
500 PRINT : PRINT
510 PRINT "THE VALUE OF PI TO ";(TEN/100+1)*SIZE;" DECIMAL PLACES:":
PRINT
520 PRINT PEEK (RESULT);" ";
530 FOR PLACE=RESULT+1 TO RESULT+SIZE
540 IF TEN=10 THEN 570
560 IF PEEK (PLACE)<10 THEN PRINT "0";
570 PRINT PEEK (PLACE);
580 NEXT PLACE
590 PRINT
600 END
1000 REM DIVISION SUBROUTINE
1010 DIGIT=0:ZERO=0
1020 FOR PLACE=POINT TO POINT+SIZE
1030 DIGIT=DIGIT+ PEEK (PLACE)
1040 QUOTIENT=DIGIT/DIVIDE
1050 RESIDUE=DIGIT MOD DIVIDE
1055 ZERO=ZERO OR (QUOTIENT+RESIDUE)
1060 POKE PLACE,QUOTIENT
1070 DIGIT=TEN*RESIDUE
1080 NEXT PLACE
1090 RETURN
2000 REM ADDITION SUBROUTINE
2010 CARRY=0
2020 FOR PLACE=SIZE TO 0 STEP -1
2030 SUM= PEEK (RESULT+PLACE)+ PEEK (TERM+PLACE)+CARRY
2040 CARRY=0
2050 IF SUM<TEN THEN 2080
2060 SUM=SUM-TEN
2070 CARRY=1
2080 POKE RESULT+PLACE,SUM
2090 NEXT PLACE
2100 RETURN
3000 REM SUBTRACTION SUBROUTINE
3010 LOAN=0
3020 FOR PLACE=SIZE TO 0 STEP -1
3030 DIFFERENCE= PEEK (RESULT+PLACE)- PEEK (TERM+PLACE)-LOAN
3040 LOAN=0
3050 IF DIFFERENCE>=0 THEN 3080
3060 DIFFERENCE=DIFFERENCE+TEN
3070 LOAN=1
3080 POKE RESULT+PLACE,DIFFERENCE
3090 NEXT PLACE
3100 RETURN
4000 REM INITIALIZE REGISTERS
4010 FOR PLACE=0 TO SIZE
4020 POKE POWER+PLACE,0
4030 POKE TERM+PLACE,0
4040 IF PASS=1 THEN POKE RESULT+PLACE,0
4050 NEXT PLACE
4060 POKE POWER,16/PASS ^ 2
4070 IF PASS=1 THEN DIVIDE=5
4080 IF PASS=2 THEN DIVIDE=239
4090 POINT=POWER: GOSUB DIV
4100 EXP=1:SIGN=3-2*PASS
4110 RETURN
5000 REM COPY "POWER" INTO "TERM"
5010 FOR PLACE=0 TO SIZE
5020 POKE TERM+PLACE, PEEK (POWER+PLACE)
5030 NEXT PLACE
5040 RETURN

> (What does mean "CALL -936" ?)

Clears the screen.

Ed

unread,

Dec 5, 2014, 5:37:13 AM12/5/14

to

Charles Richmond wrote:
>
> There is a Bob Bishop version of an Apple II BASIC Pi calculator at:
>
> http://bob-bishop.awardspace.com/ApplePi/index.html
>
> This also uses Machin's formula. This version was published in the Aug/Sept
> 1978 issue of Micro magazine. It might be instructive to compare this to the
> C version you found.

The test would be whether it is capable of generating several thousand digits
without error. Programs looked at thus far suggest some 32-bit math will be
required.

Peter Dassow

unread,

Dec 5, 2014, 1:46:15 PM12/5/14

to

On 04.12.2014 at 17:28 Mr. Emmanuel Roche, France wrote:
> Is anybody out there
>
> who has either an Apple-II or an AII emulator?
>
> I have retyped the article, but cannot check that it is working correctly.
>
> (What does mean "CALL -936" ?)

CALL -936 means Cursor back to "HOME"

Regards
Peter

Charles Richmond

unread,

Dec 5, 2014, 3:51:43 PM12/5/14

to

"Peter Dassow" <z8...@arcor.de> wrote in message
news:m5suh5$ubr$1...@dont-email.me...

ISTM that if you take -936 as a signed 16-bit number, then that 16-bit
number will be used as the machine address to "CALL". I do *not* know what
the machine language routine at that address will do...

Charles Richmond

unread,

Dec 5, 2014, 4:11:25 PM12/5/14

to

"Ed" <inv...@nospam.com> wrote in message

news:m5s1sk$c5r$1...@speranza.aioe.org...

Several years ago I re-coded Bob Bishop's Apple II BASIC Pi calculator in C
directly from the BASIC, and then used the resulting program to calculate Pi
to 100,000 digits on a Linux box. I verified the answer using the Pi
constant then available at Gutenburg.org, which gave Pi to a million digits.

David Schmidt

unread,

Dec 5, 2014, 4:36:43 PM12/5/14

to

On 12/5/2014 3:51 PM, Charles Richmond wrote:
> "Peter Dassow" <z8...@arcor.de> wrote in message
> news:m5suh5$ubr$1...@dont-email.me...
>> On 04.12.2014 at 17:28 Mr. Emmanuel Roche, France wrote:
>>> Is anybody out there
>>>
>>> who has either an Apple-II or an AII emulator?
>>>
>>> I have retyped the article, but cannot check that it is working
>>> correctly.
>>>
>>> (What does mean "CALL -936" ?)
>>
>> CALL -936 means Cursor back to "HOME"
>>
>
> ISTM that if you take -936 as a signed 16-bit number, then that 16-bit
> number will be used as the machine address to "CALL".

Right. Woz's Integer BASIC interpreter only knows how to count to 16
bits (signed), so to get into the code at 64600 ($fc58) you have to go
into negative territory. If you tried to CALL 64600 you would be
greeted with "*** >32767 ERR" as a message.

> I do *not* know
> what the machine language routine at that address will do...

Well, it's been identified twice now in previous messages... it clears
the screen and homes the cursor. There is no HOME command in Integer
the way there is in Applesoft BASIC.

Ed

unread,

Dec 5, 2014, 6:55:26 PM12/5/14

to

Charles Richmond wrote:
> "Ed" <inv...@nospam.com> wrote in message

> ...

> > The test would be whether it is capable of generating several thousand
> > digits
> > without error. Programs looked at thus far suggest some 32-bit math will
> > be
> > required.
> >
>
> Several years ago I re-coded Bob Bishop's Apple II BASIC Pi calculator in C
> directly from the BASIC, and then used the resulting program to calculate Pi
> to 100,000 digits on a Linux box. I verified the answer using the Pi
> constant then available at Gutenburg.org, which gave Pi to a million digits.

But a Linux box isn't limited to 16-bit math. If you still have the C translation
handy, test it out with 16-bit compiler.

Egan Ford

unread,

Dec 5, 2014, 8:42:19 PM12/5/14

to

On 12/4/14, 1:50 AM, Ed wrote:
> Wanting to do a PI calculator in Forth, I figured it would be easier to
> start with existing program source, even if written in a different language.

Hi, writing Pi programs for '70s and '80s era systems is a hobby of
mine. I've got hand optimized versions in assembly for about 10
different CPUs. I've posted my CP/M 8080 and z80 versions here:
https://github.com/datajerk/pies. I also included a 16bit forth version.

The forth version is not optimized like the assembly versions (e.g. loop
unrolling, Duff's device, tables, etc...).

Pi to 1000 digits for the 8080 and z80 @ 2 MHz takes 143 and 107 seconds
respectfully. The code is not the same since z80 has some additional
features, e.g. block move. Both were written for the ASL cross-assembler.

Some info on how to implement Pi:
http://jerkwerks.com/pi-day-rematch-apple-ii-vs-hp-41c/. There's
another 16-bit Forth version there too (same as above but with minor
changes for the Apple II MAF Forth).

Egan Ford

unread,

Dec 5, 2014, 8:48:55 PM12/5/14

to

On 12/4/14, 1:50 AM, Ed wrote:

> Wanting to do a PI calculator in Forth, I figured it would be easier to
> start with existing program source, even if written in a different language.

Hi, writing Pi programs for '70s and '80s era systems is a hobby of
mine. I've got hand optimized versions in assembly for about 10
different CPUs. I've posted my CP/M 8080 and z80 versions here:

https://github.com/datajerk/pies. I also included 16bit forth and CP/M
C versions (you'll want to remove the ASM code that I used to query the
time from the Apple II clock while benchmarking on a CP/M card).

The Forth and C versions are not optimized like the assembly versions

(e.g. loop unrolling, Duff's device, tables, etc...).

Assembly Pi to 1000 digits for the 8080 and z80 @ 2 MHz takes 143 and
107 seconds respectively. The code is not the same since z80 has some

Egan Ford

unread,

Dec 5, 2014, 8:50:59 PM12/5/14

to

On 12/5/14, 6:34 PM, Egan Ford wrote:

Sorry for the double post. My news server hung, so I killed the post,
and then added C versions, then reposted with edits, but I guess my hung
news server still got the post.

Egan Ford

unread,

Dec 5, 2014, 10:57:17 PM12/5/14

to

On 12/5/14, 3:37 AM, Ed wrote:
> The test would be whether it is capable of generating several thousand digits
> without error. Programs looked at thus far suggest some 32-bit math will be
> required.

The Gregory series for computing arctan required for Machin is
independent of the number of bits your math routines can support,
however the number of bits does limit how far you can traverse the
series. Each iteration divides +2 (e.g. 3,5,7,etc...). Eventually
you'll exhaust your number of bits.

For Machin each arctan(5) iteration produces ~1.4 decimal digits
(log10(5^2)). 1000 digits would need ~716 iterations with the largest
divisor being ~2x716+1. Clearly 16-bit territory. Assuming 32/16
division code, you could compute about 23440 digits.

If your programs can do 64/32 division, then your at ~1.5e9 digits.
However a classic 8-bit system supporting 64KB of RAM limits the number
of decimal digits packed into a base 256 array to ~157800.

Ed

unread,

Dec 8, 2014, 12:31:45 AM12/8/14

to

Egan Ford wrote:
> ..

> Hi, writing Pi programs for '70s and '80s era systems is a hobby of
> mine. I've got hand optimized versions in assembly for about 10
> different CPUs. I've posted my CP/M 8080 and z80 versions here:
> https://github.com/datajerk/pies. I also included 16bit forth and CP/M
> C versions (you'll want to remove the ASM code that I used to query the
> time from the Apple II clock while benchmarking on a CP/M card).

Thanks. In fact I only discovered your work and an earlier version of the 8080
asm a few days ago!

PI.C from the above site looks promising. Some quick tests suggest it requires
less memory (allowing more digits) than the C algorithm used for mine. It also
runs about twice as fast. I did a 20,000 digit run and the output was correct up to
the last digit or so.

Your 8080 asm is very speedy. Well done! (Perhaps someone will convert the
source to DRI MAC :)

'piatan.16bit.forth' however has problems. It goes wrong after about 1750 digits
using a 16-bit Forth.

I'll take a look at converting your PI.C.

For anyone interested in a 'PI data reformatter', here's one for MS-DOS and CP/M.
http://www.netbay.com.au/~dxforth/pifmt10.zip
I find it indispensible when comparing output of PI generators against a reference
data set.

Egan Ford

unread,

Dec 8, 2014, 11:38:10 AM12/8/14

to

On 12/7/14, 10:31 PM, Ed wrote:
> PI.C from the above site looks promising. Some quick tests suggest it requires
> less memory (allowing more digits) than the C algorithm used for mine. It also
> runs about twice as fast. I did a 20,000 digit run and the output was correct up to
> the last digit or so.

I didn't look at your code. Are you computing with base 10 or base 256?
Base 256 will consume less than 1/2 the RAM and run about 2x faster
(including conversion to base 10 for output).

> Your 8080 asm is very speedy. Well done! (Perhaps someone will convert the
> source to DRI MAC :)

Follow the instructions at the top and change dec_len/bin_len if you
want to test 20000+ digits. It scales at O(n^2). Estimating time will
be easy.

> 'piatan.16bit.forth' however has problems. It goes wrong after about 1750 digits
> using a 16-bit Forth.

I only tested up to 1000 digits. Since I've switched my focus to
assembly only for now, I'm probably not going to work on Forth versions.

Ed

unread,

Dec 10, 2014, 5:21:06 AM12/10/14

to

Egan Ford wrote:
> ...

> I didn't look at your code. Are you computing with base 10 or base 256?
> Base 256 will consume less than 1/2 the RAM and run about 2x faster
> (including conversion to base 10 for output).

I use Base 100 (the original 32-bit C implementation used Base 10000).

I've since simplified my Forth implementation which boosted the speed by 60% !
I'm truly puzzled why the CP/M C binaries run so slow. Here are my timings
for a 1000 digit run using MYZ80 and Pentium 200.

DX-Forth 4.09 61s
Hi-Tech C 3.09 200s
Aztec-C 1.06d 420s

I'm not a C programmer so it's possible my conversion to 16-bit isn't optimum.
The updated source/binaries are here if anyone wants to look into it.
http://www.netbay.com.au/~dxforth/pi2_cpm.zip

> > 'piatan.16bit.forth' however has problems. It goes wrong after about 1750 digits
> > using a 16-bit Forth.
>
> I only tested up to 1000 digits. Since I've switched my focus to
> assembly only for now, I'm probably not going to work on Forth versions.

I'll have a go and report back when I have some results.

Egan Ford

unread,

Dec 10, 2014, 1:48:08 PM12/10/14

to

On 12/10/14, 3:21 AM, Ed wrote:
> I've since simplified my Forth implementation which boosted the speed by 60% !
> I'm truly puzzled why the CP/M C binaries run so slow.

My experience with Forth vs. C on the 6502 is that Forth is faster,
however modern C cross compilers have been producing code almost as fast
as Forth.

IMHO, Forth is closer to the metal. As you write your Forth program you
can mentally see how it would convert to assembly. I speculate that C
and other compiled higher level languages have additional overhead
optimized for code quality and programmer productivity.

I have a love/hate relationship with Forth. I love how its like RPN and
the use of the stack. However I hate reading and debugging my own Forth
code--I'm a poor Forth programmer. Hate won out and I switched to assembly.

My Aztec C version using base 65536 on a 2 MHz z80 takes 5142 sec for
1000 digits vs. 107 sec using assembly. The C version took me a few
minutes to find online and convert and run. The assembly version took
me a very long day to learn z80 assembly and to develop, test, and
optimize the code.

There are a few great articles in Byte magazine that test various
compilers and languages with a Sieve benchmark. It's worth reading.
Specifically BYTE 1981/09 and 1983/01. I've got the articles handy if
you have trouble finding them.

Jon Saxton

unread,

Dec 10, 2014, 9:35:16 PM12/10/14

to

Ed:

Coincidentally just yesterday I got Egan's assembly language Pi
calculator working. It was written for some assembler which I don't
have so it required some manual tweaking. (I also converted it to Zilog
mnemonics but I did not introduce any Z80-specific instructions so it
should still run on an 8080.) Under yaze-ag running on a 2.3 GHz host I
get timings of roughly 1 to 1.5 seconds for 1000 digits (including
display time).

If I remember to do so, I'll try it on the old Tesseract computer next
week when I get home. That has a 4 MHz Z80 and the timings on that will
probably be more relevant.

--
Jon Saxton
Former sysop of Tesseract RCPM+ which operated in the 1980s from Australia.
http://triton.vg/TesseractRCPM+Catalog.html

Egan Ford

unread,

Dec 11, 2014, 10:48:19 AM12/11/14

to

On 12/10/14, 7:35 PM, Jon Saxton wrote:
> I also converted it to Zilog mnemonics but I did not introduce any
> Z80-specific instructions so it should still run on an 8080

Did you convert the 8080 or z80 version? The z80 version does have z80
specific instructions, e.g. block move.

My 4MHz z80 benchmarks:

8080 code on z80: ~65 sec
z80 code on z80: ~53 sec

If you send me your source converted versions I can add to my github
repo for others.

Ed

unread,

Dec 12, 2014, 7:17:30 PM12/12/14

to

Egan Ford wrote:
> ...

> I have a love/hate relationship with Forth. I love how its like RPN and
> the use of the stack. However I hate reading and debugging my own Forth
> code--I'm a poor Forth programmer. Hate won out and I switched to assembly.

As with anything new, it just takes practice. At some point Forth 'clicks' after
which it becomes effortless. IMO anyone who can do assembler and enjoy it,
will have no trouble with Forth. They're very similar in that they both deal with
basic operations on addresses.

> There are a few great articles in Byte magazine that test various
> compilers and languages with a Sieve benchmark. It's worth reading.
> Specifically BYTE 1981/09 and 1983/01. I've got the articles handy if
> you have trouble finding them.

I'll look them up.

Ed

unread,

Dec 12, 2014, 7:17:32 PM12/12/14

to

Ed wrote:
> ...

> I'll have a go and report back when I have some results.

I've converted my Forth code to Base 2^16 per the C version. This eliminated
some divisions and reduced array sizes. No wonder it runs faster! I didn't bother
with 'skip zero' optimization. Now for the results (which excludes conversion/
print time):

1000 digit run

DX-Forth 4.09 17s
Hi-Tech C 3.09 81s
Aztec C 1.06d 67s

This time Aztec performed faster than Hi-Tech compared with Base 100.
I'm beginning to suspect the speed difference between the C and Forth versions
may be due to the multiply/divide primitives each employ...

Instead of issuing new binaries, I'll just post the revised DX-Forth source (below).
Source for the C version may be found on the website Egan mentioned earlier.

\ PI.F
\
\ Revision 4, 2014-12-13 es
\
\ Compute Pi to an arbitrary precision. Uses Machin's
\ formula: pi/4 = 4 arctan(1/5) - arctan(1/239)
\
\ Compile with 16-bit DX-Forth: FORTH - INCLUDE PI.F BYE
\
\ Tested to 35,000 digits
\
\ Acknowledgements:
\
\ Roy Williams, Feb 1994
\ J. W. Stumpel, May 1991
\ E. Ford, Aug 2009
\
\ This code is PUBLIC DOMAIN. Use at your own risk.

empty forth definitions decimal application

0 value nblock
0 value tot
0 value t1
0 value t2
0 value t3

: copy ( from to -- )
nblock cells move ;

\ : zero? ( result -- f )
\ 0 begin
\ 2dup cells + @ if 2drop 0 exit then
\ 1+ nblock over =
\ until 2drop 1 ;

: zero? ( result -- f )
nblock cells 0 skip nip 0= ;

: set ( rhs result -- )
dup nblock cells erase ! ;

0 value tmp
0 value res
variable carry

: add ( result increm -- )
to tmp
0 carry !
0 nblock 1- do
i cells over + dup to res @ 0
i cells tmp + @ 0 d+ carry @ m+
( hi) carry ! ( lo) res !
-1 +loop drop
;

: sub ( result decrem -- )
to tmp
0 carry !
0 nblock 1- do
i cells over + dup to res @ 0
i cells tmp + @ 0 d- carry @ m+
( hi) carry ! ( lo) res !
-1 +loop drop
;

: mult ( result factor -- )
to tmp
0 carry !
0 nblock 1- do
i cells over +
dup >r @ tmp um* carry @ m+
carry ! r> !
-1 +loop drop
;

: div ( result denom -- )
to tmp
0 carry !
nblock 0 do
i cells over +
dup >r @ carry @ tmp um/mod
r> ! carry !
loop drop
;

0 value onestep
0 value denom
0 value denom2
0 value w1
0 value w2
0 value result

: arctan ( result w1 w2 denom onestep -- )
to onestep to denom to w2 to w1 to result

denom dup * to denom2

1 result set

result denom div
result w1 copy

1 ( k)

begin
onestep if
w1 denom2 div
else
w1 denom div
w1 denom div
then
w1 w2 copy
w2 over ( k) 2* 1+ div
( k) dup 1 and if ( odd)
result w2 sub
else
result w2 add
then
( k) 1+
w2 zero?
until drop
;

0 value ndigit

: print ( result -- )
dos-io ( allow output redirect )
dup @ 0 .r [char] . emit cr
ndigit 0 do
0 over !
dup 10000 mult
dup @ 0 <# # # # # #> type
out @ 63 > if cr then
4
+loop drop cr
;

: allocate ( -- )
nblock cells allot ;

: pi ( u -- )
dup to ndigit
\ calc array size: (ndigit / log10(2^16)) + 1
109 um* 525 um/mod nip 1+
\ plus extra for accurate last digits
2 + to nblock

\ allocate work arrays
here to tot allocate
here to t1 allocate
here to t2 allocate
here to t3 allocate

\ compute PI
tot t1 t2 5 1 arctan
tot 4 mult
t3 t1 t2 239 0 arctan
tot t3 sub
tot 4 mult

\ show result
tot print
;

: main ( -- )
\ get #digits from command-line
cmdtail ( adr n ) dup if
0. 2swap >number 2drop drop
else
2drop 100
." Usage: PI #digits" cr
." (Assuming " dup . ." digits)" cr
then
( n ) pi
;

cr .( Save to disk? ) y/n [if]
turnkey main pi
[then]

\ end

Egan Ford

unread,

Dec 13, 2014, 9:07:19 AM12/13/14

to

On 12/12/14, 5:17 PM, Ed wrote:
> I'm beginning to suspect the speed difference between the C and Forth versions
> may be due to the multiply/divide primitives each employ...

Disassemble both and take a look. ~80% of the clock cycles are just divide.

Ed

unread,

Dec 17, 2014, 9:34:29 AM12/17/14

to

That's more than I expected but it would explain the results - namely that
Forth's mixed operators (32x16) are better suited to the PI program than
C's long operators (32x32).

retrogear

unread,

Dec 18, 2014, 12:36:08 AM12/18/14

to

Well, here's some kind of simpler pi calculations from Don Maslin's Imsai Imdos collection. I'm not a math whiz to know what it means:

56K IMDOS VERS 2.02
A>type pi.for

DOUBLE PRECISION PILOW,SLNGTH,PIUP,FACT
DOUBLE PRECISION K,SIDES,SUM,SSQ,TERM,TEMP
WRITE(5,198)
198 FORMAT(/1X,'BOUNDS ON PI - DOUBLE PRECISION BINOMIAL THEOREM VER
1SION'//' N SIDES SIDE LENGTH PI - LOWER BOUND PI-
2 UPPER BOUND')
SIDES=4.0
SUM=2.0
N=3
1 SIDES=2.0*SIDES
SSQ=SUM
SUM=0.0
TERM=.25*SSQ
K=1
2 TEMP=TERM+SUM
IF(TEMP.LE.SUM) GOTO 4
SUM=TEMP
FACT=(2.0*K-1.00)/(K+1.0)
TERM=FACT*SSQ*TERM/8.0
K=K+1.0
GOTO 2
4 SLNGTH=DSQRT(SUM)
PILOW=0.5*SIDES*SLNGTH
PIUP=SIDES*SLNGTH/(2.0-SLNGTH)
WRITE(5,200) N,SIDES,SLNGTH,PILOW,PIUP
IF(N.EQ.20) STOP
N=N+1
GOTO 1
200 FORMAT(1X,I3,F9.0,F15.6,2F19.12)
END

A>f80 =pi/m

$MAIN

A>l80 /p:100/d:2100,pi,forlib/s,pi.com/n/e

[0100 231E 35]

A>pi

BOUNDS ON PI - DOUBLE PRECISION BINOMIAL THEOREM VERSION

N SIDES SIDE LENGTH PI - LOWER BOUND PI- UPPER BOUND
3 8. .765367 3.061467458921 4.959315235374
4 16. .390181 3.121445152258 3.878006734963
5 32. .196034 3.136548490546 3.477392565633
6 64. .098135 3.140331156955 3.302370812490
7 128. .049082 3.141277250933 3.220307554543
8 256. .024543 3.141513801144 3.180543968220
9 512. .012272 3.141572940367 3.160968277095
10 1024. .006136 3.141587725277 3.151255641334
11 2048. .003068 3.141591421511 3.146417964326
12 4096. .001534 3.141592345570 3.144003766021
13 8192. .000767 3.141592576585 3.142797824426
14 16384. .000383 3.141592634339 3.142195142707
15 32768. .000192 3.141592648777 3.141893874079
16 65536. .000096 3.141592652387 3.141743257818
17 131072. .000048 3.141592653289 3.141667954200
18 262144. .000024 3.141592653515 3.141630303519
19 524288. .000012 3.141592653571 3.141611478460
20 1048576. .000006 3.141592653585 3.141602066002 STOP

A>

Larry G

Egan Ford

unread,

Dec 18, 2014, 1:33:05 PM12/18/14

to

On 12/17/14, 10:36 PM, retrogear wrote:
> I'm not a math whiz to know what it means

It's the same method Archimedes used in c. 250 BC with 96 sides to
accurately compute Pi to 2 decimal places.

Archimedes' Method of Exhaustion:

http://personal.bgsu.edu/~carother/pi/Pi3a.html

retrogear

unread,

Dec 18, 2014, 5:01:54 PM12/18/14

to

Looks like with this algorithm it took 2,048 sides for the bounds to match
to 2 decimal places. I'll try jacking up the value of N in IF(N.EQ.20) and see how accurate it can get before it implodes on itself :)

Larry

Ed

unread,

Dec 21, 2014, 6:17:24 PM12/21/14

to

Curious, I translated Bishop's 'Apple Pi' program verbatim to 16-bit Forth as I didn't
have a machine fast enough to run the original. On testing it failed at 2365 digits -
the same place where a similar generator, PI.XPL, also failed when compiled with
16-bits. According to the latter's author, the remainder of division 'can accumulate
a value up to 3277, which when multiplied by 10 exceeds the range of 16-bit signed
arithmetic.'

While Bishop's program allowed for 4096 digits of Pi, he generates only 1000
digits avoiding the limitation inherent in his implementation. He certainly knew
limitations existed ...

"If the number of requested digits for Pi is less than about 200, it is possible to

store two digits per byte; otherwise, each byte must contain no more than one digit.
(The reason for this distinction occurs in line 1070 [the division routine] where

an arithmetic overflow can occur when trying to evaluate higher order terms of the

series if too many digits are packed into each byte.)"

Presumably he originally tried base 100 arithmetic (for speed) but found it failed
after 250 digits and switches to base 10.

Charles Richmond

unread,

Dec 23, 2014, 11:29:36 AM12/23/14

to

"Ed" <inv...@nospam.com> wrote in message

news:m77ke1$2rr$1...@speranza.aioe.org...

I do *not* understand the problem with storing 2 digits per byte. The
program I wrote in C was written several years ago. I'm *not* sure what I
did to get it "wrong", but my program will calculate pi to 100,000 digits.
I know because I did that calculation and compared it to a known good value
for Pi. If you run the program, you should ask for 100,020 digits because
the last 2 or 3 digits are usually bad.

Perhaps you can take a look at my C code and tell me why it works. I
thought it was a straight translation of Bob Bishop's Apple Pi program. You
can find my program at:

http://www.aquaporin4.com/applepi

Charles Richmond

unread,

Dec 23, 2014, 11:29:36 AM12/23/14

to

"Ed" <inv...@nospam.com> wrote in message

news:m77ke1$2rr$1...@speranza.aioe.org...

I do *not* understand the problem with storing 2 digits per byte. The
program I wrote in C was written several years ago. I'm *not* sure what I
did to get it "wrong", but my program will calculate pi to 100,000 digits.
I know because I did that calculation and compared it to a known good value
for Pi. If you run the program, you should ask for 100,020 digits because
the last 2 or 3 digits are usually bad.

Perhaps you can take a look at my C code and tell me why it works. I
thought it was a straight translation of Bob Bishop's Apple Pi program. You
can find my program at:

http://www.aquaporin4.com/applepi

Ed

unread,

Dec 24, 2014, 3:54:06 AM12/24/14

to

Charles Richmond wrote:
> ...

> Perhaps you can take a look at my C code and tell me why it works.

It doesn't. Below is the output when compiled with Borland C++ 3.00
The only change made was MEMSIZE 8000 to ensure code and data
fit into 64K.

Output of 'applepi.c' using a 16-bit compiler:

3.
14159265358979323846264338327950288419716939937510
58209749445923078164062862089986280348253421170679
82148086513282306647093844609550582231725359408128
48111745028410270193852110555964462294895493038196
44288109756659334461284756482337867831652712019090
75670577504020389446149551452421011368786235269502
...

PI reference data:

3.
14159265358979323846264338327950288419716939937510
58209749445923078164062862089986280348253421170679
82148086513282306647093844609550582231725359408128
48111745028410270193852110555964462294895493038196
44288109756659334461284756482337867831652712019091
45648566923460348610454326648213393607260249141273
...

Charles Richmond

unread,

Dec 24, 2014, 4:49:02 PM12/24/14

to

"Ed" <inv...@nospam.com> wrote in message

news:m7duva$6ig$1...@speranza.aioe.org...

You can *not* with any honesty *change* what I have in the code... and then
declare that my code does *not* work!!! Use a better machine with more
memory. Of course you have to have large enough arrays to support
calculating more digits!!!

If you directly compile my code on Linux and run it as is... it *will*
calculate Pi correctly to many digits!

Ed

unread,

Dec 26, 2014, 1:52:00 AM12/26/14

to

Charles Richmond wrote:
> "Ed" <inv...@nospam.com> wrote in message
> news:m7duva$6ig$1...@speranza.aioe.org...
> > Charles Richmond wrote:
> >> ...
> >> Perhaps you can take a look at my C code and tell me why it works.
> >
> > It doesn't. Below is the output when compiled with Borland C++ 3.00
> > The only change made was MEMSIZE 8000 to ensure code and data
> > fit into 64K.

> > ...
>
> You can *not* with any honesty *change* what I have in the code... and then
> declare that my code does *not* work!!!

I can if the change does not materially affect the outcome. The problem
is the 16-bit math - not the amount of memory.

> ...

> If you directly compile my code on Linux and run it as is... it *will*
> calculate Pi correctly to many digits!

Which doesn't help anyone who needs a Pi generator algorithm that
will work in a 16-bit environment such as CP/M.

In such an environment Bishop's program/algorithm is limited to 2315
digits (250 digits for your C version).

Charles Richmond

unread,

Dec 26, 2014, 4:26:39 AM12/26/14

to

"Ed" <inv...@nospam.com> wrote in message

news:m7j0ib$7qv$1...@speranza.aioe.org...

> Charles Richmond wrote:
>> "Ed" <inv...@nospam.com> wrote in message
>> news:m7duva$6ig$1...@speranza.aioe.org...
>> > Charles Richmond wrote:
>> >> ...
>> >> Perhaps you can take a look at my C code and tell me why it works.
>> >
>> > It doesn't. Below is the output when compiled with Borland C++ 3.00
>> > The only change made was MEMSIZE 8000 to ensure code and data
>> > fit into 64K.
>> > ...
>>
>> You can *not* with any honesty *change* what I have in the code... and
>> then
>> declare that my code does *not* work!!!
>
> I can if the change does not materially affect the outcome. The problem
> is the 16-bit math - not the amount of memory.
>

Ed, I have *used* the code I posted to calculate 100,000 decimal digits of
Pi. Your problem *is* the reduced memory. Okay, if you want a program to
calculate Pi on a machine with only 64k of RAM... perhaps you need a
different algorithm. But my code *does* work with the increased memory!
The problem is *not* the algorithm.

Ed

unread,

Dec 29, 2014, 6:15:43 PM12/29/14

to

Charles Richmond wrote:
> ...

> Ed, I have *used* the code I posted to calculate 100,000 decimal digits of
> Pi. Your problem *is* the reduced memory. Okay, if you want a program to
> calculate Pi on a machine with only 64k of RAM... perhaps you need a
> different algorithm. But my code *does* work with the increased memory!
> The problem is *not* the algorithm.

Sorry but in my estimation an algorithm which requires a 32-bit compiler
in order to be able to generate more than 250 digits of PI, is not worth
promoting.

Charles Richmond

unread,

Dec 30, 2014, 12:49:37 PM12/30/14

to

"Ed" <inv...@nospam.com> wrote in message

news:m7snar$dra$1...@speranza.aioe.org...

In my opinion, you are misled. I reduced the array size to 8000 bytes and
still could get 15,000 digits of Pi. I do *not* agree that this requires a
32-bit compiler. YMMV.

Ed

unread,

Jan 4, 2015, 9:31:54 AM1/4/15

to

Charles Richmond wrote:
> "Ed" <inv...@nospam.com> wrote in message

> ...

> > Sorry but in my estimation an algorithm which requires a 32-bit compiler
> > in order to be able to generate more than 250 digits of PI, is not worth
> > promoting.
> >
>
> In my opinion, you are misled. I reduced the array size to 8000 bytes and
> still could get 15,000 digits of Pi. I do *not* agree that this requires a
> 32-bit compiler. YMMV.

By all means try a 16-bit compiler. Do a run (300 digits is enough) and
post a dump of the output.

Egan Ford

unread,

Jan 6, 2015, 10:44:49 PM1/6/15

to

On 1/4/15 7:01 AM, Ed wrote:
> By all means try a 16-bit compiler.

I did. Or at least I think I did. Unsure what you mean by "16-bit
compiler". The 8080/z80 are 8-bit processors with 16-bits for
addressing. IIRC, every C compiler I've used on retro 8-bit systems has
16-bit ints and 32-bit longs.

I used Aztec C Vers. 1.06B 8080 (C) 1982 1983 1984 by Manx Software
Systems on a CP/M 2.2 system.

I had to rewrite memcpy and memset and remove the timer code and include
files to get it to build. It does fail as you have observed around 350
digits. However changing TEN=10 and MEMSIZE=2000 and then running
APPLEPI 2002 will produce 1000 decimal digits correctly. I had to
change the output code a bit to display properly.

The algorithm is sound.

Putting the code back to TEN=100 MEMSIZE=1000 and changing the following
two lines

from:

int digit = 0;
int quotient, residue;

to:

unsigned long int digit = 0;
unsigned long int quotient, residue;

Solves the problem of the original code failing around 350, at least for
1000 digits.

What about 2000 digits? Changed to MEMSIZE=2000. Works too.

I'll let you test 10,000 digits.

Ed

unread,

Jan 9, 2015, 11:51:52 PM1/9/15

to

Egan Ford wrote:
> ...

> I used Aztec C Vers. 1.06B 8080 (C) 1982 1983 1984 by Manx Software
> Systems on a CP/M 2.2 system.
>
> I had to rewrite memcpy and memset and remove the timer code and include
> files to get it to build. It does fail as you have observed around 350
> digits.
>
> However changing TEN=10 and MEMSIZE=2000 and then running
> APPLEPI 2002 will produce 1000 decimal digits correctly.

After 2315 digits it will begin to fail again. Only so much can be done
with 16-bit arithmetic.

> Putting the code back to TEN=100 MEMSIZE=1000 and changing the following
> two lines
>
> from:
>
> int digit = 0;
> int quotient, residue;
>
> to:
>
> unsigned long int digit = 0;
> unsigned long int quotient, residue;

Yes, though switching to 32-bit arithmetic necessarily slows things down.
At this point I'd ditch the idea of using decimal and switch to base 2^16 per
your own PI code.

To be fair to Bob Bishop, he was limited by the Apple 16-bit Basic. There is
one feature I've taken from his code - his use of 3 arrays rather than the 4 one
sees elsewhere. This allowed me to generate in excess of 45,000 digits. It
craps out at 45,808 digits - being the limit of the 32/16 bit arithmetic used.