Am9511A APU Module for RC2014

1,173 views
Skip to first unread message

Phillip Stevens

unread,
Aug 11, 2020, 7:12:31 AM8/11/20
to RC2014-Z80
For a while I've been promising myself to make an Am9511A APU Module for the RC2014.

To recap, the Am9511 was the first Arithmetic Processing Unit ever manufactured, from 1977, and the Am9511A came in 1979 (I think). It was licenced from AMD by Intel while they were getting their act together on floating point, and preceded their 8087 by a few years.

It is a 16 bit processor, with internal 16 bit data paths. It has an external 8 bit data bus. It was designed to work effectively with 8080 processors, but with a little bit of glue could work with any CPU from that era.

As I've noted elsewhere, the Am9511A device runs hot. Really hot. But, I don't think adding a heat sink makes any difference. There's certainly no clock throttling going on. And, the devices are designed and produced to military reliability standards (100% inspection), since that was the major market in the day. Running hot doesn't seem to bother them at all.

And so, it seems like now is the right time to have an Am9511A APU Module. The design needed to take care of a few issues.
  • The Am9511A runs at less than 3MHz, so to work directly of the RC2014 clock a 1/3 division circuit was required.
  • Also, since it needs a 12V supply, a tiny Pololu 12V boost converter was used. Faster and cheaper than building my own.
I laid out the board a few weekends ago, and sent it off to OSH Park, whilst getting on with some software.

Am9511_Module.png


51a4b3ecb3dbf5d85a14f683cd84951c.png



And today, I've finished PR for the Am9511 maths library in z88dk, for the Am9511A APU Module.
Using the APU is now completely seamless from C (and assembly).

From your build command simply choose --am9511 instead of --math32 to link the Am9511A APU accelerated floating point library.

IMG_0841.jpg


So, how does it perform?
Well glad you asked. First tests using the Whetstone benchmark (hand timed, because emulators don't do the Am9511A), I'm seeing about a 3x speed up.
The benchmark takes about 91 seconds with a standard RC2014 (using math32), and about 28 seconds using the Am9511A.

I'll share the Module Eagle files shortly (once I've done some tidying).

And, I also committed as a stretch goal to integrate support for MS Basic 4.7.
What was I thinking?

Cheers, Phillip
RC2014_AM9511.pdf

karlab

unread,
Aug 11, 2020, 7:33:46 AM8/11/20
to RC2014-Z80
Phillip
It looks nice!
Are you selling as a kit?
The chip is supported by Turbo Pascal, so that should be interesting to try out.
Maybe it is possible to water-cool the chip ;)
Cheers
Karl

Phillip Stevens

unread,
Aug 11, 2020, 7:42:52 AM8/11/20
to RC2014-Z80
karlab wrote:
It looks nice!

Thank you ;-)
 
Are you selling as a kit?

Unsure. Postage from down-under is a killer for most kits. So may not be practical. But, I'm looking into it.
If I get the stretch goal completed, an alternative UK supply may become available.

In any case the Gerbers, and design files will be available very shortly.
 
The chip is supported by Turbo Pascal, so that should be interesting to try out.

Hmm. Interesting. Would be interesting to see how the Am9511A is addressed in Turbo Pascal. Any hints?

I've written the z88dk library to minimise the hardware dependencies into a few routines, so that we could (relatively easily) support both the RC2014, and the DAI, and other retro machines with the same library.

Maybe it is possible to water-cool the chip ;)

Yes. A new RC2014 kit idea...

p.

Alan Cox

unread,
Aug 11, 2020, 7:46:29 AM8/11/20
to rc201...@googlegroups.com
And so, it seems like now is the right time to have an Am9511A APU Module. The design needed to take care of a few issues.
  • The Am9511A runs at less than 3MHz, so to work directly of the RC2014 clock a 1/3 division circuit was required.
  • Also, since it needs a 12V supply, a tiny Pololu 12V boost converter was used. Faster and cheaper than building my own.

Nice - I'd prototyped my own but the 12v was a bit of an issue - keeping it on the card with the module is a really elegant solution and avoids having 12v floating around the boards too much.

Alan

Phillip Stevens

unread,
Aug 12, 2020, 12:21:23 AM8/12/20
to RC2014-Z80
Phillip Stevens wrote:
For a while I've been promising myself to make an Am9511A APU Module for the RC2014.
And so, it seems like now is the right time to have an Am9511A APU Module. The design needed to take care of a few issues.
  • The Am9511A runs at less than 3MHz, so to work directly of the RC2014 clock a 1/3 division circuit was required.
I'll share the Module Eagle files shortly (once I've done some tidying).

OK. I've done some tidying up.
  • Added diodes on the /WAIT and /NMI lines to enable multi-APU configurations. (The /NMI diode should be omitted for normal use, disconnecting the NMI interrupt).
  • Use the spare NOR gate to better qualify /CS with /MREQ.
  • Pushed some things around to tidy up.
The Eagle (and Gerber) files are attached, so that other suppliers' PCB services can be tested.

RC2014_AM9511A.png 13e9653657fe804b935ede2369168139.png


Enjoy,
Phillip
RC2014_AM9511_v2.zip
RC2014_AM9511A.pdf

Mark T

unread,
Aug 12, 2020, 12:55:51 AM8/12/20
to RC2014-Z80
Hi Philip,

Why did you use the nor gate to qualify mreq, rather than to eliminate the diode logic?

Mark

Phillip Stevens

unread,
Aug 12, 2020, 1:16:19 AM8/12/20
to RC2014-Z80
Mark T wrote:
Why did you use the nor gate to qualify mreq, rather than to eliminate the diode logic?

The /CS is pretty noisy and, as it can’t be qualified with /IORQ, I thought to use the NOR gate on /MREQ to quiet it. Perhaps reducing unnecessary selection will help it to run cooler? It doesn’t really matter though. Works without the tighter qualification.

Using the diodes on /WAIT and /NMI is optional. 99% of the time wire would be a better solution. The /WAIT diode is just to enable a multi-APU System Option. Just joking about with an 8 APU Cray like system as a concept, following Spencer’s 3D flex motherboard. The /NMI diode should be omitted anyway, unless someone wants to write an interrupt driver.

TLDR, just putting logic where it is more useful more often.

Cheers, Phillip


Randy Mongenel

unread,
Aug 12, 2020, 1:38:38 AM8/12/20
to rc201...@googlegroups.com
Hi Philip,

I checked on the OSH Park link, and it's describing the board as a 4-layer board, which costs $77. Was the board uploaded with incorrect information?

-Randy



--
You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/rc2014-z80/1f18fada-4659-4042-894e-5257214096a7o%40googlegroups.com.

Phillip Stevens

unread,
Aug 12, 2020, 2:34:09 AM8/12/20
to RC2014-Z80
Randy wrote:
I checked on the OSH Park link, and it's describing the board as a 4-layer board, which costs $77. Was the board uploaded with incorrect information?

Yes. I did it as a four layer board. The upload is correct.
OSH Park prices make me wince every time. Especially now USPost is unreliable. Perhaps Seeed or Elecrow would be cheaper?

No reason why it couldn’t be redone as a two layer board, if you wanted to do that.

Cheers, Phillip
 

Phillip Stevens

unread,
Aug 12, 2020, 3:53:21 AM8/12/20
to RC2014-Z80
Just to note that it is 3x for $77. Still expensive, if you’re not sharing.

P.

Randy Mongenel

unread,
Aug 12, 2020, 4:06:02 AM8/12/20
to rc201...@googlegroups.com
Yeah, I am aware it is for 3 boards. The 2-layer version of a board that size is half the price. $10/sq.in. vs $5/sq.in. Was there a specific reason you went with 4-layer? The schematic is pretty simple and the board doesn't seem like it would have RF issues.

I think the board is fantastic, and it's nice to see your 9511 work come together this way.

-R

On Wed, Aug 12, 2020 at 12:53 AM Phillip Stevens <phillip...@gmail.com> wrote:
Just to note that it is 3x for $77. Still expensive, if you’re not sharing.

P.

--
You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.

mik...@houseofmyrrh.org

unread,
Aug 12, 2020, 4:28:16 AM8/12/20
to rc201...@googlegroups.com

Bulk is usually less expensive. So, if several (enough) of us pre-ordered and you had the boards made…

 

A bare board wouldn’t cost that much to mail, though it’d probably take a week to get here in the US.

 

Us state-siders can get the parts directly given a good parts list.

 

So, that is an idea other than several of us trying to replicate what you’ve done.

 

Also, I was wondering if anyone has done any work with the 8087?

 

Are there any patches for CP/M or Basic to make use of the APC?

I saw a video that showed that with the 8087 a system actually slowed slightly, except for math. 😊


Mike Sr.

--

You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.

Phillip Stevens

unread,
Aug 12, 2020, 4:41:01 AM8/12/20
to RC2014-Z80
Randy Mongenel wrote:
Yeah, I am aware it is for 3 boards.

Ok, just checking. I don't think OSH Park mentions that on the first link.
 
Was there a specific reason you went with 4-layer? The schematic is pretty simple and the board doesn't seem like it would have RF issues.

Yes, you're right that the schematic is pretty simple, and I did consider redoing it as a 2-layer board.

When I'm building something my philosophy is to keep it simple (from my point of view) and to that end using 4-layers makes routing much easier.
It also saves me hours of my time, because I can't help trying to make the routing "beautiful". Fewer traces means less OCD.

I think the board is fantastic, and it's nice to see your 9511 work come together this way.

Thanks. I'm most happy now that using it with the RC2014 is seamless.

Getting a 3x speedup on the Whetstone result is a great outcome imho, and that is ignoring that the Am9511A can also do integer maths.

I'm planning on adding integer maths to the library, shortly, so that will be an interesting test for the rest of the benchmarks.

P.

Phillip Stevens

unread,
Aug 12, 2020, 4:57:34 AM8/12/20
to RC2014-Z80
mike_sr wrote:

Bulk is usually less expensive. So, if several (enough) of us pre-ordered and you had the boards made…

A bare board wouldn’t cost that much to mail, though it’d probably take a week to get here in the US.


Yes, it could be dropped into a normal letter sized envelope. That's a good thought (not quite with it today).
 

 Us state-siders can get the parts directly given a good parts list.

So, that is an idea other than several of us trying to replicate what you’ve done.


There may be someone in USA, who's prepared to coordinate an order? That would also reduce postage and save time.
I'm very worried about the political situation with regards to USPostal currently, as that seems to be affecting international deliveries.

 Also, I was wondering if anyone has done any work with the 8087?


I saw Marten did a MC68882 FPU. I'm quite interested in that too, as it would suit the Z180 clock frequencies much better.
 

 Are there any patches for CP/M


Yes works already. No need to patch. CP/M doesn't use floating point itself, so nothing to add there.
I was doing my testing for the RC2014 using CP/M, so from that point of view it is already "made for CP/M".

An example command line looks like this...

zcc +rc2014 -subtype=cpm -v --list -m -SO3 --am9511 --max-allocs-per-node200000 @ftest.lst -o ftest -create-app

The resulting ftest.bin is then uploaded using XMODEM, or the ftest.ihx uploaded using PIP and converted using LOAD or MLOAD.
 

or Basic to make use of the APC?


Stretch goal. Converting MS Basic to talk to the Am9511A.
Soon, I hope.

Cheers, Phillip

mik...@houseofmyrrh.org

unread,
Aug 12, 2020, 5:10:23 AM8/12/20
to rc201...@googlegroups.com

*ALL*

How many, contingent upon final bare board costs, would be interested in being part of a bulk order.

 

We’d need some one to coordinate the payment process because I don’t have any way to do that.

 

Also, risk or no risk, I’d say pay before the boards are ordered to reduce Phillips risk, he can provide shipping information for a heavy First Class letter to N.Y.

Also, some “tip” to Phillip for his efforts and time involved.

 

As for the political situation, that could go “south” on either side of the “big pond” in a heartbeat, so anything we do has to be done accepting the risk without recrimination to Phillip if something goes wrong.

 

Right now 1 US Dollar is worth 1.4 Aussie Dollars (If I remember where Phillip is from correctly) My drop dead value is $25.00 US

 

So, we give it a week and see who responds, and whether we are in WW3 or not.

 

Phillip, I know this was a labor of love, but my Bible tells me that a Laborer is worthy of their hire. You need to be “meditating” on what compensation (“tip”) you feel is adequate for your efforts.

 

0509 and typing without my glasses. 😊

 

Mike Sr.

 

From: rc201...@googlegroups.com <rc201...@googlegroups.com> On Behalf Of Phillip Stevens
Sent: Wednesday, August 12, 2020 4:58 AM
To: RC2014-Z80 <rc201...@googlegroups.com>
Subject: Re: [rc2014-z80] Re: Am9511A APU Module for RC2014

 

mike_sr wrote:

--

You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.

Phillip Stevens

unread,
Aug 12, 2020, 5:21:36 AM8/12/20
to RC2014-Z80
mike_sr wrote:

*ALL*

How many, contingent upon final bare board costs, would be interested in being part of a bulk order.

 
I'd say wait until I get the revised design built up, at least, before ordering.
Just to remove all risk that I've tidied in a new error...

My drop dead value is $25.00 US.


Should be easy to meet. Even the most expensive OSH Park option is only that much per board.
 

So, we give it a week and see who responds, and whether we are in WW3 or not.


Yes, be good to collect a list of interested parties.
There may be another option (kits with blue boards) which opens up if the interested parties list is long enough.
 

0509 and typing without my glasses. 😊


:-) know that feeling.

P.

Phillip Stevens

unread,
Aug 13, 2020, 10:00:45 AM8/13/20
to RC2014-Z80
Phillip Stevens wrote:
For a while I've been promising myself to make an Am9511A APU Module for the RC2014.
And so, it seems like now is the right time to have an Am9511A APU Module.

IMG_0841.jpg

 
Been playing with my new shiny toy, and running a few benchmarks.

It is difficult to get precise timing, because the z88dk-ticks emulator doesn't (yet) emulate the Am9511.
But, given the nature of the results, it doesn't really matter for hand timing even plus or minus a second on any result.

There are three results all using the sdcc compiler and the z88dk new C library for consistency, with a standard 7.3728MHz z80 RC2014.
  1. using the math48 library, which is the z88dk default maths library and was for some time the only option. math48 is proven, accurate, and has been used for a long time. When using SDCC, the additional accuracy of the 40-bit mantissa is discarded in the conversion to IEEE-754 24-bit mantissa, being the native floating point format.
  2. using the math32 library, which is a new IEEE-754 based maths library. Generally this library is substantially faster than math48 because it calculates fewer mantissa bytes, but not in every case. It also uses the native z180 and z80n hardware multiply instructions when they are available for a big performance bump.
  3. using the am9511 library, the library in question which is closely related to math32, but the back end is using the APU rather than soft floating point functions.
TLDR, Generally using the APU results in between 4x speedup over math48, and a 3x speedup over math32 (eg Whetstone & n-body) and at the least about a 2x speedup over math48 (eg fasta).

Some additional polishing in the classic library to do (to support the other machine types), but we're nearly there.

Cheers, Phillip

karlab

unread,
Aug 13, 2020, 10:10:05 AM8/13/20
to RC2014-Z80
Hi Phillip
From the schematics I see that you use some logic to clock down the APU.
Is this required? could the board be designed with its own clock signal?
Karl

Bill Shen

unread,
Aug 13, 2020, 10:16:15 AM8/13/20
to RC2014-Z80
Very cool board. The plastic 74 series probably disqualify it as mil std 883 board, although you can always argue that ceramic 54 series parts can be used instead.
Bill

Phillip Stevens

unread,
Aug 13, 2020, 10:22:00 AM8/13/20
to RC2014-Z80
karlab wrote:
From the schematics I see that you use some logic to clock down the APU.
Is this required?

Yes the Am9511A-1 can only run at less than 3MHz. I've tried overclocking it, but it simply doesn't work above 3MHz (on any examples I tried).
There is a Am9511A-4 version that runs up to 4MHz, but they seem to be made of unobtainium. I've never been able to source one, so I gave up.

So, yes. It is necessary to clock the APU to stay within spec.
 
could the board be designed with its own clock signal?

Absolutely, you could do that. It is asynchronous to the CPU, and it autocratically hangs on the CPU /WAIT line until it is ready to move on.
You could run it just shy of 3MHz and get a a bit more performance (500k cycles/second) if you wanted to.

I chose to do a 1/3 clock off the standard RC2014 clock, because it is an interesting circuit, and it does well enough at getting close to 3MHz.

Cheers, Phillip


Phillip Stevens

unread,
Aug 13, 2020, 10:37:30 AM8/13/20
to RC2014-Z80
 Bill Shen wrote:
The plastic 74 series probably disqualify it as mil std 883 board, although you can always argue that ceramic 54 series parts can be used instead.

I found it interesting when researching the Am9511, that the MIL-STD-883 appeared all over the datasheet. I think it just means that the production is 100% tested, rather than being sample tested. Right?

But, now you mention it, it would be really cool to try to build the Module with ceramic logic, to match the Am9511A.

Do you know whether there is a supplier who would source that kind of stuff? Or just the usual suspects?
P.

karlab

unread,
Aug 13, 2020, 11:29:18 AM8/13/20
to RC2014-Z80
So basicly the chip needs to run at 3Mhz or slower.
My question was more in the direction, you are using to chips the 74 and 107 to clock down the chip. 
Would it not be easier and cheaper to put a crystal oscillator (<3mhz) on the board to run the APU?
Karl

Bill Shen

unread,
Aug 13, 2020, 11:58:31 AM8/13/20
to RC2014-Z80
Mil 883 parts must be hermetically sealed, so epoxy encapsulated 74xxx are not acceptable; it involves 168 hours of burn-in and lots of paperwork. Chain of custody must be enforced so eBay is obviously unacceptable. It s so expensive, even military has waived the requirements. You can buy ceramic parts from grey market for the mil883 look, but it is not mil883 without traceable chain of custody.
Bill

Spencer Owen

unread,
Aug 13, 2020, 12:12:30 PM8/13/20
to RC201...@googlegroups.com
I think the biggest hindrance to getting this up to mil spec is the silkscreen saying "not for use in military, aerospace or nuclear research". No matter how good the chips are, how much burn in time, and paper trail, if something goes wrong and the PCB has that written on it, you have a lot of explaining to do! :)

Although I am now wondering if there's a market for an upgrade pack, which is a sticker saying "only" which can be placed over the "not" on the PCB! 

Spencer 

On Thu, 13 Aug 2020, 16:58 Bill Shen, <coinst...@gmail.com> wrote:
Mil 883 parts must be hermetically sealed, so epoxy encapsulated 74xxx are not acceptable; it involves 168 hours of burn-in and lots of paperwork.  Chain of custody must be enforced so eBay is obviously unacceptable.  It s so expensive, even military has waived the requirements.  You can buy ceramic parts from grey market for the mil883 look, but it is not mil883 without traceable chain of custody.
  Bill

--
You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.

Greg Holdren

unread,
Aug 13, 2020, 12:43:47 PM8/13/20
to RC2014-Z80
 
A mil spec assembled board would never use sockets for dip. 5400 parts soldered in. Yes.

Greg

Greg Holdren

unread,
Aug 13, 2020, 1:15:37 PM8/13/20
to RC2014-Z80


All parts are 100% tested. Commercial and Mil. Maybe I don't understand what you mean by "...production is 100%"

Greg

Phillip Stevens

unread,
Aug 14, 2020, 8:36:17 AM8/14/20
to RC2014-Z80
Greg Holdren wrote:
All parts are 100% tested. Commercial and Mil. Maybe I don't understand what you mean by "...production is 100%"

Phillip Stevens wrote:
I found it interesting when researching the Am9511, that the MIL-STD-883 appeared all over the datasheet. I think it just means that the production is 100% tested, rather than being sample tested. Right?

Reading the datasheet, it says "100% MIL-STD-883 reliability assurance testing". I took that to mean that on the contrary, other products might not be 100% tested, and rather be only sample tested. Perhaps a bad assumption?

Reminds me (somewhat off topic) of 6-sigma training courses I did back in the early 90's, where the story was told of the USA contractor specifying "shall contain no more than 6 parts per 1000 faulty components", and the Japanese supplier delivered 994 working components with a plastic bag stapled on top of the package containing the 6 faulty components (as per contract).

Wildly off topic, but are today's 74xxx commercial grade components 100% tested?
It would seem that the manufacturing technology is so simple that 100% testing would be an unnecessary cost?
Am9511 Arithmetic Processor.pdf

mik...@houseofmyrrh.org

unread,
Aug 14, 2020, 8:48:13 AM8/14/20
to rc201...@googlegroups.com

Testing “on die” might be an inconsequential cost towards 100%. Microscopy being  possible.

 

But, once the die is sliced, mounted and encapsulated we have the additional issue of grabbing each and every individual part, orienting it properly, inserting into a ZIF socket, running a test routine and then removing from the socket. Some of this happens anyway, but I am not sure how much slow down would occur even if they went wildly parallel in testing.

 

As for Mil-883, at this point with the obsolescence issues I’m not sure it really matters. Unless someone is planning on using the design in a military, aerospace or nuclear product? 😊

 

From my Cal Lab days I do know that repairing them can be a real challenge as some of the conformal coating can be hard to probe through and then getting it off to unsolder the part is fun, too.

And, after it’s tested good you have to restore the conformal coating. Yech.

 

Mike sr.

 

From: rc201...@googlegroups.com <rc201...@googlegroups.com> On Behalf Of Phillip Stevens
Sent: Friday, August 14, 2020 8:36 AM
To: RC2014-Z80 <rc201...@googlegroups.com>

--

You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.

Greg Holdren

unread,
Aug 14, 2020, 6:01:52 PM8/14/20
to RC2014-Z80

It appears that all Am9511 parts are "100% MIL-STD-883 reliability assurance testing" as the datasheet specifies. AMD probably decided to use the same flow for commercial too since the volume was low. I've seen other parts tested as MIL sold commercially too. No, MIL-STD-883 is the standard parts need to conform to for sale to military contractors. Has nothing to do with 100% testing in general or sampling. Here is a brief overview of the standard which covers, reliability, electrical/mechanical tests, documentation, test flow and more. https://en.wikipedia.org/wiki/MIL-STD-883

My exposure to MIL-STD-883 was through a peripheral view as a line maintenance tech at Intel and Signetics in the 80s fixing and calibrating wafer probers, final test handlers and testers. At Intel I worked in the programmable memories division which tested EPROMs for commercial and military use. At Signetics I worked in the military test division which tested all of their mil spec ICs. (Linear op amps, comparitors, 5400 series logic (probably 80% of the volume) to 8031/68K and 8x305 CPUs on the VLSI testers) Every single part was test from the lot. Bin 8 parts (failed category) got rejected and analyzed. Parts dropped on the floor was rejected etc. Docs with the lot written in black ink only with certain chars written a certain way to remove ambiguity. '5', 'S', '0', 'O' etc. Of course the mil parts got test at a wider temp and voltage range than commercial devices.

Yes, all parts are tested based on my experience and I don't think that has changed even for peanut parts. How can the semi company determine if the are no die mounting/damage during the process, bonding, lead frame issues? This is how the semi manufactures track trends and issues with the process from wafer fab to final test in handling of the part. Most of the failures are caught in wafer sort where the die can be discarded from the lot so no more cost goes into handling bad parts. Most failure at the final test are AC/DC failure vs the part not functioning. Speed grade sort is done in the final test stage too. I would not want to buy a 74HCT138 put in my board and scratch my head because the part was a catastrophic failure before leaving the factory door.

As far as the contract issue example with the Japanese company. Wow, Last time I'd work with those shysters. That company took the "no more than 6 bad parts out of 1000" to the next level. No, It is you send 100% good passing parts to us and yes there will be a certain amount of devices that will not meet spec or be just bad. This must not exceed 6/1000. Its not send 944 100% passing parts and add 6 bad parts. This is silly anyways since the 944 part sample will eventually have bad parts which then would violate the contract.

Greg

Phillip Stevens

unread,
Aug 15, 2020, 8:26:19 AM8/15/20
to RC2014-Z80
Phillip Stevens wrote:
Been playing with my new shiny toy, and running a few benchmarks.

It is difficult to get precise timing, because the z88dk-ticks emulator doesn't (yet) emulate the Am9511.
But, given the nature of the results, it doesn't really matter for hand timing even plus or minus a second on any result.

TLDR, Generally using the APU results in between 4x speedup over math48, and a 3x speedup over math32 (eg Whetstone & n-body) and at the least about a 2x speedup over math48 (eg fasta).


Added support for APU long and integer types today.
So that, when using the sdcc compiler, long and integer multiplies, divides and modulus calculations are done by the APU.

Ran spectral-norm benchmark (yawn)... with the integer & long support added.
Only the integer multiply is additionally activated by the new code (along with the floating point code of course).

Using sdcc/newlib - RC2014 Z80 @ 7.3728Mhz

math32
28 minutes 57 seconds
1.2742190

am9511
6 min 35 seconds
1.2742140

Admittedly math32 doesn't do this particular benchmark very well, never-the-less, pretty happy with the outcome.

P.

Phillip Stevens

unread,
Aug 16, 2020, 10:04:44 AM8/16/20
to RC2014-Z80
TLDR, Generally using the APU results in between 4x speedup over math48, and a 3x speedup over math32 (eg Whetstone & n-body) and at the least about a 2x speedup over math48 (eg fasta).

Added support for APU long and integer types today.
So that, when using the sdcc compiler, long and integer multiplies, divides and modulus calculations are done by the APU.

Ran spectral-norm benchmark (yawn)... with the integer & long support added.
Only the integer multiply is additionally activated by the new code (along with the floating point code of course).

Using sdcc/newlib - RC2014 Z80 @ 7.3728Mhz

math32
28 minutes 57 seconds

am9511
6 min 35 seconds

Optimisation is iteration, and sometimes you can just go too far.
integer support using the APU is actually noticeably slower than using the normal z88dk library fast integer maths routines.
So, I've backed that out, and @suborb has added some things to sccz80 compiler to allow support for long using the APU.

APU goodness now looks like this...

spectral-norm
RC2014 Z80 @ 7.3728Mhz

sdcc - am9511
6 min 10 seconds

sccz80 - am9511
5 min 28 seconds

dean.ne...@gmail.com

unread,
Aug 20, 2020, 11:17:48 PM8/20/20
to RC2014-Z80
Nice one - this is interesting and cool - i will add it to my ever growing list of things i want to build/do for the RC2014.  

Phillip Stevens

unread,
Aug 25, 2020, 5:43:02 AM8/25/20
to RC2014-Z80
Phillip Stevens wrote:
For a while I've been promising myself to make an Am9511A APU Module for the RC2014.

To recap, the Am9511 was the first Arithmetic Processing Unit ever manufactured, from 1977, and the Am9511A came in 1979 (I think). It was licenced from AMD by Intel while they were getting their act together on floating point, and preceded their 8087 by a few years.
It is a 16 bit processor, with internal 16 bit data paths. It has an external 8 bit data bus. It was designed to work effectively with 8080 processors, but with a little bit of glue could work with any CPU from that era.

Well the second board version is back from OSH Park, and I've built them up. Working as expected.


IMG_0864.jpg IMG_0866.jpg



I'll share the Module Eagle files shortly (once I've done some tidying).

The APU Module Eagle files and a set of Gerbers were attached, but are too big for Groups. Email me to Github @feilipu email for a copy, please
If you only want one board, perhaps hold fire because there may be a way to get a partial kit from a UK source shortly.
 
And, I also committed as a stretch goal to integrate support for MS Basic 4.7.
What was I thinking?

Well the stretch goal to have the APU working in MS Basic is now 90% done. All the standard maths calculations are now complete and working (+), (-), (*), (/), and (^ power).
As part of this process I've been learning how to drive MS Basic, and have also removed some of the cruft related to non RC2014 stuff, and added an integral HLOAD function to support Intel HEX uploading.

Uploading programs using the new HLOAD function manages all of the messy things previously necessary (like: adjusting the top of Basic memory, writing the correct program origin into the correct USR(x) jump location).
So using assembly/C, generated with z88dk or other source, from MS Basic is now just two steps...

HLOAD
ok

? USR(0)
Hello World
0

I'll write more on this in another post, but the link for NON-APU 32k RC2014 is here.

Now, back to the final 10% of fixing transcendental functions...

Cheers, Phillip

Phillip Stevens

unread,
Aug 28, 2020, 8:23:02 PM8/28/20
to RC2014-Z80
Phillip Stevens wrote:
The APU Module Eagle files and a set of Gerbers were attached, but are too big for Groups. Email me to Github @feilipu email for a copy, please
If you only want one board, perhaps hold fire because there may be a way to get a partial kit from a UK source shortly.
 
And, I also committed as a stretch goal to integrate support for MS Basic 4.7.
What was I thinking?

Well the stretch goal to have the APU working in MS Basic is now 90% done. All the standard maths calculations are now complete and working (+), (-), (*), (/), and (^ power).
Now, back to the final 10% of fixing transcendental functions...

And, they're fixed. A small oversight, that had me searching for a week. And somebody else had to see it.
Wood and trees.

And now it is done. 

With its companion for those without an APU Module, but with integrated HEX Upload, and many Z80 optimisations.

P.

Phillip Stevens

unread,
Sep 12, 2020, 8:12:57 AM9/12/20
to RC2014-Z80

Phillip Stevens wrote:
For a while I've been promising myself to make an Am9511A APU Module for the RC2014.

To recap, the AMD Am9511 was the first Arithmetic Processing Unit ever commercially produced, from 1977, and the Am9511A came in 1979 (I think). It was licensed by Intel as the 8231A while they were getting their act together on floating point, and preceded their 8087 by a few years. It is a 16 bit processor, with internal 16 bit data paths, with an external 8 bit data bus. It was designed to work effectively with 8080 processors, but with a little bit of glue could work with any CPU from that era.


IMG_0864.jpg IMG_0866.jpg


Enjoy, Phillip

Spencer Owen

unread,
Sep 25, 2020, 12:56:52 PM9/25/20
to rc201...@googlegroups.com
If you've been following along with this thread, but haven't read the RC2014 Newsletter yet, you'll be pleased to know that the AM9511A APU is now available on Tindie; https://www.tindie.com/products/semachthemonkey/rc2014-am9511a-apu-arithmetic-processor/

A big thank you to Phillip for all the work he has put in designing this module and for the help in bringing it to market.

Spencer

--
You received this message because you are subscribed to the Google Groups "RC2014-Z80" group.
To unsubscribe from this group and stop receiving emails from it, send an email to rc2014-z80+...@googlegroups.com.

Phillip Stevens

unread,
Sep 25, 2020, 9:15:56 PM9/25/20
to RC2014-Z80
Spencer Owen wrote:
If you've been following along with this thread, but haven't read the RC2014 Newsletter yet, you'll be pleased to know that the AM9511A APU is now available on Tindie; https://www.tindie.com/products/semachthemonkey/rc2014-am9511a-apu-arithmetic-processor/

Spencer, the Tindie price point at $11 for the PCB is well within the budget discussed above ($25), so well done.

In fact, since the price is half the budget, why not have two?

apu_multi_config.jpg


I've been writing multi APU routines now for z88dk, and they're now available (for assembly language only) for up to 4 APU in one RC2014 system.

For the Mandelbrot calculation, two APU is fairly optimal, as the real and imaginary components can be calculated simultaneously.
But for 2D vector calculations, I think 4 APU will be preferred for the matrix operations required.

Some Mandelbrot code below as an example of how to use two APUs.

;;;;;;; double calc


        ld hl
,z_0                       ; Extend 16 bit z_0 to 32 bit
        inc hl
        ld a
,(hl)
        add a
,a                         ; Put sign bit into carry
        sbc a
,a                         ; A = 0 if carry == 0, $FF otherwise
        inc hl
        ld
(hl),a
        inc hl
        ld
(hl),a


        ld hl
,z_1                       ; Extend 16 bit z_1 to 32 bit
        inc hl
        ld a
,(hl)
        add a
,a                         ; Put sign bit into carry
        sbc a
,a                         ; A = 0 if carry == 0, $FF otherwise
        inc hl
        ld
(hl),a
        inc hl
        ld
(hl),a


;       z_2 = (z_0 * z_0 - z_1 * z_1) / scale;
;       z_0 = z_2 + x;


;       z_3 = 2 * z_0 * z_1 / scale;
;       z_1 = z_3 + y;


        ld hl
,z_0
        call asm_am9511_0_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_PTOD           ; COMMAND for PTOD (push double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_DMUL           ; COMMAND for DMUL (multiply lower)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld hl
,z_0
        call asm_am9511_1_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_PTOD           ; COMMAND for PTOD (push double)
       
out (__IO_APU1_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_DADD           ; COMMAND for DADD (add double)
       
out (__IO_APU1_CONTROL),a       ; ENTER a COMMAND

        ld hl
,z_1
        call asm_am9511_1_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_DMUL           ; COMMAND for DMUL (multiply lower)
       
out (__IO_APU1_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_PTOD           ; COMMAND for PTOD (push double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld hl
,z_0_square
        call asm_am9511_0_popl_hl      
; POINTER TO RESULT IN HL

        ld hl
,z_1
        call asm_am9511_0_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_PTOD           ; COMMAND for PTOD (push double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_DMUL           ; COMMAND for DMUL (multiply lower)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld hl
,scale
        call asm_am9511_1_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_DDIV           ; COMMAND for DDIV (divide double)
       
out (__IO_APU1_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_PTOD           ; COMMAND for PTOD (push double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND
       
        ld hl
,z_1_square
        call asm_am9511_0_popl_hl      
; POINTER TO RESULT IN HL

        ld a
,__IO_APU_OP_DSUB           ; COMMAND for DSUB (subtract double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld hl
,scale
        call asm_am9511_0_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_DDIV           ; COMMAND for DDIV (divide double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_POPS           ; COMMAND for POPS (pop single)
       
out (__IO_APU1_CONTROL),a       ; ENTER a COMMAND

        ld hl
,y
        call asm_am9511_1_pushi_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_SADD           ; COMMAND for SADD (add single)
       
out (__IO_APU1_CONTROL),a       ; ENTER a COMMAND

        call asm_am9511_1_popi          
; RESULT IN HL
        ld
(z_1),hl  

        ld a
,__IO_APU_OP_POPS           ; COMMAND for POPS (pop single)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld hl
,x
        call asm_am9511_0_pushi_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_SADD           ; COMMAND for SADD (add single)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        call asm_am9511_0_popi          
; RESULT IN HL
        ld
(z_0),hl

;       if (z_0 * z_0 / scale + z_1 * z_1 / scale > 4 * scale)

        ld hl
,z_0_square
        call asm_am9511_0_pushl_hl      
; POINTER TO OPERAND IN HL

        ld hl
,z_1_square
        call asm_am9511_0_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_DADD           ; COMMAND for DADD (add double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld hl
,scale
        call asm_am9511_0_pushl_hl      
; POINTER TO OPERAND IN HL

        ld a
,__IO_APU_OP_DDIV           ; COMMAND for DDIV (divide double)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        ld a
,__IO_APU_OP_POPS           ; COMMAND for POPS (pop single)
       
out (__IO_APU0_CONTROL),a       ; ENTER a COMMAND

        call asm_am9511_0_popi          
; RESULT IN HL



Phillip Stevens

unread,
Dec 8, 2023, 8:27:44 PM12/8/23
to RC2014-Z80
On Saturday 26 September 2020, Spencer Owen wrote:
If you've been following along with this thread, but haven't read the RC2014 Newsletter yet, you'll be pleased to know that the AM9511A APU is now available on Tindie; https://www.tindie.com/products/semachthemonkey/rc2014-am9511a-apu-arithmetic-processor/

A big thank you to Phillip for all the work he has put in designing this module and for the help in bringing it to market.
Spencer

On Sat, 12 Sep 2020,, Phillip Stevens wrote:

Phillip Stevens wrote:
For a while I've been promising myself to make an Am9511A APU Module for the RC2014.

To recap, the AMD Am9511 was the first Arithmetic Processing Unit ever commercially produced, from 1977, and the Am9511A came in 1979 (I think). It was licensed by Intel as the 8231A while they were getting their act together on floating point, and preceded their 8087 by a few years. It is a 16 bit processor, with internal 16 bit data paths, with an external 8 bit data bus. It was designed to work effectively with 8080 processors, but with a little bit of glue could work with any CPU from that era.

IMG_0864.jpg IMG_0866.jpg

Enjoy, Phillip

Just to update on this Am9511a APU Module thread, I've ordered another set of 10x PCBs for the Am9511a APU Module. I'm expecting the new PCBs to arrive around the 12th December, and will ship them as soon as possible.

Recently, I made some minor modification to the PCB to remove support for the NMI interrupt (using NMI would conflict with CP/M, and I didn't implement code so it was redundant) and used the reclaimed space to add a high frequency bypass capacitor on the /WAIT line to speed up release of the CPU. The specification for the 12V boost converter was updated as Pololu released a new (slightly better) product. The hardware documentation was updated to reflect this.

See the rest of this thread on APU support from the z88dk C compiler, assembler, floating point library and development tools, and benchmarks vs software floating point on the z80 CPU.

There is an implementation of MS Basic for APU+Z80 available with identical characteristics to the MS Basic for Classic ][, or Mini ][, so this APU Module can be added to any RC2014 Kit as needed. If a standard backplane is used then the CPU Module /WAIT line will need to be connected to one of the USER pins, and a similar bridging is provided on the APU Module.

Cheers, Phillip
Reply all
Reply to author
Forward
0 new messages