multiprocessor Zx80 on RC2014

686 views
Skip to first unread message

Bill Shen

unread,
Oct 22, 2019, 8:24:36 AM10/22/19
to retro-comp
Having found a cheap source of 4K dual port RAM (IDT71342 with hardware semaphores, about $2 from UTSource) and successfully built a video display board with it, I'm thinking of multiprocessor Z80 on RC2014 bus using the dual port RAM as the building block.  Architecturally it is a master Z80 controlling cluster of slave Zx80 (other processor families are possible, but lets stick with Zx80 for now) all sharing the same RC2014 backplane.  The number of slave processors are modest, in the order of 8 or less.  A 8-slot RC2014 backplane can have one master Z80 and 7 Zx80 slaves.  This means every board is a single-board Zx80 computer.  Before going to the trouble of multiprocessors, the single processor should run as fast as possible which is 20MHz for Z80.  Yet we have a number of 7.37MHz Z80 around, so a collective of Zx80 processors running at different speed for different applications are desirable.  Z80 has no test-and-set instruction, so additional hardware support is needed for resource sharing among different processors.  Being single board computer in RC2014 form factor also means the design needs to be compact, but this is for the RC2014 hobbyist community so it should use through-hole components.  So small, fast, flexible, hobbyist-friendly and cheap, that is not too much to ask. ;-)

Dual port RAM can realize some of the requirements. 
  *  It minimizes interface between processors; there is no need for arbitration and data/address buffers; each side of the RAM appears as an independent memory.
  *  Different processors running at different clocks can share the dual port RAM.
  *  Dual port RAM has hardware supported semaphores, specifically IDT71342, which is the synchronization mechanism between processors.  This is especially important because Z80 has no test-and-set instruction.
  *  The slave processor can boot off the dual port RAM by locating the 4K RAM at the base of memory map.  This saves the cost, space, and hassle of separate EPROM for each processor.
  *  The master Z80 access dual port RAM as I/O utilizing A8-A15 for additional address decode up to 64K of I/O space.  Theoretically 16 processors each with 4K of I/O can coexist on one RC2014 bus.
  *  Dual port RAM is fast; 35nS access time is readily available so it can handle 20MHz Z80 bus timing and Z180 at higher clock frequency.

A test case for such slave processor is a music player based on YM2149.  It'll be nice to have music playing in the background without interruption while we putter around with Z80 doing other things.  We already have a working design running at 7.37MHz, so shrink it to one board and we are there.  Such slave Z80 music player will have a dual port RAM, Z80, YM2149A, CPLD, and passive components and audio connector.  Seem do-able in the RC2014 form factor.

Another Zx80 slave is a Z180 dedicated to Ethernet operation; yet another Zx80 slave is a diagnostic monitor that watches the RC2014 bus; yet another another Zx80 slave is a 8-channel serial port server.

Thoughts?

  Bill

Alan Cox

unread,
Oct 22, 2019, 4:49:55 PM10/22/19
to retro-comp
You don't need hardware support for semaphores. It's useful and faster but as they are all in-order machines you can use Dekker's algorithm for serializing or a messaging protocol.

The S100 systems didn't use dual port RAM (too expensive and fancy I guess) but some of them could be kicked off their RAM (busreq) and then the host could map their RAM and do stuff and then unmap it and let it go. Some also had a write only register that enabled the relevant card so they used the same I/O or memory space otherwise to avoid losing a lot of space.

You ended up with something like

ld a,boardid  ; 1 2 4 8 16 .. etc
out (SELECT),a  ; selects one board - all latch the 8bits and a jumper determines what it responds too
; then does stuff.
; and for write only with care you could write FF and stuff all the boards at once 8)

the late era ones mapped the 64K of each card into different parts of the 24bit address space but that's a bit trickier with RC2014.


I would also turn it around slightly. I/O processors at least really want to be on an RC2014 bus so you can add cards to them. That would also help with the form factor problem but I guess might need bus drivers and cables. The other way of doing the I/O processors if it's two or less might be to do what I'm doing with the pending I/O extender - it's a horizontal card that plugs into the end slots of two backplanes one behind the other. I'm just recognizing a port and using buffers etc to do I/O transactions so that anything for that port the bus transaction becomes one for the upper 8bits as the low I/O 8bits on the other bus. (I figured 30 slots might be ambitious otherwise ;-)). However it ought to be possible to build a secondary CPU card that has it's own bus and sits on the main bus ?


I'm also not sure you need the YM2149A. A fast dedicated Z80 should be capable of doing sound better than YMF2149A chips - without anything but a raw DAC. The Russian hackers took it to the extreme with the General Sound and then NeoGS (it's audio is pretty much Amiga standard!)

The code is not even that scary


http://nedopc.com/gs/gs_prog.pdf (Google translate will convert it nicely if like me your Russian is a bit rusty ;-))

It's a bit more limited as without shared memory it was slow to upload data so you generally had to upload your .MOD files and bits into its local RAM before you got going.

Other thing you IMHO need is control of the secondary CPU reset, and a way to interrupt in both directions.

Multiprocessing Fuzix is still a concept only but I could certainly make use of a second I/O CPU to do low speed I/O (floppy, tape, serial - including line editing offload, parallel, maybe sound needs its own) and also one to do video acceleration given a suitable video system. Less sure about networking, that would require a lot of thinking to get right.

Alan

Mark T

unread,
Oct 22, 2019, 7:15:11 PM10/22/19
to retro-comp
I think Bill’s choice of dual port ram is to reduce the chip count compared to buffers. Also because he has some he wants to use :) that they have built in semaphore is an interesting extra feature to use.

Brief look at the idt spec shows they dont generate wait states for matching address access, which might allow it to run at 20 MHz. Z80 wait states are tricky at 20MHz, decoding from /MREQ to /WAIT is only 2.5ns using worst case timing. It might be better to treat /WAIT as DTACK instead, then if it does miss the setup time it just runs a little slow instead of failing to wait when it’s required.

I think it would probably need a separate bus for each processor, at least as an option, so that the slaves could have input/output and off load slower or more complicated io from the master.

If the aim is to support multiple slaves then an alternate bus could be used between the master and the slaves. Each slave could have its own rc2014 short bus, and then maybe something more suited to ribbon cable to connect to the master. Maybe z50 bus is possible on ribbon cable, or use the z80 pinout on 40 pin idc connectors.

Mark

Bill Shen

unread,
Oct 22, 2019, 9:56:42 PM10/22/19
to retro-comp
The main reason for using dual port RAM is cheap, reduce chip count, and serves as elastic buffer between different speed processors.   If I/O processor needs more expansion, I'm thinking of a daughter board plugs into the I/O processor board.  The main board can also be taller than the standard RC2014, something like 75mmX100mm for more components and connectors.  To start off I'm thinking of a prototype I/O processor consists of CPLD in PLCC44, dual port RAM in PLCC52, and Z80 in PLCC44 and a breadboard area.  It can also serves as bridge to a solderless breadboard similar to the RC2014-to-breadboard prototype in the attached photo.  It is certainly possible to drive a RC2014 bus as the I/O processor expansion bus.

The I/O processor will powered up with its RESET asserted since it has no valid code in the dual port RAM, so the main processor will need to load the dual port RAM with valid software and then release the RESET.  I'll need to think more about interrupt since the main processor (Z80) only has one interrupt.
  Bill


CPLD_bridge_yamaha_sound.jpg

Bill Shen

unread,
Oct 29, 2019, 8:30:13 AM10/29/19
to retro-comp
Another new design I sent off the JLCPCB last night is a prototype board of Z80 slave processor for RC2014 bus.  It consists of Z80, CPLD, 4kx8 dual port ram and breadboard space on a 3"x4" pc board.  The decoding is simple enough that a EPM7064S PLCC44 should be sufficient.  I added a compact flash port because program and data for the slave Z80 can come from the CF disk.  The CF disk is also a nice expansion port.
  Bill
jlcpcb_10-28-19c.png
DPRAM_scm_r0.pdf

Tadeusz Pycio

unread,
Oct 29, 2019, 6:40:16 PM10/29/19
to retro-comp
Hi Bill!

Take interest in Zilog Z8038 FIO, it is up for grabs and works with Z80 with little glue logic. It is an ideal solution for building small multiprocessor systems with their own bus. 74HCT646 is also suitable for simple connections, but more logic is needed.
If I deal with the backlog after my long break, I will follow this path with my Z8002 SBC.

Bill Shen

unread,
Oct 29, 2019, 9:49:23 PM10/29/19
to retro-comp
Z8038 looks interesting.  I do like dual port RAM partly because you can boot the slave Z80 from the dual port RAM; it offers greater programmability and save a EPROM chip.
  Bill

Tadeusz Pycio

unread,
Oct 30, 2019, 3:48:56 AM10/30/19
to retro-comp
True, these advantages are not in dispute. All that remains is the problem of the slave reporting termination of processing, data requests. Polling in a circle?

Bill Shen

unread,
Oct 30, 2019, 9:38:40 AM10/30/19
to retro-comp
Robust interprocessor communications are critical; Z80 has no test-and-set instruction, so I plan to use the hardware semaphores on the dual port RAM (IDT71342).  Frankly I don't know how well that'll work, this is why I'm doing a prototype board first.
  Bill

Alan Cox

unread,
Oct 30, 2019, 10:10:51 AM10/30/19
to retro-comp
On Wed, 30 Oct 2019 at 13:38, Bill Shen  wrote:
>
> Robust interprocessor communications are critical; Z80 has no test-and-set instruction

For reader/writer queues you don't need one with a simple CPU because
everything has one owner and everything has one order.

You end up something like

ld a,(hl)                         ; read so far (I own and update)
inc hl                             ; move to the queued ptr (sender
owns and updates)
cp (hl)                           ; nothing has been queued
ret z
; read byte
dec hl
inc (hl)                          ; we used a byte


and the writer is similar. It's a shade more complicated because you
need to disambiguate 256 bytes from 0 so the
writer has to stop one before.

Because only one person ever updates each pointer and the 8bits are
updated together you never need a lock. To get speed you also don't
put data in the queue, just indexing. So you might write a command one
way of say 'write to disk' and get a reply of 'wrote to disk' each
maybe only a byte long, because you have a fixed location buffer for
the data that you can just ldir/otir/inir with.

Bill Shen

unread,
Nov 8, 2019, 1:07:52 PM11/8/19
to retro-comp
Out of the two batches of board I received this week, the one I'm most interested in is the dual-port RAM prototype.  It is a deceptively simple looking board with slave Z80, 4K dual port RAM, CPLD, CF interface and prototype area.  The slave Z80 is started by writing bootstrap code into the dual port RAM and releasing its reset; communication with the slave is over the dual port RAM.  The slave is basically an intelligent I/O processor offloading the main Z80.   The slave may help the main Z80 with calculations; it may also be a diagnostic monitor like an angel watching over the main Z80.  The slave CPU does not need to be Z80/Z180/Z280.  I think there are many interesting possibilities.
  Bill
DSC_50961108.jpg

Bill Shen

unread,
Nov 21, 2019, 9:40:11 PM11/21/19
to retro-comp
This is the first project with the slave Z80.  It is a YM2149 player with a dedicated Z80.  The Z80 slave is initially held in reset waiting for program from the main processor.  Once the program & data are loaded via the dual port RAM, the slave Z80 is command to run.  The processor clocks are independent of each other; the main processor is running at 22MHz, but this Z80 slave is running at 7.37MHz.  I'm still waiting on LM386 amplifier to drive a pair of speakers.
  Bill

DSC_54101121.jpg
DSC_54111121.jpg

Bill Shen

unread,
Nov 23, 2019, 11:52:10 PM11/23/19
to retro-comp
Following the discussion about I2C with PCA9665 distract me from finishing the YM2149 player.  It reminds me of the Game of Life program running on Zuno a month ago that took 188mS for each generation of Life on a 22MHz Z80.  About 50ms of that is bit banging I2C bus.  So if I figure out how to split the program into two processors, I should speed up each generation by 50mS, possibly more if I can unload some of the calculation to the second Z80. 

So I prototype another dual-ported Z80 with bit-bang register for I2C.  Its job is to drive the 128x64 OLED screen via I2C.  It will receive graphic data from the main processor over the dual port RAM and bit bang it out to the I2C bus.  This should free up the main Z80 to do calculation intensive tasks.

This is a good exercise for multiprocessing architecture; exploring how to divide a job into parallel tasks and what hardware resources are needed to facilitate that.

  Bill
DSC_54161123.jpg
DSC_54171123.jpg

jopil

unread,
Nov 24, 2019, 8:12:06 AM11/24/19
to retro-comp
Brilliant & Exceptional, out-of-the-box, approach Bill. Well done.
John

Bill Shen

unread,
Nov 24, 2019, 9:34:49 AM11/24/19
to retro-comp
Thank you, a fan of multiprocessor!

The trick is how to write relocatable program for the coprocessor to be loaded by the main processor.  I use zmac to write assembly for the main processor and  use zmac's "phase" and "dephase" psuedo-directives to write the portion of code for the coprocessor.  Maybe there is a better way, but at least this seems to work.

In the attached program, the main processor loads the coprocessor's code into dual port RAM and release the reset to run.  The coprocessor initializes the 128x64 OLED and load the ASCII table and scroll the screen.  It all seems to work.

  Bill

PS, The picture is the dual-port coprocessor with 128x64 OLED plugs into Z80MB64's RC2014 expansion slot.  The attached code is executed by the Z80MB64.
DSC_54161124.jpg
i2c.asm

Alan Cox

unread,
Nov 25, 2019, 8:21:49 AM11/25/19
to retro-comp


On Sunday, 24 November 2019 14:34:49 UTC, Bill Shen wrote:
Thank you, a fan of multiprocessor!

The trick is how to write relocatable program for the coprocessor to be loaded by the main processor.  I use zmac to write assembly for the main processor and  use zmac's "phase" and "dephase" psuedo-directives to write the portion of code for the coprocessor.  Maybe there is a better way, but at least this seems to work.

If you use Z88DK you can produce self relocating binaries that you just pass the load address to an an argument (one of the few interesting architectural 'can't do' cases in Z80 is finding PC without knowing any fixed routines). For Fuzix I simply linked the program twice at 0x0000 and 0x0100 and wrote a tool that compares the binaries and deduces the relocations. I've since discovered it's how loads of 1970s/1980s stuff did it.


Alan

Bill Shen

unread,
Nov 26, 2019, 9:10:02 AM11/26/19
to retro-comp
Haha, it is working.  For each generation of life, the main processor scans the 128x64 array for birth & death and passes the results over dual port RAM for the coprocessor to bit bang over I2C bus to the OLED display.  The dual port RAM is not big enough to hold all graphic data, so the main processor passes 1/4 of the previous generation of data, process 1/4 of current generation of data, increment and repeat until all previous generation data is passed on and current generation processed; then the process starts all over again.

It takes 34 seconds for the 2nd gun to reach the edge.  With just one processor it takes 50 seconds, not quite cutting the time in half.  Since scanning the 128x64 array for birth & death takes the longest time, it should be possible to add another coprocessor to share that task, possibly cutting the time to 1/3.  Hmmm...that's the wonderful thing about multiprocessing--you add another dimension of variables to twiddle endlessly...
  Bill

DSC_54181126.jpg

Bill Shen

unread,
Nov 27, 2019, 12:50:39 PM11/27/19
to retro-comp
This is rather cool, if I do say so myself.  Here is three Z80 running in parallel.  The motherboard is Z80MB64, the board in front is OLED display coprocessor, the board at the back is math coprocessor.  Each coprocessor has separate block of addresses for their dual port RAM and their own reset & NMI registers.  All three are running off the 22MHz CPU clock.  The 3 processors split the Game of Life tasks so OLED display coprocessor is bit-banging graphic data over I2C bus; the algorithm for determining birth & death of each cell is computational intensive, so that task is splitted up between the main and math coprocessor.  The main processor does all the shuffling of data between the coprocessor and the display coprocessor.  I have it working between two processors, now I'm trying to get all three processors to work together.  The current consumption with everything running is around 350mA, 5V.
  Bill

DSC_54221127.jpg
DSC_54241127.jpg

Bill Shen

unread,
Dec 4, 2019, 9:04:25 AM12/4/19
to retro-comp
Adapter board for OX16C956, quad UART with 128-byte FIFO.  The lead pitch is 0.5mm, I wasn't sure I can hand solder it anymore, but I did managed OK this morning.  It is for a serial port co-processor that controls quad or octal serial ports.
  Bill

DSC_54461204.jpg

Bill Shen

unread,
Dec 11, 2019, 10:03:35 AM12/11/19
to retro-comp
Eight serial ports controlled by a Z80 slave over dual port RAM.  I've only prototyped one quad-channel OX16C954.  The other one is just a "prop".  So what should I call it, OctalSer, PorcuUART,  Gone8pe? 8peripheral-enhanced-Self-Hosted-Integrated-Transceivers?
  Bill
DSC_54571211.jpg

Colin MacArthur

unread,
Dec 11, 2019, 1:16:33 PM12/11/19
to retro-comp
I think "PorcuUART" is an accurate description. 
Reminds me of the old "DIGI" /   "Computone" boards I used for UNIX on a 286,,,

Computone.jpg


One is a QUAD port (dual SIO/2) and the other is an OCTAL port (quad SSC)...
CM
Reply all
Reply to author
Forward
0 new messages