You don't need hardware support for semaphores. It's useful and faster but as they are all in-order machines you can use Dekker's algorithm for serializing or a messaging protocol.
The S100 systems didn't use dual port RAM (too expensive and fancy I guess) but some of them could be kicked off their RAM (busreq) and then the host could map their RAM and do stuff and then unmap it and let it go. Some also had a write only register that enabled the relevant card so they used the same I/O or memory space otherwise to avoid losing a lot of space.
You ended up with something like
ld a,boardid ; 1 2 4 8 16 .. etc
out (SELECT),a ; selects one board - all latch the 8bits and a jumper determines what it responds too
; then does stuff.
; and for write only with care you could write FF and stuff all the boards at once 8)
the late era ones mapped the 64K of each card into different parts of the 24bit address space but that's a bit trickier with RC2014.
I would also turn it around slightly. I/O processors at least really want to be on an RC2014 bus so you can add cards to them. That would also help with the form factor problem but I guess might need bus drivers and cables. The other way of doing the I/O processors if it's two or less might be to do what I'm doing with the pending I/O extender - it's a horizontal card that plugs into the end slots of two backplanes one behind the other. I'm just recognizing a port and using buffers etc to do I/O transactions so that anything for that port the bus transaction becomes one for the upper 8bits as the low I/O 8bits on the other bus. (I figured 30 slots might be ambitious otherwise ;-)). However it ought to be possible to build a secondary CPU card that has it's own bus and sits on the main bus ?
I'm also not sure you need the YM2149A. A fast dedicated Z80 should be capable of doing sound better than YMF2149A chips - without anything but a raw DAC. The Russian hackers took it to the extreme with the General Sound and then NeoGS (it's audio is pretty much Amiga standard!)
The code is not even that scary
It's a bit more limited as without shared memory it was slow to upload data so you generally had to upload your .MOD files and bits into its local RAM before you got going.
Other thing you IMHO need is control of the secondary CPU reset, and a way to interrupt in both directions.
Multiprocessing Fuzix is still a concept only but I could certainly make use of a second I/O CPU to do low speed I/O (floppy, tape, serial - including line editing offload, parallel, maybe sound needs its own) and also one to do video acceleration given a suitable video system. Less sure about networking, that would require a lot of thinking to get right.
Alan