I was trying to see if it might be possible to reduce the chip count to make it fit on a standard RC2014 size. I don't think I've achieved that but thought some of the ideas I tried might be interesting.
Modified EPROM enable so that when the EPROM is enabled on reset only EPROM can be read, and RAM can only be written. It should be possible then to copy the entire EPROM to RAM, then disable EPROM and continue running from RAM. This then only uses half of the 74HCT139.
Second half of 74HCT139 is used to separate the IO address for data and control register.
Control register is changed to 74HCT175 as this is a smaller package and provides inverted outputs.
As with Alan's design, there is no IO address decoding on the slave side. In order to separate EPROM and RAM page control I'm using the refresh cycle to latch the I register from A8 to A11, but only when A7 the most significant bit of the refresh register is low. This allows the page control to be disabled if IM2 is being enabled and I register contains the interrupt vector table base address. Using 74HCT173 as this has two gate controls for the clock input. After the LD I,A instruction to set the memory page there should probably be a NOP instruction as the contents of the page register will not be set until the following instructions refresh cycle is performed.
Interface between the two data busses is via a single 74HCT245, with wait controlled by the 74LS156. This is intended to work by inserting wait states to either master or slave processor attempting IO read from the interface, until the other processor performs an IO write, then the 74HCT245 would be enabled and the wait released. Wait states are not applied to IO writes so the processor that is reading data has to be set up to read before the other processor can start writing. This is partly the limit of using the 74LS156, but also depends on Z80 IO timing due to the wait state automatically added to all IO operations.
I think using wait in this way with no latch for the data would still require both processors to run from the same clock source, although due to the Z80 added wait state it may be possible to use up to 2x clock difference.
Decoding for the slave processor reading from the interface is using /SIORQ low and /SWR high, so the slave processor will also wait for data to be written by the master during an interupt acknowledge. I thought maybe the slave could use IM2 and then the vector written by the master could be used to select the function to be performed by the slave.
I didn't include a way for the slave to generate an interrupt to the master cpu, maybe a schottky diode from the slave /HALT to the master /INT could be used.
None of this is tested, and I'm not sure if I might have missed some unintended consequences. I'll probably never try and complete the layout and build this thing.

Mark