Hi Kevin, any chance you have a circuit schematic you can share? I've been tinkering along similar lines and I'm keen to compare approaches :)
A couple of things I've found that might save you some time:
The Pico has a thing called an Input Synchroniser (a pair of back to back flip-flops on every GPIO pin) that adds two clock cycles meaning the state of any input is always 2 cycles behind reality. Not a big issue but a little confusing when looking at traces on a scope. More details are in the Pico docs. In the C-SDK you can disable these. I strongly advise against that. For lack of a more technical description, turning them off causes "weird stuff" to happen in the state machines. There's a warning in the docs about this I shouldn't have ignored!
I did some very rough benchmarking of MicroPython on the Pico and if I recall correctly it was around 50-100 clock cycles per "line of python". This doesn't seem to be an issue if you just want to read or write values to the bus, but if you want something more interactive then it's quite slow and absolutely you need the wait pin. For example, if the PIO reads the lower address pins, passes them up to the python interpreter which has a think, then pushes a result back to the PIO to write to the data bus. By contrast it is possible to have an interactive response using the C-SDK but you still only get a handful of clock cycles to respond before you need the wait pin. Overclocking the Pico buys you some more cycles :)
Using the C-SDK I've seen the occasional I/O request take longer than the rest. I suspect this is the USB controller generating interrupts which the cpu has to deal with. If they coincide with the I/O request it ends up taking longer. If you use the second core to run the code that deals with the PIO state machines that seems to fix the issue (I'd guess USB interrupts are tied to the first core). I'm not sure if something similar will apply under MicroPython.
I had the bright idea of replacing the 74LVC245 that talks to the data bus with a 74LVC4245 which does up translation to 5V. This caused me lots of issues with the circuit as that chip is quite bouncy driving the long tracks on the RC2014 bus. I don't recommend going down this path, I wasted a lot of hours staring at my scope trying to understand what was happening.
I've just recently replaced all the LVC parts in my circuit with AHC. They appear to be drop-in replacements but operate at about half the speed. That might sound like an odd thing to want but I've had a couple timing issues where the very fast LVC parts react too quickly making them glitch (the Pico see's an I/O request where none occurred). This could well be a side-effect of how I've designed my circuit.
Dylan