Hi Dan,
For HDMI I share memory with the ARM core and there is a program running in userspace and interpreting video memory. (I use SDL for output).
I suppose that your idea feasibility depends on the clock: with the 1Mhz of the apple ][ clock I had to capture 3 bytes in 489ns (half cycle). But selecting each bytes is slow.
In theory there is more than enough time, but there are a few delays:
- Reading "pin to register" delay (signal on the pin --> read from __R31)
- Writing to "register to pin" delay (write to __R30 --> signal on the pin)
- Propagation of latches and demuxes
- Write to shared memory
So, experimenting with timings I got to the conclusion that changing from one 75HC245 to the next took at least 75ns for reading. I didn't test delays writing to pins yet, but I think itś higher (>100ns).
If you have more bits in less time (higher frequency), this approach unfeasible... but if the clock is low enough and you don't need HDMI, you can play with PRU1 and use 16bits + some bits of control....