Hi Henk,
To follow up comments from Mark - I have played a lot with the TN20K module. I think the code base we have both used for the FPGA, is sourced from the same upstream projects. I cant recall precisely the differences between the implementation of the 9K and 20K - but for the most part, they are the same. (I think the biggest change the 20K has, is the support for HDMI audio output). My schematic can be found here:
https://www.dinoboards.com.au/assets/hdmi-for-rc/schematic.pdf
So let me share some of my experience - as I think it relates to what you are exploring. Please note - not an actual electronics engineer -- just a hobbyist - who barely knows what he is doing.

> 1. The HDMI port seems to be backpowering the main system. I added a diode to the TN 9K VCC. Would this be sufficient?
I assume you have the "backpowering" issue, when you are reprogramming the Tang Nano module. (I certainty avoid having both TN and the backplane powered at the same time). For my board, I created jumpers, to allow me to easily disconnect the power line - and also pull the enable pin of the 74LVC245 chips to high - to avoid chance that the buffers and the TangNano both drive pins at the same time - my solution may be overkill and unnecessary.
I suspect the diode would be fine - but I would check a couple of things - the diode will cause a voltage drop to the TangNano, when powered by the backplane - not sure if that will be an issue. Also - although less of an issue - power can always leak through data lines - full isolation is the less risky, but I have reprogrammed my unit, in circuit, countless times, using just the jumpers mentioned.
> 2. The HDMI signal doesn't work with all monitors...
Hmm. That interesting... Not sure why you might experience that. I did have an issue with my Tang Nano 20K version -- when the audio signal is active, some monitors and passive HDMI to DVI converters will not work. But the TN9K does not have enough power to do the audio - so not sure why you might have issues with some monitors. Could it be a signal integrity/strength issue? Have you played with different/shorter cables? I wonder if this might be an issue with the TN9K hdmi circuit itself.
> 2. ....and the 80 columns mode is a bit disappointing i.e. in a smaller window.
As mark mentioned, I discussed the 'smaller' window issue in his thread. For the TN20K version, there are 2 HDMI resolutions 576p @ 50Hz and 480p @ 60Hz used. Its quite a challenge to upscale the 'lower resolutions' of the V9958 to these HDMI modes. Pixels are doubled - but that still means you get a border. Are you using a 4:3 monitor or a 16:9 widescreen monitor?
I think there are/were versions of the V9958 emulation, that mapped to 640x480. This is the absolute lowest resolution that HDMI can support (the V9958's largest progressive resolution is 512x212 - so still short). What resolution does your monitor indicate it is displaying at?
> 3. The original design used three 74LVC245AN ic's, I reduced it to two but maybe there's a reason why three are used?
Had a look at your 2 schematics. I do a similar thing for the data bus buffering -- driving the DIR from the /RD signal. I have not observed any issues.
I also wondered why others had done it with 2 chips - were they concerned there could be contention (eg: the 74LVC245 driving a line at the same time as the Z80 or the Tang Nano - creating contention). I cant recall the details now, but I am sure I studied the Z80 timing diagrams and considered how the IOREQ/RD lines become active and confirmed that there is sufficient time and sequence, to ensure the 74LVC245 do not end up driving its output, where it would generate a conflict.
One key difference with my design is I have on-boarded the I/O address decoding to the FPGA chip (that is the 74HCT138 logic is implemented in the FPGA). I have a pinout on my TangNano called VDP_CS -- this signal is an output, and drives the enable of the buffers. Moving the address decoding/chip-select to the FPGA gives me 3 key features:
* I can make sure the buffers are only enabled after the data direction has settled (if this proved to be an issue)
* I can change the address the system responds to, by just a quick change of the FPGA code.
* I can later add additional emulated hardware (eg audio chips) - without any hardware updates.
(You will see my design actually uses 3 buffers, despite the consolidation of the data buffer - I need the three buffers as I now need to map all 8 of the Z80's lower address lines).
Hope my notes help. If there a specific questions as to how/why i did something on my -- happy to try and answer
Cheers
Dean