I find the completion pulse logic a bit hard to understand but the emulator should have it implemented correctly:
iot_pulse() has an argument nac, that holds whether we Need A Completion pulse for this IOT, and its up to the device
to implement it correctly. When the device is done and the per-device NAC flip-flop is set, IOS should be set, which tells the cpu
to stop waiting for IO and resume execution. The logic for the latter is all in the cpu and should work as expected. You only have to set
IOS under the right conditions (IO done and NAC was set).
NAC is indeed set in two cases, which has to do with non-waiting IO. This is the more confusing part.
If you don't wait you still may want to synchronize with the device. In this case you do something like `tyo-i` to
type out a character asychronously but still expecting a completion pulse. `iot i` then does the waiting for the
completion pulse. I'm pretty sure this `iot i` does not work if the completion pulse has already happened. In fact,
due to a timing bug in my emulator the tyo-completion pulse came earlier than MACRO expected which caused it
to hang (now fixed of course).