Before I get into the discussion about the nuances of MTS non-realtime tuning changes and SysEx, I want to articulate my vision really clearly about the one simple requirement synth manufacturers need to obey to avoid those artifacts mentioned.
That requirement is: the tuning change needs to ensure that it's zoned properly by the synth.
That means that synth manufacturers need to ensure that, no matter what a MIDI note number is being retuned to, things like the choice of sample, filter settings, keyboard tracking, etc, depend only on the target frequency, not the 12-EDO default frequency for that MIDI note.
An obvious example: if we're working with a sampler, and we retune the entire tuning table to 31-EDO, MIDI note 0 needs to get the appropriate sample for "two octaves below middle C," not the default sample for MIDI note 0 naively being resampled up like three octaves or whatever, which would sound ridiculous.
This is obvious to us, but is one of those things that really needs to be stated explicitly. Most manufacturers, for instance, don't treat the existing channel pitch bend this way. If you load up a piano sample and pitch bend it up two octaves, you end up getting timbre deformation due to the naive resampling that's typically used. Since synth manufacturers don't know how to deal with MTS a priori, they need to know to not treat these realtime tuning changes as polyphonic pitch bends from a 12-EDO skeleton, which they may well do if they think we only care about 12-note scales. It needs to be spelled out.
This rather unambitious requirement turns out to have rather remarkable consequences. If you can take any note and tune it to anything, and it's guaranteed the synth isn't going to continue to cling to the former 12-EDO interpretation of that note in any way at all, then you can take note 127 and tune it to the bass, or take note 0 and tune it to middle C, or take anything and tune it to anything, and the synth always handles it properly.
This really defines a new paradigm for handling MIDI note numbers, since they now no longer contain any inherent tuning information at all. In effect, they instead simply become 127 "voices," which are tuned to whatever. Effectively, obedience towards this one simple requirement means that MIDI becomes a protocol containing 127-voice polyphony with notes of arbitrary frequency.
Retuners to do a tremendous amount of stuff under this scheme. One of those things is a particularly powerful algorithm that I call "freestyle" retuning, which works as follows:
1) In general, new notes are sounded with combination "tuning change + note on" messages.
2) All 127 voices are put in a queue.
3) Every time a new note comes on, the retuner just uses the note that was last "note offed" in the queue.
That's it. I find this simple idea to be very useful, because it means we don't need to require synth manufacturers to implement things like multitimbral channel linking. Instead, we can just have them keep one-instrument-per-channel as most usually do now, and have retuners do the job instead. In fact, you can even use just realtime tuning messages with it and it still works flawlessly with no artifacts. You get virtually zero chance of dopplering, because a note's tuning is only changed the next time it's processed in the queue.
I think that stuff like this will lead to the retuner ending up as an integral part of the modern tuning workflow, so that the architecture ends up being controller->retuner->synth. We can simply ask synths to implement this one requirement, as well as asking controllers to implement whatever the easiest possible thing is, and retuners will bridge the the gap.
I'll leave it there for now, lots more I could go into but that's long enough.