Thanks for the detailed description. That all makes sense.
1) there is interpolation for missing notes currently, but the work of pitch-shifting happens in the WEB UI, and then each pitch is saved separately in it's corresponding note slot in memory. Pitch shifting on-the-fly is unlikely to be possible with much polyphony on an ESP any time soon. There are some high power RISC-V cores (t-head c906/7/8 etc) coming down the pipeline and Espressif seems to be on that bandwagon, so who knows!
2) this seems very possible, and on a per-voice basis, it would only be a matter of setting fade_factor, and implementing the same thing at the start of playback, and then choosing the right MIDI CC command for it. No WEB UI changes would be needed, or anything fancy.
3) same as 1, DSP with polyphony are unlikely to be possible on ESP any time soon, the FPU is too slow.
4) batch uploading is possible now, there are a few ways to do it, all outlined in the docs, and there is one new (undocumented) way available, which uses .sf2 files to upload, there is a special binary (currently in beta) to play with that.
The math for the fade-out is fairly straight forward, although I know looking at another persons code can be an incredibly brain-melting task, especially when that person is me, and my code is probably chaos. Basically the loop is guaranteed to run at 44.1 khz, so if fade_factor is 4 (the fastest) the math is not that hard ... ((1 / 88.2) * fade_factor * 128) seconds to fade out ... so about 1 ms for the minimum of fade_factor = 4.
Good luck with the Disting module, that looks AMAZING! They have a $30 high performance PIC in there, and I am getting very envious looking at its specs. 😅
I do believe that a RISC-V MCU will come along some day soon with specs like that, at a price that can suit WVR, and I will be ready and waiting :)