Hi Anthony,
Your project sounds great. It reminds me of apps like Waterlogue that take a photo and turn it into a watercolor (correct me if this is not how you are thinking about it). In other words the human creates the musical content - the bass, melody, etc - and the model handles the "details" of orchestrating it out (I do not mean to dismiss the beautiful and refined art of orchestration!).
In a way my interest is the inverse, but related. I want the model to compose new/original melodies, harmonies, counterpoint based on custom training data. At least at first, the orchestration aspect could remain fairly basic, for example it could be limited to producing keyboard music. Ultimately my hope would be that it could do this on a level similar to GPT-4's current ability to write original poems, stories, etc. But from what I understand so far, music generation seems to be quite far behind text/image generation in this regard - especially in formats like MIDI and XML (audio is further along). I don't know whether that's because there is less interest in it, or because it's technically more challenging.
I appreciate your suggestion to try Robby's PerformanceRNN. I don't have the skills to tackle it on my own, though I've found GPT-4 to be a very helpful tutor on some other projects I've tried, so I'll give it a shot. I would want to use my own dataset instead of the one it uses, which (as I understand it) was encoded in Magenta's NoteSequence. So my next step is to try to convert my MIDI files using NoteSequence.
I would be interested to hear your thoughts on GAN vs RNN in this case.
Yes, let's definitely keep in touch and share findings as we go along.
Best,
Robin