Hi Everyone,
I am new to Magenta I am following the CADL course on Kadenze, I was wondering if there is a monophonic audio to midi model that I can train on my own dataset? I could only find a polyphonic version for piano.
Also if there is none. Does someone know where to start, what kind of network would work the best for this?
So input would be audio (speech, < 20seconds) and out comes a midi transcription of the audio.
I will have time to build my own model myself, but I don't know where to start. Should I look into Onsets and Frames or does polyphonic midi transcription require a different set of layers?
Thank you, Casper